ICLR 2026
ICLR 2026

MetaMuse

Algorithm Generation via Creative Ideation

Ruiying Ma1    Chieh-Jan Mike Liang2    Yanjie Gao2    Francis Y. Yan3
1UC Berkeley    2Microsoft Research    3UIUC
35.76%
Cache Miss Reduction
30.93%
Bin Usage Reduction
1.8×
More Diverse Solutions
1.3×
Higher Entropy
Challenge

The Challenge: LLM Availability Bias

  • LLMs have broad knowledge of existing algorithms — but struggle to invent genuinely new ones
  • They exhibit availability bias: solutions cluster around well-known human heuristics (LRU, FIFO, LFU…), blocking diverse exploration
  • The bias is consistent across three LLM backbones — an architectural tendency, not a model quirk
Availability bias - GPT-4o
GPT-4o
Availability bias - Llama 3.3-70B
Llama 3.3-70B
Availability bias - DeepSeek-V3
DeepSeek-V3
Overview

MetaMuse: Three Self-Reflection Principles

MetaMuse workflow: Evaluating Diversity → Steering with External Stimuli → Waypoint Reasoning
Performance-Based Diversity Evaluation
Measures diversity in measurable performance space, not abstract idea space
External Stimuli Steering
Guides LLM ideation via external stimuli rather than internal randomness
Waypoint Reasoning
Constructs executable algorithms through structured waypoints, not free-form CoT
Optimization

Optimization: Learning to Steer Better

Builds on previous solutions to select the most effective stimuli for the next iteration. Uses Gaussian Process Regression trained on (stimuli → diversity + performance) pairs from prior runs.

Train
Fit GP models on the history of stimuli sets and the diversity + performance of the solutions they produced
Predict
Generate two candidate stimuli sets; GP predicts expected diversity and usefulness for each
Select
Choose the set that fits the current goal — exploration (diversity) or exploitation (performance)
Caching

Application 1: Cache Replacement

  • Setup: 96 real-wrold workloads from 4 scenarios; 14 baselines (5 LLM-driven methods + 9 human heuristics)
  • MetaMuse reduces cache miss ratios by up to 35.76%
Cache miss ratio reduction - GPT-4o
GPT-4o
Cache miss ratio reduction - Llama 3.3-70B
Llama 3.3-70B
Cache miss ratio reduction - DeepSeek-V3
DeepSeek-V3
Cache miss ratio reduction - Human Heuristics
Human Heuristics
Bin Packing

Application 2: Online Bin Packing

  • Setup: 288 real-wrold workloads from 3 scenarios; 11 baselines (5 LLM-driven methods + 6 human heuristics)
  • MetaMuse reduces bin usage by up to 30.93%
Bin usage reduction - GPT-4o
GPT-4o
Bin usage reduction - Llama 3.3-70B
Llama 3.3-70B
Bin usage reduction - DeepSeek-V3
DeepSeek-V3
Bin usage reduction - Human Heuristics
Human Heuristics
Diversity

Less Bias, More Diversity

1.8×
More diverse solutions vs. best baseline
1.3×
Higher solution distribution entropy
Solution distribution entropy - GPT-4o
GPT-4o
Solution distribution entropy - Llama 3.3-70B
Llama 3.3-70B
Solution distribution entropy - DeepSeek-V3
DeepSeek-V3
Summary

Summary: What MetaMuse Achieves

Bias Mitigation
Three self-reflection principles break LLMs' availability bias and push exploration beyond familiar human heuristics
Superior Diversity
1.8× more diverse solutions with 1.3× higher entropy — broadest solution coverage among all LLM-driven methods
Real-World Performance
Up to 35.76% cache miss reduction and 30.93% bin usage reduction on production cloud workloads
Adaptive Optimization
Using Gaussian Process Regression to balance exploration and exploitation through learned stimuli selection
github.com/illinois-nsai/MetaMuse