Diversification Paradox: Why More Numerai Models Isn't Better
Do multi-model Numerai operators outperform? The data on portfolio size, intra-operator correlation, and diminishing returns to running more models.
Single-model operators average a Sharpe ratio of about 0.36 — higher than every 2-to-12-model bucket in the tournament, which cluster between 0.28 and 0.33. Across 300,000+ intra-operator model pairs, the median pair correlation is 0.52: most operators are running variations of the same idea, not independent bets. Once shared feature pipelines lock predictions together, the expected variance reduction from diversification mostly disappears.
This post examines the data on multi-model operators to see whether running more models actually improves outcomes.
How Many Models Do People Run?
The distribution of models per operator is heavily skewed. Most participants run a single model. A smaller group runs two to five. A long tail of operators runs ten, twenty, or more models at once.

The log-scale y-axis is needed because the single-model category dominates. Multi-model operators are a minority, but they are important because they test whether adding submissions actually diversifies tournament exposure.
Does that sophistication translate into better results?
Returns vs Model Count
If diversification works as expected, operators with more models should see similar average returns but lower volatility. Median return should be roughly flat across model counts, and standard deviation should decline.

Median returns do not meaningfully improve with model count. Across operators running one to twenty models, median returns bounce around in a narrow band of roughly 0.001 to 0.008, with no upward trend. Running ten models does not make you ten times more likely to earn positive payouts. Return standard deviation stays in a similar 0.015 to 0.025 range across model counts — the expected diversification-driven decline in volatility is not visible in the data. Whatever benefit multi-model operators get, it is not showing up as smoother aggregate returns.
The mechanics explain why. Models built from the same feature set, trained on the same target, and validated on the same eras produce correlated predictions. A tenth correlated model does little that a first or second did not already accomplish.
The Correlation Problem
The core question: how correlated are models within the same operator compared to models from different operators?

The intra-operator distribution (305,910 pairs) is heavily skewed toward the right tail, with the mass piling up between roughly 0.5 and 1.0 and a visible peak near perfect correlation. Models built by the same person are not loosely related — a large share of intra-operator pairs are nearly interchangeable on MMC. The same person building multiple models typically reuses overlapping data pipelines, similar feature engineering, and related modeling approaches. Even deliberate attempts at diversification (different algorithms, different feature subsets) often produce predictions that move together because they share the same underlying signal sources.
The inter-operator sample (1,832 pairs) is much smaller and sits at a far lower baseline, consistent with the idea that different participants working independently produce more diverse predictions. Numerai's meta-model is designed to exploit exactly this: aggregating independent signals is far more valuable than aggregating correlated ones. The contrast also explains why portfolio-theory math fails here — variance reduction scales with 1/√N only when N is independent. At a median intra-operator correlation in the 0.5+ range, an operator's effective independent count tops out near 2-3 regardless of how many models they actually submit.
The implication: simply running more models is not enough. The models need to be genuinely different. If your ten models are all variations on the same theme, your effective diversification is closer to two or three independent bets — the same problem that makes benchmark models a useful baseline rather than a portfolio.
Risk-Adjusted Returns
The real test is whether more models translate into better risk-adjusted performance. Sharpe ratio — mean return divided by return volatility — captures this directly.

The relationship is not monotonically increasing. Single-model operators sit near the top at around 0.36. Operators running two to twelve models cluster tightly in the 0.28 to 0.33 range — slightly lower than the one-model baseline, not higher. At the high end the chart becomes noisy: a spike near fifteen models touches roughly 0.48, then Sharpe drops below 0.20 at nineteen and twenty. Much of that tail reflects small sample sizes (few operators run fifteen or more models), but the overall message holds: no evidence that running many models systematically improves risk-adjusted returns, and the single-model baseline is competitive with every bucket below it.
That is the diversification paradox. In theory, more models should help. In practice, correlation caps the benefit. Operators who achieve genuine diversity across their models capture real gains. Building variations on the same approach produces diminishing and eventually negligible returns to scale.
Takeaways
A few practical conclusions for Numerai participants:
Extra models only help when they are genuinely differentiated. The charts do not show a reliable risk-adjusted return benefit from simply increasing model count.
Correlation is the binding constraint. If you are going to run multiple models, focus on making them as uncorrelated as possible. Different feature sets, different targets, different modeling paradigms. Small parameter tweaks on the same architecture do not count.
Single-model operators are not at a disadvantage. The data shows no systematic penalty for running one model. A single strong model can outperform a portfolio of mediocre ones, and is often what earns medals and Grandmaster status.
Think about what Numerai rewards. The tournament pays for meta-model contribution, not individual model accuracy. A model that is highly correlated with your existing submissions — and with the broader meta-model — contributes little additional value. Impactful models provide unique signal, which is more about the quality of the approach than the quantity of submissions.
Multi-model strategies pay off for operators who have genuinely different ideas to express. For everyone else, time spent managing extra models is better invested in the primary one. The edge comes from surviving long enough to learn what works.