The Diversification Paradox: Why More Numerai Models Isn't Better
Do multi-model Numerai operators outperform? The data on portfolio size, intra-operator correlation, and diminishing returns to running more models.
Running multiple models on Numerai is a common strategy. Diversify your predictions, smooth out variance, and reduce the risk of any single model blowing up your stake. In traditional portfolio theory, diversification is close to a free lunch. Numerai is not a traditional portfolio. The tournament's payout structure, shared feature sets, and meta-model dynamics make the relationship between model count and performance messier than it looks.
This post examines the data on multi-model operators to see whether running more models actually improves outcomes.
How Many Models Do People Run?
The distribution of models per operator is heavily skewed. Most participants run a single model. A smaller group runs two to five. A long tail of operators runs ten, twenty, or more models at once.

The log-scale y-axis is needed because the single-model category dominates. Multi-model operators are a minority, but they control a disproportionate share of total stake and dominate the leaderboard. These tend to be the most experienced and capital-rich participants.
Does that sophistication translate into better results?
Returns vs Model Count
If diversification works as expected, operators with more models should see similar average returns but lower volatility. Median return should be roughly flat across model counts, and standard deviation should decline.

Median returns do not meaningfully improve with model count. Across operators running one to twenty models, median returns bounce around in a narrow band of roughly 0.001 to 0.008, with no upward trend. Running ten models does not make you ten times more likely to earn positive payouts. Return standard deviation stays in a similar 0.015 to 0.025 range across model counts — the expected diversification-driven decline in volatility is not visible in the data. Whatever benefit multi-model operators get, it is not showing up as smoother aggregate returns.
The mechanics explain why. Models built from the same feature set, trained on the same target, and validated on the same eras produce correlated predictions. A tenth correlated model does little that a first or second did not already accomplish.
The Correlation Problem
The core question: how correlated are models within the same operator compared to models from different operators?

The intra-operator distribution (305,910 pairs) is heavily skewed toward the right tail, with the mass piling up between roughly 0.5 and 1.0 and a visible peak near perfect correlation. Models built by the same person are not loosely related — a large share of intra-operator pairs are nearly interchangeable on MMC. The same person building multiple models typically reuses overlapping data pipelines, similar feature engineering, and related modeling approaches. Even deliberate attempts at diversification (different algorithms, different feature subsets) often produce predictions that move together because they share the same underlying signal sources.
The inter-operator sample (1,867 pairs) is much smaller and sits at a far lower baseline, consistent with the idea that different participants working independently produce more diverse predictions. Numerai's meta-model is designed to exploit exactly this: aggregating independent signals is far more valuable than aggregating correlated ones.
The implication: simply running more models is not enough. The models need to be genuinely different. If your ten models are all variations on the same theme, your effective diversification is closer to two or three independent bets — the same problem that makes benchmark models a useful baseline rather than a portfolio.
Risk-Adjusted Returns
The real test is whether more models translate into better risk-adjusted performance. Sharpe ratio — mean return divided by return volatility — captures this directly.

The relationship is not monotonically increasing. Single-model operators sit near the top at around 0.36. Operators running two to twelve models cluster tightly in the 0.28 to 0.33 range — slightly lower than the one-model baseline, not higher. At the high end the chart becomes noisy: a spike near fifteen models touches roughly 0.48, then Sharpe drops below 0.20 at nineteen and twenty. Much of that tail reflects small sample sizes (few operators run fifteen or more models), but the overall message holds: no evidence that running many models systematically improves risk-adjusted returns, and the single-model baseline is competitive with every bucket below it.
That is the diversification paradox. In theory, more models should help. In practice, correlation caps the benefit. Operators who achieve genuine diversity across their models capture real gains. Building variations on the same approach produces diminishing and eventually negligible returns to scale.
Takeaways
The data suggests several practical conclusions for Numerai participants:
Two to four well-differentiated models capture most of the diversification benefit. Beyond that, additional models are unlikely to improve your risk-adjusted returns unless they are genuinely independent.
Correlation is the binding constraint. If you are going to run multiple models, focus on making them as uncorrelated as possible. Different feature sets, different targets, different modeling paradigms. Small parameter tweaks on the same architecture do not count.
Single-model operators are not at a disadvantage. The data shows no systematic penalty for running one model. A single strong model can outperform a portfolio of mediocre ones, and is often what earns medals and Grandmaster status.
Think about what Numerai rewards. The tournament pays for meta-model contribution, not individual model accuracy. A model that is highly correlated with your existing submissions — and with the broader meta-model — contributes little additional value. Impactful models provide unique signal, which is more about the quality of the approach than the quantity of submissions.
Multi-model strategies pay off for operators who have genuinely different ideas to express. For everyone else, time spent managing additional models is better invested in improving the primary one. As with most things in the tournament, the edge comes from surviving long enough to learn what works.