Apr 10, 2026

Return Persistence: Do Numerai's Winners Keep Winning?

Displayed return persistence is weak: lag-20 autocorrelation is -0.061, the 50-round MMC quintile matrix is close to random, top-100 half-life is about 5 days, and rolling persistence is unstable.

Half the models that crack the Numerai top 100 drop out within five days. The 50-round MMC-quintile transition matrix is nearly indistinguishable from random reshuffling, and the lag-20 autocorrelation of returns is negative (-0.061). On the metrics that matter for stake-sizing, "winners keep winning" looks more like myth than mechanic.

Numerai makes this testable. Models stake real NMR, scores resolve after a fixed delay, and performance history is public — so does past performance predict future results?

Autocorrelation of Returns

The cleanest test of persistence: for each model, compute the correlation between its return in round N and round N+lag. Average across all models with enough history to be meaningful.

Bar chart showing autocorrelation of model returns at lags of 20, 50, 100, and 200 rounds. Lag-20 is -0.061, lag-50 is 0.001, lag-100 is 0.000, and lag-200 is -0.018.

The chart does not support a simple "winners keep winning" story. At a 20-round lag, average autocorrelation is -0.061. By 50 and 100 rounds it is effectively zero, and at 200 rounds it is still slightly negative.

Past payout is therefore a weak forward signal in this measurement. A strong recent stretch can still matter for staking decisions, but it should be treated as one input beside drawdown, score stability, age, and behavior across market regimes, not as a standalone forecast.

Rank Quintile Transitions

Bucket all models into MMC quintiles in one round, then check which quintile they occupy 50 rounds later. Perfect persistence would put 100% on the diagonal. Pure randomness would put 20% everywhere.

Heatmap of 50-round MMC quintile transition probabilities. Cells cluster near 19% to 21%, close to the 20% random baseline.

The matrix is close to random. Most cells sit near 19-21%, with no strong diagonal that would indicate durable quintile membership. In this view, a top-quintile round does not reliably predict top-quintile status 50 rounds later.

That does not mean every model is identical. It means round-level MMC ranking is noisy enough that broad buckets churn heavily over this horizon. For stakers, the transition matrix argues against treating a recent leaderboard bucket as a durable classification.

Top-100 Half-Life

For models that reach the top 100 by rank, how long do they stay? This survival curve tracks consecutive days in the top 100 from the moment of entry.

Survival curve of top-100 tenure showing the percentage of models still in the top 100 versus days since entry. The curve crosses 50% around day 5 and settles near 35% after roughly 30 days.

The half-life is approximately 5 days. Half of all models that enter the top 100 have dropped out within the first week. After about a month, the curve flattens near 35%, leaving a persistent core that can remain visible through multiple scoring updates.

The early steepness is the main point. A top-100 entry is often a short-lived state, especially for models that arrive through a transient score spike. The durable core is more interesting than the headline entry count.

Rolling Persistence Over Tournament History

Is the tournament becoming more or less predictable? For each round, we compute the Spearman rank correlation between model returns in round N and round N+50, then track this rolling correlation over time.

Line chart of rolling 50-round return correlation over tournament history, fluctuating around zero and turning negative in the most recent window.

Rolling persistence is unstable rather than steadily improving. The series moves above and below zero across the tournament history, with recent readings negative. Whatever short-lived persistence appears in one period can disappear in the next.

That instability matters because participants often infer too much from a recent run of rankings. When the rolling relationship itself changes sign, recent performance is better read as a fragile estimate than as evidence of a stable edge.

Takeaways

Past performance is not reliably persistent in this view. Lagged return autocorrelation is negative at 20 rounds and near zero at 50 and 100 rounds.

The quintile transition matrix is close to random. A 50-round horizon leaves most cells near the 20% baseline, so recent MMC bucket membership should not be treated as durable.

Top-100 tenure is short. A 5-day half-life means the leaderboard can change materially inside a single scoring cycle.

Rolling persistence is unstable. It fluctuates across eras and is negative in the most recent window, so there is no clean evidence that the tournament is becoming easier to forecast.

For stakers: do not extrapolate recent ranks mechanically. Consistency across market regimes, controlled drawdowns, score stability, and survival through weak periods carry more information than one leaderboard snapshot.

All charts on this page are generated from live tournament data tracked by nmrdash.