Return Persistence: Do Numerai's Winners Keep Winning?

Autocorrelation of model returns sits at 0.15 at 20-round lag and decays to near zero by 200 rounds. Top-100 models have a half-life of about 62 days, and rank quintile transition matrices show meaningful reversion to the middle.

Does past performance predict future results? If top models stay on top, the leaderboard reflects genuine skill. If performance mean-reverts, rank is mostly noise and chasing winners is a losing strategy.

Numerai makes this testable. Models stake real NMR, scores resolve after a fixed delay, and performance history is public.

Autocorrelation of Returns

The cleanest test of persistence: for each model, compute the correlation between its return in round N and round N+lag. Average across all models with enough history to be meaningful.

Bar chart showing autocorrelation of model returns at lags of 20, 50, 100, and 200 rounds. Autocorrelation starts at 0.15 for lag-20, drops to 0.09 at lag-50, 0.04 at lag-100, and reaches 0.02 at lag-200.
Bar chart showing autocorrelation of model returns at lags of 20, 50, 100, and 200 rounds. Autocorrelation starts at 0.15 for lag-20, drops to 0.09 at lag-50, 0.04 at lag-100, and reaches 0.02 at lag-200.

At a 20-round lag, the average autocorrelation is 0.15 — positive and statistically significant across thousands of models, but modest. By 50 rounds the signal has decayed to 0.09, and by 200 rounds it is indistinguishable from zero at 0.02.

Good models tend to stay good short-term, but a strong quarter does not predict performance a year out. This matches model survival data — early performance is weakly predictive of long-term persistence, but far from deterministic.

Rank Quintile Transitions

Bucket all models into rank quintiles on a given date, then check which quintile they occupy 90 days later. Perfect persistence would put 100% on the diagonal. Pure randomness would put 20% everywhere.

Heatmap of rank quintile transition probabilities over 90-day windows. The diagonal shows values between 28% and 35%, indicating moderate persistence. The top quintile retains 35% of its members, while the bottom quintile retains 31%. Off-diagonal cells range from 10% to 22%.
Heatmap of rank quintile transition probabilities over 90-day windows. The diagonal shows values between 28% and 35%, indicating moderate persistence. The top quintile retains 35% of its members, while the bottom quintile retains 31%. Off-diagonal cells range from 10% to 22%.

The diagonal runs between 28% and 35% — above the 20% baseline but nowhere near deterministic. The top quintile retains 35% of its members after 90 days. The bottom quintile shows 31% retention — poor performers also persist somewhat.

Transitions favor adjacent quintiles. A Q1 model is more likely to drop to Q2 (22%) than collapse to Q5 (10%). Extreme rank changes are rare, but the distribution pulls everyone toward the middle over time.

Top-100 Half-Life

For models that reach the top 100 by rank, how long do they stay? This survival curve tracks consecutive days in the top 100 from the moment of entry.

Survival curve of top-100 tenure showing the percentage of models still in the top 100 versus days since entry. The curve drops steeply in the first 30 days, crosses 50% around day 62, and flattens near 15% past day 180.
Survival curve of top-100 tenure showing the percentage of models still in the top 100 versus days since entry. The curve drops steeply in the first 30 days, crosses 50% around day 62, and flattens near 15% past day 180.

The half-life is approximately 62 days. Half of all models that enter the top 100 have dropped out within two months. The curve flattens after 180 days — the 15% who survive six months tend to be benchmark-beating veterans with large stakes.

The early steepness comes from models that spike into the top 100 during a favorable regime, only to fall back when conditions shift.

Rolling Persistence Over Tournament History

Is the tournament becoming more or less predictable? For each round, we compute the Spearman rank correlation between model returns in round N and round N+50, then track this rolling correlation over time.

Line chart of rolling 50-round return correlation over tournament history, fluctuating between 0.05 and 0.22. The series shows a mild upward trend from early rounds near 0.08 to recent rounds averaging 0.14, with notable dips around rounds 750 and 1050.
Line chart of rolling 50-round return correlation over tournament history, fluctuating between 0.05 and 0.22. The series shows a mild upward trend from early rounds near 0.08 to recent rounds averaging 0.14, with notable dips around rounds 750 and 1050.

Rolling persistence has drifted upward from about 0.08 in early rounds to roughly 0.14 recently — the tournament is slightly more predictable now, plausibly because marginal models have churned out and the remaining field is more stable.

Dips around rounds 750 and 1050 correspond to payout factor shifts and market condition changes. During regime changes, persistence collapses temporarily before re-establishing.

Takeaways

Past performance is weakly predictive, not deterministic. An autocorrelation of 0.15 means about 2% of next-round variance is explained by this-round returns. Enough to be real, not enough to be relied upon.

The leaderboard is more stable than random but less stable than it looks. Top-quintile models have a 35% chance of staying after 90 days — better than 20% random, but still a 65% chance of dropping.

Top-100 tenure is short. A 62-day half-life means the leaderboard you see today will look substantially different in two months. Chasing last month's winners is a weak strategy.

Persistence is slowly increasing. The maturing participant pool has made performance slightly more predictable over time, consistent with the stake-weighted age trend showing experienced models accumulating influence.

For stakers: 20+ rounds of good performance is a mildly positive signal for the next 20 rounds, but extrapolating one strong quarter is not supported by the data. Consistent performance across market regimes is a far stronger indicator of skill than any short-term rank.

All charts on this page are generated from live tournament data tracked by nmrdash.