The Volatility Tax: Why Consistent Models Earn More

Steady Numerai models earn 3-5x the cumulative payout of volatile ones at every skill level -- a penalty that persists after controlling for mean MMC.

Numerai's payout formula penalizes inconsistency. Two models with identical average MMC (Meta Model Contribution) can produce very different cumulative returns depending on how much their scores swing round to round. The steadier model wins, and across 9,100+ staked models, the gap is 3-5x.

This article quantifies the volatility tax using tournament data, explains the mechanics behind it, and shows the penalty holds after controlling for skill level. For background on scoring, see How Numerai Works.

The Payout Formula and Its Asymmetry

Each round, a model's payout is:

payout = stake x clip(0.5 x CORJ60 + 2.0 x MMC, -0.25, 0.25) x payout_factor

The clip function caps the combined score between -0.25 and +0.25. Without clipping, variance would be neutral for expected cumulative payout because positive and negative swings cancel over time. With clipping, positive outliers get truncated while negative ones still bite. A model scoring 0.30 earns the same as one scoring 0.25, but a model scoring -0.30 loses the full amount.

In practice, the caps rarely bind. Most raw scores fall well within the range.

Histogram of raw payout scores showing a tight distribution centered near zero with annotated tail percentages and the plus-or-minus 25 percent caps marked far outside the distribution
Histogram of raw payout scores showing a tight distribution centered near zero with annotated tail percentages and the plus-or-minus 25 percent caps marked far outside the distribution

The distribution clusters near zero: 7.4% of observations score above +0.05, and 3.0% fall below -0.05. The +/-0.25 caps sit far out in the tails. Clipping creates a theoretical asymmetry, but the real cost of volatility comes from a different source: variance drag on cumulative returns.

Where the Tax Actually Bites

Even without hitting the caps, volatile scores compound poorly. A model alternating between +0.03 and -0.02 averages +0.005 per round -- the same as a model that scores +0.005 every round. But the volatile model's burn rounds eat into the stake base, so each subsequent positive round is worth less in absolute NMR. Over hundreds of rounds, that drag accumulates.

Plotting MMC volatility against cumulative payout across 9,100+ staked models (each with 50+ rounds) makes the effect visible.

Scatter plot of MMC standard deviation on the x-axis versus cumulative NMR payout on the y-axis, with points colored by mean MMC quartile, showing that high-payout models cluster at low standard deviations
Scatter plot of MMC standard deviation on the x-axis versus cumulative NMR payout on the y-axis, with points colored by mean MMC quartile, showing that high-payout models cluster at low standard deviations

The largest cumulative payouts cluster on the left, among models with low MMC standard deviation. Some high-volatility Q4 (high mean MMC) models do accumulate positive payouts, but big returns skew heavily toward steady performers. Negative outliers span the full volatility range but tilt right.

The correlation between MMC standard deviation and cumulative payout is -0.045. That sounds small, but the scatter shows a nonlinear pattern the coefficient misses. Volatility imposes a floor more than a ceiling: steady models are not guaranteed to earn, but volatile models face a structural headwind. You can explore individual model histories on the Models page.

Controlling for Skill

An obvious counterargument: maybe volatile models just have worse average MMC. To test whether the penalty persists after controlling for skill, we split models into mean-MMC terciles (low, mid, high) and then into volatility quintiles (Q1 = steadiest, Q5 = most volatile) within each tercile.

Grouped bar chart showing median cumulative NMR payout by volatility quintile, grouped by skill tercile, with Q1 steady models earning substantially more than Q5 volatile models within each skill tier
Grouped bar chart showing median cumulative NMR payout by volatility quintile, grouped by skill tercile, with Q1 steady models earning substantially more than Q5 volatile models within each skill tier

The pattern holds at every skill level. Within the high-skill tercile, steady models (Q1) earn a median cumulative payout roughly 5x that of the most volatile (Q5). Mid-skill shows a similar gap, with Q1 earning several times Q5. Low-skill models earn little regardless of volatility -- consistency cannot compensate for a weak signal.

Mean payouts reinforce this. High-skill Q1 averages 143.6 NMR per model; high-skill Q5 averages 91.5 NMR. The penalty scales with stake size, because larger stakers compound more aggressively in both directions.

A Simulation of the Mechanism

To isolate the mechanics, consider three simulated models over 500 rounds. "Steady" has mean MMC 0.010, standard deviation 0.005. "Moderate" has mean 0.012, standard deviation 0.015. "Volatile" has the highest raw skill (mean 0.015) but standard deviation 0.025. All use the same payout formula with clipping applied.

Line chart of simulated cumulative returns over 500 rounds for three models, showing the steady model with the smoothest path, the moderate model roughly tracking it, and the volatile model falling behind despite higher mean MMC
Line chart of simulated cumulative returns over 500 rounds for three models, showing the steady model with the smoothest path, the moderate model roughly tracking it, and the volatile model falling behind despite higher mean MMC

Steady finishes ahead despite the lowest mean MMC. Volatile, with 50% higher average skill, falls behind because its best rounds get clipped while its worst rounds compound against the stake base. Moderate lands in between. The gap widens over time -- this is a compounding effect, not a transient one.

Why This Matters for Stakers

The volatility tax has direct implications for anyone staking NMR.

  • Model selection: A model averaging 0.01 MMC with standard deviation below 0.005 will likely outperform one averaging 0.015 with standard deviation 0.02 over a year of rounds. Prioritize consistency over peak scores.
  • Stake sizing: The penalty compounds with stake size. Larger stakers should weight consistency more heavily, because burn rounds remove more absolute NMR from the compounding base.
  • Ensemble strategies: Running multiple models with moderate skill but low correlation reduces portfolio-level volatility. The diversification paradox explores this tradeoff.
  • The payout factor amplifies the effect: When the factor is low (~0.10), absolute payout per round shrinks, slowing recovery from burns. A volatile model needs more positive rounds to offset each bad one.

Numerai's formula rewards models that are reliably decent over models that are occasionally brilliant. Variance is not free. Building for consistency (lower drawdowns, stable signal) is not just risk management. It is an expected-value optimization.