Wearable VO₂ Max Estimates
The number your watch calls "VO₂ max" or "Cardio Fitness" is a good trend tracker and a poor measurement. It is least accurate for exactly the people most likely to be watching it — the very fit and the very unfit — yet the direction it drifts over months is honest, and that is the part worth using.
A wearable cannot measure oxygen uptake. It infers it from the relationship between your heart rate and your work rate, then projects that line out to a predicted maximum. That inference is good enough to follow your own progress and bad enough that the absolute figure should never anchor a clinical decision or a training-zone calculation. This article covers how the estimate is built, how far it misses, what knocks it off, and how to get the cleanest reading your hardware allows.
How the watch guesses
Direct VO₂ max measurement requires a mask, a metabolic cart, and a maximal effort — cardiopulmonary exercise testing (CPET), the gold standard.[1] A watch has none of that, so it models the Fick relationship from the sensors it does have: optical heart rate, GPS speed, a barometer for grade, and an accelerometer. During a qualifying effort it fits a submaximal heart-rate-to-pace slope and extrapolates it to your age-predicted maximum heart rate, converting the result to VO₂ via the standard American College of Sports Medicine metabolic equations.[2]
The two dominant systems have different entry requirements:
- Apple ("Cardio Fitness") updates from an outdoor walk, run, or hike on flat ground (grade under 5%), lasting at least three minutes, with heart rate raised to at least 30% of your heart-rate reserve. It uses your entered age, sex, height, and weight, and lets you flag heart-rate-lowering medication such as beta-blockers so the estimate isn't dragged down by a blunted heart rate.[3]
- Garmin (Firstbeat) needs an outdoor run of at least ten minutes above 70% of maximum heart rate before it will revise the figure.[4]
Some newer models add passive machine-learning estimates from resting heart rate and everyday movement, which correlate well with lab values at the population level but inherit the same individual-level limits below.[5]
How accurate is the number?
Across validation studies the pattern is consistent: average agreement across a group is reasonable; agreement for any one person is not. Devices routinely "look right on average" while failing strict statistical equivalence tests — the spread of individual errors is too wide to call the watch interchangeable with a lab.[6]
| Device (method) | Validation finding |
|---|---|
| Apple Watch (submaximal walk/run) | Good average agreement; systematically underestimates in highly fit adults |
| Apple Watch Series 7 (vs cycle test) | Overestimates the low-fit, heavily underestimates the high-fit |
| Garmin Fenix 6 (Firstbeat) | Strong average agreement, but fails equivalence — individual error too large |
| Garmin Forerunner 920XT | Significant underestimation, though within a 10% error limit |
| Garmin Venu 4 | Large positive bias; leans heavily on anthropometric baseline formulas |
| Polar V800 | Sex bias — overestimates in women, underestimates in men |
One methodological trap inflates some of these numbers: a watch derives its estimate from running or walking, but several studies validate it against a cycle ergometer. The INTERLIVE expert consortium's guidance is that the test modality should match the device's locomotion — comparing a run-trained estimate to a cycling test recruits different muscle and widens the apparent error.[7]
It's most wrong at the extremes
The single most important finding is a textbook regression to the mean: wearables overestimate VO₂ max in unfit people and underestimate it in fit people, and the error grows toward both ends of the spectrum.[8] When the Apple Watch Series 7 was tested against a metabolic cart and participants were split by fitness, the poor-fitness group came out flattered and the excellent-fitness group came out short — underestimated by as much as 10–20%.[9]
The cause is structural. The algorithms are anchored to demographic priors — most importantly the age-predicted maximum heart rate — so they pull outliers back toward the population average. A highly fit runner has a strikingly low heart rate at a given pace, but the model can't see stroke volume or oxygen extraction directly, so it attributes part of that efficiency to "typical" demographics and clips the estimate. The same priors inflate a sedentary user's number. The same compression shows up in Garmin's Firstbeat engine, where error stays small in moderately trained users and widens in the highly trained.[10]
What throws the number off
Because the model assumes a clean, fixed heart-rate-to-work relationship, anything that raises heart rate without raising oxygen demand reads as a fitness change that isn't real:
- Heat. In hot or humid conditions the body shunts blood to the skin to cool down, dropping stroke volume; heart rate climbs to compensate (cardiovascular drift). The watch reads the higher heart rate as lost fitness and underestimates.[11]
- Illness, poor sleep, stress, caffeine. All raise submaximal heart rate through sympathetic activation, again pushing the estimate down on the day.
- Cold. Peripheral vasoconstriction shrinks the optical pulse at the wrist, degrading the heart-rate signal the whole model depends on.
- Skin tone. The green light used by most wrist sensors is absorbed more by melanin, lowering signal quality in darker skin and making the reading more vulnerable to motion error.[12]
- Motion artifacts. A loose band or hard arm-swing can let the sensor lock onto stride cadence instead of pulse.
- Stale body weight. VO₂ max is reported per kilogram, so an out-of-date weight in the companion app produces a wrong relative number even when nothing physiological has changed — and a weight drop alone will make the watch report a fitness rise.
The trend is the honest signal
Here is the redeeming feature: if your wearing habits and routes are consistent, the bias is consistent too. A watch that reads 10% low this month reads about 10% low next month, so the month-to-month change tracks real adaptation or decline even when the absolute value is off.[13] For longevity purposes — where the direction and rate of change matter more than a one-off measurement — that makes the watch a genuinely useful, low-cost early-warning system. A sustained downward drift is worth investigating; a slow climb is worth trusting.
The one thing not to do is let a wrong absolute number set your training intensities. Wearables derive zone boundaries as fixed percentages of the estimated VO₂ max or maximum heart rate, so an error cascades: an underestimate sets your Zone 2 ceiling too low and you under-stimulate, while an overestimate in a beginner pushes "easy" runs into anaerobic territory and invites overtraining. Calibrate zones from how you actually feel and breathe, not from the watch's headline figure.
Getting a cleaner reading
You can't make a wrist estimate lab-accurate, but you can cut most of the avoidable error:
- Feed it clean heart rate. Pair a chest strap (such as a Polar H10) for runs; it bypasses the wrist's optical weaknesses and is the same upgrade that helps heart-rate variability readings.
- Wear it right. Snug band, sensor sitting just above the wrist bone, so it doesn't slide during arm-swing.
- Give it a good data point weekly. One steady, flat outdoor run of 20–40 minutes produces the clean pace-to-heart-rate segment the algorithm wants.
- Compare like with like. Read trends across similar seasons rather than against a single hot or cold week.
- Keep your weight current in the app.
- If you're highly trained, mentally add ~10–20% to the absolute figure to correct for the built-in compression.
- Anchor it once. A single lab CPET gives you a true baseline to calibrate the trend against — worth doing once if you train enough to care about the difference. See VO₂ max for what that test involves.
Bottom line
Treat the wearable VO₂ max as a fitness trend line, not a fitness measurement. Watch its direction over months, ignore the decimal places, distrust the absolute value most when you're very fit or very unfit, and verify with a lab test if a real number matters to you.
Further reading
- Investigating the accuracy of Apple Watch VO₂ max measurements — validation study. PLOS ONE.[14]
- Assessing the accuracy of smartwatch-based estimation of VO₂ max using the Apple Watch Series 7 — validation study. JMIR Biomedical Engineering.[15]
- Validity of estimating VO₂ max by consumer wearables — systematic review with meta-analysis and expert statement of the INTERLIVE network.[16]
- Accuracy of wearables for determining maximal oxygen uptake and lactate threshold — qualitative systematic review.[17]
- Validation of aerobic capacity (VO₂ max) and pulse oximetry in wearable technology (Garmin Fenix 6).[18]
- Validity of wrist-worn activity trackers for estimating VO₂ max (Garmin Forerunner 920XT, Polar V800).[19]
- Longitudinal cardiorespiratory fitness prediction through wearables — the Fenland Study.[20]