Beyond the Numbers: Why We Stopped the 20-Minute Test and What’s Really Stopping Your Progress?

By: Elad Faltin, March 2026

This week, I was asked a question by several trainees, some even reaching out from afar: “Why don’t we do the famous 20-minute test we used to do at PowerWatts anymore?”. This question brought up many thoughts, not just about that specific test, but about how we—as coaches, athletes, and an industry—evaluate fitness, measure progress, and all too often, lie to ourselves.

To understand the answer, we need to dive into three interrelated topics: the evolution of fitness testing, the “VO2 Max Illusion” in laboratories, and the psychological trap that prevents amateur athletes from truly improving.

Part A: Farewell to the 20-Minute Test and the Evolution of CP

Our old test was a brutal and unique assessment: a 1 km course on a flat, followed by 1 km at a 1% incline, with each subsequent kilometer sharpening the gradient by an additional percent (2%, 3%, 4%, and so on). It was a test that beautifully combined riders with high absolute wattage alongside those with a high watts-per-kilogram ratio.

Its advantage was clear – it provided a precise and merciless metric of current fitness compared to previous tests. But its problematic nature stemmed exactly from that advantage: it became boring. When performed repeatedly over years, burnout occurs. Some of our clients have been riding with us for their fourth, fifth, and even tenth year. As they reached higher fitness levels, the probability of improving their score as they matured (and aged) in this specific test decreased, creating clear frustration. (By the way, the overall first place for men was Guillaume Boivin IPT (2016). Followed by Michael Woods IPT. For the women, I believe it was a world champion kayaker who tried to transition to cycling, Clara Hughes, if I’m not mistaken).

Instead, we transitioned to “Endurance Challenges.” We use real Time Trial courses from the Pro Tour scene – courses that take between 20 to 30 minutes to complete. This variety is not only more interesting but also provides a reliable result for the rider’s CP20 (Critical Power for 20 minutes) capability.

But the deeper truth is that our understanding at PowerWatts has evolved. Over time and with increased training levels, we realized there is no such thing as an absolute “CP20” or “FTP.” There are different riders with different physiological profiles. Today, instead of relying on one intimidating test, we update the CP dynamically and continuously from the training sessions themselves, according to each rider’s changing ability in real-time.

Part B: The Raw Potential Trap – Why VO2 Max Testing is Mostly a Waste of Time

The understanding that “one number doesn’t tell the whole story” leads us to the most common mistake of amateur riders: the obsession with VO2 Max (Maximal Oxygen Consumption).

As a coach, I often encounter an athlete returning from a sports lab, puffing out their chest and boasting: “I have a VO2 Max of 56!”. Then, in the same breath, they don’t understand why they got dropped from the peloton on the last climb of a Saturday ride. If you know such a rider, here are four arguments to help ground them (Hint: “Cycling races aren’t won by the highest VO2-maxers”).

The Car Analogy (“Engine Displacement” vs. “Power at the Wheels”): Your VO2 Max is simply your engine displacement (in liters). It’s nice to have a 5-liter engine, but if your gearbox is broken, your mechanical efficiency is poor, and your tires are flat – the small, efficient car with a 1.6-liter engine will overtake you at the turn. The bottom line: The 5-minute test (actual watts) is the power reaching the wheels. VO2 Max is just the theoretical potential of the engine to consume fuel.
The Actionability Argument (What do we do with it?): You cannot build a training program based on “oxygen.” If I tell you to ride tomorrow at 85% of your VO2 Max, you won’t have a clue what to do, because the computer on your handlebars doesn’t show oxygen. But if I tell you to ride at 280 watts (which is your pVO2max, the power at VO2 max you produced in the field), we have a common language and a working tool.
The Specificity Test (The lab is not the road): In the lab, you rode a stationary bike in a controlled environment with a mask and perfect temperature. But outside there is wind, technique, movement efficiency, accumulated fatigue, poor nutrition, and your actual bike. A performance test (like 5 minutes at PowerWatts) tests how you produce power under the specific conditions of your sport, incorporating everything: oxygen, efficiency, anaerobic power, and the mental capacity to suffer.
The Awakening Dissonance (The gap is the opportunity): If the lab determined you have the VO2 Max of an elite rider (e.g., 70), but at 5 minutes you only produce 281 watts – that is actually excellent news. It means you have a massive engine that simply isn’t connected to the wheels. Instead of chasing another “oxygen number,” the goal is to teach the body to use the existing potential.

To summarize the lab issue: From the perspective of a field coach seeing dozens of workouts, a single VO2 Max test is a very narrow and limited “screenshot.” It’s enough that you arrived slightly dehydrated, stressed, or after a hard workout for the result to be skewed. A VO2 Max test becomes an instructional tool only if done in strict succession (3-4 times a year) to identify a trend. For most trainees, hundreds of hours of power data (Watts) from the field and training provide a picture that is many times more reliable, dynamic, and accurate than a one-time 12-minute effort on a treadmill.

Part C: Gadgets, Ego, and the Truth People Don’t Want to Hear

Relying on lab numbers like VO2 Max connects to a broader and deeper problem among amateur athletes. Instead of dealing with the hard work (their simple and known limitations and an honest, simple dialogue with the coach), athletes turn to look for help in interesting and expensive gadgets, or complicated tests whose connection to real progress is highly questionable.

Two main problems overshadow this: The first is the ability of coaches to simply and clearly identify the athlete’s true Limitation. But the second and greater problem is the Coachability of the trainees themselves. Many times, even when a good coach tells a trainee correct and clear things about their weaknesses, the trainee refuses to listen in the worst case, or in a slightly better (or perhaps worse) case – misunderstands the coach’s words (which is more logical).

Why does this happen? Because the truth is hard. Working on the things we are less good at – whether it’s muscular endurance, technique, cadence, or excess weight – is very difficult and not fun.

It is much easier to look for the solution in a new app, an advanced sensor, or a lab test that gives us a flattering number. We behave exactly like the famous story of the man searching for his lost coin under the streetlight, simply because “there is light there,” and not where the coin actually fell (in the darkness of our weaknesses, where progress truly lies).

And here we return to the role of assessment centers and labs today. If these centers do not reinvent themselves, they will begin to cease to exist. Instead of creating and building new tests that truly redefine fitness capabilities and locate the limitations of the 2024 field athlete, they choose to continue performing the same old, archaic tests, struggling to find a different result or a new insight through them. It simply doesn’t work.

A “scratch my back and I’ll scratch yours” situation has been created in the industry, and it’s unclear who is more to blame or who wants it more – “the cow that wants to suckle” (the assessment centers and industry selling illusions and static numbers) or “the calf that wants to drink” (the amateur athlete who prefers paying for tests than confronting their weaknesses in a gut-wrenching workout).

Summary

“So so far there was only criticism. What about solutions?” (asked Piglet, and not the Gemini editor). “Good question… sorry, excellent question,” answered Po. And seriously:

The future of testing will come from the rider’s own real and reliable Data Bases, built from a wide range of diverse tests. These will give a very broad picture of the rider’s abilities and point out the differences between their performance level in training and in competitions. The data will be based primarily on comparing race data to training data, measured from the exact same source. Are they doing this today? Certainly. For several years already.

Coaches who know how to access this information immediately see the gaps between lab tests and field (race) tests. The idea is actually to narrow those gaps, striving for a rider to reach a performance level in competition that is almost identical to their optimal performance in the lab or training.

How is this done? Simply: identify the limitation and work on it. At POWERWATTS training, we do this in every single workout. What do I mean? We measure the efficiency of the riders far beyond pure power, heart rate, or even cadence. We ask the trainees to increase their efficiency level, so they actually manage to produce the same result with less energy expenditure – or with higher efficiency and lower fatigue (which is effectively the same thing).

But for this to succeed, several conditions must be met: The future is not replacing one test with another, but building a “sports-medical file” of the rider from a database of diverse efforts, searching there for the true performance limitation—especially the gap between training and competition. What else is important?

Correct and accurate data.
Correct analysis of the data – and not “forcing” the data or misinterpreting it just to fit what we want to see.
Opening for research – this work opens a large door for research and application in training theory. A clear example of this is basing training programs on WPRIME data (search for more articles about this on the site).

Should the tests be changed? Yes, they need to be more specific to the relevant discipline (also mentioned in the articles below) or, in the case of cycling, more relevant to the rider’s role (climber, sprinter, GC, etc.). For amateurs, we need to build a fitness level norm and check how far we are from it in every aspect of fitness (limitation).

Examples from POWERWATTS and the changes we made: The ISIS test (2012) or “The Israeli Test.” Within 60 minutes, it tests a wide spectrum of fitness (Limitation – a high mental capacity of a rider that will prevent them from failing it). STAGE RACE challenges at POWERWATTS centers are very similar to a stage, but with the addition of power calculations and not ALL OUT in every segment like in the ISIS test.

=========================================================================================== For POWERWATTS training (Tel Aviv Velodrome, or Ginnaton) 0523541563 🥭🥭🥭 Email is also fine elad.bike@gmail.com Two trial sessions – Full Registration Read about the method – Yes – it suits a wide range of trainees – Here

3 Academic Articles from the Last 5 Years Supporting the Ideas

Leo et al., 2022 — “Power profiling and the power–duration relationship in cycling: a narrative review” A review highlighting the value of power profiling and the power–duration relationship, and the advantage of using field power data/profiles rather than relying on a single attempt as the “be-all and end-all” test.
Vinetti et al., 2023 — “Functional Threshold Power field test exceeds laboratory performance in junior road cyclists” (JSCR) A study demonstrating gaps between lab metrics and field performance/FTP tests, clarifying why “lab-to-field translation” can be problematic (especially among certain populations).
Maunder et al., 2022 — “Using V̇O₂max as a marker of training status in athletes: how bad is the bug?” (J Appl Physiol) An article challenging the use of VO₂max as a sole marker for training status/improvement, reinforcing your claim to be cautious about making it the “main star” of training management.

For More Articles on the POWERWATTS Website

The Difference Between a Heart Rate Monitor and a Power Meter (Watts) This is the core article explaining why FTP and Watts are the true language of modern cycling. It breaks down VO2 Max vs. actual power and explains why the heart rate monitor is a “metric of the past” regarding intensity training and power improvement.
3 Different Ways to Measure Fatigue (W’PRIME) The article you sent me to visit earlier. It directly answers “what stops progress” through the concept of W’Prime. It uses CP and Watts concepts to show that progress in cycling training is not just one number, but the correct management of the anaerobic “battery.”
At PowerWatts We Don’t “Throw” Intensity at Riders – We Teach Intensity A relatively new article (January 2026) dealing with sports psychology and Coachability. It connects to Part C of your article – the fact that athletes seek gadgets instead of facing the “black hole” of their weaknesses in deep endurance training.