Book Analysis

Half the AI you're sold can't work. Here's how to tell which half.

Narayanan and Kapoor wrote a field guide for telling real AI from the kind that does not and cannot work as advertised. We rebuilt it as a detector you run against the next vendor that walks in.

0%
accuracy of a criminal-risk score that helps decide who walks free. A coin flip scores 50.
0
technologies the single word "AI" is hiding
0+
real-world AI failures logged in one public registry
Scroll to run the detector
01 / THE INSTRUMENT

One reading. Five tests. The needle moves as the claims hold or fail.

The console on the left starts neutral. Each test below is pulled straight from the book. Scroll through them and watch where a typical predictive-AI pitch ends up, and watch what happens when a claim is actually backed by evidence.

CLAIM ANALYZER v1.0 LIVE
SORTING
Identify the category
TEST 01Which AI is it actually?
TEST 02Does it claim to predict people's futures?
TEST 03Can an outsider reproduce the result?
TEST 04What happens when it is wrong, at scale?
TEST 05Is the task narrow, with real evidence?
Test 01 / Name the thing

"AI" is three products with three track records.

Generative AI writes and draws. Predictive AI scores people. Content-moderation AI polices platforms. The book's core move is refusing to treat them as one thing, because almost nothing true of one is true of the others.

When a vendor says "our AI predicts," the first job is to find out which AI. Most danger lives in the predictive bucket.
Test 02 / The extraordinary claim

Scoring a person's future is close to a coin flip.

In 2016, ProPublica audited COMPAS, the criminal-risk tool used in Broward County, across 10,000 people. The score asks 137 questions and lands at 64 percent relative accuracy. Random guessing scores 50.

137
data points per defendant
64%
relative accuracy (50% = random)
45 / 23
% of Black vs White non-reoffenders flagged high-risk
Test 03 / Show your work

A 97 percent claim became random guessing once it was checked.

A 2023 paper said machine learning could predict hit songs with 97 percent accuracy. Scientific American and Axios ran with it. The authors of this book re-ran it, removed the data leakage, and the model dropped to no better than chance.

In a wider audit, 400 AI papers had none an outsider could fully reproduce. A single bad line in ten thousand once flipped a paper's headline result. "Trust the demo" is not evidence.
Test 04 / The blast radius

When predictive AI is wrong at scale, citizens pay it back.

This is not a lab problem. The same logic, deployed by governments, has wrecked real lives faster than any demo suggests.

NETHERLANDS30,000 parents falsely accused of fraud. The entire cabinet resigned.
AUSTRALIAAUD 721M wrongly clawed back from citizens in the Robodebt scandal.
AVIATION75% of pilots trusting a bad automated alert shut down the wrong engine.
Test 05 / The honest pass

Narrow task, real evidence, and the needle swings back.

The detector is not rigged to shout snake oil. Weather forecasting gains roughly one day of accuracy per decade. AI predicting protein structures was Science's breakthrough of 2021. When the task is bounded and the evidence is independent, the reading clears.

+1 day
forecast accuracy gained per decade
2021
protein-structure AI named breakthrough of the year
Reading: Depends on the Claim

The detector does not say "never buy." It says "make it prove the claim."

Generative tools clear the bench all the time. So do narrow, well-evidenced models. The skill is telling them apart from the half that only works in the demo. Get that right, and you stop funding the half that evaporates.

Tends to clear the bench

  • Bounded tasks with low ambiguity
  • Independent, reproducible evidence
  • Generative tools judged on output
  • Aggregate forecasts, not individual fates

Tends to fail it

  • Predicting individual human futures
  • Accuracy that only exists in the demo
  • Decisions with no recourse for the subject
  • "Every" and "100%" doing the selling
SOURCE: Arvind Narayanan and Sayash Kapoor, AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference (Princeton University Press, 2024). COMPAS figures via ProPublica, Chapter 3. Hit-song leakage, 400-paper audit, and reproducibility data, Chapter 7. Dutch welfare, Robodebt, and pilot automation cases, Chapter 2. Weather and protein-structure examples, Chapters 3 and 7. Incident count from the AI, Algorithmic, and Automation Incidents and Controversies Repository, 2024.

Want to talk about this?

If something here resonated with a problem you're working on, let's spend 15 minutes on it.

Schedule a Discovery Call
Schedule a Discovery Call