Half the AI you're sold can't work. Here's how to tell which half.
Narayanan and Kapoor wrote a field guide for telling real AI from the kind that does not and cannot work as advertised. We rebuilt it as a detector you run against the next vendor that walks in.
One reading. Five tests. The needle moves as the claims hold or fail.
The console on the left starts neutral. Each test below is pulled straight from the book. Scroll through them and watch where a typical predictive-AI pitch ends up, and watch what happens when a claim is actually backed by evidence.
"AI" is three products with three track records.
Generative AI writes and draws. Predictive AI scores people. Content-moderation AI polices platforms. The book's core move is refusing to treat them as one thing, because almost nothing true of one is true of the others.
Scoring a person's future is close to a coin flip.
In 2016, ProPublica audited COMPAS, the criminal-risk tool used in Broward County, across 10,000 people. The score asks 137 questions and lands at 64 percent relative accuracy. Random guessing scores 50.
A 97 percent claim became random guessing once it was checked.
A 2023 paper said machine learning could predict hit songs with 97 percent accuracy. Scientific American and Axios ran with it. The authors of this book re-ran it, removed the data leakage, and the model dropped to no better than chance.
When predictive AI is wrong at scale, citizens pay it back.
This is not a lab problem. The same logic, deployed by governments, has wrecked real lives faster than any demo suggests.
Narrow task, real evidence, and the needle swings back.
The detector is not rigged to shout snake oil. Weather forecasting gains roughly one day of accuracy per decade. AI predicting protein structures was Science's breakthrough of 2021. When the task is bounded and the evidence is independent, the reading clears.
The detector does not say "never buy." It says "make it prove the claim."
Generative tools clear the bench all the time. So do narrow, well-evidenced models. The skill is telling them apart from the half that only works in the demo. Get that right, and you stop funding the half that evaporates.
Tends to clear the bench
- Bounded tasks with low ambiguity
- Independent, reproducible evidence
- Generative tools judged on output
- Aggregate forecasts, not individual fates
Tends to fail it
- Predicting individual human futures
- Accuracy that only exists in the demo
- Decisions with no recourse for the subject
- "Every" and "100%" doing the selling
Want to talk about this?
If something here resonated with a problem you're working on, let's spend 15 minutes on it.
Schedule a Discovery Call