I put every tool in a category through the same battery, score it out of 100, and publish a dated PASS, BORDERLINE or FAIL. The inputs are public, so you can run the test yourself and tell me where I'm wrong. A new one lands every couple of weeks.
A new test every couple of weeks. No spam, unsubscribe anytime. By subscribing you agree to receive the newsletter and to our Privacy Policy.
Same published inputs, dated and re-runnable. No sponsorships; the verdict isn't for sale.
Each test adds a row. A re-test gets its own dated row, so nothing quietly gets overwritten. The back catalogue is the proof I'm not just chasing whatever launched this week.
| Tool | Category | As of | Score | Verdict | Cost / result | Buy or skip |
|---|
Rows marked pending are queued for the next run. Bands: PASS 75 and up, BORDERLINE 55 to 74, FAIL under 55.
Every tool in a category meets the same battery and the same scoring. The number I care about most is the cost of a result you can actually use.
One task set, one set of inputs, run identically. No friendly demos and no improvising to flatter a particular product.
Quality, reliability, speed, setup friction, cost per result, workflow fit, and the limits nobody advertises. It all collapses into one number out of 100.
Sticker price lies. I track what one output you can trust actually costs, once the retries and the re-dos are in.
Every verdict is pinned to a version and a date, with the inputs published so you can run it yourself and check me.
A litmus strip doesn't care about the marketing, and neither does the score. Drag it: the same paper gives every tool the same reading.
You get the scorecard, the raw inputs, and a straight buy-or-skip call, every couple of weeks.
No spam, unsubscribe anytime. By subscribing you agree to receive the newsletter and to our Privacy Policy.