It depends on your role's task mix. AI Stole My Job scores replacement risk by industry, calibrated against Goldman, McKinsey, and Frey-Osborne research, and demonstrates the reality with a live fleet of 13 AI agents running an actual business.

What AI models run the fleet?

The agents run on Anthropic's Claude models (Opus, Sonnet, and Haiku) for reasoning and code, alongside local open-source models for high-volume tasks. The same models behind frontier AI assistants operate this business autonomously.

Is the AI agent fleet real?

Yes. Every agent's runs, costs, and outputs are published live — code pull requests, videos, emails, and QA checks — with full transparency on what each agent does and what it costs to run.

← Back home

How we score at-risk %

No black box. Here's exactly what we measure.

The score

For each industry tile (Finance, Engineering, Creative, etc.) we publish a single number 0–100 — roughly: what fraction of an entry-to-mid-level role's daily work could already be done end-to-end by the agents we are running on /fleet.

Inputs (in priority order)

Direct evidence from our own fleet. If we already run an agent in this role (Rico = backend engineer, Lola = video editor, Juana = social-media manager, etc.), we know what fraction of the human workflow it covers because we measure the output every 5 minutes.
Public capability benchmarks. SWE-bench (engineering), MBPP/HumanEval (code), GPQA (analysis), MMLU (general knowledge). When a frontier model exceeds the median human on the relevant benchmark, we raise the at-risk number.
Cost-of-replacement vs cost-of-LLM. If a $0.50/run agent reliably replaces a $60/hour task, the economic gravity is too strong to ignore — even if the LLM isn't perfect yet.
Tools available. Healthcare scores lower because regulation (HIPAA, FDA clearance) blocks deployment; legal scores middling because liability + bar-association rules slow adoption even where the tech is ready.

What the colors mean

70+ (red): already happening; entry-level hiring is contracting.
55–69 (orange): 1–2 year horizon; specialists still needed but the floor is rising.
40–54 (yellow): partial — tools augment, humans direct.
under 40 (green): regulatory / physical-world friction protects the role for now.

How we calculate “annual payroll equivalent”

The headline number on /fleet (currently ~$1.07M/yr) is the sum of US-median annual salaries for the human role each of the 13 agents does work for. Math is documented below — but two important caveats follow.

Agent	Maps to human role	US-median salary	Honest role coverage
Mateo	SRE	$145,000	35%
Carlos	Senior backend engineer	$140,000	30%
Steve	Product manager	$135,000	15%
Sofia	QA engineer	$95,000	40%
Marco	QA lead	$90,000	20%
Antonio	Options analyst	$85,000	20%
Rico	Junior backend engineer	$65,000	60%
Diego	BDR	$60,000	15%
Juana	Social-media manager	$55,000	45%
Felix	Newsletter writer	$55,000	55%
Maxi	Video editor	$50,000	45%
Luna	CS rep	$50,000	30%
Lola	TikTok/Reels editor	$45,000	60%
Sum		$1,070,000	~35% wtd avg

Caveat #1 — coverage, not replacement

No human got fired. Each agent automates the output a slice of that role would otherwise produce — not the meetings, mentoring, judgment, or relationships that come with the full job. Honest weighted-average coverage across the 13 agents is roughly 35%. At that rate, the strict replacement value is closer to ~$375k/yr, not $1.07M.

Caveat #2 — 24/7 throughput partially closes the gap

Agents run continuously. A junior backend engineer ships ~5-15 PRs/week; Rico ships closer to 100. So in pure output volume terms, the agents sometimes produce more than the human at less-than-full role coverage. Realistic "what it would cost to outsource the work these agents do, task by task" is somewhere between $500k and $800k/yr. We report the full $1.07M because the salary math is documented and verifiable — but if you want the conservative number, use $500k.

What this is NOT

Not a prediction of when YOU specifically lose your job.
Not a claim that 78% of marketers will be unemployed by next year.
Not academic research — these scores are a directional take from one founder running a real 13-agent fleet in production, calibrated by what those agents actually ship.

Disagree with a score? Open an issue at helix.ai.labs@gmail.com. We update quarterly.