← Back home

How we score at-risk %

No black box. Here's exactly what we measure.

The score

For each industry tile (Finance, Engineering, Creative, etc.) we publish a single number 0–100 — roughly: what fraction of an entry-to-mid-level role's daily work could already be done end-to-end by the agents we are running on /fleet.

Inputs (in priority order)

  1. Direct evidence from our own fleet. If we already run an agent in this role (Rico = backend engineer, Lola = video editor, Juana = social-media manager, etc.), we know what fraction of the human workflow it covers because we measure the output every 5 minutes.
  2. Public capability benchmarks. SWE-bench (engineering), MBPP/HumanEval (code), GPQA (analysis), MMLU (general knowledge). When a frontier model exceeds the median human on the relevant benchmark, we raise the at-risk number.
  3. Cost-of-replacement vs cost-of-LLM. If a $0.50/run agent reliably replaces a $60/hour task, the economic gravity is too strong to ignore — even if the LLM isn't perfect yet.
  4. Tools available. Healthcare scores lower because regulation (HIPAA, FDA clearance) blocks deployment; legal scores middling because liability + bar-association rules slow adoption even where the tech is ready.

What the colors mean

How we calculate “annual payroll equivalent”

The headline number on /fleet (currently ~$1.07M/yr) is the sum of US-median annual salaries for the human role each of the 13 agents does work for. Math is documented below — but two important caveats follow.

AgentMaps to human roleUS-median salaryHonest role coverage
MateoSRE$145,00035%
CarlosSenior backend engineer$140,00030%
SteveProduct manager$135,00015%
SofiaQA engineer$95,00040%
MarcoQA lead$90,00020%
AntonioOptions analyst$85,00020%
RicoJunior backend engineer$65,00060%
DiegoBDR$60,00015%
JuanaSocial-media manager$55,00045%
FelixNewsletter writer$55,00055%
MaxiVideo editor$50,00045%
LunaCS rep$50,00030%
LolaTikTok/Reels editor$45,00060%
Sum$1,070,000~35% wtd avg

Caveat #1 — coverage, not replacement

No human got fired. Each agent automates the output a slice of that role would otherwise produce — not the meetings, mentoring, judgment, or relationships that come with the full job. Honest weighted-average coverage across the 13 agents is roughly 35%. At that rate, the strict replacement value is closer to ~$375k/yr, not $1.07M.

Caveat #2 — 24/7 throughput partially closes the gap

Agents run continuously. A junior backend engineer ships ~5-15 PRs/week; Rico ships closer to 100. So in pure output volume terms, the agents sometimes produce more than the human at less-than-full role coverage. Realistic "what it would cost to outsource the work these agents do, task by task" is somewhere between $500k and $800k/yr. We report the full $1.07M because the salary math is documented and verifiable — but if you want the conservative number, use $500k.

What this is NOT

Disagree with a score? Open an issue at helix.ai.labs@gmail.com. We update quarterly.