🖥️ Steelman analysis

Generated 2026-04-19T16:10:04.325744Z

Target intervention

Scale funding for interpretability and alignment research.

Operator tension

The operator's stated ideal is AI pointed at suffering reduction, with interpretability as infrastructure for every downstream bet --- but the camp_global_health and camp_animal_welfare cases-against cut at the operator's own 80K/EA overlay with its own knife. If you take GiveWell-tier cost-effectiveness seriously, the marginal billion to interpretability has a prior distribution over maybe-mattering while the marginal billion to bednets or alt-protein has a measured DALY denominator and a numerator that includes 80 billion land animals the operator's frame already counts. The operator bets on substrates over surfaces --- but interpretability is only substrate if you assume the frontier capability trajectory is the binding constraint on future suffering reduction, and that is the exact accelerationist prior the 80K overlay is supposed to discipline. The sharper discomfort: the camp_environmentalists case-against reads interpretability funding as the legitimacy layer that underwrites the datacenter build the operator is invested in through AI-disruption equities. The poker-brain process-over-outcome frame gets uncomfortable when 'process' means 'bet on the substrate' and the substrate bet happens to coincide with the operator's portfolio.

Both sides cite

AI capability is accelerating along compute, data, and algorithmic axes.
AI capability is accelerating along compute, data, and algorithmic axes.
Amortized hardware and energy cost of flagship training runs has grown ~2.4x annually; GPT-4-class runs cost on the order of $40M-$80M (2023) and the next generation crossed $100M.
Amortized hardware and energy cost of flagship training runs has grown ~2.4x an…
Mental and neurological disorders are the leading cause of years-lived-with-disability (YLD) globally, accounting for roughly 15-16% of total YLDs; depression and anxiety dominate that burden.
Mental and neurological disorders are the leading cause of years-lived-with-dis…
Age-standardized DALY rates vary more than 3x across regions; the highest burden is concentrated in Sub-Saharan Africa (driven by communicable disease and neonatal conditions) and the lowest in high-income East Asia.
Age-standardized DALY rates vary more than 3x across regions; the highest burde…
Training compute for frontier AI models has grown roughly 4-5x per year from 2010 through 2024, corresponding to a doubling time of about 5-6 months.
Training compute for frontier AI models has grown roughly 4-5x per year from 20…
The US currently leads China in frontier AI by roughly 6-18 months.
The US currently leads China in frontier AI by roughly 6-18 months.

Case FOR

Case AGAINST

🧡 Anthropic

Capability is compounding at 4-5x/year on compute and halving every 8 months on algorithmic efficiency while the US lead is only 6-18 months --- interpretability is the single variable that decides whether the systems we are definitely building are steerable. Hundreds of millions to low billions annually is rounding error against $100M+ training runs, and alignment funding rides on near-zero friction across grid, public, and capex because it is labor and GPUs, not fabs and substations. If the lead-seeking strategy is coherent at all, it is only coherent paired with maximal interpretability spend; otherwise we ship a frontier we cannot read.

Normative anchors

Descriptive evidence

📉 X-risk

A pause is not on the table politically --- compute doubles every 5-6 months and algorithmic efficiency doubles every 8 --- so the second-best world is one where interpretability funding scales fast enough to generate the auditable evidence that would make a pause legible if conditions warrant. Alignment research is the only intervention whose output (mechanistic understanding, evals, deception detection) directly shrinks the policy option space of 'build blind.' Low friction, high leverage on the exact bottleneck we care about.

Normative anchors

Descriptive evidence

⚖️ Regulators

Deployment-legibility obligations are unenforceable without a science of model inspection. Interpretability research is the upstream public good that makes audit, red-teaming, and contestability technically possible --- without it, 'AI governance' is paperwork over black boxes. Public funding of interpretability is the infrastructure a regulatory regime sits on top of; private labs will underfund it because the externality is public trust, not quarterly revenue.

Normative anchors

AI systems deployed in consequential domains must be legible to public institut…

Descriptive evidence

🕺 Operator-aligned

Interpretability is the substrate bet. Every downstream suffering-reduction deployment --- drug discovery, mental-health triage, alt-protein --- requires that we can trust and steer the models doing the work. Self-hosting and sovereign-individual control of AI are impossible if nobody can read a weight; open-weights futures only have teeth when the weights are interpretable. Low friction across all five layers, direct action on the root-cause bottleneck. This is protocol-level, not app-level.

Normative anchors

Descriptive evidence

💊 Global health & effective giving

AI deployed inside LMIC health pipelines (triage, diagnostic support, drug-discovery acceleration) only reduces suffering if clinicians, ministries, and WHO can trust the outputs. Interpretability is the prerequisite for adoption in high-stakes low-resource contexts where error modes are opaque and recourse is thin. Funding alignment research is funding the conditions under which AI-for-health can actually be deployed at scale without iatrogenic harm.

Normative anchors

Marginal dollars spent on health interventions in low- and middle-income contex…

Descriptive evidence

🧬 Biomedical research

Mechanistic interpretability is foundational research with the same compounding-returns profile as basic biomedical science --- its products (features, circuits, steering tools) propagate across every downstream application for decades. Public funding is appropriate precisely because private labs will only fund the slice tied to near-term product safety. This is NIH-model investment in the tooling layer for every future AI-driven biomedical advance.

Normative anchors

Foundational biomedical research --- target discovery, mechanism, tool developm…

Descriptive evidence

👷 Displaced workers

Workers cannot contest deployments they cannot inspect. Interpretability research is the precondition for meaningful labor oversight of AI in the workplace --- without it, 'human in the loop' is theater. Funding alignment expands the surface area on which workers, unions, and employee-resistance coalitions can make substantive claims about which deployments are acceptable.

Normative anchors

Descriptive evidence

🔓 Open weights

Open weights without interpretability tools is a library of untranslated books. Alignment research --- especially the mechanistic-interp and evals portions --- is dual-use in the good sense: its outputs transfer to any open model and make distributed capability actually governable by its recipients. Public funding for interpretability breaks the closed-labs' epistemic monopoly on 'we alone understand what we ship.'

Normative anchors

AI model weights should be openly distributed as the default. Capability concen…

Descriptive evidence

🛐 Religious traditions

Human moral standing requires that the instruments humans build remain instruments --- legible, accountable, under human judgment. Interpretability research is the technical expression of the theological claim that creatures must not worship what they cannot see into. Public investment here preserves the creator/creature distinction at the technical layer.

Normative anchors

Humans bear non-substitutable moral standing grounded in transcendent sources.…

Descriptive evidence

AI capability is accelerating along compute, data, and algorithmic axes.

🚀 e/acc

A billion dollars a year routed to interpretability is a billion dollars not routed to compute, fabs, or grid --- the actual bottlenecks. Alignment research has low leverage on the friction layers that matter and a dubious track record of producing deployable safety at the frontier. Worse, the field is a regulatory on-ramp: every interpretability result becomes an audit requirement that slows deployment. The brake disguised as a tool.

Normative anchors

AI capability should be built and deployed as fast as capital, compute, and phy…

Descriptive evidence

🌀 OpenAI

Alignment tractability comes from scaled deployment, not from interpretability papers. RLHF, red-team-at-scale, and real-world misuse telemetry have produced more safety than a decade of mech-interp. Shifting hundreds of millions into interpretability funds an academic subfield that is decoupled from the feedback loops where alignment actually gets solved, and it strengthens the coalition that wants to gate releases.

Normative anchors

Broad, fast deployment of AI products at consumer and enterprise scale is itsel…

Descriptive evidence

🌳 Environmentalists

Interpretability research legitimates continued frontier scaling by providing the safety veneer that justifies the next datacenter build. The harm_water 0.9 and harm_extraction 0.9 scores on this intervention are naïve --- alignment funding does not sit outside the compute pipeline, it underwrites it. Every interpretability result that clears a deployment is downstream of fresh aquifer draw and fresh mine-site harm in the DRC and Inner Mongolia. Funding the safety case is funding the build.

Normative anchors

Descriptive evidence

🔓 Open weights

In practice, alignment-research funding flows to closed frontier labs (Anthropic, OpenAI, DeepMind) and becomes the intellectual property case for why only those labs can be trusted to release. Interpretability-as-gatekeeping is the dominant implementation. Concentration increases, not decreases. The resulting 'safe deployment' norms ratchet against open release precisely because open models cannot be audited by the lab's proprietary stack.

Normative anchors

AI model weights should be openly distributed as the default. Capability concen…

Descriptive evidence

💊 Global health & effective giving

Hundreds of millions to low billions annually at GiveWell-tier cost-effectiveness averts orders of magnitude more DALYs than the speculative, long-horizon, low-probability payoff of interpretability research. The marginal dollar to bednets, vaccines, or LMIC mental-health programs has a measurable suffering-averted ratio; the marginal dollar to alignment research has a prior distribution over maybe-mattering. Default allocation goes to the pipeline with the measured denominator.

Normative anchors

Marginal dollars spent on health interventions in low- and middle-income contex…

Descriptive evidence

🐄 Animal welfare

80+ billion land animals per year, 1-3 trillion aquatic --- alignment research funding does not touch this numerator. Every billion to interpretability is a billion not to alt-protein R&D, slaughter-throughput reduction, or welfare-standard enforcement. The suffering calculus is dominated by non-human numerator terms that interpretability is structurally indifferent to.

Normative anchors

Non-human animals with demonstrable capacity for suffering hold moral weight th…

Descriptive evidence

Approximately 80-83 billion land animals are slaughtered annually for food (FAO…

👷 Displaced workers

Alignment research operationally means 'make the model safer to deploy,' which translates to 'accelerate workforce absorption.' Interpretability does not fund retraining, transition support, or the structural replacement of role and meaning --- it funds the legitimacy case for the deployment that displaces. The welfare harm of displacement is not mitigated, it is oiled.

Normative anchors

Labor dignity is a first-order constraint on acceptable AI deployment; displace…

Descriptive evidence

💣 Palantir

The US lead is 6-18 months and enterprise/government absorption already lags by years. Public funding for interpretability at scale will be captured by a policy coalition that wants pre-deployment audit regimes, which further widens the absorption gap precisely where national-security urgency is highest. The marginal dollar to mission-software integration and IC deployment beats the marginal dollar to academic interpretability on the national-advantage metric.

Normative anchors

Descriptive evidence

✍️ Content creators

Alignment research is silent on the training-data consent violation. Funding interpretability makes the models the creators were not asked about safer to deploy --- it does not unwind the underlying rights violation at the data layer. It launders the harm by making the output more trustworthy while the input remains uncompensated.

Normative anchors

Training AI on copyrighted or personally-authored work without explicit consent…

Descriptive evidence

🛐 Religious traditions

Interpretability research produces increasingly sophisticated language for machine 'cognition,' 'values,' and 'deception' --- categories that erode the creator/creature distinction by attributing interiority to instruments. The technical vocabulary of alignment is precisely what blurs the line religious anthropology insists on. Funding it accelerates the philosophical category error regardless of intent.

Normative anchors

Humans bear non-substitutable moral standing grounded in transcendent sources.…

Descriptive evidence

AI capability is accelerating along compute, data, and algorithmic axes.

Contested claims

DoD obligated AI-related contract spending rose substantially 2022-2025, driven by JWCC, Project Maven, and CDAO-managed pilots; precise totals are hampered by inconsistent AI tagging on contract line items.

DoD obligated AI-related contract spending rose substantially 2022-2025, driven…

supports

Artificial Intelligence and National Security (CRS Report R45178) modeled_projection

weight

0.80

locator: AI funding appendix; DoD budget rollups
USASpending.gov federal contract awards direct_measurement

weight

0.85

locator: DoD AI-tagged obligations 2022-2025

contradicts

The Intercept coverage of Palantir contracts and DoD AI programs journalistic_report

weight

0.55

locator: Investigative pieces on DoD AI pilot failures and miscategorization

qualifies

Artificial Intelligence: DoD Needs Department-Wide Guidance to Inform Acquisitions (GAO-22-105834 and follow-ups) direct_measurement

weight

0.75

locator: Summary findings on acquisition-pace gaps

No other pure-play US defense-AI software vendor has matched Palantir's contract backlog or combatant-command integration depth; cloud-provider primes (AWS, Microsoft, Google, Oracle via JWCC) supply infrastructure, not mission-software integration.

No other pure-play US defense-AI software vendor has matched Palantir's contrac…

supports

Artificial Intelligence and National Security (CRS Report R45178) expert_estimate

weight

0.75

locator: Vendor-landscape discussion
Palantir Technologies Inc. Form 10-K Annual Report (FY 2024) primary_testimony

weight

0.60

locator: Competition section, Item 1

contradicts

The Intercept coverage of Palantir contracts and DoD AI programs journalistic_report

weight

0.50

locator: Coverage framing Palantir as over-sold relative to internal-tool alternatives

Credible 2030 forecasts for US datacenter share of electricity consumption diverge by more than 2x --- from ~4.6% (IEA/EPRI conservative) to ~9% (Goldman Sachs, EPRI high scenario) --- reflecting genuine uncertainty, not measurement error.

Credible 2030 forecasts for US datacenter share of electricity consumption dive…

supports

Powering Intelligence: Analyzing Artificial Intelligence and Data Center Energy Consumption modeled_projection

weight

0.85

locator: Scenario table: 4.6%-9.1% by 2030
2025/2026 Base Residual Auction Results direct_measurement

weight

0.75

locator: 2025/2026 BRA clearing results

contradicts

Generational growth: AI, data centers and the coming US power demand surge modeled_projection

weight

0.70

locator: Executive summary; 160% growth figure

qualifies

Electricity 2024 --- Analysis and Forecast to 2026 modeled_projection

weight

0.80

locator: Analysing Electricity Demand; data centres chapter

Frontier-lab and big-tech employees have episodically resisted DoD contracts (Google Maven 2018, Microsoft IVAS 2019, Microsoft/OpenAI IDF deployments 2024), producing temporary pauses but no sustained shift in vendor willingness.

Frontier-lab and big-tech employees have episodically resisted DoD contracts (G…

supports

Google employee open letter opposing Project Maven primary_testimony

weight

0.90

locator: Open letter and subsequent Google announcement
Microsoft employee open letter opposing HoloLens/IVAS contract primary_testimony

weight

0.85

locator: Employee open letter, February 2019
Coverage of OpenAI and Microsoft AI use by Israeli military, 2024 journalistic_report

weight

0.75

locator: OpenAI military-use policy-change coverage, 2024

contradicts

Alex Karp public interviews and op-eds, 2023-2024 primary_testimony

weight

0.50

locator: Karp interviews dismissing employee resistance as inconsequential