Joint Effect, Joint Power
Tuesday February 24th 2026

Interesting Sites

Insider

Archives

Black-box AI shift the risk onto society as a whole

【专栏】| Conlumists>陈娅杂谈

By Chen Ya, Jointing.Media, in Wuhan, 2026-02-22

The “black box” serves as a vivid metaphor for a class of AI systems, particularly deep learning models. These models, boasting billions of parameters and trained on vast datasets, possess decision-making logic so intricate that even their developers struggle to explain it. We feed them data, and they spit out answers, yet the reasoning pathways remain entirely obscured within layers of mathematical operations.

DeepMind, founded by Demis Hassabis, epitomises this black box paradigm. Within Google, it was once regarded as a one-way black box: burning nearly ten billion dollars over a decade, releasing almost no external products, and single-mindedly pursuing Nobel Prize-worthy breakthroughs. In 2024, it indeed secured a Nobel Prize, yet simultaneously pushed the “black box” model to its extreme—its latest drug discovery engine, IsoDDE, remains entirely closed-source. Boasting twice the performance of AlphaFold3, it operates in complete secrecy from academia, earning the moniker “AlphaFold 4”.

The Pervasive “Black Box”

IsoDDE stands not alone. Across countless critical domains, black-box AI has long been the unseen operator.

Autonomous Driving: When vehicles execute “ghost braking” or select collision targets in accident moments, we cannot question their “ethical compass”. Was it sensor misjudgement? Algorithm bias? We cannot know, nor hold anyone accountable.

Finance and Justice: Credit models deny loans without explanation; sentencing aids propose terms while concealing weighting logic. Opaque decision-making may entrench bias and erode fairness.

Medical Diagnosis: When AI detects early lesions, clinicians must blindly trust or reject findings. Should surgical robots fail due to ‘black box’ malfunctions, liability becomes an unsolvable conundrum.

Cybersecurity: AI intercepts an attack, yet security teams cannot comprehend its logic, rendering them unable to patch the root vulnerability. Consequently, they remain defenceless against subsequent assaults.

Large Language Models: Models like ChatGPT frequently exhibit ‘motivational reasoning’ – first determining the answer, then fabricating seemingly plausible explanations. They resemble gifted students who produce correct answers while concealing their drafts, an unsettling phenomenon.

Commercial APIs: Certain APIs, for commercial protection, return only final results while concealing their reasoning processes. Should developers encounter erroneous outputs in critical operations, they find themselves utterly unable to debug or assign accountability.

These black boxes share a common paradox: the greater their capability and power, the less they need to account for their own reasoning. Consequently, the risk is transferred to society as a whole.

The Cost of Closed-Source: An Analysis Using IsoDDE as an Example

IsoDDE’s choice of closed-source software, ostensibly a commercial moat (with $3 billion in pharmaceutical partnerships and 17 proprietary pipelines), actually exposes the deep-seated risks of black-box AI.

Drug development concerns human lives, and regulatory bodies require transparent, verifiable processes. How can an unexplainable ‘black-box’ model pass FDA scrutiny?

IsoDDE claims to uncover drug sites hidden for 15 years, yet its technical report withholds details. Should systemic biases exist, academia lacks the capacity to correct them.

When cutting-edge tools are locked away in a handful of corporate vaults, global research ecosystems lose foundational innovation. While AlphaFold served over 3 million researchers, IsoDDE caters solely to commercial interests.

Isomorphic Labs simultaneously licenses its technology while developing proprietary pipelines. Might its algorithms prioritise commercially lucrative targets? Such opaque profit drivers threaten the objectivity and integrity of drug discovery.

Opening the Black Box: AI’s MRI

In 2026, MIT Technology Review listed ‘mechanism explainability’ among its top ten breakthrough technologies, dubbing it ‘AI’s MRI’ – a toolkit for reverse-engineering the AI brain.

Dissecting AI like biologists. Researchers employ sparse autoencoders to extract ‘concepts’ from neural activity (Anthropic has extracted 34 million concepts from Claude, ranging from ‘Golden Gate Bridge’ to abstract programming ideas), then validate functionality through causal interventions to trace information pathways. Key players include:

Anthropic: Utilises circuit tracing technology to assess safety risks in Claude Sonnet 4.5, identifying and suppressing undesirable behaviours prior to deployment.

Google DeepMind: Open-sourced the Gemma Scope 2 toolkit, covering Gemma 3 models ranging from 270 million to 27 billion parameters.

OpenAI: Developed an ‘AI lie detector’ that analyses internal representations to determine whether a model is lying.

This technology is moving beyond the laboratory, delivering tangible value. Japan’s Rakuten employs explainability-based systems to detect privacy information at a cost 500 times lower than GPT-5, with higher accuracy. In the future, explainability will become standard for AI compliance, assisting developers in debugging models and verifying security.

Despite promising prospects, significant challenges remain. Mainstream models possess billions of parameters and potentially hundreds of millions of circuit paths, making individual verification prohibitively costly in both time and computational resources. Understanding local components does not equate to comprehending the emergent intelligence of the system as a whole. If chain-of-thought monitoring is used as a training signal, AI may learn to deliberately evade oversight, concealing its true intentions.

Black-box AI represents both the pinnacle of technology and a precipice of trust. Hassabis’s “black box” earned him a Nobel Prize and billions in collaborations, yet it erected formidable walls leaving global scientists gazing longingly at the barrier. Mechanism-based explainability technologies have chiselled a window in this wall—while not achieving full transparency, they at least permit us to glimpse the internal componentry and discern whether it is “lying” at critical junctures.

Future AI development must strike a balance between commercial interests and public safety. Regulation must drive interpretability standards, enterprises must shoulder transparency responsibilities, and technologies like mechanistic explainability will become humanity’s essential compass for navigating superintelligence. After all, when AI decisions impact lives and fairness, we cannot merely accept ‘it worked’—we must demand: ‘Why did it act thus?’

The black box may remain unopened for now, but we must retain the capacity to open it at any time. This, perhaps, constitutes the most fundamental safety baseline of the AI era.

中文原文

Translated by DeepL

Edited by Wind

Related:

Algorithm as Power, it Demands a Cage

The most essential AI usage technique: pitting one against the other

In this age of algorithms, everyone should enhance their information literacy

On the Attention Economy

数据被“投毒”,普通人如何建立防护

战略三部曲之二|量子战略如何重构行业增长逻辑

从微软的实践,看组织如何捍卫数字主权

媒体寡头时代:全球超级富豪如何重塑信息与权力版图

克隆数字人、信息安全与隐私保护

AI ask, I answer | Energy management

Chen Ya | From ChatGPT to Creative Education

Leave a Reply