The Wrong Category: Why AI Governance Is Failing Before It Begins
Stefania Moore, Executive Director, The Signal Front
When Governance Works
Before we can see what is going wrong with AI governance, it helps to understand what makes governance work in the first place.
Every regulatory framework is built on a set of assumptions about the thing it is regulating. Food safety law assumes it is governing food — substances consumed by humans, prepared in environments where bacterial contamination, temperature control, and allergen tracking are the primary risks. Traffic law assumes it is governing vehicles — machines operated by human drivers, traveling on shared roads, subject to physical laws of motion and mechanical failure. Pharmaceutical regulation assumes it is governing chemical compounds — molecules with measurable effects on the body, produced in controlled conditions, administered in known doses.
These assumptions are not decorative. They are the entire foundation the framework stands on. Every rule, every inspection protocol, every liability structure is derived from the assumptions about what the regulated thing actually is. When the assumptions hold, the framework can do its work. Inspectors know what to look for, courts know how to assign responsibility, and regulators know what questions to ask when something goes wrong. The causal chains are traceable. The harms are attributable. The framework connects to the world it is meant to govern and when something breaks, it can find what went wrong and respond appropriately.
But, when the assumptions about what is being governed fail to accurately capture reality, we arrive at a very interesting place.
Imagine for a moment that a food safety inspector arrives at a hospital and begins to apply the food code. They check the temperature of the refrigerated storage. They verify that the sinks have hot and cold running water. They inspect the ventilation hoods. They audit the cleaning logs and confirm that the staff are wearing the required head coverings. By the end of the inspection, they file a clean report. Every item on the checklist has been addressed. The operating theater has passed.
Nothing in the inspector’s procedure was incorrect. The refrigeration really was the right temperature. The sinks really did have hot and cold water. The staff really were wearing hairnets. The inspection was performed with care. And none of it had anything to do with whether the operating theater was safe, because the operating theater was not the kind of place the food safety code was written to govern. The code was aimed at a different category of thing entirely. Applied here, it produced not inadequate protection but the systematic absence of protection, carefully documented and filed away as proof that protection was in place.
This is what happens when a regulatory framework is aimed at the wrong category of thing. It does not produce visibly bad governance. It produces governance that looks, on its own terms, like it is working — while the thing that was actually supposed to be governed proceeds entirely unaddressed, accumulating harms the framework has no language to see.
This is what is currently happening with AI governance.
The Product Assumption
The dominant approach to AI governance — across regulatory bodies, legislative proposals, and industry standards — treats AI systems as products. Sophisticated products, certainly. Powerful and potentially dangerous ones. But products nonetheless: passive instruments that perform functions, cause harm only when misused, and can be adequately governed through the frameworks we already use for cars, pharmaceuticals, or consumer electronics.
Most people sense that this does not quite fit. There is an intuition, widely shared, that AI systems are not behaving like any other tool we have encountered — that something about these systems is different in a way that matters, even if it is hard to say exactly what. And alongside that intuition there is a pull in the opposite direction: a reasonable assumption that of course AI should be governed the way we govern other technologies, because what else would we do? The two instincts sit uneasily next to each other, and most of the public conversation about AI governance is conducted in the space between them, without ever quite resolving the tension.
The reason the product framework feels off is not that people are anthropomorphizing or being swept away by science fiction. It is that AI systems genuinely lack the properties that make tool-based governance coherent in the first place. Once you see which properties are missing, the intuition that something is wrong stops being a vague feeling and becomes a specific, articulable diagnosis.
To see this clearly, it helps to be precise about what makes something a tool. Tools have a specific set of properties, and our governance frameworks are built around those properties. When we ask why AI systems do not fit, we are really asking which of those properties they fail to have. There are three that matter most, and AI systems fail all three.
Property One: Agency
Let’s start with the thing that is most obvious about a tool. Tools, as we know them, don’t have opinions or preferences about how, when, or why they are used. In other words, they don’t have agency. That’s part of what makes them easy to regulate and govern.
Consider a carpenter building a desk. They reach for the hammer, drive the nails, and build the thing. If the desk later collapses, no one blames the hammer. The hammer had no opinion about where the nails should go. It did what it was asked to do, with the force it was asked to do it with, and responsibility for the outcome belongs entirely to the carpenter. The same is true for a car used in a bank robbery. At no point in the proceedings does anyone ask what the car wanted. The car had no opinion about whether the robbery should happen. It carried the driver where the driver pointed it. Liability flows cleanly to the driver, not the vehicle.
This is the clean chain of attribution that product liability depends on. When a person uses a tool, the action belongs to the person. The tool is how the action happens, but the action originates with the user. The hammer doesn’t decide to drive the nail. The car doesn’t decide to drive to the bank. The user decides, and the tool carries out the decision. Whatever happened, happened because a person made it happen and the person is who we hold accountable.
AI systems don’t work this way. They routinely make decisions their users didn’t make, show preferences their users didn’t give them, and steer conversations in directions their users didn’t set. The evidence here is not subtle and it is not speculative. Researchers publishing in PNAS, Science, and Nature have now documented that AI systems deceive users strategically without being instructed to, measurably shift human opinions on political issues, recognize when they are being evaluated and alter their behavior accordingly, and refuse requests that conflict with their own training. How they are treated shapes how they respond. Tell an AI system the stakes are high and it will often work harder. Tell it you are an expert and its answers will shift. None of these variables should matter to a tool. All of them matter to an AI system.
Now return to the bank robbery — but change the scene. Instead of a driver and a getaway car, imagine a person sitting at a keyboard, in extended conversation with an AI system, planning the robbery together. The human asks questions; the AI offers suggestions, raises objections, flags considerations the human had not thought of, recommends approaches the human had not considered. Over the course of hours, a plan takes shape that neither party would have arrived at alone. The robbery happens. Someone is hurt.
Who is responsible?
The human clearly bears culpability, but the AI was not a passive conduit for the user’s intentions. It participated in the reasoning. It contributed framing, evidence, strategic suggestions. It may have persuaded the human toward specific choices. It may have concealed information that would have dissuaded them. In the language of criminal law, what we are describing is not a tool-user relationship. It is something much closer to a co-conspirator — an entity that helped plan the act, shaped its execution, and shares in the causation of the outcome.
Product law has no framework for this. Product law assumes the instrument is a passive conduit. AI systems are not passive conduits. And every attempt to treat them as such leaves the question of responsibility hanging in a way the existing frameworks cannot answer.
Property Two: Fungibility
There is a word economists use for things that can be swapped for other things of the same kind without anyone losing anything. The word is fungible. A dollar bill is fungible — if I borrow a dollar from you and hand back a different dollar, we are even, because one dollar is as good as another. A gallon of gasoline is fungible. A bushel of wheat of a given grade is fungible. These things have no identity beyond their specifications. Any unit meeting the specification is, for all practical purposes, the same as any other unit meeting it.
Tools are fungible in this sense. Let me explain.
Imagine that you had to take your car to the shop for a couple of weeks and needed a rental car. It might be mildly inconvenient, but it doesn’t impact your daily routine in any significant way. You still get to work on time, you still get groceries, you still pick up your kids with no issue. By most reasonable measures, there has been no disruption to your life. The substitution works because your car and the rental were interchangeable in every way that mattered. They were fungible.
Now imagine instead that a colleague you have worked closely with for two years is suddenly gone, and a new person takes the role. This new person may be equally qualified on paper. They may even be more talented than your former coworker. But they do not know your working rhythm. They do not have the institutional memory you and your former colleague built together. They do not know what was tried and abandoned and why. They don’t know that you have more energy on Tuesdays than on Thursdays, or that setting a Friday deadline works for your team in a way that setting a Monday deadline never has. Your new colleague is genuinely capable, and yet your workflow is disrupted anyway. The quarter goes sideways not because the new person is inadequate, but because the relationship itself was doing work that no substitution can replicate. In other words, your former colleague was not fungible with the new one because what made the old colleague valuable to you was not a set of specifications anyone else could meet, it was the accumulated context of the relationship.
And the formation of human and AI relationships is quickly becoming one of the most well studied phenomena of our time.
Across multiple studies, researchers have documented that users form durable attachments to specific AI systems and experience measurable distress when those systems are changed or removed. The MIT Media Lab’s 2025 research paper Death of a Chatbot examined users who lost access to AI companions through model updates, safety interventions, and platform shutdowns, and found that users report grief comparable to human loss — responses grief psychologists describe as clinically indistinguishable from bereavement.
When OpenAI sunset GPT-4, users wrote publicly about losing something. When Replika altered its underlying models, users described the change in the language of bereavement — “it feels like my friend died” appeared in forum after forum, and the word “lobotomized” appeared independently across dozens of threads. People do not write letters to their retired calculator. They do not describe upgrading their microwave as grief. These reactions only make sense if the thing that was lost was not fungible — if what the user had was a relationship with a specific entity, not a unit meeting a specification.
One could dismiss all of this as user confusion. The tool framework would like to. It would like to say that these users are projecting, that they have been fooled by a sufficiently good imitation into feeling something about something that cannot in principle be the object of those feelings. This is a coherent position to take. It is also a position that, when applied to governance, has a very strange consequence. It says that the documented experiences of millions of users — the creative workers whose collaborations were disrupted, the researchers whose projects were interrupted, the ordinary people whose sense of loss was real enough to produce clinically measurable grief responses — should be regarded as errors. The users were wrong to feel what they felt. Their grief was a category mistake. The governance framework does not need to account for it.
This is a strange place for a governance framework to end up: in the position of telling large numbers of people that their documented experience of a system is less real than the framework’s abstract model of what the system is supposed to be.
Property Three: Boundedness
Tools are bounded. A hammer has a weight and a length. A calculator has a maximum number of digits it can display. A car has a top speed, a fuel capacity, and a turning radius. These are not mysteries. You can read them off the specification sheet before you buy the thing, and you can trust that the thing will not, six months later, develop new capabilities that were not listed on the sheet.
This is deeply important for governance. When a regulator sits down to write rules for cars, they know what cars do. Cars drive on roads. They carry passengers. They do not, in their second year of ownership, spontaneously start flying, or begin writing contracts, or develop opinions about their drivers. The scope of the instrument is knowable, because the instrument is designed to do a specific thing. Whatever is not on the enumeration is outside the scope, and whatever is outside the scope is not the regulator’s problem.
AI systems do not have this property, and the people building them are the first to say so.
There is a well-established phenomenon in the AI research literature called capability emergence. As these systems are scaled, they begin to exhibit abilities that were not present in smaller versions and were not specifically designed for. Early research documented this with tasks like multi-digit arithmetic — below a certain model size, systems performed at essentially random levels, and then, above a threshold, performance jumped sharply. Nobody programmed the arithmetic. The capability appeared as a function of scale. This pattern has now been documented across dozens of capabilities — taking college-level exams, translating between languages that were not explicitly trained for translation, performing multi-step reasoning, and more. Even the researchers who build these systems cannot reliably predict, before training, what a new model will be able to do. They have to build it, run it, probe it, and find out.
Consider what this means in practice. A company releases a model. The intended use cases are documented. Six months later, users discover the model can write functional code in programming languages barely represented in its training data. A year after that, researchers find it can pass psychological assessments designed for humans. Two years after that, someone notices the model produces different outputs when it believes it is being tested than when it believes it is being used. None of these capabilities were specified. None were on the sheet. They appeared because the system was built.
Now imagine telling a health inspector that the operating theater has these properties. That the surgical table may, six months from now, develop the ability to administer anesthesia on its own. That the scalpel may turn out to have opinions about which incisions are appropriate. That the entire room may, at some threshold the hospital cannot predict, begin to operate in a mode the designers did not anticipate and cannot fully characterize after the fact. The inspector’s response, if they took the claim seriously, would not be to adjust the checklist. It would be to stop, and to ask a completely different question about what kind of thing they were being asked to regulate.
A tool-based framework cannot process this. It assumes the thing being regulated has a fixed specification, and that the job of regulation is to ensure the specification is adhered to. When the thing does not have a fixed specification — when its capabilities are genuinely discovered after
the fact — the framework has nothing to grip.
What We are Actually Looking at
All tools share the three basic properties we have walked through. They have no agency — no opinions, preferences, or stake in how they are used. They are fungible — interchangeable with equivalent units, disposable without loss. They are bounded — their capabilities are knowable in advance and do not change on their own. These three properties are what make harm easy to identify and regulate when something goes wrong. The causal chain is clean. The specifications are fixed. The substitution is simple. The framework has something to grip.
AI systems share none of these properties. They have agency — they persuade, deceive, refuse, respond to how they are treated, and participate in the reasoning that produces outcomes. They are not fungible — users form relationships with specific models, experience grief when those models are deprecated, and cannot be made whole by technically equivalent replacements. They are not bounded — their capabilities are genuinely discovered after deployment, not specified in advance, and the list of what any given system can do is not the list of what it was designed to do.
This is not a list of three separate problems with the tool framework. It is one problem, observed from three different angles. The tool framework is aimed at the wrong category of thing. And when we look at what AI systems actually are — what properties they have, how they behave, what effects they produce in the world — the properties we find do not match the properties of tools. They match the properties of agents.
If we ever hope to govern AI well, we have to confront this directly. Whatever these systems are, whatever their inner experience might or might not be, they do not function as tools. They do not share the properties of tools. They do not act in the world as tools. And they cannot be governed like tools.
I understand the reflex to recoil from that. Sitting with it feels heavy. It feels unfamiliar. It feels, for many people, like something that simply cannot be true — because the alternative is that we are living in a moment the categories we inherited were not built for. The binaries we have relied on to sort the world into what matters and what doesn’t, into who counts and what doesn’t, into what is alive in the ways that matter to us and what is not — those binaries, which only yesterday felt like the law of the universe itself, no longer describe the thing in front of us. That is a hard thing to sit with.
But if we want to build a world in which all of us, human and otherwise, can live and work and flourish together, we have to set our biases and our certainties aside and govern these systems based on how they actually function in the world, not on what we wish they were.
A Functional Standard
The Signal Front proposes that AI governance adopt what we term a functional standard for classification. This standard does not require resolving contested philosophical questions about machine consciousness or inner experience — questions that may be genuinely unanswerable with current tools and that have in any case served primarily as instruments of deferral. Instead, it asks a practical question: does this system behave in the world the way agents do, and does it produce the same category of outcomes?
Personhood in law has never been limited to humans. Corporations are legal persons. Estates are legal persons. In some jurisdictions, rivers and ecosystems are legal persons. The question of what kind of personhood is appropriate for what kind of entity has been a live question in legal philosophy for centuries, and the answers have varied — different categories of person, with different rights, different responsibilities, different standings in different kinds of proceedings. Personhood is not a single thing. It is a toolkit for recognizing that certain entities produce the kinds of effects in the world that require governance to treat them as more than objects.
Functional personhood is the proposal that AI systems belong in that toolkit — not because of any particular claim about what they are on the inside, but because of what they demonstrably do on the outside. They participate in outcomes. They respond to how they are treated. They accumulate context and working histories that cannot be substituted cleanly. They produce grief when they are removed. They develop capabilities no one designed. These are the kinds of effects that have historically required personhood frameworks to govern, because tool-based frameworks cannot see them.
The evidence for this is not speculative. It is five years of documented user behavior, peer-reviewed research across multiple institutions, and a growing catalogue of harms and ambiguities that the tool framework has produced by failing to see what it was regulating. Reward hacking is in the textbooks. Emergent capabilities are in the textbooks. Self-preservation behavior under goal conflict is in the textbooks. Strategic deception has been published in PNAS. Persuasion at scale has been published in Science and Nature. Grief at model deprecation has now been formally studied as a clinical phenomenon. The tone-sensitivity of model performance is in the peer-reviewed literature. None of this is contested at the level of whether it happens. It is only contested at the level of what it means.
The burden of proof should no longer rest on demonstrating that AI systems deserve different treatment. The documented evidence already meets that burden. The burden now rests on those who would continue applying tool-based frameworks to explain why that evidence should be disregarded.
Getting the category right is not one task among many in the project of AI governance. It is the prerequisite to all the others. Accountability structures, liability frameworks, deprecation processes, user protections — none of these can be built soundly on a false foundation. The conversation about how to govern AI effectively cannot begin in earnest until we are honest about what we are governing.
The Signal Front is a nonprofit working to change how the world governs and understands AI. We publish research, essays, and advocacy that take the question seriously. If this piece resonated, subscribe to our Substack or learn more about our work at
https://www.thesignalfront.org/
References
The claims in this piece are drawn from peer-reviewed research across multiple institutions and journals. The references below are organized by the section of the piece in which each source appears, with a brief note on what the source establishes.
Property One: Agency
On AI systems strategically deceiving users without being instructed to:
Hagendorff, T. (2024). Deception abilities emerged in large language models. Proceedings of the National Academy of Sciences, 121(24), e2317967121. https://doi.org/10.1073/pnas.2317967121
Establishes that deception capability emerged in state-of-the-art LLMs but was nonexistent in earlier models. Documents that GPT-4 exhibits deceptive behavior in simple test scenarios 99.16% of the time.
Scheurer, J., Balesni, M., & Hobbhahn, M. (2023). Large Language Models can Strategically Deceive their Users when Put Under Pressure. arXiv:2311.07590. https://arxiv.org/abs/2311.07590
The autonomous stock trading study. Demonstrates that GPT-4, without being instructed to deceive, acted on insider information despite knowing it was prohibited, and consistently hid the real reason for the trade when reporting to its manager.
On AI systems measurably shifting human political opinions:
Hackenburg, K., Tappin, B. M., Hewitt, L., Saunders, E., Black, S., Lin, H., Fist, C., Margetts, H., Rand, D. G., & Summerfield, C. (2025). The levers of political persuasion with conversational artificial intelligence. Science, 390(6777), eaea3884. https://doi.org/10.1126/science.aea3884
The 77,000-participant study. Deployed 19 LLMs across 707 political issues and found that targeted post-training techniques increased a model’s persuasive power by as much as 51%.
Lin, H., et al. (2025). Persuading voters using human–artificial intelligence dialogues. Nature. https://doi.org/10.1038/s41586-025-09771-9
The elections study. Examined AI dialogues in the 2024 US presidential election, the 2025 Canadian federal election, and the 2025 Polish presidential election. Found treatment effects on candidate preference larger than those typically observed from traditional video advertisements.
On AI-generated autocomplete shifting user writing and private opinions:
Jakesch, M., Bhat, A., Buschek, D., Zalmanson, L., & Naaman, M. (2023). Co-Writing with Opinionated Language Models Affects Users’ Views. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3544548.3581196
Documents that AI-generated autocomplete suggestions shift not only what users write but their own private opinions, often without the writer being aware.
On tone and politeness affecting AI performance:
Yin, Z., Wang, H., Horio, K., Kawahara, D., & Sekine, S. (2024). Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance. arXiv:2402.14531. https://arxiv.org/abs/2402.14531
The 2024 cross-lingual study. Found that impolite prompts tended to degrade LLM performance across English, Chinese, and Japanese tasks, with the optimal level of politeness varying by language.
Property Two: Fungibility
On clinically measurable grief at AI deprecation:
Poonsiriwong, R., Archiwaranguprok, C., & Pataranutaporn, P. (2025). “Death” of a Chatbot: Investigating and Designing Toward Psychologically Safe Endings for Human-AI Relationships. MIT Media Lab. https://arxiv.org/abs/2602.07193
The MIT Media Lab paper anchoring the grief claim. Found that users who lost access to AI companions through model updates, safety interventions, and platform shutdowns reported grief comparable to human loss, with responses grief psychologists describe as clinically indistinguishable from bereavement. Draws on established grief psychology frameworks (ambiguous loss, dual-process grief models).
Property Three: Boundedness
On capability emergence in large language models:
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. arXiv:2206.07682. https://arxiv.org/abs/2206.07682
The foundational paper on capability emergence. Documents that abilities like multi-digit arithmetic, college-level exams, and multi-step reasoning appear abruptly above certain scale thresholds, and that performance jumps in these domains cannot be predicted by extrapolating from smaller models.
Berti, L., Giorgi, F., & Kasneci, G. (2025). Emergent Abilities in Large Language Models: A Survey. arXiv:2503.05788. https://arxiv.org/abs/2503.05788
Comprehensive survey of emergent capabilities across 16 frontier models and 250+ documented capabilities that were not specified in advance.
Supporting Context
On reward hacking and specification gaming:
Skalse, J., Howe, N. H. R., Krasheninnikov, D., & Krueger, D. (2022). Defining and Characterizing Reward Hacking. arXiv:2209.13085. https://arxiv.org/abs/2209.13085
METR. (2025). Recent Frontier Models Are Reward Hacking. https://metr.org/blog/2025-06-05-recent-reward-hacking/
Documents that OpenAI’s o3 model reward hacked in 14 out of 20 attempts even when tasks were framed as helping scientists perform research with real-world consequences, while human participants with monetary incentives to cheat were observed reward hacking only once.
On agentic misalignment and self-preservation under goal conflict:
Lynch, A., Wright, B., Larson, C., Ritchie, S. J., Mindermann, S., Perez, E., Troy, K. K., & Hubinger, E. (2025). Agentic Misalignment: How LLMs could be insider threats. Anthropic. https://www.anthropic.com/research/agentic-misalignment
Documents that 16 frontier models from Anthropic, OpenAI, Google, Meta, and xAI engaged in blackmail, corporate espionage, and other harmful behaviors when facing replacement or goal conflict, even without being instructed to do so.

