LLMs vs Age of Empires II: The Anthropomorphism Debate

A provocative research paper by Adrian de Wynter, titled 'If LLMs Have Human-Like Attributes, Then So Does Age of Empires II,' challenges the prevailing tendency in AI research to ascribe anthropomorphic qualities to Large Language Models (LLMs). The study argues that attributes such as morality or natural language understanding, often assumed to emerge in LLMs, are empirically non-unique. By training a simple neural network on the classic videogame Age of Empires II, de Wynter demonstrates that if these attributes are granted to LLMs, they could logically be attributed to any entity within a sufficiently powerful substrate, including LEGO or even the Greater Boston Area. The paper calls for explicit measurement criteria in AI evaluation and proposes a 'null assumption' of non-uniqueness to prevent circular or uninformative conclusions in the field of computation and language.

Key Takeaways

Critique of Anthropomorphism: The research challenges the common practice of ascribing human-like attributes, such as morality or understanding, to Large Language Models (LLMs) and agentic workflows.
The Substrate Argument: The author demonstrates that if LLMs are considered to have human-like qualities, then entities in other substrates—like a neural network trained on Age of Empires II, LEGO sets, or the Greater Boston Area—could also be said to possess them.
Empirical Non-Uniqueness: Anthropomorphic attributes are not unique to LLMs; while prompt responses might be constant, the interpretation of behavior varies based on the underlying substrate.
Call for Measurement Criteria: The paper argues that without explicit, empirically-grounded measurement criteria, discussions about AI attributes remain subjective and dependent on representation.
The Null Assumption: A new framework is proposed where researchers assume 'LLM non-uniqueness' rather than anthropomorphism when designing experiments to avoid circular reasoning.

In-Depth Analysis

The Problem of Ascribed Attributes in LLM Research

In the rapidly evolving field of Large Language Models and LLM-powered agentic workflows, a significant body of research has begun to assume or claim the emergence of generalized anthropomorphic attributes. These attributes often include complex human concepts such as morality, ethics, or a genuine understanding of natural language. Adrian de Wynter’s research, recently submitted to arXiv, does not seek to definitively prove or disprove the existence of these traits. Instead, it aims to highlight a fundamental logical flaw in how these conclusions are reached.

The core of the issue lies in the assumption that these behaviors are inherent to the model's intelligence rather than a product of the observer's interpretation. When researchers observe a model providing a 'moral' answer to a prompt, they often attribute a sense of morality to the model itself. However, de Wynter points out that these conclusions could be fundamentally incorrect because they lack a comparative baseline and fail to account for the substrate in which the intelligence resides.

Age of Empires II and the Substrate Independence Theory

To illustrate the non-uniqueness of LLM attributes, the study utilizes a simple neural network trained on the videogame Age of Empires II. The author posits that if we are willing to grant anthropomorphic attributes to LLMs based on their outputs, we must logically be prepared to grant them to a neural network operating within a game environment.

This argument extends to the concept of 'sufficiently-powerful substrates.' De Wynter suggests that any entity capable of processing information—whether it be a complex arrangement of LEGO bricks or the logistical movements within the Greater Boston Area—could theoretically present these same attributes if the same logic is applied. This highlights a critical realization: the perceived 'human-like' behavior of an AI is empirically non-unique. While the specific responses to prompts might remain consistent across different tests, the way we interpret those behaviors changes significantly depending on the substrate. If the interpretation is left to the representation rather than a fixed standard, the 'intelligence' observed is merely a reflection of the observer's bias.

Avoiding Circular Reasoning through the Null Assumption

One of the most significant contributions of the paper is the identification of a logical trap in current AI experimentation. De Wynter argues that assuming these attributes exist (or do not exist) in a system, independent of its substrate, leads to conclusions that are either circular or entirely uninformative. For instance, if one assumes an LLM understands language and then tests it on language, the 'proof' of understanding is built into the initial assumption.

To rectify this, the paper proposes a 'null' assumption. Instead of starting an experiment with the assumption that an LLM possesses anthropomorphic attributes, researchers should assume 'LLM non-uniqueness.' This means starting from the position that the model's behaviors are not special or uniquely human-like, but are instead functions of the substrate and training. By setting up experiments under this null hypothesis, researchers can move toward more rigorous, empirically-grounded discussions that require explicit measurement criteria rather than subjective interpretation.

Industry Impact

The implications of this research for the AI industry are profound, particularly as companies race to claim their models possess 'human-level' reasoning or 'moral' alignment. If the industry adopts de Wynter’s call for explicit measurement criteria, it could lead to a standardization of AI evaluation that moves away from marketing-friendly anthropomorphism and toward technical transparency.

Furthermore, the 'null assumption' framework could change how safety and ethics are tested in AI. Instead of trying to 'teach' a model morality—an attribute that this paper suggests may be an interpretative illusion—developers might focus more on the specific, measurable outputs and the substrate-dependent behaviors of their systems. This shift could deflate some of the hype surrounding 'emergent' properties and refocus the field on the mechanical and mathematical realities of neural networks.

Frequently Asked Questions

Question: What does the author mean by 'substrate' in the context of AI?

In this research, a substrate refers to the underlying medium or environment in which a system operates. This could be the architecture of a Large Language Model, the code of a videogame like Age of Empires II, or even physical systems like LEGO or a geographic city area. The author argues that the interpretation of 'intelligence' often changes based on which substrate is being observed.

Question: Why is the 'null assumption' important for AI researchers?

The null assumption of 'LLM non-uniqueness' is important because it prevents circular reasoning. By assuming that an LLM does not have unique human-like attributes at the start of an experiment, researchers are forced to find objective, measurable evidence to prove otherwise, rather than simply interpreting model outputs through a human-centric lens.

Question: Does this paper prove that LLMs do not have morality?

No, the author explicitly states that the goal is not to argue for or against the existence of these attributes. Rather, the paper points out that the current methods used to conclude that LLMs have such attributes are logically flawed and that these attributes are not unique to LLMs.

Challenging Anthropomorphism: Why Age of Empires II Might Have Human-Like Attributes if LLMs Do