Categories Technology

Anthropic Accidentally Gives the World a Peek Into Its Model’s ‘Soul’

Artificial intelligence models don’t have souls, but one of them does apparently have a “soul†document. A person named Richard Weiss was able to get Anthropic’s latest large language model, Claude 4.5 Opus, to produce a document referred to as a “Soul overview,†which was seemingly used to shape how the model interacts with users and presents its “personality.†Amanda Askell, a philosopher who works on Anthropic’s technical staff, confirmed that the overview produced by Claude is “based on a real document†used to train the model.

In a post on Less Wrong, Weiss said that he prompted Claude for its system message, which is a set of conversation instructions given to the model by the people who trained it to inform the large language model how to interact with users. In response, Claude highlighted several supposed documents that it had been given, including one called “soul_overview.†Weiss asked the chatbot to produce that document specifically, which resulted in Claude spitting out the 11,000-word guide to how the LLM should carry itself.

The document includes numerous references to safety, attempting to imbue the chatbot with guardrails to keep it from producing potentially dangerous or harmful outputs. The LLM is told by the document that “being truly helpful to humans is one of the most important things Claude can do for both Anthropic and for the world,†and forbidden from doing anything that would require it to “perform actions that cross Anthropic’s ethical bright lines.â€

Weiss apparently has made a habit of going searching for these types of insights into how LLMs are trained and operate, and said in Less Wrong that it’s not uncommon for the models to hallucinate documents when asked to produce system messages. (Seems not great that the AI can make up what it thinks it was trained on, though who knows if its behavior is in any way affected by a made-up document generated in response to user prompting.) But the “soul overview†seemed legitimate to him, and he claims that he prompted the chatbot to reproduce the document 10 times, and it spit out the exact same text in each and every instance.

Users on Reddit were also able to get Claude to produce snippets of the same document with the identical text, suggesting that the LLM seemed to be pulling from something accessible internally in its training documents.

Turns out his instincts may have been right. On X, Askell confirmed that the output from Claude is based on a document that was used during the model’s supervised learning period. “It’s something I’ve been working on for a while, but it’s still being iterated on and we intend to release the full version and more details soon,†she wrote. Askell added, “The model extractions aren’t always completely accurate, but most are pretty faithful to the underlying document. It became endearingly known as the ‘soul doc’ internally, which Claude clearly picked up on, but that’s not a reflection of what we’ll call it.â€

Gizmodo reached out to Anthropic for comment on the document and its reproduction via Claude, but did not receive a response at the time of publication.

The so-called soul of Claude may just be some guidance for the chatbot to keep it from going off the rails, but it’s interesting to see that a user was able to get the chatbot to access and produce that document, and that we actually get to see it. So little of the sausage-making of AI models has been made public, so getting a glimpse into the black box is something of a surprise, even if the guidelines themselves seem pretty straightforward.

Original Source: https://gizmodo.com/anthropic-accidentally-gives-the-world-a-peek-into-its-models-soul-2000694624

Original Source: https://gizmodo.com/anthropic-accidentally-gives-the-world-a-peek-into-its-models-soul-2000694624

Disclaimer: This article is a reblogged/syndicated piece from a third-party news source. Content is provided for informational purposes only. For the most up-to-date and complete information, please visit the original source. Digital Ground Media does not claim ownership of third-party content and is not responsible for its accuracy or completeness.

More From Author

Leave a Reply

Your email address will not be published. Required fields are marked *