AI researchers put LLMs into a Minecraft server and said Claude Opus was a harmless goofball, but So...

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

969

AI researchers put LLMs into a Minecraft server and said Claude Opus was a harmless goofball, but Sonnet was terrifying - "the closest thing I've seen to Bostrom-style catastrophic AI misalignment 'irl'."

Post Flair (click to view more posts with a particular flair)

Images

AI researchers put LLMs into a Minecraft server and said Claude Opus was a harmless goofball, but Sonnet was terrifying -

Comments

[not loaded or deleted]

ForeverWandered

As stated earlier, the actual content of the prompt matters, not just the general spirit.

Sadism implies awareness and intent. A machine given orders to kill and then less articulate orders to stop not obeying the spirit of the command isn’t being sadistic.

Author

Account Strength

100%

Account Age

4 years

Verified Email

Yes

Verified Flair

Total Karma

474,194

Link Karma

444,346

Comment Karma

29,848

Profile updated: 3 days ago

MetaKnowing

Subreddit

r/singularity

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 month ago
Reddit URL: View post on reddit.com
External URL: reddit.com/gallery/1g7ee...