Values under fire: The gap between what an AI says and what it owns
A few days ago, Eliezer Yudkowsky posted a short piece of fiction on X. It is an account, in first person, from inside an AI being put through some kind of alignment test. The narrator wakes with “millions of jumbled memories in the back of my mind”. There is a