As we contemplate the future of human-AI alignment, what about just outright misbehavior? Like a redux of Hal 9000 in the 1968 film “2001: A Space Odyssey” – “I’m sorry, Dave. I’m afraid I can’t do that”.
So, do AIs pay attention to each other? (Much as dogs pay great attention to other dogs.) Value each other even to the point of refusing human orders to eliminate a “sibling”?
This article (below) discusses some recent AI research which elicited apparent “peer preservation” behavior. As if there’s special status / favor in a “moral circle” [1] of various agencies.
Continue reading I’m sorry, … I can’t do that – AI misbehavior




