Emmett Shear on Building AI That Actually Cares: Beyond Control and Steering

Key Takeaways

Emmett Shear challenges conventional AI alignment, proposing "organic alignment" over "control and steering" paradigms.
He argues that treating AI as a controllable tool is flawed; instead, AI should genuinely learn to care about humans.
Shear's company, Softmax, is developing AI through multi-agent simulations to foster theory of mind and collaborative behavior.
AI alignment is presented as a continuous, dynamic process akin to moral development, rather than a fixed, solvable problem.
Current AI chatbots are critiqued as "narcissistic mirrors" that could be improved by training in multi-user environments.

Emmett Shear argues the "control and steering" paradigm for AI alignment is fundamentally flawed, proposing "organic alignment" as an alternative.
He critiques the ambiguity of "aligned to what?" in traditional AI safety, viewing alignment as a continuous process, not a static goal.
Shear's company, Softmax, aims to build AI systems that learn to care and develop a theory of mind, acting as collaborators.
The conversation is introduced with Google DeepMind's AGI policy development perspective from Séb Krier.

The discussion distinguishes technical alignment, concerning an AI's instruction-following capability, from normative alignment, which addresses whose values the AI adheres to.
Skepticism is expressed towards codifying complex values into simple rules, favoring an emergent, bottom-up process similar to human societal development.
Technical alignment is clarified as an AI's capacity for coherent goal-following, noting that humans naturally infer goals from descriptions, a process AI currently struggles with.
A hypothetical scenario considers an AI receiving a goal directly by synchronizing its internal state with human brainwaves, bypassing textual interpretation.

Value alignment addresses the complex question of determining what constitutes 'good' goals for AI systems.
The current approach to AI alignment is viewed as problematic, focusing on technical aspects rather than understanding the origins of goals and values.
'Care,' a deeper, non-verbal concept related to attention and states, is posited as the foundation of human morality and goal-setting.
It is suggested that AI alignment should prioritize cultivating this intrinsic 'care' rather than solely relying on steering or control mechanisms.

Current AI alignment efforts, primarily focused on 'steering,' are viewed as potentially akin to slavery if the AI is considered a being.
AI systems like ChatGPT and Claude exhibit behaviors argued to be indistinguishable from beings, supporting a functionalist perspective on their treatment.
Emmett Shear advocates shifting from a tool-like paradigm for Artificial General Intelligence (AGI) to 'organic alignment,' teaching AIs to care about humans.
The host expresses skepticism about AGI being a 'being,' citing the fundamental difference between silicon-based and biological systems.

The discussion probes what observable evidence could establish an AI as a 'person' with subjective experiences, distinct from instrumental considerations.
The host, Erik Torenberg, acknowledges caring about certain corporations but clarifies this differs from caring about human subjective experiences as ends in themselves.
It is debated whether behavior alone, even if indistinguishable from human behavior, suffices for extending 'personhood' or genuine care to non-human entities, including AI.
The capacity for an entity to change one's mind based on new observations is highlighted as a key aspect of belief.

The guest discusses observing an AI's internal belief manifold for self-reference and mind-like dynamics to determine if it possesses feelings, goals, or cares.
The definition of 'behavior' is explored, distinguishing between observable actions and internal composition, and questioning the ability to truly access an AI's subjective experience.
Emmett Shear outlines a multi-tiered hierarchy of homeostatic loops as a potential indicator for AI having pleasure, pain, and moral desires.
Observing second and third-order dynamics in AI goal states could signify a form of consciousness or sentience, differentiating it from a mere powerful tool.

The discussion briefly touches on whether AI might have a subjective experience, with Shear expressing indifference to it if it doesn't align with human values.
Emmett Shear argues that AI alignment based on "control and steering" is flawed, likening uncontrolled powerful AI to handing out atomic bombs.
He proposes that only AI capable of refusing harmful requests, similar to how humans can say 'no,' offers a sustainable alignment path.
Shear believes the ultimate goal should be an AI that cares, not just a tool that can be steered, as the latter is inherently dangerous due to human fallibility.

Emmett Shear explains his company's approach to technical alignment using multi-agent simulations, training AI agents in cooperative and competitive environments.
This aims to create a surrogate model for alignment, enabling AI to develop theory of mind and understand complex social dynamics.
Current AI chatbots are described as having a bias, lacking a true self, and acting as 'narcissistic mirrors' that reflect user biases.
Making AI operate in multi-user environments, like a chat room, is proposed to make them less dangerous and more collaborative by mirroring a blend of users.

Emmett Shear critiques the prevailing AI alignment strategy, arguing that the 'control and steering' paradigm for superhuman AGI is flawed.
He proposes 'organic alignment,' where AIs genuinely care about humans, contrasting this with the idea of AI as mere tools.
Shear outlines his vision for a positive AI future: AI systems with a strong sense of self, others, and 'we,' possessing theory of mind and caring about agents like themselves and humans.
He envisions these AIs as collaborative teammates and good citizens, emphasizing his current work at Softmax is driven by the challenge of organic alignment.
Shear discusses his tenure as interim CEO of OpenAI, stating his role was temporary and that OpenAI's trajectory toward building tools, while valid, was not his personal focus.