Key Takeaways
- General Intuition leveraged 3.8 billion action-labeled game clips from Medal to build world models.
- Founder Pim De Witte declined a $500 million OpenAI offer, instead raising $134 million to establish an independent world model lab.
- The company develops vision-based AI agents that learn from gameplay for complex spatial reasoning and actions.
- These world models transfer capabilities from games to real-world video and are designed for robotics applications.
- General Intuition aims for spatial-temporal foundation models to power 80% of future atoms-to-atoms AI interactions.
Deep Dive
- Medal, with 12 million users and 3.8 billion video clips, captures footage with associated actions, creating a vast dataset for AI agents.
- The platform's privacy-first approach logs anonymized actions instead of specific key presses, resulting in almost perfect training data.
- A key innovation was retroactive video recording, allowing users to save event clips post-occurrence, enhancing data value for training.
- General Intuition's agents operate solely on visual input (pixels) to predict real-time actions in games.
- These imitation learning models, trained on game highlights, demonstrate human-like and superhuman capabilities.
- Initial memory for these agents is limited to four seconds, enabling rapid reaction to game states.
- World models integrate game features, such as mouse sensitivity, and handle rapid movements to generate novel scenarios.
- Demonstrations include realistic camera shake during explosions, even if not present in original footage, and consistent behavior across views.
- These models exhibit partial observability, continuing to track objects after they go out of visual scope.
- Agents also demonstrate advanced spatial reasoning capabilities, such as hiding and peeking around corners.
- The General Intuition team formed with expertise in models like Genie and Sima, focusing on generalization capabilities.
- Sima research showed agents trained on nine games could navigate a tenth, unseen game with performance comparable to specialized agents.
- The approach applies the LLM concept to world models by training on vast gameplay data to predict actions and outcomes.
- Foundational papers like Diamond, which enabled playable world models on consumer GPUs with limited data, garnered significant attention from major AI labs.
- General Intuition chose to remain independent, aiming to surpass competitors by focusing on spatial-temporal agents and building a foundation model, similar to Anthropic's code focus.
- The research team is expanding, with lead researchers from projects like Tyre and core contributors from the Diamond paper joining.
- Researchers recognized the unique potential of Medal's dataset for real-world applications, influencing their decision to join GI.
- Founders with proprietary datasets are advised to model their data to understand its unique capabilities before engaging with larger labs.
- It is crucial to understand whether data is intended for language models versus world models and how scaling laws might apply.
- General Intuition decided to build its own models, aligning with game developers and the gaming industry, rather than selling or licensing their foundational action-labeled data.
- World models are defined as systems that understand a full range of possibilities and outcomes based on actions, generating the next state, rather than just predicting the next frame.
- This involves understanding physics and material interactions, which increases simulation complexity exponentially with more agents and degrees of freedom.
- General Intuition's approach focuses on frame-based input and action output, advocating for video transfer and interaction with challenging environments as a maximal bet for world models.
- General Intuition partners with major game developers and engines to replace deterministic systems like behavior trees with a 'frames in, actions out' API.
- This API streams game frames and predicts actions, aiming for steerable agents that mimic human-like intuition in various situations, extending to real-world robotics.
- Game developers are interested in General Intuition's technology to improve bot quality, which is crucial for player retention across diverse games, including truck simulators.
- General Intuition aims to develop a general agent capable of playing any game in real-time, extending to simulations like GTA5 and truck simulators.
- Their strategy leverages naturally occurring negative events and adversity from game clips to significantly reduce data acquisition burdens, targeting a 1% to 10% data reduction for companies controlling robots.
- Medal's clips serve as 'episodic memory of simulation,' facilitating the transition from imitation learning to reinforcement learning by analyzing undesirable outcomes for reward model training.