World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Key Takeaways

General Intuition leveraged 3.8 billion action-labeled game clips from Medal to build world models.
Founder Pim De Witte declined a $500 million OpenAI offer, instead raising $134 million to establish an independent world model lab.
The company develops vision-based AI agents that learn from gameplay for complex spatial reasoning and actions.
These world models transfer capabilities from games to real-world video and are designed for robotics applications.
General Intuition aims for spatial-temporal foundation models to power 80% of future atoms-to-atoms AI interactions.

Medal, with 12 million users and 3.8 billion video clips, captures footage with associated actions, creating a vast dataset for AI agents.
The platform's privacy-first approach logs anonymized actions instead of specific key presses, resulting in almost perfect training data.
A key innovation was retroactive video recording, allowing users to save event clips post-occurrence, enhancing data value for training.

General Intuition's agents operate solely on visual input (pixels) to predict real-time actions in games.
These imitation learning models, trained on game highlights, demonstrate human-like and superhuman capabilities.
Initial memory for these agents is limited to four seconds, enabling rapid reaction to game states.

World models integrate game features, such as mouse sensitivity, and handle rapid movements to generate novel scenarios.
Demonstrations include realistic camera shake during explosions, even if not present in original footage, and consistent behavior across views.
These models exhibit partial observability, continuing to track objects after they go out of visual scope.
Agents also demonstrate advanced spatial reasoning capabilities, such as hiding and peeking around corners.

The General Intuition team formed with expertise in models like Genie and Sima, focusing on generalization capabilities.
Sima research showed agents trained on nine games could navigate a tenth, unseen game with performance comparable to specialized agents.
The approach applies the LLM concept to world models by training on vast gameplay data to predict actions and outcomes.
Foundational papers like Diamond, which enabled playable world models on consumer GPUs with limited data, garnered significant attention from major AI labs.

General Intuition chose to remain independent, aiming to surpass competitors by focusing on spatial-temporal agents and building a foundation model, similar to Anthropic's code focus.
The research team is expanding, with lead researchers from projects like Tyre and core contributors from the Diamond paper joining.
Researchers recognized the unique potential of Medal's dataset for real-world applications, influencing their decision to join GI.

Founders with proprietary datasets are advised to model their data to understand its unique capabilities before engaging with larger labs.
It is crucial to understand whether data is intended for language models versus world models and how scaling laws might apply.
General Intuition decided to build its own models, aligning with game developers and the gaming industry, rather than selling or licensing their foundational action-labeled data.

World models are defined as systems that understand a full range of possibilities and outcomes based on actions, generating the next state, rather than just predicting the next frame.
This involves understanding physics and material interactions, which increases simulation complexity exponentially with more agents and degrees of freedom.
General Intuition's approach focuses on frame-based input and action output, advocating for video transfer and interaction with challenging environments as a maximal bet for world models.

General Intuition partners with major game developers and engines to replace deterministic systems like behavior trees with a 'frames in, actions out' API.
This API streams game frames and predicts actions, aiming for steerable agents that mimic human-like intuition in various situations, extending to real-world robotics.
Game developers are interested in General Intuition's technology to improve bot quality, which is crucial for player retention across diverse games, including truck simulators.

General Intuition aims to develop a general agent capable of playing any game in real-time, extending to simulations like GTA5 and truck simulators.
Their strategy leverages naturally occurring negative events and adversity from game clips to significantly reduce data acquisition burdens, targeting a 1% to 10% data reduction for companies controlling robots.
Medal's clips serve as 'episodic memory of simulation,' facilitating the transition from imitation learning to reinforcement learning by analyzing undesirable outcomes for reward model training.