Overview

The AI model landscape has evolved into a three-horse race between OpenAI, Anthropic, and Google, with OpenAI's market share dropping from 95% to 50-75% as competitors introduce aggressive pricing strategies and improved capabilities.

AI engineering has reached the peak of the hype curve, with the field shifting from research-heavy to more engineering-focused, while success metrics evolve from academic citations to real-world impact and monetization.

The industry is experiencing a significant inference price collapse, with costs dropping from $40 per million tokens to around 7.5 cents—a three orders of magnitude improvement in just one year.

The debate around AI scaling suggests current pre-training approaches may be reaching limitations, with focus shifting toward "inference time compute" and more efficient training methods beyond raw parameter size increases.

On-device AI capabilities are expanding through Apple Intelligence, Chrome Nano, and Windows implementations, while multimodal models continue to advance with specialized startups gaining traction alongside big tech's all-in-one approaches.

Content

Podcast Context and Milestone

This is the 100th episode of the Latent Space Podcast, marking nearly two years of content
Hosts are Alessio (partner and CTO at Decibel Partners) and Swix
Podcast has evolved from initial format to more research-driven content
The podcast's Discord community has grown to nearly 5,000 members with a positive culture
Hosts encouraged listeners to subscribe to their YouTube channel

AI Engineering Trends and Industry Observations

Gartner has placed AI engineering at the peak of the hype curve
GitHub Models launch prominently featured "AI engineers" in marketing
The concept of AI engineering is gaining widespread recognition
Podcast growth mirrors the expansion of the AI engineering industry
Increasing interest in applied AI, demonstrated by large conference attendance
Anticipation that the field will shift from research-heavy to more engineering-focused
AI Engineer World's Fair in June had over 2000 attendees

Research and Production Insights

The hosts focus on bridging research and production, particularly in machine learning conferences
Researchers are increasingly interested in how their work translates to practical applications
Success metrics are shifting from academic citations to real-world impact and monetization
The speaker created Lay in Space Live to address perceived flaws in academic conferences

Conference Observations

Conferences like NeurIPS and ICML are heavily oriented towards PhD students and job markets
Conferences are adding tracks like position papers and benchmarks to improve information sharing
The conference covered topics including startups, vision, open models, synthetic data, and agents
Model papers are becoming less prominent, with data sets and benchmarks taking center stage
Two notable data set papers mentioned: Data Comp and Fine Web

Scaling and AI Development Debate

A significant discussion emerged about whether AI model scaling has hit a wall
Key figures like John Franco, Ilya Sutskever, and Noam Brown suggested current pre-training approaches may be reaching limitations
The community is shifting focus towards "inference time compute" (ITC) as a new approach
There's growing interest in compute-optimal training beyond just pre-training stages
"Inference time compute" is becoming the preferred term over "test time compute"

AI Model Landscape and Market Trends

The AI model competitive landscape is now effectively a "three-horse race" between Gemini, Anthropic, and OpenAI
OpenAI's market share has dropped from 95% to between 50-75% over 2023-2024
Emerging competitive strategies include:

- Gemini launching a price war with Gemini Flash (essentially free for personal use) - Claude 3's market entry and aggressive growth

LMSys Elo scores have significantly improved, with top models now around 1275
Multiple "frontier labs" are competing, with a clear tier zero and tier one
XAI is often overlooked in benchmarks due to slow API rollout

Small Model Developments

"Small models" now represent a more nuanced category, roughly 1-5 billion parameters
Large labs are producing competitive small models (Gemini, Apple Foundation)
Open source community has focused on 0.5-3B parameter models
Apple Intelligence launched with a local 3B transformer model running on phones
Not considered a game-changing release, but seen as the largest scale transformer rollout after Google's BERT

Open Source AI Models

LAMA 405B performs well in comparisons with Gemini and GPT-40
However, the model is slow and expensive for inference, often not practical for production
Many are using large models like 405B primarily as "teacher models" to distill smaller models
The speaker believes the gap between open source and closed source AI is actually widening, not narrowing
Llama 3 released open-source models (8B and 7B variants)

Business and Economic Insights

OpenAI sought more than $6.6 billion in fundraising but did not succeed
Companies are shifting from fixed to variable cost models for AI compute
This allows better margin control by attributing costs directly to usage
Inference costs are being passed more directly to customers
Expecting a $1,000-2,000 ChatGPT tier in the near future
The speaker suggests AI professionals should be prepared to spend ~$1,000/month on AI tools
Some platforms like Google are offering free inference during preview periods
Funding environment for large AI model pre-training is becoming more challenging

Agents and AI Progress

2025 is being predicted as the "year of agents" (similar to previous predictions)
Key frontier problems in agents include understanding environment and institutional knowledge
Current agents are good at following instructions but struggle with extracting implicit knowledge
Challenges include indexing and understanding undocumented enterprise processes
Core set of tools for AI agents now include:

- Web browsing - Code interpreting - Memory/planning capabilities

Emerging developments in agent infrastructure:

- Morph Labs' "time travel VM" for stateful code execution - Ability to fork and explore different execution paths - Need for more sophisticated code execution approaches

AI Research and Development Highlights

Jeff's talk focused on using ML to optimize systems, not just as a tool
Speculative decoding is becoming a norm for faster inference, though with GPU cost concerns
Synthetic data's growing importance in AI research and production
Ilya Sutskever Insights across different years (2014-2024)
OpenAI lawsuit revealing historical emails about GPU scaling
Early insights from Ilya and Greg about AI potential, even during Dota training phase

Reinforcement Learning and Language Models

RL is good at specific tasks but struggles with generalization
Language models scale with data and compute, but data scaling is not keeping pace
Ilya Sutskever predicts significant computational growth (potentially 10x per year)
New approach shifts from data-driven to task-driven fine-tuning
Fine-tuning now uses a reward model instead of traditional data sets
Addresses challenges of finding sufficient fine-tuning data for models like Llama

Scaling and Evolutionary Insights

Compared mammalian brain-body mass scaling to AI model development
Humans "broke off the slope" in brain development, potentially analogous to AI post-training development
Potential future directions: agents, synthetic data, inference, compute
Scaling big models seems to have hit a temporary wall
No major expected releases of next-generation models (GPT-4.5, Gemini Pro)
Focus shifting from pure scaling to more research-oriented compute usage

NeurIPS Highlights and Research Trends

Conference has a lottery system with many overlooked papers
Interesting paper discovered on AI agent collusion via steganography
Agents potentially hiding messages in generated text through subtle encoding techniques
Explored concepts like steganography, shelling points, and low-coordination communication
Discussed a paper on optimal language model pricing from an Nvidia researcher
Mentioned research on learning rate optimization techniques (shampoo, soap)

Synthetic Data and Reasoning

Synthetic data is increasingly seen as valuable, particularly for model distillation and fine-tuning
The Star series of papers (Star, Q-Star, V-Star) demonstrate techniques for synthesizing reasoning steps
OpenAI's approach involves using graders to create valid synthetic data for reasoning path fine-tuning
Reasoning capabilities are showing promise beyond STEM fields
O1 (a model) is noted as particularly strong in creative writing, character-level manipulation, and instruction following
There's uncertainty about how much of a competitive advantage reasoning capabilities provide
Other models like Claude Sonnet and Gemini Pro are competitive without explicit reasoning models

Intellectual Property and Legal Challenges

Highlighted ongoing legal battles between AI companies and IP owners
Belligerents include:

- New York Times - Stack Overflow - Reddit - Getty - Various artists and writers - Scarlett Johansson

Discussed tensions between traditional data annotation and synthetic data approaches
There are ongoing debates about data usage and IP in AI model training
Some models (like Deep Seek) have quickly replicated advanced capabilities

GPU Landscape and Trends

The GPU market has significantly changed from a year ago
GPU-rich startup funding model (raising millions to spend on NVIDIA hardware) is now obsolete
There's a new hierarchy emerging: GPU ultra-rich, GPU middle class, and GPU poor
Major labs (XAI, Meta, OpenAI) are building massive GPU clusters
Some are building clusters almost as an "article of faith" without clear immediate purpose
Reached a potential plateau around 2 trillion parameters (not expecting to go to 10 trillion)
Cloud models becoming more practical for "GPU poor" entities
Increasing interest in heterogeneous and distributed computing
Nvidia is now the most valuable company globally
Blackwell GPU series is their best-selling series
Nvidia has moved from a two-year to a one-year product cycle

On-Device AI and Multimodal Developments

Apple Intelligence now available on phones, offering notification summaries
Chrome Nano (Gemini Nano) and Windows RWKB rolling out on-device AI capabilities
Trend towards more localized, device-level AI compute
GPU-Poor Startups Scaling Successfully:

- Suno: Grew from 0 to $20 million ARR, runs training on Moto - Bolt: Announced $20 million ARR

Emerging strategy of startups leveraging existing GPU cloud infrastructure

Multimodality War Observations

Debate between specialized vs. generalist AI models
Specialist startups gaining traction:

- 11 Labs (unicorn status) - Pika Labs (launched Pika 2.0)

Big tech labs pursuing all-in-one multimodal approaches
Gemini 2 highlighted for native image output capabilities
Midjourney praised for user-friendly UI and ease of use
Recraft V3 has surprisingly overtaken Flux 1.1 in image model rankings
Gemini's image editing demo highlighted as an ideal AI interaction model
Midjourney continues to be preferred for thumbnails
Black Forest models potentially better pixel-by-pixel, but less user-friendly
Grok launched Aurora image model, previously partnered with Black Forest Labs

Video and AI Research

DeepMind (Gemini/DeepMind) appears ahead of OpenAI in video AI research
DeepMind working on models like Genie, Genie 2, and Video Poet
Potential 4-year advantage in world modeling compared to OpenAI
Next frontier discussed: video-to-audio synchronization
Expect continued video AI development over next 5 years

AI Project Growth and Trends

Analyzing download stats on PyPy for AI projects like Langchain and Lama Index
Noting that projects with commercial products tend to have more sustained usage
Observing that some projects like Crew AI have growing GitHub stars but flat usage
Highlighting a trend of over-promising and under-delivering in AI startups
Discussing the unique AI phenomenon of high GitHub star counts but low actual usage
Noting that generalist AI projects often struggle to deliver on broad promises
Specific Project Observations:

- Devin: Took 9 months to reach general availability, but showing improvement - Auto GPT: High interest due to promise of generality, but challenges in execution - Most AI projects focusing on specific, narrow use cases (code completion, PR reviews)

Memory and AI Systems

Discussion of AI memory systems and their potential development
Mentioned memory-related projects like Land Graph, ZEP, and MGpt
Argued that memory is a crucial "building block" for consumer AI products
Highlighted the need for long-lived, portable memories across different AI platforms
Current memory products are considered immature:

- Mostly do explicit summarization - Lack implicit preference extraction - Do not truly capture nuanced user interactions

Distinction between memory and knowledge:

- Knowledge: Information about the world (external/internal) - Memory: Personal interaction history over time - Requires time-based decay and review functions

Currently no standardization for AI memory systems
Big AI labs like Anthropic have more capability to set standards
Role-play communities might be potential pioneers in memory standardization

Benchmarks Evolution

Significant changes in AI benchmarks from previous years
Emerging benchmarks include:

- Sweebench - Livebench - MMU Pro - AIME

Shifting focus towards frontier math, coding capabilities
Previous benchmarks like MMLU and GPQA becoming less relevant
LMSys Arena emerging as a key comparative platform for AI models
Current Sweebench progress: Started at 13% this year, now around 50%
Goal of reaching 80% by end of next year, with 90% targeted by Kowinski prize

AI Capabilities Categorization

Mature Capabilities:

- General knowledge - Long context (100-200K tokens) - Retrieval-augmented generation (RAG) - Batch transcription - Code generation

Emerging Capabilities:

- Tool use - Vision language models - PDF parsing - Real-time transcription - Improving diarization capabilities - Potential use of Gemini 2.0 flash for transcription

Frontier Capabilities:

- Interesting but not yet ready for broad usage - Challenges in daily application - Potential for long inference and real-time API voice modes - On-device models still developing

Niche Capabilities:

- Base models are underrated - Potential release of GPT-3 base model - Interest in state space models and RWKVs - Cartesia emerging as a competitor to 11 labs

Inference and Pricing Trends

Ongoing "inference race to the bottom"
Significant price reductions in AI model tokens
Example: From $40 per million tokens to 50 cents for similar ELO performance
Two orders of magnitude improvement in pricing within a year
Competitive landscape with models like Cloud Three Haiku and Gemini 1.5 Pro
Throughout 2024, there has been a significant reduction in AI model pricing
The cost of intelligence is decreasing by approximately 3 orders of magnitude this year
Models are becoming more efficient and cheaper across different intelligence levels (ELO)
In September, the AI pricing frontier included:

- O1 preview - GPT 4.0 - O1 mini - Gemini flash

Amazon Nova recently entered the market with competitive pricing
Gemini 1.5 pro adjusted pricing to align with the market frontier
From early 2024, pricing has dropped from $40-50 per million tokens to around 7.5 cents per token
Some models like Haiku Niu saw a 4X price increase, which was potentially due to model size expansion
The pricing trend is faster than previously predicted Moore's law-like improvements

Key AI Releases and Developments in 2023-2024

Jeff Bezos invested in Perplexity, which is seen as a potential "new Google"
OpenAI discussions about GPT-5 were present, but no concrete release
ChatGPT introduced memory feature in February
Gemini launched with significant context window (1 million tokens)
Claude 3 by Anthropic emerged as a strong competitor to OpenAI
Devin AI launched with a notable PR campaign, available for $500/month
AI music generation platforms like Suno and Udo gained attention
GPT-4 Turbo released with model efficiency improvements
OpenAI released several significant models and products:

- Q/Strawberry (September) - GPT-4 Turbo variants (One Pro and One Full) - Voice Mode - Canvas (document editing environment in October) - Chrome search extension

OpenAI is strategically competing with Google across multiple domains:

- Document editing (challenging Google Docs) - Search capabilities - Workflow integration

OpenAI and Personnel Changes

Ilya's new startup raised $1 billion, with Dan Gross becoming full-time CEO
Focused on direct path to superintelligence, avoiding intermediate product steps
Noted exodus of safety-oriented team members, including Jan Leike
Entire super alignment team has left OpenAI
Discussion of lost Scarlett Johansson-like voice for AI assistants

Future of Work and AI

Prediction that 2025 might be the first year where AI sets the "skill floor" for certain roles
Potential impact on jobs like customer support and software engineering
Agents might replace or significantly change low-skill job requirements
Speculation about OpenAI's future agent project "Operator"
Researcher prediction about potential foreign espionage in AI labs
Increasing awareness of security risks in AI research environments
Prediction of always-on

Latent.Space 2024 Year in Review