Overview
- The AI model landscape has evolved into a three-horse race between OpenAI, Anthropic, and Google, with OpenAI's market share dropping from 95% to 50-75% as competitors introduce aggressive pricing strategies and improved capabilities.
- AI engineering has reached the peak of the hype curve, with the field shifting from research-heavy to more engineering-focused, while success metrics evolve from academic citations to real-world impact and monetization.
- The industry is experiencing a significant inference price collapse, with costs dropping from $40 per million tokens to around 7.5 cents—a three orders of magnitude improvement in just one year.
- The debate around AI scaling suggests current pre-training approaches may be reaching limitations, with focus shifting toward "inference time compute" and more efficient training methods beyond raw parameter size increases.
- On-device AI capabilities are expanding through Apple Intelligence, Chrome Nano, and Windows implementations, while multimodal models continue to advance with specialized startups gaining traction alongside big tech's all-in-one approaches.
Content
Podcast Context and Milestone
- This is the 100th episode of the Latent Space Podcast, marking nearly two years of content
- Hosts are Alessio (partner and CTO at Decibel Partners) and Swix
- Podcast has evolved from initial format to more research-driven content
- The podcast's Discord community has grown to nearly 5,000 members with a positive culture
- Hosts encouraged listeners to subscribe to their YouTube channel
AI Engineering Trends and Industry Observations
- Gartner has placed AI engineering at the peak of the hype curve
- GitHub Models launch prominently featured "AI engineers" in marketing
- The concept of AI engineering is gaining widespread recognition
- Podcast growth mirrors the expansion of the AI engineering industry
- Increasing interest in applied AI, demonstrated by large conference attendance
- Anticipation that the field will shift from research-heavy to more engineering-focused
- AI Engineer World's Fair in June had over 2000 attendees
Research and Production Insights
- The hosts focus on bridging research and production, particularly in machine learning conferences
- Researchers are increasingly interested in how their work translates to practical applications
- Success metrics are shifting from academic citations to real-world impact and monetization
- The speaker created Lay in Space Live to address perceived flaws in academic conferences
Conference Observations
- Conferences like NeurIPS and ICML are heavily oriented towards PhD students and job markets
- Conferences are adding tracks like position papers and benchmarks to improve information sharing
- The conference covered topics including startups, vision, open models, synthetic data, and agents
- Model papers are becoming less prominent, with data sets and benchmarks taking center stage
- Two notable data set papers mentioned: Data Comp and Fine Web
Scaling and AI Development Debate
- A significant discussion emerged about whether AI model scaling has hit a wall
- Key figures like John Franco, Ilya Sutskever, and Noam Brown suggested current pre-training approaches may be reaching limitations
- The community is shifting focus towards "inference time compute" (ITC) as a new approach
- There's growing interest in compute-optimal training beyond just pre-training stages
- "Inference time compute" is becoming the preferred term over "test time compute"
AI Model Landscape and Market Trends
- The AI model competitive landscape is now effectively a "three-horse race" between Gemini, Anthropic, and OpenAI
- OpenAI's market share has dropped from 95% to between 50-75% over 2023-2024
- Emerging competitive strategies include:
- LMSys Elo scores have significantly improved, with top models now around 1275
- Multiple "frontier labs" are competing, with a clear tier zero and tier one
- XAI is often overlooked in benchmarks due to slow API rollout
Small Model Developments
- "Small models" now represent a more nuanced category, roughly 1-5 billion parameters
- Large labs are producing competitive small models (Gemini, Apple Foundation)
- Open source community has focused on 0.5-3B parameter models
- Apple Intelligence launched with a local 3B transformer model running on phones
- Not considered a game-changing release, but seen as the largest scale transformer rollout after Google's BERT
Open Source AI Models
- LAMA 405B performs well in comparisons with Gemini and GPT-40
- However, the model is slow and expensive for inference, often not practical for production
- Many are using large models like 405B primarily as "teacher models" to distill smaller models
- The speaker believes the gap between open source and closed source AI is actually widening, not narrowing
- Llama 3 released open-source models (8B and 7B variants)
Business and Economic Insights
- OpenAI sought more than $6.6 billion in fundraising but did not succeed
- Companies are shifting from fixed to variable cost models for AI compute
- This allows better margin control by attributing costs directly to usage
- Inference costs are being passed more directly to customers
- Expecting a $1,000-2,000 ChatGPT tier in the near future
- The speaker suggests AI professionals should be prepared to spend ~$1,000/month on AI tools
- Some platforms like Google are offering free inference during preview periods
- Funding environment for large AI model pre-training is becoming more challenging
Agents and AI Progress
- 2025 is being predicted as the "year of agents" (similar to previous predictions)
- Key frontier problems in agents include understanding environment and institutional knowledge
- Current agents are good at following instructions but struggle with extracting implicit knowledge
- Challenges include indexing and understanding undocumented enterprise processes
- Core set of tools for AI agents now include:
- Emerging developments in agent infrastructure:
AI Research and Development Highlights
- Jeff's talk focused on using ML to optimize systems, not just as a tool
- Speculative decoding is becoming a norm for faster inference, though with GPU cost concerns
- Synthetic data's growing importance in AI research and production
- Ilya Sutskever Insights across different years (2014-2024)
- OpenAI lawsuit revealing historical emails about GPU scaling
- Early insights from Ilya and Greg about AI potential, even during Dota training phase
Reinforcement Learning and Language Models
- RL is good at specific tasks but struggles with generalization
- Language models scale with data and compute, but data scaling is not keeping pace
- Ilya Sutskever predicts significant computational growth (potentially 10x per year)
- New approach shifts from data-driven to task-driven fine-tuning
- Fine-tuning now uses a reward model instead of traditional data sets
- Addresses challenges of finding sufficient fine-tuning data for models like Llama
Scaling and Evolutionary Insights
- Compared mammalian brain-body mass scaling to AI model development
- Humans "broke off the slope" in brain development, potentially analogous to AI post-training development
- Potential future directions: agents, synthetic data, inference, compute
- Scaling big models seems to have hit a temporary wall
- No major expected releases of next-generation models (GPT-4.5, Gemini Pro)
- Focus shifting from pure scaling to more research-oriented compute usage
NeurIPS Highlights and Research Trends
- Conference has a lottery system with many overlooked papers
- Interesting paper discovered on AI agent collusion via steganography
- Agents potentially hiding messages in generated text through subtle encoding techniques
- Explored concepts like steganography, shelling points, and low-coordination communication
- Discussed a paper on optimal language model pricing from an Nvidia researcher
- Mentioned research on learning rate optimization techniques (shampoo, soap)
Synthetic Data and Reasoning
- Synthetic data is increasingly seen as valuable, particularly for model distillation and fine-tuning
- The Star series of papers (Star, Q-Star, V-Star) demonstrate techniques for synthesizing reasoning steps
- OpenAI's approach involves using graders to create valid synthetic data for reasoning path fine-tuning
- Reasoning capabilities are showing promise beyond STEM fields
- O1 (a model) is noted as particularly strong in creative writing, character-level manipulation, and instruction following
- There's uncertainty about how much of a competitive advantage reasoning capabilities provide
- Other models like Claude Sonnet and Gemini Pro are competitive without explicit reasoning models
Intellectual Property and Legal Challenges
- Highlighted ongoing legal battles between AI companies and IP owners
- Belligerents include:
- Discussed tensions between traditional data annotation and synthetic data approaches
- There are ongoing debates about data usage and IP in AI model training
- Some models (like Deep Seek) have quickly replicated advanced capabilities
GPU Landscape and Trends
- The GPU market has significantly changed from a year ago
- GPU-rich startup funding model (raising millions to spend on NVIDIA hardware) is now obsolete
- There's a new hierarchy emerging: GPU ultra-rich, GPU middle class, and GPU poor
- Major labs (XAI, Meta, OpenAI) are building massive GPU clusters
- Some are building clusters almost as an "article of faith" without clear immediate purpose
- Reached a potential plateau around 2 trillion parameters (not expecting to go to 10 trillion)
- Cloud models becoming more practical for "GPU poor" entities
- Increasing interest in heterogeneous and distributed computing
- Nvidia is now the most valuable company globally
- Blackwell GPU series is their best-selling series
- Nvidia has moved from a two-year to a one-year product cycle
On-Device AI and Multimodal Developments
- Apple Intelligence now available on phones, offering notification summaries
- Chrome Nano (Gemini Nano) and Windows RWKB rolling out on-device AI capabilities
- Trend towards more localized, device-level AI compute
- GPU-Poor Startups Scaling Successfully:
- Emerging strategy of startups leveraging existing GPU cloud infrastructure
Multimodality War Observations
- Debate between specialized vs. generalist AI models
- Specialist startups gaining traction:
- Big tech labs pursuing all-in-one multimodal approaches
- Gemini 2 highlighted for native image output capabilities
- Midjourney praised for user-friendly UI and ease of use
- Recraft V3 has surprisingly overtaken Flux 1.1 in image model rankings
- Gemini's image editing demo highlighted as an ideal AI interaction model
- Midjourney continues to be preferred for thumbnails
- Black Forest models potentially better pixel-by-pixel, but less user-friendly
- Grok launched Aurora image model, previously partnered with Black Forest Labs
Video and AI Research
- DeepMind (Gemini/DeepMind) appears ahead of OpenAI in video AI research
- DeepMind working on models like Genie, Genie 2, and Video Poet
- Potential 4-year advantage in world modeling compared to OpenAI
- Next frontier discussed: video-to-audio synchronization
- Expect continued video AI development over next 5 years
AI Project Growth and Trends
- Analyzing download stats on PyPy for AI projects like Langchain and Lama Index
- Noting that projects with commercial products tend to have more sustained usage
- Observing that some projects like Crew AI have growing GitHub stars but flat usage
- Highlighting a trend of over-promising and under-delivering in AI startups
- Discussing the unique AI phenomenon of high GitHub star counts but low actual usage
- Noting that generalist AI projects often struggle to deliver on broad promises
- Specific Project Observations:
Memory and AI Systems
- Discussion of AI memory systems and their potential development
- Mentioned memory-related projects like Land Graph, ZEP, and MGpt
- Argued that memory is a crucial "building block" for consumer AI products
- Highlighted the need for long-lived, portable memories across different AI platforms
- Current memory products are considered immature:
- Distinction between memory and knowledge:
- Currently no standardization for AI memory systems
- Big AI labs like Anthropic have more capability to set standards
- Role-play communities might be potential pioneers in memory standardization
Benchmarks Evolution
- Significant changes in AI benchmarks from previous years
- Emerging benchmarks include:
- Shifting focus towards frontier math, coding capabilities
- Previous benchmarks like MMLU and GPQA becoming less relevant
- LMSys Arena emerging as a key comparative platform for AI models
- Current Sweebench progress: Started at 13% this year, now around 50%
- Goal of reaching 80% by end of next year, with 90% targeted by Kowinski prize
AI Capabilities Categorization
- Mature Capabilities:
- Emerging Capabilities:
- Frontier Capabilities:
- Niche Capabilities:
Inference and Pricing Trends
- Ongoing "inference race to the bottom"
- Significant price reductions in AI model tokens
- Example: From $40 per million tokens to 50 cents for similar ELO performance
- Two orders of magnitude improvement in pricing within a year
- Competitive landscape with models like Cloud Three Haiku and Gemini 1.5 Pro
- Throughout 2024, there has been a significant reduction in AI model pricing
- The cost of intelligence is decreasing by approximately 3 orders of magnitude this year
- Models are becoming more efficient and cheaper across different intelligence levels (ELO)
- In September, the AI pricing frontier included:
- Amazon Nova recently entered the market with competitive pricing
- Gemini 1.5 pro adjusted pricing to align with the market frontier
- From early 2024, pricing has dropped from $40-50 per million tokens to around 7.5 cents per token
- Some models like Haiku Niu saw a 4X price increase, which was potentially due to model size expansion
- The pricing trend is faster than previously predicted Moore's law-like improvements
Key AI Releases and Developments in 2023-2024
- Jeff Bezos invested in Perplexity, which is seen as a potential "new Google"
- OpenAI discussions about GPT-5 were present, but no concrete release
- ChatGPT introduced memory feature in February
- Gemini launched with significant context window (1 million tokens)
- Claude 3 by Anthropic emerged as a strong competitor to OpenAI
- Devin AI launched with a notable PR campaign, available for $500/month
- AI music generation platforms like Suno and Udo gained attention
- GPT-4 Turbo released with model efficiency improvements
- OpenAI released several significant models and products:
- OpenAI is strategically competing with Google across multiple domains:
OpenAI and Personnel Changes
- Ilya's new startup raised $1 billion, with Dan Gross becoming full-time CEO
- Focused on direct path to superintelligence, avoiding intermediate product steps
- Noted exodus of safety-oriented team members, including Jan Leike
- Entire super alignment team has left OpenAI
- Discussion of lost Scarlett Johansson-like voice for AI assistants
Future of Work and AI
- Prediction that 2025 might be the first year where AI sets the "skill floor" for certain roles
- Potential impact on jobs like customer support and software engineering
- Agents might replace or significantly change low-skill job requirements
- Speculation about OpenAI's future agent project "Operator"
- Researcher prediction about potential foreign espionage in AI labs
- Increasing awareness of security risks in AI research environments
- Prediction of always-on