Latent Space: The AI Engineer Podcast

The Winds of AI Winter (Q2 Four Wars Recap) + ChatGPT Voice Mode Preview

Overview

Content

Podcast Context and Recent Milestones

* The Latent Space Podcast has crossed 1 million downloads and 2 million Substack reads * Hosts Swix and Alessio are recording in-person in Singapore * They're discussing recent AI developments, calling it the "winds of AI winter"

Singapore Visit and AI Summit

* First-time visit for Swix, who notes Singapore's unique characteristics: - Bustling urban environment - Compact geography (can see Indonesia and Malaysia from high-rise buildings) - Dense urban infrastructure with extensive greenery

* Recently hosted Sovereign AI Summit with participants from: - Tomasic - GSD - Singtel * Discussed national AI strategy, focusing on: - Productivity growth - Workforce supplementation - Potential investments (foundation models, infrastructure, GPUs, agents) * Planning to bring AI engineer conference to Singapore next year

AI Model Landscape and Claude 3.5 Sonnet

* The hosts discuss a framework of "four words of AI" from 2023: GPU bridge vs. GPU poor, data quality wars, multimodality wars, and RAG/ops wars * Claude (by Anthropic) has emerged as a strong competitor to OpenAI, potentially marking a "non-OpenAI centric world"

* Claude 3.5 Sonnet highlights: - Currently ranked as the top model on some benchmarks for over a month - Potentially leveraged the "scaling monosematicity" research paper - Allows for model improvement by identifying and adjusting specific control features without full retraining - Represents a significant advancement in model interpretability

* Hosts have switched primarily to using Claude, finding it superior to ChatGPT in many tasks - Particularly strong in areas like summarization and instruction following - Believe open-source models might eventually replicate similar improvement techniques

Llama 3 and Synthetic Data Developments

* Synthetic data is now a critical focus for model development * Meta's Llama 3 paper provides detailed insights into synthetic data usage across domains: - Code - Math - Multilingual capabilities - Long context understanding - Tool use - Audio/voice generation

* Model development observations: - Traditional benchmarks like MMLU are becoming less meaningful - Training on large model outputs (like GPT-4) is becoming a common practice - The focus is shifting from creating large models to improving smaller models through synthetic data and fine-tuning - The 405B model is impressive but may have practical deployment challenges - Unclear how easily it can be hosted and scaled by current infrastructure providers

Shifting Competitive Landscape

* There's a noted "vibe shift" where companies like Anthropic and Meta are now seen as serious AI competitors to OpenAI * Successive model releases are seen as an important indicator of a company's potential * Examples of improving model series include: - Microsoft's FI models - Google's Gemma (Gemma 2 particularly noted as successful) - Llama 3 superseding previous open-source models

* Mistral analysis: - Mistral Large 2 released shortly after Llama 3 - Perceived as less exciting compared to Llama 3 - Released with open weights and research/commercial licenses - Currently shifting away from pure open-source model - Deprecated several previous model versions - Still has significant funding ($600 million) - Developing Mistral Nemo and Mistral Large V1 - Facing questions about their licensing and commercial viability

Hardware and Optimization Landscape

* NVIDIA currently dominates AI hardware market: - Competitors (AMD, Maddox, Etched) struggle to challenge NVIDIA - Flash Attention 3 only works on NVIDIA H100s - Custom ASIC development becomes economically viable at higher training run costs ($500M-$1B)

* Character AI: - Serves 20% of Google search traffic through LLM inference - Noam Shazir highlighted five optimization tricks - Discussed native int8 training as a potential efficiency improvement - Podcast hosts skeptical about its current use cases - Potentially exploring a sale (speculation about Google or Meta buying for $1 billion)

Emerging AI Architecture Trends

* Pre-quantizing models during training to reduce memory usage * Growing interest in hybrid global-local attention architectures * Research suggests optimal ratios of global to local attention (e.g., 1:5 ratio) * Multiple companies exploring hybrid model architectures * Potential fundamental AI building blocks: transformers, local attention models, and other emerging techniques

On-Device AI Developments

* Increasing focus on on-device AI models * Notable projects and initiatives: - Mozilla AI's Llama File - Google Chrome's Gemini Nano - Apple Intelligence - Other platforms like Llama CPP and MLX * Key advantages: local processing, privacy, potential for running on CPUs * Potential performance differences between on-device AI models * Challenges of model differentiation at smaller model sizes

Apple Intelligence and Model Routing

* Apple is developing a model routing system with 14-20 adapters * Can route to different AI providers like OpenAI or Google based on use case * Potentially aims to commoditize OpenAI by offering provider choice * Likely good for privacy and computational efficiency

* Apple's benchmarking approach: - Used mostly internal human evaluations - Only one industry-standard benchmark (IFEVAL) was used - Released Datacomp LM, a 7B language model that outperforms previous state-of-the-art models - Emphasizes open data, open weights, and open model approaches

* Future speculation: - Potential for OpenAI to develop its own phone - Apple potentially positioning itself as a central AI routing platform - OpenAI currently not pursuing on-device strategy due to API business model

Legal and Business Developments

* New York Times lawsuit against OpenAI is ongoing: - OpenAI's defense strategy involves challenging the originality of New York Times content - OpenAI claims they were close to a licensing deal with NYT before negotiations broke down

* Data licensing deals: - Reddit secured over $200 million in data licensing deals with AI providers - Reddit's recent IPO has performed well (up 25%) - FTC has opened an inquiry into data licensing practices - Reddit has updated its robots.txt to only allow Google indexing, blocking other AI companies

* Content and copyright issues: - Scarlett Johansson publicly challenged OpenAI over voice simulation - Music industry (RIAA) is pursuing legal action against AI music generation companies like Yudio and Suno - Potential Supreme Court case may determine if AI training on data constitutes "transformative use" - Newspapers are taking different approaches to AI partnerships (some collaborating, some resisting)

AI and Mathematical Performance

* Google's AlphaProof model nearly achieved a gold medal at the International Math Olympiad, scoring full marks on 4 out of 6 questions * Eliezer Yudkowsky suggests AI is close to gold medal mathematical performance, currently at silver/bronze level * Hugging Face has related research on AI mathematical problem-solving

* Insights on AI intelligence: - Discussed "jagged intelligence" - AI's inconsistent performance across different tasks - Current challenge is achieving general intelligence, not just specialized model performance - Models can solve complex math problems quickly but struggle with simple comparisons

Multimodal AI Developments

* The speaker is resubscribing to ChatGPT+ due to new proprietary models * OpenAI is developing voice technology, with a recent demo at the AI Engineer World's Fair * The demo required significant technical effort to handle real-time voice interactions

* Llama 3 and Meta AI research: - Llama 3 is expected to be a multimodal model - Uses adapters for multimodality, integrating previous Meta AI research projects - Working on voice capabilities with approximately 230,000 hours of speech recordings

* Chameleon is highlighted as a natively fused vision and language model: - Represents a more advanced approach to multimodality compared to adapter-based methods - Seen as a potential future direction for AI model development - The speaker views GPT-4 as a fully omnimodal, early fusion model

* Google's multimodal developments: - Pali Gemma was discussed, a late fusion model for extracting structured text from PDFs - Considered state-of-the-art, outperforming Amazon Textract - Google has been making significant progress with small but capable models like Gemma Nano, Gemma 2, and Pali Gemma

Google's AI Platform Strategy

* Tension between AI Studio and Vertex AI * AI Studio aims to be more developer-friendly, similar to Netlify/Vercel compared to AWS * Addressing previous complexity issues with Google Cloud Platform

Emerging AI Ecosystem Concept: LLMOS

* Introduced "LLMOS" (previously "RAG Ops Wars") * Focus on how LLMs can integrate with broader ecosystem capabilities * Beyond chatbots: enabling AI to write code, work with agents, etc. * ChatGPT plugin features represent potential startup opportunities * E2B highlighted as a code interpreter SDK service

AI Operations (Ops) Landscape

* Three broad categories identified: frameworks, gateways, and monitoring/tracing

* Frameworks: - Current prompt management tools are expensive and potentially overpriced - Developers want ways to manage prompts, integrate between product management and development - Challenges include how to store and manage prompts, configurations, and models - Most successful frameworks historically emerge from internal company solutions

* Gateways: - Primary function is to proxy different AI endpoints - Typically normalize APIs, with OpenAI as the initial reference point

* Monitoring and tracing: - Current tools focus on traditional metrics like latency (P99) - Emerging tools include LangSmith, LangFuse, and others - Developers frustrated by tool fragmentation and lack of comprehensive solutions

* Key challenges: - The AI ops market is still immature and highly specialized - There's a need for more integrated, holistic tools - Quality of AI output may be more important than traditional performance metrics - Too many specialized tools requiring multiple integrations - Lack of clear, comprehensive recommendations for developers

MLOps and Vector Databases

* The speaker sees potential parallels between MLOps and emerging AI Engineering Ops * Apple's Tellaria was highlighted as an impressive internal MLOps tool that allows: - Performance profiling of transformer layers - A/B testing of model variations - Quantization performance comparisons

* Vector databases and memory: - Vector databases are considered too low-level and primarily focused on cosine similarity - Interest in the next evolution of vector databases, particularly memory-focused approaches - Distinction between factual memory and conversation memory - Interest in long-form memory for AI agents, assistants, and chatbots - Potential for tracking conversational context over time - LanceDB is exploring multimodal embeddings, as used by Character AI

Agent Ecosystems and Communication

* Growing focus on agent ecosystems and inter-agent communication * Startups are building tooling and infrastructure for agent interactions * Many potential startup opportunities in agent-related technologies * Companies like OpenDevIn (now All Hands AI) are exploring agent coordination

* Agent communication challenges: - Current state is primarily intra-agent connectivity - Lack of standardized communication protocols - Most companies focused on internal capabilities - Need for specialized APIs for AI interactions

* Potential communication approaches: - Open API and RESTful protocols - Workflow-based long-running request/response systems - Frameworks like Autogen and Crew AI exploring inter-agent communication - Ideal future: Communication in natural language (English)

Model Cost and Efficiency Trends

* Rapid depreciation in AI model costs observed * Approximately one order of magnitude drop in cost for the same intelligence level every 4 months (previously estimated at 12 months) * Created charts comparing price efficiency frontiers in March/April 2024 and July 2024 * Models like Haiku and Mistral are moving quickly on this efficiency curve

* Strategic observations: - Emerging industry focus is on model efficiency after initially pursuing capability - Recommended model development strategy: first pursue capability, then generalization, then efficiency - Large models can generate synthetic data to train more efficient smaller models - Smaller models (like 8B) can see significant performance improvements when trained from larger models

* Business implications: - Potential strategy for startups: create economically non-viable products now, anticipating future cost reductions - Model providers face pressure to continually reduce prices - Current AI models are "good but not amazing" - not generating enough downstream value to justify high prices

AI Services and Labor Market Insights

* Most companies want AI to perform actual labor, not just provide tools for building AI * Productivity gains from AI tools often benefit employees more than companies * Emerging business model: Selling AI-powered labor services instead of software tools

* Specific AI labor service examples: - Brightwave: Provides research services for hedge funds and investment advisors - Dropzone: Performs security alert analysis (SOC analysis) - Reduces cost from $35 to $6 per alert

* Strategic observations: - Successful AI services are increasingly verticalized and focused on specific labor tasks - Startups are operating with smaller teams compared to previous tech iterations - The most promising AI applications solve specific, high-value labor challenges - Shift from "software as a service" to "services as software" - Interest in AI agents that can perform complete labor tasks - Focus on producing consumable reports rather than complex retrieval systems

Benchmarking and Evaluation Directions

* The speaker outlines 10 key areas for evaluating AI models post-MMLU, including: - Multi-step reasoning (MUSR) - Math capabilities - Instruction following - Coding benchmarks - Context utilization - Function calling - Vision/multimodality - Multilingual capabilities

* Benchmarking insights: - General academic benchmarks are useful but cannot fully capture specific product use cases - AI engineers should develop product-specific evaluations - Benchmark correlations are helpful but not definitive (e.g., IQ tests correlate minimally with job performance)

AI Voice and Audio Capabilities Exploration

* The conversation explores AI voice synthesis capabilities: - Experimenting with multiple character voices (Michelle: high-pitched, John: deep voice) - Creating bedtime story dialogues with specific vocal characteristics - Testing emotional tone and energy level detection - Exploring Chinese language capabilities, including recitation of a famous poem - Testing accent detection (successfully identifying Midwest/St. Louis and Singaporean accents) - Simulating different emotional vocal responses (laughing, crying, physical exertion)

* Audio experiments: - Exploring shepherd tone as an audio illusion - Attempting to generate musical tones (G note) - Testing the AI's ability to mimic tone and precise speech patterns - Exploring the AI's problem-solving capabilities and reasoning processes

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store