Overview
* AI model efficiency has dramatically improved, with costs dropping 100x since 2022-2023 and previously high-end models now running on personal laptops, challenging assumptions about escalating AI development costs.
* While AI models haven't experienced a massive intelligence leap beyond GPT-4, they've become significantly more versatile and accessible through multimodal capabilities (handling images, video, audio), longer context lengths, and improved interaction methods.
* Current AI agents show promise in specific domains like research assistance and coding but face fundamental reliability challenges - particularly their inability to distinguish truth from fiction - limiting their autonomous capabilities.
* The user interface for AI remains a critical bottleneck, with current LLM interfaces compared to "dropping users into a Linux terminal," highlighting the urgent need for more intuitive interaction methods beyond text prompts.
* Creative industries are beginning to incorporate AI tools into workflows, with the most effective implementations maintaining human curation and editorial oversight to establish credibility and avoid low-quality "slop" content.
Content
AI Landscape in Early 2025
- Significant improvements in AI models throughout 2024, characterized by:
- Cheaper and faster models
- Multimodal capabilities (images, video, audio)
- Longer context lengths
- Improved interaction methods- Model development observations:
- No massive step change from GPT-4 as initially expected
- 18 organizations have developed models that beat the original GPT-4
- Models didn't get dramatically "smarter" but became more versatile
- Computational efficiency and inference time are key development areas
- Open-source models are now competitive with previous state-of-the-art models
- Some advanced models can now run on personal laptopsModel Efficiency and Cost Trends
- AI models are becoming more efficient, smaller, and cheaper to run:
- Microsoft's 5.4 model can now run on a MacBook Pro
- DeepSeek V3 is currently the best open weights model, trained for only $5.5 million- Dramatic pricing developments:
- OpenAI models are now 100x cheaper compared to 2022-2023
- Google's Gemini 1.5 Flash model costs 0.075 dollars per million tokens
- Gemini 1.5 Flash is 27 times cheaper than GPT 3.5 Turbo from a year ago - Intense competition in the AI model market is driving prices down
- Some providers like Google Gemini and Amazon Nova are operating profitably at these low prices
- Cost of achieving GPT-4 level intelligence dropped approximately 1,000x from start to end of last yearDeepSeek's Impact and AI Efficiency
- DeepSeek's model training achievement is considered a "bombshell" - challenging previous assumptions about escalating AI training costs
- Key insights about DeepSeek:
- Relatively small company (around 150 employees)
- Part of a quant hedge fund
- Potentially demonstrating technological capabilities
- Raising questions about their motivations and progress- Speculation about DeepSeek's model development:
- Possible copying or borrowing of ideas from other AI labs
- Rapid rate of progress attracting attention - DeepSeek R1: A reasoning model that can run on a laptop
- Growing interest in more efficient, accessible AI models
- Uncertainty about whether this efficiency improvement is sustainable
- Hypothesis that "low-hanging fruit" in AI efficiency is being rapidly discoveredAI Reasoning and Limitations
- Interesting AI reasoning observations:
- QWQ and QVQ models demonstrate unique "thinking out loud" characteristics
- Anecdote about an AI drawing a pelican on a bicycle in SVG, which processed thoughts in Chinese
- Reference to Andrei Karpathy's observation about advanced AI reasoning potentially happening in non-English languages - Skepticism about current AI agents due to fundamental reliability issues
- Main critique centers on AI models' "gullibility" - their tendency to believe anything presented to them
- Highlighted problem: AI agents cannot reliably distinguish truth from fiction
- Lack of clear definition of what constitutes an "AI agent" across different contexts
- Security risks demonstrated by the Claud example, where an AI was tricked into downloading malwareAgent Technology Perspectives
- Cautiously optimistic outlook on agent technology:
- Development compared to the gradual progress of self-driving cars
- Technological advancement viewed as a "slow cook" process over the next 10 years - Research Assistant Agents: Viewed as most credible
* Can analyze multiple sources (e.g., Google Gemini 1.5 Pro)
* Capable of comprehensive research and reporting
- Coding Agents: Proven to work well
* Can write code, execute, and self-correct based on error messages
* Continuously improving - Autonomous agents making independent financial decisions
- Agents acting completely independently without human oversight
- Fully autonomous financial agents seen as an "AGI level problem" - Stripe released an agent toolkit with virtual spending cards
- Travel booking agents are a recurring technological "promise" across generations
- Existing solutions like Google Flights already work effectively
- Notebook LM viewed as two products: a good RAG tool with an interesting but somewhat "gimmicky" podcast featureMultimodal AI Capabilities
- Rapid advancement in multimodal AI capabilities:
- Vision and audio models have made significant progress
- Video processing now involves capturing images per second and feeding them into AI models
- ChatGPT iPhone app allows real-time video interaction and object/scene identification- Specific model observations:
- GPT-4 Vision was impressive initially
- Google Gemini 1.5 Pro improved multimodal capabilities
- Recent models can process audio and images simultaneously
- Gemini Flash offers free-tier capabilities like continuous photo capture and prompting- Cost and scalability improvements:
- Processing images has become extremely cost-effective
- Example: Generating captions for 68,000 photos would cost only $1.68
- Video processing is now economically feasible - Sora and Google's VO2 were significant launches this year
- Discussion suggests VO2 might be technically superior to Sora
- The publicly released Sora might be a "lite" versionAI in Creative Industries
- Sora and generative AI in filmmaking:
- OpenAI's strategy appears to be developing Sora for Hollywood studios while letting others experiment with Sora Lite
- Generative AI likely to first impact film production in background and marginal elements, similar to early CGI techniques
- The VFX team for "Everything Everywhere All at Once" already used RunwayML in their workflow
- Potential AI applications include generating background video, music, and sound effects- Industry and cultural perspectives:
- Perceived cultural resistance in Hollywood against AI technologies
- Some collaborative efforts emerging, like generative AI video creative hackathons
- Chinese AI models (Hyluo, Kling) making significant progress in video generation
- Video avatar companies like HeyGen and Swix developing specialized AI video technologies- AI content creation workflows:
- HeyGen used to create avatar-based lip-synced videos from audio recordings
- Potential for AI influencers, though current attempts are often novelty-driven
- AI tools seen as workflow simplifiers that help creators do more ambitious work- Credibility considerations:
- Critical importance of human credibility in content creation
- AI can generate variations, with humans selecting and endorsing final output
- Credibility comes from human review and willingness to "put your name" behind content
- LLMs cannot inherently establish credibilityUI Challenges and Innovation
- Concept of "Slop" in AI content:
- Defined as AI-generated content that is unrequested and unreviewed
- Crucial distinction is human curation/editorial review- LLM user interface challenges:
- Current LLM interfaces compared to dropping users into a Linux terminal
- Urgent need for a more intuitive, user-friendly interface
- Parallel drawn to how GUI replaced command-line interfaces- Potential UI innovation directions:
- OpenAI's canvas collaboration interface
- Drawing-based UI that translates sketches into functional interfaces
- Prompt-driven UI development
* Generating custom HTML/JavaScript interfaces based on user prompts
* Potential for dynamic, interactive interfaces - LLMs creating custom interfaces with interactive elements, sliders, map selection tools
- Goal: More precise, intuitive interaction with AI models
- Current limitation: Interfaces aren't yet "closing the loop" by learning from user interactionsSoftware Creation and Local AI Models
- AI-powered software development:
- LLMs enabling easier software creation, with tools like Bolt allowing zero-shot app generation
- Potential for creating custom dashboards, web applications, and data exploration tools via prompts
- AI integration in productivity tools (like Gemini in Gmail and Google Sheets) simplifying complex tasks- Complexity and usability challenges:
- While AI tools are becoming more powerful, they're also becoming more complex
- Many AI features have undocumented limitations and edge cases
- Understanding the full scope of what's possible requires increasing technical expertise
- Challenges include issues like CORS headers and API access limitations- Local LLMs and personal computing:
- Local AI models have significantly improved in the past three months
- While not yet matching top-tier hosted models like Claude 3.5 Sonic, local models are now practically usable
- RAM limitations currently pose a challenge for running large models (e.g., Llama 370B)
- Future laptops with more RAM could make local AI models more viable
- NVIDIA's new $3,000 128GB machine represents an interesting development in local AI computing- Recommended local AI applications:
- MLC Chat (iPhone): Runs Llama 3B, good for creative tasks like generating movie plot outlines
- Llama: Recommended as an easy entry point for running local models
- LM Studio: Best user interface for local AI models
- Open Web UI: Provides a good interface for Ollama models
- Local models range from 2GB to 20-30GB in size
- Smaller models (3B) are becoming more capablePractical AI Tools and Industry Landscape
- Mac Whisper: Desktop app that can pull audio/transcripts from YouTube videos, works with MP3 files
- Super Whisper: Speech-to-text tool that uses GPT-4 to clean up and rewrite transcripts
- Rosebud: AI journaling app, highlighting potential for AI in mental health applications
- Riverside: Recording platform with smart editing feature that automatically manages multi-track video editing, saving significant time in post-production- OpenAI and AI landscape assessment:
- OpenAI no longer the unambiguous market leader
- Facing talent loss challenges
- Competitive pressure from Google Gemini and Anthropic
- GPT-4 helped maintain their position- LLM criticism perspectives:
- Current AI discourse lacks nuanced criticism
- Typical criticism focuses on environmental impact, training data plagiarism, unlicensed data usage
- The speaker argues that LLMs are valuable when used correctly, despite their tendency to hallucinate- Key concerns about AI technology:
- Training data usage (potentially legal but perceived as unfair)
- Environmental impact
- Potential job displacement in unexpected sectors like art and lawRegulatory Considerations and Emerging Technologies
- Previous AI regulation attempts (e.g., White House, California) have been ineffective
- Regulations often try to "regulate the last war" instead of addressing current technological developments- Two areas of potential AI regulation:
1. Preventing opaque AI decision-making in critical areas like insurance claims
2. Strengthening privacy laws to protect user data and prevent unauthorized training- Wearable technology resurgence:
- AI-enabled wearables becoming surprisingly affordable, increasingly capable, with decent battery life
- Companies like Limitless (formerly Rewind) developing voice-recording wearables
- Potential use cases include workplace meeting recording, personal memory preservation
- AI transforming smart glasses as a product category
- Future potential for integrated technologies combining smart glasses, advanced earbuds, LLMs, with smartphone as a central device - Ongoing societal discussions needed about acceptable use of recording technologies
- Importance of user choice and consent in data collectionUpcoming Events and Projects
- AI.engineer conference in New York City on February 20-21:
- February 20th: Leadership day for management/VPs/CTOs
- February 21st: Engineer day for individual contributors
- Will feature labs from DeepMind, Anthropic, Meta, OpenAI
- Registration website: apply.ai.engineer - Open source tools for data journalism
- Dataset.io (data publishing/exploration platform)
- Currently developing AI tools for the platform
- Plans to add LLM-powered features for query crafting and dashboard building
- Hosts Tech Meme Ride Home, a daily 15-minute tech news podcast
- Personal blog: simonwillison.net