Overview

* AI model efficiency has dramatically improved, with costs dropping 100x since 2022-2023 and previously high-end models now running on personal laptops, challenging assumptions about escalating AI development costs.

* While AI models haven't experienced a massive intelligence leap beyond GPT-4, they've become significantly more versatile and accessible through multimodal capabilities (handling images, video, audio), longer context lengths, and improved interaction methods.

* Current AI agents show promise in specific domains like research assistance and coding but face fundamental reliability challenges - particularly their inability to distinguish truth from fiction - limiting their autonomous capabilities.

* The user interface for AI remains a critical bottleneck, with current LLM interfaces compared to "dropping users into a Linux terminal," highlighting the urgent need for more intuitive interaction methods beyond text prompts.

* Creative industries are beginning to incorporate AI tools into workflows, with the most effective implementations maintaining human curation and editorial oversight to establish credibility and avoid low-quality "slop" content.

Content

AI Landscape in Early 2025

Significant improvements in AI models throughout 2024, characterized by:

- Cheaper and faster models - Multimodal capabilities (images, video, audio) - Longer context lengths - Improved interaction methods

Model development observations:

- No massive step change from GPT-4 as initially expected - 18 organizations have developed models that beat the original GPT-4 - Models didn't get dramatically "smarter" but became more versatile - Computational efficiency and inference time are key development areas - Open-source models are now competitive with previous state-of-the-art models - Some advanced models can now run on personal laptops

Model Efficiency and Cost Trends

AI models are becoming more efficient, smaller, and cheaper to run:

- Microsoft's 5.4 model can now run on a MacBook Pro - DeepSeek V3 is currently the best open weights model, trained for only $5.5 million

Dramatic pricing developments:

- OpenAI models are now 100x cheaper compared to 2022-2023 - Google's Gemini 1.5 Flash model costs 0.075 dollars per million tokens - Gemini 1.5 Flash is 27 times cheaper than GPT 3.5 Turbo from a year ago

Cost reduction factors:

- Intense competition in the AI model market is driving prices down - Some providers like Google Gemini and Amazon Nova are operating profitably at these low prices - Cost of achieving GPT-4 level intelligence dropped approximately 1,000x from start to end of last year

DeepSeek's Impact and AI Efficiency

DeepSeek's model training achievement is considered a "bombshell" - challenging previous assumptions about escalating AI training costs

Key insights about DeepSeek:

- Relatively small company (around 150 employees) - Part of a quant hedge fund - Potentially demonstrating technological capabilities - Raising questions about their motivations and progress

Speculation about DeepSeek's model development:

- Possible copying or borrowing of ideas from other AI labs - Rapid rate of progress attracting attention

Emerging developments:

- DeepSeek R1: A reasoning model that can run on a laptop - Growing interest in more efficient, accessible AI models - Uncertainty about whether this efficiency improvement is sustainable - Hypothesis that "low-hanging fruit" in AI efficiency is being rapidly discovered

AI Reasoning and Limitations

Interesting AI reasoning observations:

- QWQ and QVQ models demonstrate unique "thinking out loud" characteristics - Anecdote about an AI drawing a pelican on a bicycle in SVG, which processed thoughts in Chinese - Reference to Andrei Karpathy's observation about advanced AI reasoning potentially happening in non-English languages

AI agent limitations:

- Skepticism about current AI agents due to fundamental reliability issues - Main critique centers on AI models' "gullibility" - their tendency to believe anything presented to them - Highlighted problem: AI agents cannot reliably distinguish truth from fiction - Lack of clear definition of what constitutes an "AI agent" across different contexts - Security risks demonstrated by the Claud example, where an AI was tricked into downloading malware

Agent Technology Perspectives

Cautiously optimistic outlook on agent technology:

- Development compared to the gradual progress of self-driving cars - Technological advancement viewed as a "slow cook" process over the next 10 years

Promising agent types:

- Research Assistant Agents: Viewed as most credible * Can analyze multiple sources (e.g., Google Gemini 1.5 Pro) * Capable of comprehensive research and reporting - Coding Agents: Proven to work well * Can write code, execute, and self-correct based on error messages * Continuously improving

Skeptical areas:

- Autonomous agents making independent financial decisions - Agents acting completely independently without human oversight - Fully autonomous financial agents seen as an "AGI level problem"

Specific examples:

- Stripe released an agent toolkit with virtual spending cards - Travel booking agents are a recurring technological "promise" across generations - Existing solutions like Google Flights already work effectively - Notebook LM viewed as two products: a good RAG tool with an interesting but somewhat "gimmicky" podcast feature

Multimodal AI Capabilities

Rapid advancement in multimodal AI capabilities:

- Vision and audio models have made significant progress - Video processing now involves capturing images per second and feeding them into AI models - ChatGPT iPhone app allows real-time video interaction and object/scene identification

Specific model observations:

- GPT-4 Vision was impressive initially - Google Gemini 1.5 Pro improved multimodal capabilities - Recent models can process audio and images simultaneously - Gemini Flash offers free-tier capabilities like continuous photo capture and prompting

Cost and scalability improvements:

- Processing images has become extremely cost-effective - Example: Generating captions for 68,000 photos would cost only $1.68 - Video processing is now economically feasible

Video generation models:

- Sora and Google's VO2 were significant launches this year - Discussion suggests VO2 might be technically superior to Sora - The publicly released Sora might be a "lite" version

AI in Creative Industries

Sora and generative AI in filmmaking:

- OpenAI's strategy appears to be developing Sora for Hollywood studios while letting others experiment with Sora Lite - Generative AI likely to first impact film production in background and marginal elements, similar to early CGI techniques - The VFX team for "Everything Everywhere All at Once" already used RunwayML in their workflow - Potential AI applications include generating background video, music, and sound effects

Industry and cultural perspectives:

- Perceived cultural resistance in Hollywood against AI technologies - Some collaborative efforts emerging, like generative AI video creative hackathons - Chinese AI models (Hyluo, Kling) making significant progress in video generation - Video avatar companies like HeyGen and Swix developing specialized AI video technologies

AI content creation workflows:

- HeyGen used to create avatar-based lip-synced videos from audio recordings - Potential for AI influencers, though current attempts are often novelty-driven - AI tools seen as workflow simplifiers that help creators do more ambitious work

Credibility considerations:

- Critical importance of human credibility in content creation - AI can generate variations, with humans selecting and endorsing final output - Credibility comes from human review and willingness to "put your name" behind content - LLMs cannot inherently establish credibility

UI Challenges and Innovation

Concept of "Slop" in AI content:

- Defined as AI-generated content that is unrequested and unreviewed - Crucial distinction is human curation/editorial review

LLM user interface challenges:

- Current LLM interfaces compared to dropping users into a Linux terminal - Urgent need for a more intuitive, user-friendly interface - Parallel drawn to how GUI replaced command-line interfaces

Potential UI innovation directions:

- OpenAI's canvas collaboration interface - Drawing-based UI that translates sketches into functional interfaces - Prompt-driven UI development * Generating custom HTML/JavaScript interfaces based on user prompts * Potential for dynamic, interactive interfaces

Future UI vision:

- LLMs creating custom interfaces with interactive elements, sliders, map selection tools - Goal: More precise, intuitive interaction with AI models - Current limitation: Interfaces aren't yet "closing the loop" by learning from user interactions

Software Creation and Local AI Models

AI-powered software development:

- LLMs enabling easier software creation, with tools like Bolt allowing zero-shot app generation - Potential for creating custom dashboards, web applications, and data exploration tools via prompts - AI integration in productivity tools (like Gemini in Gmail and Google Sheets) simplifying complex tasks

Complexity and usability challenges:

- While AI tools are becoming more powerful, they're also becoming more complex - Many AI features have undocumented limitations and edge cases - Understanding the full scope of what's possible requires increasing technical expertise - Challenges include issues like CORS headers and API access limitations

Local LLMs and personal computing:

- Local AI models have significantly improved in the past three months - While not yet matching top-tier hosted models like Claude 3.5 Sonic, local models are now practically usable - RAM limitations currently pose a challenge for running large models (e.g., Llama 370B) - Future laptops with more RAM could make local AI models more viable - NVIDIA's new $3,000 128GB machine represents an interesting development in local AI computing

Recommended local AI applications:

- MLC Chat (iPhone): Runs Llama 3B, good for creative tasks like generating movie plot outlines - Llama: Recommended as an easy entry point for running local models - LM Studio: Best user interface for local AI models - Open Web UI: Provides a good interface for Ollama models - Local models range from 2GB to 20-30GB in size - Smaller models (3B) are becoming more capable

Practical AI Tools and Industry Landscape

Recommended AI tools:

- Mac Whisper: Desktop app that can pull audio/transcripts from YouTube videos, works with MP3 files - Super Whisper: Speech-to-text tool that uses GPT-4 to clean up and rewrite transcripts - Rosebud: AI journaling app, highlighting potential for AI in mental health applications - Riverside: Recording platform with smart editing feature that automatically manages multi-track video editing, saving significant time in post-production

OpenAI and AI landscape assessment:

- OpenAI no longer the unambiguous market leader - Facing talent loss challenges - Competitive pressure from Google Gemini and Anthropic - GPT-4 helped maintain their position

LLM criticism perspectives:

- Current AI discourse lacks nuanced criticism - Typical criticism focuses on environmental impact, training data plagiarism, unlicensed data usage - The speaker argues that LLMs are valuable when used correctly, despite their tendency to hallucinate

Key concerns about AI technology:

- Training data usage (potentially legal but perceived as unfair) - Environmental impact - Potential job displacement in unexpected sectors like art and law

Regulatory Considerations and Emerging Technologies

Regulatory perspectives:

- Previous AI regulation attempts (e.g., White House, California) have been ineffective - Regulations often try to "regulate the last war" instead of addressing current technological developments

Two areas of potential AI regulation:

1. Preventing opaque AI decision-making in critical areas like insurance claims 2. Strengthening privacy laws to protect user data and prevent unauthorized training

Wearable technology resurgence:

- AI-enabled wearables becoming surprisingly affordable, increasingly capable, with decent battery life - Companies like Limitless (formerly Rewind) developing voice-recording wearables - Potential use cases include workplace meeting recording, personal memory preservation - AI transforming smart glasses as a product category - Future potential for integrated technologies combining smart glasses, advanced earbuds, LLMs, with smartphone as a central device

Privacy considerations:

- Ongoing societal discussions needed about acceptable use of recording technologies - Importance of user choice and consent in data collection

Upcoming Events and Projects

AI.engineer conference in New York City on February 20-21:

- February 20th: Leadership day for management/VPs/CTOs - February 21st: Engineer day for individual contributors - Will feature labs from DeepMind, Anthropic, Meta, OpenAI - Registration website: apply.ai.engineer

Simon Willison's work:

- Open source tools for data journalism - Dataset.io (data publishing/exploration platform) - Currently developing AI tools for the platform - Plans to add LLM-powered features for query crafting and dashboard building - Hosts Tech Meme Ride Home, a daily 15-minute tech news podcast - Personal blog: simonwillison.net

[Ride Home] Simon Willison: Things we learned about LLMs in 2024