Llama3.1">

Latent Space: The AI Engineer Podcast

AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai

Overview

  • Multi-phase training approaches are evolving beyond rigid pre-training/fine-tuning distinctions toward more continuous, flexible parameter scheduling that incorporates original datasets into later training stages, challenging the necessity of random initialization.
  • Answer AI exemplifies a non-traditional organizational structure with no management hierarchy, emphasizing talent over credentials and operating as "a large yard with narrow fences" - giving team members flexibility while maintaining a shared vision for more practical, accessible AI research.
  • The team advocates for encoder-decoder architectures for their superior feature representation capabilities, while developing techniques to fine-tune large language models with limited computational resources through approaches like FSDP, QLora, and adapter-based fine-tuning.
  • Projects like Fast HTML (creating web applications in a single Python file) and "Dialogue Engineering" aim to bridge gaps in current development and AI interaction paradigms, moving beyond both ChatGPT-style interfaces and traditional coding environments.
  • Future research focuses on helping people learn to maintain AI-generated code, optimizing model inference through adapter distribution rather than full model merging, and exploring conceptual innovations like models that develop "sketches" before generating tokens.

Content

Podcast Context and Introduction

  • This is an episode of the "Late in Space" podcast featuring Jeremy Howard
  • Hosts are Alessio and SWIX
  • This is Jeremy Howard's second or third appearance on the podcast
  • Episode was recorded over a month ago and delayed due to Llama 3 and SAM 2 paper releases

Evolving Perspectives on Model Training

  • Jeremy Howard discusses evolving perspectives on model training approaches
  • Training steps (pre-training, instruction tuning, task training) are not as separate as originally thought
  • These steps should be treated more as a continuum
  • Howard advocates for incorporating original dataset into later training stages
  • He highlights the ability to significantly modify model behavior without starting from random weights
  • Howard is skeptical of starting model training from random weights
  • Argues there's likely always some data similarity that makes random initialization unnecessary

Multi-Phase Pre-Training and Optimization

  • Snowflake released a model called Snowflake Arctic with a three-phase training approach
  • Training phases involved gradually reducing web text and increasing code percentage
  • Discussion suggests multi-phase training is becoming more explicitly discussed in research
  • Preference for flexible parameter scheduling rather than rigid schedules
  • Mention of Meta's work on "schedule-free" optimizers
  • Emphasis on having configurable hyperparameters with good default settings

OpenAI Governance and Organizational Design

  • Critique of OpenAI's governance structure before Sam Altman's firing
  • Observation that the non-profit/for-profit hybrid model was fundamentally flawed
  • Argument that financial incentives (equity) made the governance model unsustainable
  • Reflection on how companies tend to become "sociopathic" and devour their original mission

Creating Better Company Structures

  • Discussion focuses on creating companies that are less "sociopathic" and more aligned with founders' intentions
  • Key strategies for maintaining company values include:
- Setting up legal structures that enforce long-term value principles - Using specific legal mechanisms like voting agreements - Becoming a Public Benefit Corporation (PBC)
  • PBC Benefits:
- Allows companies to reject acquisition offers that conflict with their stated public benefit - Provides legal protection against being forced to make decisions solely for short-term financial gain - Can be implemented with minimal legal complexity

Hiring Philosophy and Team Building

  • Emphasis on talent and potential over traditional institutional credentials
  • Interest in candidates with non-traditional backgrounds
  • Recognition that exceptional work often comes from people with unique life experiences
  • Valuing people who:
- Succeed despite constraints - Take risks - Are creative - Are tenacious - Are open-minded
  • At Answer AI, many team members experience imposter syndrome
  • Mutual intimidation between developers and researchers, who each view the other group as impressive
  • Key philosophical points:
- It's unreasonable to expect to be the best at everything - Being in an environment where you're not the best at everything can be healthy - The goal is collective learning and bringing different skills together

Answer AI's Organizational Approach

  • Brief mention of Answer AI, Howard's startup
  • Noted for shipping multiple open-source projects quickly
  • Small team (maximum 12 members)
  • No traditional management hierarchy:
- No managers telling people what to do - Collaborative, experimental approach - Focus on learning from each other
  • Specific collaborative examples:
- Ben Clavier initiating a new Bert project by gathering experts - Benjamin Warner creating a hackable Transformers implementation - Organic, self-driven collaboration without top-down direction
  • No required meetings, but regular meetings across time zones
  • Everyone is interviewed by the entire company during recruitment
  • Nearly all candidates in the recruiting pipeline have been hired
  • "A large yard with narrow fences" - giving team members flexibility while maintaining a shared vision

Research and Technical Focus

  • The group shares a common vision and critique of current AI research
  • They believe current research is:
- Too expensive - Too complicated - Focused on unnecessary foundation models
  • They prioritize practical research with real-world outcomes
  • Technical improvements like:
- Transitioning from LORA to DORA - Creating VLLM extensions - Exploring quantized model training - Improving web GPU programming

Notable Team Members and Talent

  • Examples of standout people include:
- Ben Clavier: Writes distinctive, high-quality code - Vic (ex-DataQuest CEO): Successful startup founder, won Kaggle NLP competition - Karim: Created state-of-the-art Turkish language model independently - Jono Whittaker (creative tinkerer) - Benjamin (strong community contributor without formal qualifications) - Austin (experienced AI leader with diverse background)

Model Architecture Perspectives

  • Discussion of encoder-decoder vs. decoder-only model paradigms
  • Key arguments for encoder-decoder models:
- Better feature representation of input information - More effective for tasks requiring context understanding - Critical for translation and complex encoding tasks
  • Model architecture observations:
- Encoder-only models work well for classification tasks - Decoder-only models require more training resources and larger model sizes to be competitive
  • Current research trend tends to focus on incremental improvements rather than exploring promising approaches
  • Renewed interest in BERT models, particularly BERT24
  • Growing interest in state-based models (RNNs, LSTMs, XLSTM)

Technical Development Challenges

  • Developing techniques for fine-tuning large language models with limited computational resources
  • Key techniques mentioned include:
- FSDP (Fully Sharded Data Parallel) - QLora (Quantized Low-Rank Adaptation) - Adapter-based fine-tuning
  • The development process was extremely complex and challenging
  • Significant obstacles included:
- Poorly documented libraries - Complicated CUDA code - Interconnected Hugging Face ecosystem components - Lack of clear, minimal working examples
  • The goal was to prove it's possible to fine-tune large models (like 70b) on more modest hardware (e.g., 4090 GPU)

Performance and Validation Challenges

  • Significant challenges in the open source AI ecosystem with performance evaluation
  • Many claims about model capabilities don't hold up under actual testing
  • Developing AI models requires extensive "janitorial work" and tenacious effort
  • Systems implementation is complex, not just theoretical mathematics

Inference and Model Optimization

  • Working on inference performance optimization
  • Goals include:
- Avoiding model merging - Promoting quantized models with adapters - Making model downloads and inference faster
  • Collaborating with communities like CUDA Mode, PyTorch team, and HQQ quantization library
  • Recommendation is to distribute merged adapters, not full merged models
  • Merging adapters can often produce better results
  • Adapter-based approach allows smaller downloads, faster inference, and more efficient customization

Fast HTML Web Development Project

  • Working on a new web development tool called Fast HTML
  • Fast HTML allows creating complete web applications in a single Python file
  • Unlike Streamlit and Gradio, it works directly with web foundations
  • Built on top of Starlette and closely matches Fast API's interface
  • Key features:
- No separate template, CSS, or JavaScript files - Can create components using libraries like Daisy UI, Bootstrap, Shoelace - Provides built-in session and security features - Designed to be easy to use "out of the box"
  • Received support from developers like Sebastian (Fast API creator), Carson (HTMX), and Django community

AI Interaction and "Dialogue Engineering"

  • Developing a new approach called "Dialogue Engineering"
  • Created a system named "AI Magic" that increases personal productivity
  • Compares current AI interfaces to 1970s teletype interactions
  • Different approaches to AI interaction:
- ChatGPT-style chat interface (beginner-friendly) - Traditional coding environments like Visual Studio Code - A proposed middle ground focused on interactive dialogue
  • Developed libraries to improve AI API interactions:
- Claudette: A library specifically optimized for Claude - Cosette: A library optimized for OpenAI APIs

Future Projects and Research Directions

  • Planned "How to Solve It with Code" course:
- Aimed at people learning to code through AI tools - Helps learners understand how to maintain and extend AI-generated code
  • Exploring emerging application strategies:
- Comparing and combining Retrieval-Augmented Generation (RAG), in-context learning, prompt engineering - KV cache creation and persistent context storage
  • Interest in conceptual model innovations:
- JEPA and diffusion models as potential solutions for generative processes - Models that can develop a conceptual "sketch" before generating tokens - Gradually refine solutions and update state dynamically
  • Determining optimal use cases for different approaches and how various model architectures can complement each other

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store