upcoming events: AI Engineer London, AWS Re:Invent in Las Vegas, and now Latent Space ">

Latent Space: The AI Engineer Podcast

Why Compound AI + Open Source will beat Closed AI

Overview

  • Fireworks AI has evolved from a PyTorch cloud platform to a specialized generative AI inference platform, offering a comprehensive model catalog across text, audio, vision, and embedding modalities while maintaining OpenAI compatibility for easier developer adoption.
  • The company's technical edge comes from its distributed inference engine with custom CUDA kernels and unique GPU distribution approach, allowing them to chop models into pieces and scale differently based on bottlenecks while supporting multi-LoRA capabilities.
  • Lin Tian, former PyTorch team lead at Meta, envisions a future of "Compound AI" where multiple models across different modalities combine with APIs, storage systems, and knowledge bases to solve complex business problems.
  • Fireworks AI is announcing a new declarative AI model inspired by OpenAI's O1, continuing their philosophy that developers should specify what they want, not how to achieve it, while acknowledging the ongoing tension between specialized and generalized models.
  • Despite being a small team of 40 people competing with larger AI companies, Fireworks AI maintains a culture focused on results and customer obsession, currently expanding across frontend, cloud infrastructure, backend optimization, and research roles.

Content

Podcast Context and Company Background

  • Recorded at Fireworks AI HQ in Redwood City as part of the Latent Space series
  • Features Lin Tian, CEO of Fireworks AI, with hosts Alessio and Swix
  • Fireworks AI recently celebrated its two-year company anniversary
  • The company experienced early challenges including the Silicon Valley Bank run and operational data deletion
  • Underwent significant scaling and team building in a short period

Lin Tian's Background and Industry Observations

  • Previously ran the PyTorch team at Meta
  • Has extensive background in distributed systems and database management
  • Observed AI's critical role in driving data generation
  • Witnessed Meta's transition from mobile-first to AI-first strategy
- Mobile engagement generated unprecedented user data - This data subsequently powered AI development
  • As a founder, initially focused more on product and go-to-market strategy
  • Learned that operating a company involves more complexity than expected
  • Noticed a proliferation of AI frameworks with most focusing on production
  • Saw an opportunity to contribute to AI infrastructure development

PyTorch Origins and Fireworks AI Evolution

  • PyTorch originally started as a research-focused framework created by Sumit to address pain points
  • Meta strategically established PyTorch to drive massive open-source adoption
  • The framework evolved from research-only to supporting both research and production workloads
  • Took five years to architect PyTorch to handle production concerns like stability and low latency
  • Fireworks.ai was initially envisioned as a PyTorch cloud platform in 2022
  • Decided to take a verticalized approach instead of a horizontal platform
  • Pivoted focus to generative AI after ChatGPT's announcement
  • Chose to specialize in generative AI inference, specifically for PyTorch models
  • Believed GenAI would drive significant consumer and developer application innovation
  • Launched public platform in August 2022
  • Started with a distributed inference engine and expanded to a full platform with multiple product lines

Product Strategy and Market Positioning

  • The company's goal is to make AI accessible to app developers and product engineers
  • Recognized that generative AI is fundamentally different from previous AI technologies
  • Foundation models eliminate the need for companies to train AI from scratch
  • Decided early to be OpenAI-compatible to ease developer adoption
  • Focused on building a distributed inference engine with custom CUDA and ROCKER kernels
  • Created a specialized PyTorch build
  • Aimed to create a "one size fits all" inference platform that works across different workloads

Industry Dynamics

  • OpenAI has become a de facto standard for AI APIs
  • Meta is developing Llama stack, attempting to create a standardized open-source ecosystem
  • Uncertainty exists about whether Llama stack will gain widespread community adoption
  • Compound AI is emerging as a new conceptual framework in the AI industry
  • Generative AI has dramatically lowered the barrier to entry for AI technology
  • Companies are now focusing on making AI models easily consumable rather than building from scratch

Model and Service Offerings

  • Introduced Fire Optimizer, a customization engine that helps users optimize across quality, latency, and cost
  • Offers a comprehensive model catalog including:
- Text models - Audio models (transcription, translation, speech synthesis) - Vision models - Embedding models - Text-to-image, image-to-image, and text-to-video generation models
  • Provides open-source alternatives to OpenAI's services, potentially covering more modalities
  • Developed custom kernels (FireAttention) for improved model performance, especially for language models

Technical Capabilities and Infrastructure

  • Unique GPU distribution approach: chopping models into pieces and scaling differently based on bottlenecks
  • Distributed across regions (North America, AMIR, Asia) with global load balancing
  • Manages various hardware scales
  • Provides a developer experience layer for AI infrastructure
  • Offers serverless endpoints
  • Has a distributed inference engine with "Fire Optimizer" that improves performance through continuous feedback
  • Multi-LoRA capability: Can upload LoRA adapters at the same cost as base models
  • Can sustain 100-1000 LoRA adapters on a single base model
  • Reduces memory footprint by sharing base model across adapters
  • Maintains consistent token pricing

Compound AI System Concept

  • Proposed solution for complex business use cases
  • Involves combining:
- Multiple models across different modalities - Public and proprietary APIs - Storage systems - Databases - Knowledge systems
  • Closely working with vector database providers (MongoDB is an investor)

AI Model Limitations and Philosophical Approach

  • Recognize that customer use cases often don't align with training data distribution
  • Acknowledge that AI models are:
- Probabilistic, not deterministic - Not always sufficient to solve complex problems - Limited by finite training data
  • Preference for declarative systems where developers specify what they want, not how to do it
  • Focus on enabling innovation by abstracting technical complexities
  • Draws analogy between database management systems (declarative) and ETL pipelines (imperative)
  • Suggests both approaches have value and neither will completely replace the other

New Model Announcements and Development

  • Announcing a new declarative AI model inspired by OpenAI's O1
  • Previously released "Fire Function" - a function calling model with multiple API dispatch capabilities
  • Model will be benchmarked and potentially compete with Gemini/Meta models
  • Acknowledging OpenAI's high-caliber team while believing multiple approaches can achieve similar goals
  • Observing a trend of shrinking gaps between open-source and closed-source models
  • Predicting future model landscape will involve specialized "expert models" in narrow domains
  • Internally testing model's capabilities, including asking it to define AGI
  • Model will be an endpoint service, not open-source, with pricing still undecided

AI Development Trends and Scaling

  • Discussion of AI specialization vs. generalization, referencing the "bitter lesson"
  • Prediction that generalized models will eventually supersede domain-specific models
  • Approaching limits of training data, with synthetic data generation becoming crucial
  • Shifting focus from training scaling law to inference scaling law
  • Significant performance improvements over previous models
  • Small team (40 people) competing with larger AI companies

Team, Culture and Hiring

  • Most team members come from Meta and startups
  • Strong cultural alignment around two core principles:
- Focus on results - Customer obsession
  • Willing to go above and beyond for customers (e.g., deploying models mid-night, weekend work)
  • Currently hiring and expanding rapidly across multiple roles:
- Frontend engineers - Cloud infrastructure engineers - Backend system optimization engineers - Applied researchers - Post-training and fine-tuning specialists

Cursor Partnership and Technical Innovation

  • Cursor viewed as a unique team with high technical caliber
  • Cursor seeks strategic partnerships rather than building everything internally
  • Collaboration involved scaling infrastructure across multiple regions and developing high-intensity inference stack
  • Partnership built on mutual trust and close communication
  • Developed "Fire Optimizer" product line
  • Speculative decoding is not static - multiple approaches exist:
- Pairing small and large models - Using different decoding techniques (eagle pads, Medusa heads)
  • Optimization strategies are workload-specific
  • Demonstrated capability of achieving 1,000 tokens per second

Development Cycles and Business Strategy

  • Two main development cycles identified: experimentation and post-product market scaling
  • During experimentation, focus is on finding a good model and product fit
  • Post-market fit involves optimizing across quality, latency, and cost
  • Product decisions should prioritize overall user experience
  • Aim to make margin on open-source models
  • Pricing should correlate with delivered value
  • Referenced OpenAI's 2024 financials: $4 billion revenue, with significant costs in compute and research

Community Engagement

  • Seeking feedback from application developers on what works well, wishlist for improvements, and pain points
  • Active Discord channel for communication
  • Offer office hours with dev rel and engineering teams
  • Typically launch new products to small groups first
  • Excited to see how the community will use and test their models

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store