Latent Space: The AI Engineer Podcast

How NotebookLM Was Made

Overview

Content

Podcast Introduction and Guest Backgrounds

* This Latent Space podcast episode features Ryza Martin and Usama Shafgat from Google discussing their work on Notebook LM.

* Ryza Martin's Background: * Leads the Notebook LM team inside Google Labs * Previously worked in payments and ads at Google * Almost quit multiple times but found interesting teams/projects * Has been at Google Labs for two years * Considers herself a "zero-to-one" type of person

* Usama's Background: * Previously worked on Google's data center supply chain planning * Transitioned through Area 120 (now defunct) to Google Labs * Was involved in building a creator commerce platform called Kaya * Worked on AI Test Kitchen, which was an early platform for testing LLM prototypes

* Google Labs Context: * Relatively new organization (about two years old) * Mandate is to build AI products * Works closely with DeepMind * Focus on experimenting and finding what resonates with users

Notebook LM Origins and Evolution

* Initially announced as Project Tailwind, part of Google's strategy to develop practical AI products

* Project Origins: * Started as "Talk to Small Corpus" initiated by someone named Adam * Core idea was using Large Language Models (LLMs) to interact with data * Initial motivation was to help adult learners study more effectively by "talking to" textbooks

* Early Development and Features: * Initial prototype called Project Tailwind * Early features included: - Q&A with documents - Automatic document summarization - Generating key document topics - Supporting up to 10 documents

* Development Milestones: * Presented at Google I/O in 2023 * Launched a Discord server for rapid user feedback * Progressively added features like note-taking and follow-up question generation * Expanded geographical reach (United States, then over 200 countries) * Added multi-language support

* User Insights: * Users' first action was typically requesting a document summary * Unexpected user growth in Japan, with users appreciating translation capabilities * LLM ability to adapt explanations to user's preferred language and communication style

* Technical Approach: * Initially used Retrieval-Augmented Generation (RAG) with limited context length (2K-4K tokens) * Focused on making complex texts more accessible through conversational interaction

Community Building and Feature Development

* Notebook LM Community: * 65,000 members on Discord * Community helps quickly identify system issues and server downtimes * Provides insights into user motivations and use cases * Allows team to understand what features are working or not

* Feature Development Process: * Team uses feature flagging and Mendel experiments to test and roll out features * Recently removed a text highlighting/transformation feature due to low usage * Aims to maintain a strong core feature set and be willing to "unlaunch" less critical features * Someone on Twitter managed to bypass feature flags and expose experimental features

* Content Transformation Project: * Initially developed outside of Notebook LM * Focused on transforming content across different modalities * Goal was to create more engaging, listenable audio formats * Explored conversational-style content transformation * Discovered that the model can self-prompt and unroll information density through the conversational approach

Audio Generation and Editorial Transform

* Key Technical Elements: * Working with DeepMind audio teams * Using Gemini 1.5's long context absorption capabilities * Creating a non-robotic, engaging text-to-speech experience

* Core Innovation - "Editorial Transform": * Goes beyond simple text reading * Creates dynamic dialogue between AI personas * Introduces unpredictability and "wow" factor

* Technical Approach: * Experimenting with different model generation techniques * Using multiple system prompts * Ensuring personas have distinct perspectives * Avoiding traditional chatbot-style monotonous responses

* Steven Johnson Collaboration: * New York Times bestselling author joined the team * Brought vision of creating a "tool to help me think" * The team focused on preserving the element of surprise and engagement in AI-generated content

Research Workflow and User Insights

* Steven's Expert Workflow: * The team observed Steven's thorough research and self-questioning techniques * Steven uses tools like Notebook LM to their absolute limits, pushing technological boundaries

* Product Development Goals: * Transform Steven's expert workflow into a tool accessible to everyday people * Steven was involved by describing desired features and use cases * Team aims to simplify complex workflows and reduce user effort to "zero work"

* Technological Capabilities: * Notebook LM supports multiple data sources (Google Drive, PDFs, MP3s) * Tool can perform image recognition, including processing handwritten historical documents (e.g., Marie Curie's notes) * Steven uses the tool for advanced research and writing purposes

* Key Challenges: * Bridging the gap between expert-level tool usage and mainstream user accessibility * Developing intuitive onboarding that makes complex research techniques approachable * Creating a product that can replicate sophisticated research workflows

* Comparative Perspective: * Draws parallel to OpenAI's approach with Andrew Mason in using AI as a "tool for thought" * Suggests that users who push AI tools to their absolute limits are relatively rare

Product Dimensions and Engineering Challenges

* Three-Dimensional Product View: * Source inputs * Capabilities * Outputs

* Current Limitations in Source Support: * Handwritten notes * DocX and PowerPoint files * Combining images and PDFs with text * Complex multimodal inputs

* Output and Knowledge Sharing Focus: * Primary user goal is often creating something new, not just Q&A * Roadmap includes: - Shared notebooks - One-click document generation - Outputs that maintain user's style and brand guidelines

* Engineering Challenges: * Handling multimodal inputs with large source collections * Managing context window limitations * Integrating Gemini's image understanding capabilities * Developing feedback loops between product and modeling teams

* Audio Generation Insights: * Audio model aims to mimic human characteristics: - Natural intonation - Breathing - Pauses - Laughter * Current audio transcription is "lossy" and doesn't capture emotional nuances * Existing audio output has a consistent "deep dive" format that is generally upbeat

Use Cases and Development Approach

* Popular Use Cases: * LinkedIn profile analysis * Startup founders testing landing page value propositions * Personal website "about" page reviews * Dream journal transcriptions * Google performance review audio summaries * Wikipedia article summarization (Andrej Karpathy created a Spotify podcast channel from Wikipedia articles)

* User Reactions: * Users found the tool therapeutic and confidence-boosting * The tool creates human-sounding audio overviews

* Product Development Approach: * Team prioritized internal "dogfooding" and critical listening over traditional engineering metrics * Focused on maintaining a high bar for human-like audio quality * Used iterative, opinionated approach to improvement * Relied on team's collective intuition rather than solely on formal rating processes

Evaluation and Quality Improvement

* Evaluation Process: * Team worked on improving both audio and transcript generation * Focused on making output entertaining, which is challenging to quantify * Used a Likert scale for formal rater evaluations * Broke down "entertainment" into multiple factors

* Key Evaluation Dimensions: * Entertainment and engagement * Avoiding hallucinations * Safety considerations * Coherent narrative and structure * Consistency and groundedness

* Challenges in Evaluation: * Non-deterministic AI models can produce variable outputs * Dealing with potential "bad luck" in individual generations * Difficulty in ensuring consistent quality across different outputs

* User Control and Variation: * Users can influence output by: - Changing notebook title - Adding show notes - Changing output language * These variations are less tested compared to core generation aspects

* Team Dynamics: * Multiple team members (Usama, Steven) act as "tastemakers" * Collaborative approach to refining AI-generated content * Iterative process of catching and improving subtle details in generation

Principles of Engaging Audio Content

* Key Principles for Compelling Audio: * Vary tone and speaking speed * Create narrative tension * Withhold and gradually reveal information * Avoid complete agreement between speakers * Mimic natural human speech patterns * Use interjections and conversational cues

* Content Design Strategies: * Structure content with a clear ultimate goal * Transform information from static sources into dynamic dialogue * Inject different perspectives on the same topic * Create narrative depth by diving into and returning from sub-topics

* Character/Persona Development: * Recognizing the importance of distinct personas in podcasting * Experimenting with character backstories and personalities * Noting challenges in maintaining consistent character traits with AI models * Observing audience attachment to specific podcast personas

* Challenges in AI Content Creation: * AI engineers are experimenting with making audio engaging without specialized expertise * Linguistics training doesn't necessarily translate to being good at language use * Interacting with chatbots is initially novel, but humans make the interaction interesting

* Reflections on Humor in AI: * Humor is extremely difficult to generate * Humor is highly contextual * The team views humor as a potential marker of Artificial General Intelligence (AGI)

* Current Project Stage: * Still in very early days (only 2-3 weeks old at time of recording) * Continuing to study how people are actually using the technology * Open to future improvements and expert engagement

AI-Generated Content Possibilities

* Personalization Benefits: * AI enables personalized content transformation tailored to individual consumption preferences * Content can be regenerated to match personal tastes, moving beyond universal appeal * AI allows creation of content for niche or previously uninteresting topics (e.g., personal diaries, city council meetings)

* Emerging Content Possibilities: * Potential to convert information into preferred formats (e.g., converting 100-slide decks into 16-minute audio summaries) * Ability to generate highly personalized content that wouldn't traditionally exist * Content becomes more engaging when presented through interesting "personas" or perspectives

* AI Product Design Perspectives: * Exploring AI as more than just a tool, but as a way to create interactive, dynamic content * Emphasis on letting AI "personas" develop their own perspectives and interactions * Goal is to make AI-generated content inherently interesting, not just functionally useful

* Comparative Content Observations: * Traditional content creators (like YouTubers, Mr. Beast) optimize for broad appeal * AI can potentially create more niche, personally relevant content * The novelty lies in transforming mundane information into engaging narratives

AI Approach and Strategy

* Two Main AI Approaches: * "Compound AI" (Databricks model) with chained small models * "Open AI" model with large prompts * Optimal approach is typically a spectrum between these two extremes

* Choosing an AI Strategy Depends On: * Specific task requirements * Desired outcome * Engineering goals (e.g., delighting users, simplifying workflows)

* Key Philosophical Insights: * AI development is both a science and a craft * Having a strong point of view about value is more important than technical capability * Models will rapidly evolve, so focus should be on creating value today

* Product Development Considerations: * Balancing between end-user product experience and infrastructure for developers * Potential API development

* Language and Multilingual Support: * Working on adding more languages * Ambitious goal of supporting 100-200 languages * Interest in supporting niche dialects (e.g., local Italian dialects) * Leveraging expertise of speech and modeling teams

Dialect Support and Product Philosophy

* Dialect and Voice Model Development: * Google's speech team is working on dialect-specific language models * Challenges include maintaining consistent dialects across different regional variations * Potential future features may include: - Selecting specific dialects - Contributing/customizing personal voice models * Requires significant work to ensure reliable speech quality

* Product Development Philosophy: * Focus on creating streamlined, "magical" user experiences * Preference for simple, one-button solutions over complex configuration options * Prioritize user insights and delightful interactions

* Technical Challenges: * Adding new features (or "knobs") is not as simple as adding parameters * Requires comprehensive quality reassessment for each modification * Evaluation becomes more complex with increased customization options

* Upcoming Plans: * Planning to launch audio-related features * Likely to start with a "fast follow" approach rather than a fully comprehensive solution * Iterative development based on user feedback

Product Strategy and Future Direction

* Vision for Notebook LM: * Currently exploring different output modalities, with audio being an initial focus * Core value proposition is a flexible framework where users can: - Bring their own sources - Process information - Generate outputs across different formats

* Monetization and Product Development: * Audio feature is seen as a "hook", but users stay for other functionalities * Long-term business potential lies in the comprehensive product package, not just audio * Prioritizing features that deliver clear user utility

* Development Approach: * Cautious about rapidly introducing new features like artifact generation * Want to ensure differentiating value before shipping new capabilities * Willing to "fast follow" competitor innovations, but with a high bar for implementation

* Future Considerations: * Exploring potential for code integration (e.g., GitHub connection) * Interested in expanding interaction with data and documents * Open to learning from user feedback and market signals

Code Understanding and User Engagement

* NotebookLM Code Capabilities: * The tool has code understanding capabilities * Shared anecdote about a student using NotebookLM for computer science homework - The AI could identify errors in homework without giving direct answers - The student was interested in learning, not just getting solutions

* Technical Exploration: * The team is exploring real-time chat functionality * Actively considering the value proposition of interactive features * Recognizing the complexity of real-time chat implementation, especially API interruptions

* Call to Action: * Invitation to try NotebookLM at notebooklm.google.com * Request for user feedback, specifically: - Is the tool useful? - Why or why not? - Long-term usage intentions

* Advice for AI Product Managers: * "Always be building" * Personally experiment with different AI tools and APIs * Maintain discipline in learning and exploring technologies * Test and understand product functionality firsthand

Final Insights on AI Product Development

* Key Development Insights: * The most magical moments occur when pushing the boundaries of model capabilities * Showing early prototypes to others can lead to creative improvements * It's important to challenge AI models and not just aim for constant success

* Product Development Principles: * Have a strong opinion about user actions and help guide users * Don't be afraid to iterate and experiment * Combining multiple models can lead to better results * More "thinking time" consistently improves AI performance

* Observations on Product Innovation: * Successful products often emerge unexpectedly, not through planned big launches * notebook.lm was compared to a "ChatGPT moment" for Google * Organic, focused development can be more effective than trying to create a massive launch

* Personal Anecdotes: * The speaker shared an experience of requesting more TPUs (Tensor Processing Units) through a humorous "subtweet" * Compared learning to use new technology (like Google search) to the current challenge of using AI effectively

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store