Overview
- Snips is an AI-powered podcast app that transforms passive listening into active learning by allowing users to capture and retain key insights through features like triple-tap headphone marking, AI summarization, and searchable transcripts.
- The app's technical infrastructure combines Python backend with Flutter frontend, incorporating sophisticated speaker diarization and transcript synchronization technologies that can handle dynamically inserted ads while maintaining alignment between audio and text.
- The founders advocate for "invisible AI" that seamlessly integrates into products rather than being prominently marketed, comparing future AI to electricity - eventually becoming an unremarkable utility that powers experiences.
- Snips' development philosophy emphasizes rapid iteration and user feedback over perfect AI, using a "vibe evals" approach and creative LLM evaluation techniques like using more sophisticated models to judge outputs from cheaper models.
- Future development focuses on enhancing knowledge retention through post-listening interactions, expanding content types beyond podcasts, and improving discovery through AI-powered recommendations that users can guide in real-time.
Content
AI Engineer Summit and Zurich Tech Ecosystem
* The conversation takes place in New York, with participants discussing the AI Engineer Summit and SNITs (an AI-related app) * Kevin Ben Smith found the summit valuable for connecting with like-minded AI professionals * Day two's engineering track was particularly interesting, especially talks about voice AI and AI agents * The AI community was noted for its open and collaborative attitude
* Zurich is emerging as a significant tech hub, driven primarily by ETH (technical university) * Major tech companies with AI presence in Zurich include: - Google (largest tech hub outside US) - Facebook - Apple - OpenAI - SwapBik * Historically a finance hub (home to UBS), now transitioning to a tech center * ETH and EPFL are key research institutions driving innovation * Zurich is described as a livable city with proximity to nature, mountains, and lakes * Cost of living is high, but still potentially cheaper than New York City (e.g., coffee prices)
Snips App Introduction and Concept
* Snips is an AI-powered podcast app focused on helping users learn and capture knowledge from podcasts * Initial Concept: - Originally envisioned as a social platform like TikTok for podcast clips - Users could create and share short "Snips" (podcast snippets) - Founders expected creating Snips would be difficult, but found users were actually very eager to create them * Key Features: - Triple tap headphones to automatically save and summarize interesting podcast moments - AI helps users capture and retain podcast insights - Addresses the problem of quickly forgetting podcast content after listening * Motivation: - Recognize podcasts as a major knowledge source - Help users move beyond passive content consumption - Provide tools to effectively learn and retain information from audio content * Evolution: - Started 3.5 years ago, before ChatGPT and Whisper - Initial social sharing model shifted to focus on knowledge capture and learning - Developed based on personal experiences of difficulty in manually capturing podcast insights
Founder's Background and Career Path
* The speaker studied mathematics and economics at ETH Zurich, specializing in quantitative finance * Initially not passionate about finance, more interested in mathematical modeling * Discovered machine learning through a course recommended by a friend and became immediately passionate * Worked at a bank, trying to incorporate machine learning where possible * Professional Journey: - Quit his job to join an early-stage tech startup in Zurich - Built and led an AI team over five years - Developed machine learning projects for banks, including: - Sales team models for client product targeting - Transaction text processing/clarification models - Natural language processing (NLP) for transaction descriptions * Started exploring startup ideas with a friend who initially introduced him to machine learning * This friend had a significant influence on his career trajectory
Podcast App Origin and Development
* The founders met through a background in AI and data science * They participated in HackZuric (Europe's largest hackathon) with a podcast search idea * They created a natural language search tool for podcast episodes * Won the hackathon with a demo using a Joe Rogan/Elon Musk episode * Other hackathon participants showed strong interest in their concept * Company Details: - Founded approximately 4 years ago after quitting their jobs - Currently a small team of 4 people, all technical - Team composition: 2 backend (AI) developers, 2 frontend (mobile app) developers - Developing iOS, Android, and Apple Watch apps * App Features and Challenges: - Key use case: Podcast listening during activities like running/cycling - Apple Watch app allows downloading/streaming podcast episodes - Discovered a limitation with Substack-hosted podcasts not playable on Apple Watch - Attempted to reach out to Substack about the restriction without resolution
Comprehensive App Features
* Uses ListenNotes search engine with a comprehensive podcast database * Non-default auto-download feature (must opt-in to download episodes) * Default view shows episode "snips" indicating listener interest points * Allows sharing snips on platforms like Discord and Twitter * AI-Powered Features: - Full transcription - Speaker diarization and identification - Guest mini-bio extraction - Guest photo retrieval - AI-generated chapter breakdowns with descriptions - Book mention identification and detailed book information retrieval * Advanced Search/Interaction Capabilities: - Keyword transcript search - AI-powered episode chat functionality - Ability to ask LLM-based questions about specific episode topics - Can retrieve information about mentioned books and authors - Cross-referencing guest appearances across different podcast episodes
Technical Infrastructure and Development
* Backend is 90% written in Python, hosted on Google Cloud Platform * Frontend uses Flutter framework (Dart), enabling cross-platform development for Android and iOS * Developed a unique "Shazam for podcasts" algorithm to dynamically re-sync audio and transcripts with dynamically inserted ads * Flutter Framework Insights: - Allows one codebase for Android and iOS - Front-end engineers are satisfied with the framework - Challenges include potential Google abandonment, less optimization for iOS, and limitations with Apple Watch app development * AI and Transcription Technology: - Started developing before ChatGPT/GPT-3.5 Turbo API - Initially used open-source models and attempted fine-tuning - Recognized early models had limited effectiveness - Continuously evolving their AI and transcription technology * Podcast Technology Innovations: - Developed a feature to triple-tap headphones or click a button to have AI summarize podcast insights - Created a method to handle dynamically inserted podcast ads without breaking transcript synchronization
AI Processing Infrastructure
* The speaker discusses a transformative moment in AI when the transformer architecture was first applied to continuous audio data (Wave to Work paper) * Their startup currently processes over 1 million podcasts * Transcription and processing workflow includes: - Transcription - Speaker diarization - Using LLMs to identify speakers and assign names to speech blocks * AI Model and API Usage: - Primarily using OpenAI and Google (Gemini) models - Using Perplexity for web search capabilities - Evaluating different AI search and infrastructure providers - Price is a critical consideration in model selection * Speaker Diarization: - Using open-source packages with custom tweaks - Acknowledging the complexity of diarization across different audio environments - Recognizing challenges in accurately identifying and separating speakers * Technical Considerations: - Aim to balance intelligence level with cost-effectiveness - Goal is to enable features across as many podcasts as possible - Interested in increasing competition among AI service providers to drive down costs and improve quality
Speaker Diarization and Transcription Insights
* The team works with high-quality, controlled audio typically recorded in studios * They use specific heuristics to improve speaker identification, such as: - Analyzing speaking duration (e.g., brief 30-second segments likely indicate advertisements) - Clustering speech embeddings to distinguish speakers - Leveraging podcast-specific structural knowledge * Technical Approach: - Combine transcript processing with Large Language Models (LLMs) - Use LLMs to recalibrate speaker switching points - Acknowledge that speaker diarization is not perfect but has significantly improved over time * User Experience Highlights: - Developed features like clickable timestamps in transcripts - Identified speaker names using AI - Compared favorably to competitor tools like Descript * Future Perspectives: - Emphasize focusing on user needs over complex AI technology - Anticipate increased personalization of AI tools - Currently constrained by LLM processing costs for long-form content - Offer current workaround of using in-app chat for custom summaries
AI Product Design Philosophy
* Key insights on AI product design: - Consumer apps need to move beyond simple chat interfaces - AI should become "invisible" and integrated naturally into products - Current AI features are often too prominently marketed - Future vision: AI will be like electricity - ubiquitous and unremarkable * Specific technical discussion points: - Prompt engineering is crucial for creating functional AI features - Implementing AI features requires extensive work beyond quick demos - Regex (regular expressions) are still used to correct AI output formatting - Streaming text responses presents technical challenges for real-time correction * Metaphorical perspective on AI: - AI will eventually be seen as a utility, not a special feature - Comparison of AI to electricity - something that just "exists" in products * Technical challenges highlighted: - Difficulty in streaming and correcting AI-generated text in real-time - Complexity of moving from a demo to a production-ready feature
LLM Engineering and Evaluation
* The discussion focuses on handling uncertainty with Large Language Models (LLMs) * The startup embraces a "vibe evals" approach, allowing for faster iteration and risk-taking * They created a "Snips Wrapped" feature with three LLM-powered elements: - Assigning a personality based on user snippets - Creating a learning scorecard of most-discussed topics - Selecting a standout quote * LLM Evaluation Techniques: - They use an innovative "LLM as a judge" approach to improve feature selection - For quote selection, they generate 5 candidate quotes with a cheaper model, then use a more sophisticated model to choose the best quote - Similar techniques used for books and speaker identification features * Model Selection and Cost Considerations: - Prefer Claude for its language formulation and personality - Use Claude 3.5 Sonnet for coding and brainstorming - Considering transitioning to open-source models to reduce costs - Acknowledge the challenge of balancing model performance and expense - Recognize that as a startup, they can't operate like a dedicated AI research lab
LLM Infrastructure and Future Trends
* The speakers discuss the importance of fast iteration and learning from users * Closed models from OpenAI, Google, etc. are seen as advantageous due to easy API calls * They compare LLM providers to AWS, offering "intelligence on demand" * Usage Patterns for Large Language Models: 1. Predictable, batch processing (e.g., podcast transcription, running 24/7) 2. Real-time, user-triggered actions with variable usage intensity * Future AI Trends: - Multimodality in AI models is emerging - Potential for single multimodal LLMs to replace current pipeline-based approaches - Current limitation is cost difference between existing methods and new multimodal solutions * Future Product Development Focus: - Content expansion (podcasts, audiobooks, YouTube videos) - AI-generated content - Discovery features
Future Vision for Discovery and Voice Interfaces
* Discovery and Recommendations: - The speaker believes AI will transform content discovery - Future discovery may involve direct communication with recommendation algorithms - Users could potentially guide content recommendations in real-time - Collaborative filtering and LLM technologies will likely drive recommendation improvements * Voice and Interfaces: - Voice interfaces are seen as a promising area for innovation - The speaker is interested in finding natural "triggers" that encourage app usage - Duolingo is cited as a rare example of an app successfully creating engagement without a natural trigger * Podcast Learning Features: - The goal is to help users maximize learning from podcast content - Key proposed features include encouraging active reflection on podcast insights, helping users distill a single key takeaway from each episode, and turning podcast learnings into actionable knowledge - Voice interfaces could potentially reduce friction in implementing these learning features - Voice could "hook into" existing podcast listening habits more naturally than text-based notifications
Enhanced Podcast Consumption and Voice Cloning
* The conversation focuses on a new approach to podcast consumption that goes beyond passive listening * The key idea is creating an AI-assisted experience that helps users retain and process podcast information * The goal is to develop a seamless, short (2-3 minute) post-listening interaction that enhances learning and retention * Key Perspectives: - The speaker distinguishes between an engineering approach (simple chat functionality) and a product-focused approach that prioritizes learning and retention - Current podcast consumption is mostly "consume, consume, consume" without meaningful processing - There are over half a billion monthly active podcast listeners * Potential Features: - An AI companion that helps users process podcast content immediately after listening - Potential integration with podcast snippets and contextual exploration - Avoiding the "chore" of manual note-taking that tools like Anki require * Voice Cloning Discussion: - Voice cloning technology is becoming increasingly normalized - Societal attitudes toward such technologies are evolving - What seemed ethically controversial in 2017 is now more widely accepted - Platforms like 11 Labs have made voice cloning more accessible
Podcast Platform Insights and Creator Needs
* YouTube is currently the best podcasting platform, primarily due to its social recommendation layer and user habits * Video podcasts are increasingly important, with a focus on "backgroundable video" - content people listen to while doing other activities * Video's primary value is in discovery, making content more engaging and easier to find * Podcast Platform Observations: - Most podcast consumption happens while doing other tasks (around 90% of listening) - Video helps listeners connect more with hosts and provides visual context when needed * Platform Audiences: 1. Podcast listeners 2. Content annotators/data contributors 3. Podcast creators * Creator-Focused Platform Needs: - Creators want better discovery mechanisms - Current tools like Riverside and Descript have limitations - Creators need better editing, thumbnail generation, and short-form content creation tools * Platform Comparison: - Riverside: Good for remote recording, but has rough edges in editing and shorts - Descript: Strong editing niche, but hasn't innovated significantly - Potential opportunity exists for platforms prioritizing creator needs * Metrics and Engagement: - "Snip count" is seen as a more active engagement metric compared to passive download counts
Podcast Creation and Accessibility
* The conversation focuses on podcast creation and a tool called Snips * Main goal is to simplify podcast production process * Discussion highlights barriers to podcast creation for busy professionals * Key Insights: - Most podcasts are low quality because they're created by people without significant life experience - The ideal target is enabling busy professionals like CEOs to create podcasts easily - Switching to a new podcast tool can be challenging, but can also be an opportunity to "clean house" and refocus * Podcast Tool (Snips) Highlights: - Offers a premium version - Provides free trial for a month - Has OPML import functionality - Aims to make podcast creation more accessible * Broader Themes: - Democratizing content creation for professionals - Reducing technical barriers to podcast production - Importance of making tools user-friendly for time-constrained individuals * The podcast host appreciates featuring smaller B2C app developers and emphasizes the value of showcasing successful, confident entrepreneurs in challenging markets