Overview

* Deep Research is a Gemini-powered AI research assistant that generates comprehensive reports by browsing the web in 5-6 minutes, helping users quickly gain substantial knowledge on new topics through a structured research workflow that begins with a customizable research plan.

* The system balances breadth and depth in its research approach, using web search and deep page exploration to gather information, connect insights across sources, and self-critique its findings before producing a final report that users can further refine through follow-up questions.

* Technical innovations include the ability to handle 1-2 million token contexts, making strategic decisions between using RAG versus long context, and supporting asynchronous research with state preservation that allows users to leave and return to tasks.

* User testing revealed that people actually value longer research times (perceiving them as more thorough), and the tool excels particularly for complex, specification-heavy research topics rather than visually-driven shopping.

* Future development aims to incorporate personalization, multimodal interactions, private document integration, and more adaptive context-aware capabilities while maintaining the transparent approach that shows users the research process in real-time.

Content: Deep Research - Gemini-Powered AI Research Assistant

Overview and Capabilities

* Deep Research is a Gemini-powered AI research assistant that generates comprehensive research reports by browsing the web. * It takes approximately 5-6 minutes to produce detailed outputs, helping users go from "zero to 50" quickly on new topics. * The tool aims to solve the problem of complex research requiring multiple browser tabs, representing a new category of "deep research agents."

Notable Industry Reactions

* Jason Calacanis described it as "like having a college-educated researcher in your pocket" * Tyler Cowen compared it to a PhD-level research assistant, but much faster * Ben Thompson called it "one of the best bargains in technology" * Sam Altman noted it's capable of doing a "single-digit percentage of all economically valuable tasks"

Technical Foundation and Approach

* Uses Gemini 1.5 Pro with special post-training for Deep Research (not the base model) * The technology can be fine-tuned using open-source Gemma models * No special access is required to replicate the research process

Research Process Workflow

Initial Research Planning

* The system creates an initial research plan for users to review and potentially edit * This plan serves as a "contract" for the subsequent investigation * Users can conversationally edit the research plan * While a button encourages plan editing, most users tend to just start the research

Research Execution

* The AI conducts research iteratively, exploring different aspects of the research plan * Uses a breadth-first approach to initially explore topics * Can parallelize research steps using two primary tools: - Web search - Webpage deep exploration * Decides which sources to "double click" on based on inconsistencies or partial information * Self-critiques and revises drafts to finalize a report * Can reason across search results, connecting information from different sources

User Interaction Features

* Three main interaction modes: - Seeking additional deep research - Changing/modifying existing reports - Direct editing of content * Users can ask follow-up questions about generated reports * The model maintains context from all previously browsed web sources * Supports continuous learning and exploration beyond initial reports

Google Ecosystem Integration

* Gemini extensions allow fetching content from various Google services * Currently supports apps like Gmail, Calendar, YouTube, Maps * Includes some third-party integrations like Spotify

Technical Considerations

Context and Retrieval Strategies

* The team balances trade-offs between context length, latency, and performance * Explores when to use Retrieval-Augmented Generation (RAG) versus long context * Guiding principles for choosing between RAG and context include: - Complexity of query attributes - Performance of newer model generations with long contexts * The system can handle 1-2 million token contexts

Conversation Management

* Recommended to continue a conversation thread if: - Topics are related - There's potential for nuanced, connected follow-up research * For completely unrelated topics, starting a new conversation is suggested * No hard limits on conversation length * Current UI suggests stopping when document is created

Evaluation and Testing

* Developed automatic metrics to track model performance, including: - Time spent on planning - Number of iterative steps - Length of research plans - Number of steps in planning process * Uses a combination of automated metrics and human evaluation * Focuses on attributes like comprehensiveness, completeness, and groundedness * Created an "ontology of use cases" to understand research behaviors

Research Behavior Ontology

* Spans a spectrum from broad/shallow to specific/deep research patterns * Key research types include: - Broad exploration (e.g., finding summer camp options) - Deep topic investigation - Comparative research - Comparative research (e.g., US vs. EU milk/meat regulations) - Compound research (e.g., wedding planning project)

Product Development Approach

* The team intentionally avoided specializing in a specific vertical * Tested with various user personas to discover unexpected use cases * Developed multiple versions with different research depths * Ultimately shipped a version with a 5-10 minute research time * Deliberately avoided constraining the product to a single use case or user type

User Behavior Insights

* Core behavior in shopping/research is "options exploration" - sifting through information * Deep research is particularly effective for complex, specification-heavy research (e.g., HVAC systems) * Less suitable for visually-driven shopping (e.g., shoes) * Counterintuitively, users value longer research processes, perceiving longer search times as more thorough * Users are tolerant of extended research periods (5-10 minutes)

Research and Computation Trade-offs

* The team balances computational resources, exploration, and verification * Key trade-offs involve: - Exploring broadly vs. verifying deeply - Compute resources - Time spent on research - User value * Debate exists about giving users direct control over research parameters * Concern that users would always maximize settings if given a toggle * Ideal system would dynamically adjust research approach based on context

Technical Infrastructure

* Created an asynchronous platform for deep research with: - Support for users leaving and returning to tasks - Long-running jobs (5-6 minutes) - State preservation and retry mechanisms - Currently available on Android, rolling out to iOS * Workflow systems for complex, long-running tasks reference technologies like Apache Airflow, AWS Step Functions, and Temporal

Future Development Areas

* Personalization based on user context and learning stage * Multimodal input and output (beyond text-only interactions) * Customized content delivery for different audiences * Ability to access and integrate personal/private document collections * More adaptive, context-aware AI systems * Integration of generative UI and personalized content generation * Expansion beyond current text-based limitations to more interactive interactions

Product Philosophy and Differentiation

* Designed with transparency in mind * Shows research process in real-time * Displays websites being browsed * Presents information in a side-by-side format * Provides upfront research plans * Focuses on solving one specific problem well, rather than creating a generic platform

The Inventors of Deep Research