Overview
- Exa AI (formerly Metaphor Systems) is building a fundamentally new search engine that uses neural networks to predict relevant documents rather than relying on traditional keyword matching, with the goal of creating "perfect search" that truly understands user queries.
- The company's technology combines document prediction (training models to predict links based on surrounding text) with comprehensive infrastructure including URL discovery, web crawling, and AI processing—all designed to deliver more precise, high-quality results than conventional search engines.
- Exa's approach allows for variable compute allocation to searches, enabling more complex semantic queries and comprehensive results gathering that would be impossible with traditional search methods, though this sometimes requires longer processing times.
- The future of search likely involves LLM-based interfaces combined with powerful backend search capabilities, creating more collaborative "agentic" search experiences that balance automation with user involvement rather than full autonomy.
- Exa faces significant challenges including web scraping difficulties, training data quality control, and managing computational costs, but believes rapidly decreasing LLM costs (potentially 200x reduction in coming years) will create new opportunities for innovative search technologies.
Content
Background and Origins
- Will Brick is the CEO and co-founder of Exa AI (previously Metaphor Systems)
- Has been interested in search and high-quality information since childhood
- Built a mini search engine in college
- Founded Exa with the goal of creating a fundamentally better search engine
- Professional background includes:
Early Company Vision and Evolution
- Entered Y Combinator in summer 2021 with the pitch of "Google 2.0"
- Inspired by GPT-3's language understanding capabilities
- Aimed to create a search engine that truly understands user queries
- Believed Google hadn't meaningfully improved in a decade
- Always focused on building a better search algorithm
- Started as a research endeavor with the potential to become a significant business
- Initially released an early search engine as a "research preview"
- Transitioned from a research to a product-focused company
- Core mission: Create "perfect search" over the web, with downstream use cases to be determined later
Exa's Technology and Approach
- Describes itself as the "open AI of search" - a research startup focused on fundamental AI research for search
- Originated with a unique approach to link/document prediction using a transformer-inspired model
- Initial training involved predicting links/documents based on surrounding text
- Process involves:
- More accurately described as "document prediction" rather than link prediction
- Similar to language model training, with potential for further refinement through synthetic data and supervised fine-tuning
Exa's Current Components and Infrastructure
- Comprehensive search engine built from scratch
- Major subsystems include:
- Relatively small teams compared to large tech companies like Google
- Name "Exa" means 10 to the 18th power (contrasts with Google's name which references 10 to the 100th)
- Core philosophy: Smaller, more precise results (10^18) are better than massive, unfocused result sets (10^100)
Key Features and Capabilities
- Ability to create complex, semantic queries
- Find comprehensive lists of results (e.g., "startups working on hardware in SF")
- Handle semantic variations (robotics, wearables, hardware)
- Search can take varying amounts of time (from milliseconds to hours)
- Some search platforms offer previews and allow scaling of compute resources
- Complex searches might require longer processing times
- Fully neural-based search engine not relying on traditional keyword algorithms
- End-to-end neural search methodology
- "Link prediction objective" serves as a neural equivalent to PageRank
- Can capture content references in multiple ways, making it more powerful than traditional search methods
Exa's Product Offerings
- Search API
- Excel search
- List builder
- Web scraping capabilities
- Ability to retrieve full content for URLs, not just links
- Can retrieve multiple URLs simultaneously
Challenges in Search Technology
- Subjectivity of search results
- Comprehensiveness of results
- Semantic understanding
- Training data quality is crucial - "if you train on a bunch of crap, your prediction will be crappy"
- Aim to control training data to ensure high-quality content
- Goal is to avoid "SEO slop" and low-quality search results that plague traditional search engines
- Increasing difficulty accessing content due to sites blocking bots and scrapers
- Potential solutions include data partnerships and leveraging long-tail open sites
- Scraping is a challenging technical problem
- Difficult to create a "perfect" scraper
Compute-based Approach to Search
- More compute can be applied to increase result comprehensiveness
- Analogous to O1's approach of applying variable computational resources to solve problems
- Potential to use large language models like GPT-4 to scan and classify web content
- Future considerations:
Business and Knowledge Implications
- Traditional information arbitrage is being disrupted by accessible search tools
- Search infrastructure enables building applications and direct user interfaces
- Potential to democratize access to information across industries
- Distinction between "super knowledge" and "super intelligence"
- Future AI systems may require robust search capabilities to overcome knowledge limitations
- Even advanced AI (like potential AGI) will need search tools to access information
Potential Use Cases for Exa
1. Dating - Finding potential partners based on specific criteria - Matching intellectual compatibility - Searching profiles across the web2. Academic/Research - Writing assistants for students - Searching and summarizing research papers - Helping with research paper preparation
3. Business/Investment - Venture capital research - Finding lists of companies in specific industries/sectors - Competitor analysis - Identifying potential sales targets
4. Recruiting - Searching for potential candidates - Finding professionals who have written about relevant topics - Discovering candidates through blogs, LinkedIn, Twitter, etc.
5. Enterprise/Company Document Search (future expansion)
Search Engine Evolution and Future Vision
- Google has dominated search for 30 years, conditioning people to think of search in limited ways
- ChatGPT has expanded people's understanding of what search can be
- Future search interfaces will likely involve Large Language Models (LLMs)
- Making oneself "discoverable" online is increasingly important
- Search engines fundamentally shape what content gets created
- Exa aims to optimize for high-quality, contextually relevant content, unlike keyword-based search engines
- LLMs will likely become the primary search interface
- Search engines should be designed to handle complex LLM-generated queries
- The goal is to create more intelligent, context-aware search experiences
Agentic Search Concept
- The search approach being developed is considered "agentic" - capable of taking actions and making decisions
- Combines algorithmic and agent-based approaches
- The goal is to create a search tool that feels collaborative, not completely autonomous
- Full autonomy (Level 5) tends to fail because users want to be involved in the process
- Users prefer "drive assist" models where they can influence and understand the search/research process
- Current AI agents are not yet advanced enough to completely replace human involvement
- As AI agents improve, the term "agentic" may become less meaningful because agent-like capabilities will become standard
AI Search and Interface Challenges
- Exploring new search interfaces that are iterative and allow for refinement
- Identifying potential failure modes in AI agents:
- Current system prompts often feel performative (e.g., "you are a helpful assistant")
- Users need to be part of the process, not just give high-level commands
- There's uncertainty about how to effectively guide AI behavior
- Prompting techniques currently feel more like "cargo culting" than scientific approach
AI Training and Model Ecosystem
- Discussion of creating self-training AI systems using reward signals
- Possibility of AI generating its own training tasks and learning from performance
- Belief that future AI models will be trained using this self-improvement paradigm
- OpenAI won't likely dominate all language model use cases
- Expect multiple models from different companies of varying sizes
- Some use cases will prioritize inference speed over complex reasoning
- Human labeling for search can be challenging and often keyword-based
- Large Language Models (LLMs) may be more effective at data labeling
- LLMs like GPT-4 can potentially improve search result relevance
Company Culture (Exa)
- Founded by long-time friends with a "counter consensus" approach
- Described as having an unconventional, meme-friendly culture
- Culture of fun, laughter, and unconventional problem-solving
- Implemented nap pods to address employee fatigue and promote creativity
- Purchased nap pods from China, with a humorous story about the heavy delivery
- Emphasis on employee well-being and providing flexible work environments
- Rejection of "hustle culture" in favor of enjoying work and building meaningful things
- Belief that building something from scratch with friends is a deeply satisfying experience
- Currently hiring and growing rapidly
Technical Infrastructure and Economics
- Purchased a $5 million H200 compute cluster
- Use a mix of their own cluster and AWS for inference and training
- Considering the economic constraints of AI search and inference costs
- Managing computational costs is a key challenge
- Strategically allocating compute resources
- Pre-processing and indexing to reduce real-time computational expenses
- Technical approach involves:
- LLM costs are rapidly decreasing (potentially 200x reduction in a few years)
- This cost reduction creates new opportunities for rethinking search algorithms
- Suggests potential for more innovative and cost-effective search technologies