Overview

BrainTrust emerged from lessons learned at Impira, where Ankur Goyal discovered the power of evaluation systems to resolve technical disagreements about AI approaches, leading to his current company that raised a $36M Series A at a $150M valuation.

The platform evolved from an evaluation dashboard to a comprehensive AI development environment, offering features like simplified data collection, logging functionality, playground testing, and custom tool creation—all designed specifically for software engineers building AI products.

Despite initial VC skepticism about their hybrid on-premises approach, BrainTrust's strategic early bets on TypeScript SDK and evaluation-focused tools have attracted major customers like Stripe, Zapier, and Vercel.

Current AI implementation trends show approximately 50% of use cases involve single prompt manipulations, 25% use simple agents (often RAG-based), and 25% employ advanced agents—with successful implementations focusing on user experience rather than complex technical architectures.

The AI model landscape has shifted from OpenAI's near-total dominance to increased competition from Anthropic's Claude models, though OpenAI maintains significant advantages in infrastructure reliability and production deployments.

Content

Early Career and Background

The podcast features Ankur Goyal, founder of BrainTrust, which recently raised $36 million Series A at a $150 million valuation.

Ankur's early background:

- First-generation Indian immigrant with doctor parents - Explored career paths after rejecting traditional routes (big tech, academia) - Sought a high-impact, creative work environment

Career journey highlights:

- Internships at Microsoft working on Bing distributed compute infrastructure - Felt unsatisfied with low-intensity work at large companies and academic research - Moved to San Francisco and connected with a recruiter - Joined MemSQL (now SingleStore) as employee #2 - Joined despite thinking he failed the interview - Dropped out of school to pursue the opportunity - Worked there for almost six years - Ran the engineering team

SingleStore Experience and Technology Insights

SingleStore (database technology) observations:

- Not suitable for small weekend projects due to hardware and software costs - More expensive than alternatives but still cheaper than Oracle Exadata or SAP HANA - Limited adoption due to complexity and pricing model - Founder Nikita Shamgunov is now pursuing a different strategy with Neon (offering free/inexpensive Postgres)

Technology adoption insights:

- Advanced technology doesn't guarantee widespread usability - Pricing and packaging can significantly limit technology adoption - Some technologies have excellent technical capabilities but are constrained by economic/deployment limitations

Database technology observations:

- Discussed variant types in technologies like Snowflake, Redshift, ClickHouse - Snowflake's variant type is considered an engineering marvel for semi-structured data storage - DuckDB's struct type has limitations compared to variant types

Impera Journey and Challenges

Original vision for Impera:

- Make unstructured data as easy to use as structured data, leveraging machine learning - Pre-LLM models and limited data collection made the initial concept challenging - Worked with top financial services and public enterprises

Sales and business learnings:

- Initially misunderstood his own sales capabilities - Realized the complexity of getting initial meetings, closing deals at scale, and managing revenue retention - Received guidance from his father-in-law (a seasoned sales leader at Cloudflare/Palo Alto Networks) - Hired Jason, an exceptional account executive who closed 90-95% of Impera's business

Key insight on customer understanding:

- Shifted from selling to technical customers (like at MemSQL) to selling to line-of-business customers - Discovered that lacking a deep, intuitive understanding of customers makes everything more challenging

Market insights on unstructured to structured data:

- The fundamental challenge is not technological, but organizational prioritization - CEOs/CTOs are more likely to prioritize projects that create new user experiences rather than solving existing inefficient processes - Unstructured data solutions often remain a second or third-tier priority for large organizations

Impira's Technical Evolution and Acquisition

Impira's technical journey:

- Prior to acquisition, Impira's key advantage was extracting data from PDF documents with minimal training examples - Their approach was primarily computer vision-based, leveraging visual signals in documents - The emergence of advanced language models like BERT and ChatGPT dramatically changed the landscape - Text-based extraction techniques began to outperform previous computer vision methods

Technological shift and realization:

- The speaker had difficulty convincing his team about the potential of new AI technologies like Layout LM and GPT-3 - Became a top non-employee contributor to Hugging Face - Experimented with Layout LM and GPT-3 - Noticed GPT-3 significantly outperformed their existing technology - Recognized emerging AI models were rapidly improving and could potentially "cannibalize" their existing technology

Acquisition context:

- Acquisition by Figma occurred in December 2022 - Received inbound interest due to growing AI awareness - Worked closely with an investor (Ilad) during the process - Ultimately chose to be acquired by Figma (before Adobe's acquisition)

Emotional impact:

- Found the process of shutting down the company "extremely devastating" - Experienced significant sadness for 3-4 months - Recognized the emotional complexity of shutting down a startup and letting customers down

Figma Experience and Startup Closure

Startup closure and decision-making:

- Worked to provide generous refunds and support for customers during the closure - Emphasized making difficult but right entrepreneurial decisions, even when uncomfortable

Figma insights:

- Figma was in a unique position, dealing with an acquisition, exploring identity beyond a design tool, and maintaining an annual release cycle - Introducing AI into Figma was complex due to high product quality standards, challenges with visual AI, and technical difficulties in applying AI to design formats

Design and AI perspectives:

- Designers are generally skeptical of AI replacing design work - Potential AI value for designers lies in code generation, bridging UI engineering and design, and enhancing collaboration

Personal reflection:

- Found Figma's slower iteration pace challenging - Appreciated the company and people, but preferred a more rapid shipping environment

Birth of BrainTrust

At Impira, the founders experienced challenges in deciding between different AI model approaches, leading to prolonged debates

- Developed an evaluation (eval) system that helped resolve technical disagreements - The eval system transformed discussions from hypothetical to more scientific and data-driven

Motivation for BrainTrust:

- Transformers and large language models have made AI development more accessible to software engineers - Existing ML tools are difficult for software engineers to use - Need for evaluation tools designed specifically for software engineers' workflows

BrainTrust overview:

- An end-to-end developer platform for building AI products - Core belief: Embrace evaluation as a central workflow in AI engineering - Started by creating a highly regarded evaluation product, initially targeting software engineers

Platform evolution:

- Began as an evaluation dashboard - Evolved into a debugger-like tool - Progressing towards becoming an integrated development environment (IDE)

BrainTrust Platform Features

Key features:

- Simplified data collection and ETL process - Logging functionality that automatically captures data in eval-ready format - Allows users to analyze eval results, investigate performance variations, compare metrics, and modify prompts or models for quick re-testing - Offers a collaborative, save-friendly environment for working with AI prompts and models - Users can compare multiple prompts and models side-by-side - Recently added capability to run evaluations directly in the playground

Playground functionality:

- Allows testing prompts against different models, including fine-tuned models - Enables creating custom evaluations with scoring mechanisms - Supports running pre-built and user-created evaluations - Demonstrates evaluating summary quality of press release documents

New tool capabilities:

- Support for defining custom tools using TypeScript - Integration of external APIs (like EXA search) directly into the platform - Ability to run tool-augmented prompts in the playground environment - Dynamic code evaluation in a sandbox environment - Granular comparison of different AI-generated outputs - Easy deployment of custom tools via a simple command

Technical Approach and API Integration

Evaluation (Evals) approach:

- Developed a novel syntax for running evaluations without complex for loops - Eval consists of an argument with data, a task function, and one or more scoring functions - Enables parallel and efficient eval running - Supports caching and async processing - Provides consistent interfaces across Python and TypeScript - Converts evals into a declarative data structure

API and integration:

- Provides a REST API endpoint for each prompt - Allows users to spend more time crafting use cases and reusing tools - Integrates development process tightly with evaluation

Collaboration features:

- Allows sharing and publishing eval histories - Enables team discussions around evaluation results - Provides interactive debugging of task and scoring functions

Development Journey and Strategy

Early development insights:

- Worked closely with early users like Brian from Zapier for critical feedback - Iteratively improved the product based on user suggestions - Developed features like prompt rerunning, model comparison, and token count correlation

Hybrid on-premises model:

- Initially considered "stupid" by many VCs and industry observers - Deemed necessary by early customers like Zapier, Coda, and Airtable who wanted data to remain in their cloud - Supported by some investors who saw value in the approach - Compared to Databricks' similar hybrid model - Leveraged serverless technology as a key unlock

Strategic early bets:

- Hybrid on-premises approach - Prioritizing TypeScript SDK (now used by ~75% of users) - Focusing initially on evaluations (evals) as a critical pain point

Market validation:

- Initially, some VCs were skeptical, comparing the market to CICD - Subsequent market interest validated their approach - Impressive customer logos including Stripe and Vercel - Notable quote from Malta (former Google search team member) praising BrainTrust's workflow transformation

Market Insights and Technology Perspectives

Market and technology dynamics:

- Parallels between the current AI market and the early cloud computing era - The market is highly dynamic, with significant technological shifts happening rapidly - Companies are treating AI as an existential question fundamentally changing software development

Vector search and database technology:

- Vector Search is not typically a storage or performance bottleneck - The real challenge is integrating Vector Search with other data systems - Databases are not just storage, but also compilers - Examples of innovative database approaches include Snowflake separating storage from compute and Databricks making arbitrary code a first-class citizen

Fine-tuning discussion:

- Fine-tuning is not necessarily a business in itself - The core goal is "automatic optimization" of use cases - Alternative optimization methods include DSPY-style prompt optimization, hand-crafting prompts, and in-context learning - Very few customers are currently fine-tuning models in production - The landscape of model optimization is rapidly changing

AI Model Landscape and Trends

Model market share and trends:

- Pre-Claude 3, OpenAI dominated nearly 100% of the market - Post-Claude 3, customers are now evaluating both OpenAI and Anthropic - Anthropic's Haiku was particularly notable for being cheap, fast, and supporting tool calling - Sonnet is now seen as both affordable and capable - OpenAI remains the overwhelming majority choice in production environments

OpenAI's infrastructure advantages:

- Excels in model availability, rate limits, and reliability - Their single endpoint approach is a significant engineering achievement - Managing multiple cloud endpoints is complex and requires substantial engineering effort

Cloud and model provider landscape:

- Big companies don't exclusively use specific cloud providers for AI models - There's a diverse ecosystem with multiple options and tradeoffs - Different model labs (OpenAI, Anthropic, Meta) are actively competing and innovating - OpenAI's GPT-4o release is seen as potentially invigorating competition

AI Use Cases and Future Directions

Generative AI use cases:

- Approximately 50% involve single prompt manipulations (auto-generating ticket titles, video/document summaries) - About 25% involve simple agents (prompt + tools, often RAG-based) - Remaining 25% are advanced agents with more complex interactions

Software development trends:

- Initial AI integration involved complex, mathematically-oriented programming - Current trend is "sprinkling intelligence" throughout applications - Goal is to make AI implementation easy and low-friction - Developers want AI to feel like a natural part of building software, not a separate paradigm

Technical design and AI agent approaches:

- Advocates for designing AI agents focused on user experience rather than complex technical implementations - Suggests writing more UI code between LLM calls to craft user interactions - Introduces the concept of "code core versus LLM core" - keeping the core system well-defined and using LLMs sparingly - Highlights the Voyager agent as an innovative approach (writes and persists code for future reuse)

Personal Reflections and Company Update

Professional reflections:

- Acknowledges a shift in perspective about technical expertise - Recognizes that practical problem understanding is as valuable as deep technical knowledge - Currently in a unique position of understanding AI tools from a user's perspective

Current work and team:

- Deeply enjoying his current work environment at BrainTrust - Values working with a team he respects, including his brother, Eden (head of product, first designer at Airtable and Cruise), and Albert (handles business operations) - Prioritizes working on meaningful problems, enjoying his work environment, and collaborating with people he respects

Hiring announcement:

- BrainTrust is currently hiring software engineers, salespeople, DevRel, and one designer - Primarily seeking San Francisco-based candidates with some flexibility for remote candidates - Building AI software and passionate about their problem space - Interested in working with high-quality customers and team members

Production AI Engineering starts with Evals — with Ankur Goyal of Braintrust