Overview
- OpenAI's API evolution reflects a strategic shift from basic completions to sophisticated structured outputs, with JSON Mode and structured outputs addressing developers' needs for reliable, format-constrained responses that integrate seamlessly with applications.
- The company's product development philosophy balances engineering and research approaches, focusing on the "80/20 principle" to deliver maximum value while maintaining a clear distinction between function calling (for tool invocation) and structured outputs (for response formatting).
- Model selection strategy follows a tiered approach—start with GPT-4.0 Mini for most use cases, upgrade to GPT-4.0 for better performance, and consider fine-tuning (now generally available) for specialized applications, with as few as 100 high-quality examples potentially yielding significant improvements.
- The new O1 model represents a parallel development path focused on complex reasoning rather than replacing the GPT series, with GPT-4.0 remaining the "workhorse" for standard tasks while O1 handles more reasoning-intensive problems, demonstrating OpenAI's commitment to specialized models for different use cases.
- API features and roadmap include significant improvements to file handling (supporting 10,000 files per assistant), enhanced semantic search capabilities, batch processing with 50% cost savings, and ongoing development of vision, video, and speech capabilities, all designed to make complex AI implementation more accessible.
Content
Background and Career Journey
- Michelle Pope is a Canadian engineer who graduated from the University of Waterloo
- She completed six internships during her degree, including:
- Waterloo has a strong startup culture that influenced her career path
- Her internship at Coinbase was particularly formative, where she:
- She tends to join companies at critical scaling moments
Early Career Progression
- Worked internships at Stripe and Coinbase
- Co-founded Readwise (briefly) during entrepreneurship co-op at Waterloo
- Joined Clubhouse as one of the early backend engineers during its rapid growth period
- At Clubhouse, she tackled technical challenges including:
- Experienced recurring database challenges across multiple companies
- Returned to Coinbase to improve engineering skills
- Viewed early career roles as opportunities to learn and grow technically
Joining OpenAI
- Initially excited about GitHub Copilot's capabilities
- Mentions DALL-E as a product that helped explain their work to family
- Describes early API platform as small (around 5 people working on it)
- Notes significant company growth since initial days
ChatGPT Release Challenges
- Experienced scaling difficulties, particularly with:
JSON Mode and Structured Outputs Development
- JSON Mode was introduced at dev day last year as first step towards structured outputs
- Allows constraining model output to JSON format
- Addresses developer needs by:
- Motivated by consistent customer feedback over past year
- The team took a collaborative approach between engineering and research with key distinctions:
- Structured Outputs Features:
- JSON Mode vs. Structured Outputs:
- Function Calling vs. Structured Outputs:
- New Development: SDK improvements for function calling, including a "run tools" method that can automatically manage the conversation loop
Technical Implementation Details
- The API models team focuses on improving models based on developer feedback
- They collaborate closely with post-training and safety systems teams
- Currently support JSON schema and a dialect of it
- Future plans may include broader grammar support
- Developed a solution from scratch to meet specific needs
- Refusal Mechanism:
- Error Handling Considerations:
- API and Model Architecture:
Evaluation (Evals) Landscape
- Evals are considered the "hardest part of AI"
- Collaborated with BFCL (Function Calling Leaderboard)
- Current evals challenges:
- Recommended evals:
- Sweetbench targets code writing and file searching capabilities
- Low pass rates indicating room for improvement
Parallel Function Calling and Latency
- OpenAI supports parallel function calling in their API for newer models
- Currently not supported with structured outputs due to technical trade-offs
- Concerns about potential latency and complexity for developers
- First token/request has increased latency
- Expected to improve over time
- Not currently a major concern for most developers during integration
- Exploring potential caching solutions
Agents and Structured Outputs
- Structured outputs and function calling seen as critical building blocks for agentic systems
- Goal is improving reliability from ~95% to near 100% accuracy
- Enables converting natural language intent into application actions
Assistants API
- Represents a bet on:
- Most developers still likely to use standard messages and completion endpoints
- The company has successful hosted tools, particularly a file search tool that saves time on building RAG pipelines
- They are iterating on making stateful tools more intuitive and easier to use
Structured Output and Model Capabilities
- Models have varying capabilities with inherent trade-offs
- New response format is currently available only for GPT-4.0 Mini and new 4.0 models
- Function calling is enabled across all models that support it
- Key Use Cases for Structured Output:
- Technical Considerations:
- Design Principles:
- Additional Features:
Prompt and Instruction Strategies
- Multiple places to provide instructions:
- Practical Insights:
- Evaluation and Development Approach:
- Practical Example:
Advanced Features and Future Directions
- Discussed advanced features for generating structured outputs with AI models
- Emphasized saving on latency and cost by generating outputs "in one shot" rather than through multiple retries
- Exploring potential future features like custom grammars beyond JSON schema
Model Selection and Fine-Tuning
- Recommended model selection strategy:
- Announced general availability of GPT-4.0 fine-tuning
- Offering free training tokens (1 million per day until September 23rd)
- Fine-tuning can be effective with as few as 100 high-quality examples
- Additional Insights:
- Roadmap Considerations:
Prompt Engineering and Model Performance
- Prompt engineering requires creativity and persistence
- Not everyone finds prompting equally easy; some are naturally skilled
- ChatGPT has helped people develop intuition about interacting with AI models
- Prompt engineering likely won't disappear, as clearly explaining requirements remains crucial
Model Releases and Distinctions
- OpenAI recently released two models: GPT-4 (20340806) and Chat GPT-4
- The models have different tuning focuses:
- API models have stable weights for developer reliability
- ChatGPT model is a "rolling" model with potentially changing weights
- Key Insights on Model Development:
OpenAI API and Strategy
- The API is viewed as the broadest vehicle for distributing AGI
- OpenAI values working with developers who often "see the future before anyone else"
- The API predates ChatGPT and was their first commercialization product
- They aim to expose all of OpenAI's models through the API, including multi-modal models
- Developer and Engineering Insights:
Assistance API Development
- Developed with a small team under tight deadlines
- Experienced technical challenges, including a brief outage just before a major launch event
- Made significant improvements, particularly in file search capabilities
API and Product Updates
- Increased file handling capacity: Now supports 10,000 files per assistant
- Enhanced semantic search capabilities across files
- Introducing more advanced chunking and re-ranking options
- Goal is to make RAG (Retrieval-Augmented Generation) easier to implement at scale
- Determinism and Technical Challenges:
- Logit Bias Feature:
- Tiering and Access:
Developer Ecosystem Strategy
- Applying 80/20 principle in product development
- Focus on building features that provide maximum value
- Prioritize developments based on developer feedback
- Aim to make complex processes more accessible
Batch API Features
- Offers 50% cost savings
- 24-hour turnaround time for batch jobs
- Works with GPT-4 Mini
- Very cost-effective (around 7.5 cents per million)
- Potential use cases include:
- Potential future interest in exploring even cheaper/free GPU runtime
Vision API Highlights
- Integrated across multiple OpenAI APIs
- Supports:
- Useful for complex data extraction involving spatial relationships
- Current limitation: primarily works with individual image frames
- Potential future exploration of continuous video streaming capabilities
Video and API Development
- Discussion about potential video analysis API with frame sampling
- Exploring batch processing capabilities for video frames
- Considering sequential processing options for video analysis
Whisper API Insights
- Whisper v3 has diarization feature, but not yet implemented in API
- Performance trade-offs between Whisper v2 and v3
- Diarization (speaker identification) is technically challenging
- Transcription quality is notably high, even with simple implementation
- Whisper Transcription Features:
- Future API Developments:
OpenAI Enterprise Features
- Recently shipped admin and audit log APIs
- Improved SSO (Single Sign-On) offering
- Designed for enterprise users to manage API keys and projects programmatically
Waterloo University Insights
- Co-op program provides significant practical experience
- Cold climate and limited entertainment encourage studying and project work
- Students typically graduate with two years of work experience
- Strong "hacker mentality" and entrepreneurial culture
Book Recommendations
- "The Making of Prince of Persia" - inspirational book about hard work
- "Misbehaving" by Richard Thaler - explores behavioral economics and irrational decision-making
OpenAI Hiring and Team Dynamics
- OpenAI is hiring across multiple teams and roles
- Ideal OpenAI employees are described as:
- They welcome people from diverse backgrounds
- No specific AI experience is required to join
- Hiring for engineering, research, and a "model behavior" role
O1 Model Release and Features
- Multiple models were released (a preview and a broader model)
- Pricing is noted as being higher compared to previous models like Opus
- The model has accompanying documentation including blog posts, system cards, and technical resources
- API differences from previous models include:
- Reasoning tokens are a key feature, but can limit problem-solving complexity
- Performance and Evaluation:
OpenAI Model Strategy
- O1 is a new model focused on reasoning capabilities
- O1 is distinct from the GPT series, but not replacing it
- GPT-4.0 will continue to be supported and used
- Developers are expected to use both O1 and GPT-4.0 for different tasks
- Model Positioning:
- Future Development:
O1 Model Capabilities and Access
- Currently in preview stage without tool support
- Planned future features include:
- Multimodal capabilities built-in
- Uses reinforcement learning for reasoning performance
- Reasoning and Performance:
- Prompting and Usage: