AI Engineer London! See our

Latent Space: The AI Engineer Podcast

From API to AGI: Structured Outputs, OpenAI API platform and O1 Q&A — with Michelle Pokrass & OpenAI Devrel + Strawberry team

Overview

Content

Background and Career Journey

- A bank (working with Visual Basic) - Google - Coinbase (during 2018-2020 crypto era) - Worked on ACH rails - Learned critical engineering skills - Got early production experience

Early Career Progression

- Improving notification speed (initially took 10 minutes) - System reliability improvements - Performance issues in Postgres - Inefficient job queuing infrastructure - Database scaling challenges

Joining OpenAI

ChatGPT Release Challenges

- Postgres authorization system issues - Complex GPU resource allocation decisions - Linking developer and ChatGPT accounts

JSON Mode and Structured Outputs Development

- Ensuring output always in JSON format - Attempting to match specified schemas - Engineering approach: Constraining model outputs through token masking - Modeling approach: Training the model to better follow desired formats

- Designed for developers who need precise, system-compatible function calls - Uses tools like Pydantic or Zod objects - Eliminates manual serialization complexities - Allows easy parsing of model responses

- JSON mode is better for more creative, open-ended JSON generation - Most developers likely to prefer structured outputs

- Function calling: Intended for actual tool/function invocation (e.g., querying databases, sending emails) - Structured outputs: Focused on getting model responses in a specific format - Previously, developers were "hacking" function calling to get desired response formats

Technical Implementation Details

- Implemented a "refusal field" to allow models to refuse requests that don't match policies - Goals include preserving model's ability to refuse inappropriate requests - Providing a clear developer experience - Making it easy to programmatically handle refusals - Chose a refusal field over traditional error codes due to developers paying for tokens

- Discussing challenges in error handling for AI models, particularly around HTTP error codes - Proposing new error code ranges (600-series) specifically for AI model errors - Potential error codes could include: * 601: Auto-refusal * 602: Chat ML format violations

- OpenAI transitioned from completions endpoint to chat completions API - Chat completions use Chat ML format with defined message roles (user, assistant, system) - Recent improvements in structured outputs have reduced certain model errors - Models can now better constrain to Chat ML format

Evaluation (Evals) Landscape

- Difficult to create robust evaluation pipelines - Many current evals are saturated - Models can often achieve high performance with different prompting - BFCL Function Calling Leaderboard - Sweetbench (tests model performance on GitHub issues)

Parallel Function Calling and Latency

Agents and Structured Outputs

Assistants API

- Hosted tools (file search, code interpreter) - Introducing statefulness to API

Structured Output and Model Capabilities

- Extracting structured data from unstructured data - Works with both text and vision inputs - Dynamic UI generation - Enterprise application improvements

- Supports recursive schemas for UI generation - Provides reliability gains by ensuring strict type matching - Uses a custom JSON schema dialect with specific design choices: * Additional properties are allowed by default * Requires explicit configuration for stricter type enforcement * Makes all keys required by default to match developer expectations

- Aim to be explicit - Provide clear definitions - Allow developers to choose the most suitable model - Maintain compatibility with existing schema standards

- Developers can make keys nullable by using union types (e.g., int and null) - Allows specifying chain of thought fields before final answers - Enables step-by-step rendering of model responses (e.g., in math tutoring scenarios)

Prompt and Instruction Strategies

1. System message (good for function call guidelines) 2. Function descriptions (best for explaining how to call a function) 3. Property names and descriptions can guide model behavior

- Early stage of AI means developers are still discovering best practices - Renaming properties to be more descriptive can potentially improve model performance - No official benchmarks for very granular instruction techniques

- Focus on raising overall capabilities for users - Recommend customers develop their own evaluations - Goal is to make evaluation processes easier for developers

- One developer implemented structured outputs for AI news - Reduced code by 20 lines - Decreased API costs by approximately 55%

Advanced Features and Future Directions

Model Selection and Fine-Tuning

- Start with GPT-4.0 mini (cheapest, good for most use cases) - Move to GPT-4.0 if more performance is needed - Consider fine-tuning for advanced use cases - Discussed challenges with rating systems and structured outputs - Suggested using log probabilities for more accurate classification tasks - Highlighted importance of model calibration when generating structured responses

- Exploring parallel function calling - Interested in developer feedback on potential new features - Aiming to make model migration easier for users

Prompt Engineering and Model Performance

Model Releases and Distinctions

- One is function calling-tuned (for API) - One is chat-tuned (for ChatGPT interface) - OpenAI aims to be transparent about model capabilities - They're still learning how to effectively communicate model changes - The goal is to provide developers flexibility in model selection - Future release notes will aim to be more comprehensive about model improvements

OpenAI API and Strategy

- Engineering remains the primary way to access AI models - There's significant potential ("alpha") in writing code and deploying AI solutions - The concept of "AI engineering" is emerging as a distinct discipline

Assistance API Development

API and Product Updates

- Seed parameter is not fully deterministic - More determinism in initial tokens - Challenges in balancing determinism with system reliability

- Primarily used for classification tasks - Can help refine output by biasing towards specific tokens - Considered a power user feature with limited widespread adoption - Potential use cases include guiding classification outputs and controlling punctuation

- More transparent tier system - Developers can now see their current tier in dashboard - Tier two and above have full access to fine-tuning - Rollouts and feature access tied to tier levels

Developer Ecosystem Strategy

Batch API Features

- User activation workflows - Offline evaluations

Vision API Highlights

- Assistance API - Batch API - Structured outputs

Video and API Development

Whisper API Insights

- Supports translation in ~50 languages - Prompt feature allows vocabulary bias, helpful for acronyms and specialized terms - Developers use workarounds like dictionary replacement for transcription accuracy

- Exploring new API shapes for advanced voice/speech-to-speech modes - Likely to use socket-based approach (like LiveKit) instead of traditional request-response - Goal is to make new API paradigms easy for developers to adopt

OpenAI Enterprise Features

Waterloo University Insights

Book Recommendations

OpenAI Hiring and Team Dynamics

- Low ego - User-focused - Driven - Willing to "roll up their sleeves" - Unpretentious

O1 Model Release and Features

- No system role - No temperature setting - No tool calling - No streaming - Limited visibility into token usage - Introduces new evaluation benchmarks - Highlights scaling laws for both training and test-time compute - O1 Mini is notably interesting for its relative outperformance compared to larger models - Trained specifically for certain domains (e.g., Mini performs better in STEM)

OpenAI Model Strategy

- GPT-4.0 remains the "workhorse" for standard tasks like summarization - O1 is designed for more complex, reasoning-intensive problems - The goal is for developers to use both models in complementary ways

- OpenAI is working on improvements to speed up reasoning processes - The API is in early stages, with more features to be added over time - Focus on optimizing context and processing speed - Committed to continuous learning and iteration

O1 Model Capabilities and Access

- Function calling - Code interpreter - Web browsing - Streaming capabilities - Generates hidden chains of thought during reasoning - Strong performance in lateral tasks and philosophical reasoning - Can creatively solve complex problems - Demonstrates ability to generalize and handle challenging tasks

-

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store