Overview
- The AI industry has seen dramatic efficiency improvements in both cost and performance, with GPT-3 level intelligence dropping from $60 to $0.27 per million tokens (2020-2023) through advances in hardware, quantization, pruning, and model distillation.
- Optimization strategies must balance latency vs. throughput considerations, with techniques like batch processing, quantization, and pruning yielding different benefits depending on specific use cases and constraints.
- The development of synthetic data generation has become crucial as many AI applications hit "data walls," with game engines providing valuable physics-informed training environments that may become increasingly important for LLMs.
- Current 3D AI character development is advancing toward fully interactive experiences with conversational NPCs featuring emotional responsiveness, contextual awareness, and multimodal perception capabilities for applications beyond gaming.
- The future of AI may involve more efficient training approaches that prioritize data quality over quantity, with model distillation emerging as a promising technique to transfer capabilities from larger to smaller models while maintaining performance.
Content
Introduction and Background
- Podcast is Latent Space, hosted by Celestial and SWIX
- Guest is Nyla Werko, currently at Google AI, previously at NVIDIA and ConvAI
- Background spans astrophysics research, machine learning, and AI efficiency
Career Journey
- Started in astrophysics manually categorizing astronomical images
- Discovered machine learning's potential through a 1996 paper using neural networks for astronomical classification
- Transitioned from manual image classification to exploring machine learning's broader applications
- Moved from CPU to GPU-based research, working on computer vision and edge devices
- Career focused on AI model training and inference optimization, particularly at eBay
- Later joined NVIDIA's solutions architect program, supporting various AI customers across sectors:
- Recently worked at NVIDIA on 3D content creation acceleration
- Currently works at Convey, focusing on embodied conversational 3D characters
AI Efficiency Trends
- Dramatic price drops in AI intelligence:
- Parallels with computer vision, which saw ~3,000x throughput improvement in six years
- Efficiency improvements driven by:
Inference Optimization Insights
- Inference is critical and should be a primary focus
- It's not just about speed, but meeting human-perceived latency requirements
- Optimization involves techniques like kernel fusion and model quantization
- Specific example from eBay:
Hardware Evolution and Strategy
- Dramatic performance improvements over time
- V100 (130 teraflops) compared to newer GB200 (20,000 teraflops)
- Hardware capabilities have significantly expanded
- Optimization strategy considerations:
- Observed significant learning gaps between hardware, platform, and AI research teams
Performance Optimization Techniques
- Batch size increases can lead to significant efficiency gains
- Dynamic/continuous batching provides performance improvements
- Quantization techniques evolved over time (FP16, Bfloat16, quantization-aware training)
- Pruning networks was effective in computer vision, less so currently for LLMs
- Performance optimization considerations:
Model Quantization Insights
- Quantization reduces precision by storing information in fewer bits
- Vision models may preserve principal feature components more robustly
- Language models might be more sensitive to precision loss due to complex word interactions
- Smaller models potentially more impacted by quantization than larger models
- Discussion of extreme quantization techniques, including ternary models and 1.58 bit models
- Hypothesis that for large models, directional information (yes/no) might matter more than precise numerical weights
- Analogy drawn to physics constants, where directionality is key
Synthetic Data Development
- Identified critical data challenges across industries
- Developed synthetic data solutions for specific use cases:
- Collaborated with researchers like Jonathan Tremblay
- Coined the concept of "hitting a data wall" in AI development
- Predicts similar data limitations will emerge for Large Language Models (LLMs)
- Generating synthetic data requires specialized skills, considered an "art"
- In 3D environments, synthetic data generation is still relatively limited
- Game engines valuable for creating temporally coherent, physics-informed synthetic data
3D Content Creation and AI Models
- Recent AI models are augmenting 3D content creation processes, including:
- Current 3D generation technologies are still imperfect, often producing flawed outputs
- Ongoing research focuses on improving asset topology and generation quality
- Anticipates convergence of video and 3D generation technologies
- Envisions future interactive experiences with:
Training and Model Development
- Current large language model training is relatively "brute force" with massive data ingestion
- Recognition that not all training data is equally valuable for specific use cases
- Computational efficiency has been a key driver in model architecture choices
- Potential for more efficient training by identifying truly valuable data
- Model distillation as a promising approach to reduce computational requirements
- Different models/approaches for different tasks (e.g., Databricks assistant using model collage)
- Multiple types of distillation emerging:
- Specific examples:
Benchmarks and Data Quality
- Concerns about benchmark gaming in AI, particularly in computer vision
- Researchers sometimes submit papers with checkpoints that are not reproducible
- Close benchmark numbers often considered unreliable due to potential manipulation
- FineWeb dataset from HuggingFace demonstrates potential for improving data quality using LLMs
- Initial results suggest training with fewer, higher-quality tokens can achieve similar or better model performance
- Draws parallel to education: quality of information matters more than quantity
AGI Perspectives
- AGI challenges include optimizing for "everything" without a specific problem domain
- Feedback loops are crucial for AI development (e.g., coding environments provide clear feedback)
- Robotics and reinforcement learning show promise, but LLMs are still approximating available knowledge
- Views text (especially structured sources like textbooks) as inherently labeled data
- Sees current LLM approach as a good approximation of human intelligence, but not necessarily achieving true AGI
- Defines potential AGI as self-improving and significantly surpassing human capabilities
- Skeptical about current LLM approaches achieving true AGI
Convey and Conversational 3D AI Characters
- Speaker discusses their work at Convey, creating conversational 3D AI characters
- Key technical capabilities:
- Use cases:
- Technological gaps/challenges:
Technology and Interaction Advancements
- Emerging technologies creating more realistic AI character interactions with improved:
- Observed holographic displays at Computex with screens embedded in transparent glass
- Latency identified as the most critical optimization factor for natural interactions
- Goal is to create AI interactions that feel seamless and responsive
- Emotional tone detection and appropriate character reactions are crucial
- NVIDIA/Convai demo showcased AI characters with:
NPCs Beyond Gaming
- Discussion on using NPCs beyond video games for simulation and training
- Potential enterprise use cases include:
- Gaming industry noted as somewhat conservative about adopting new mechanics
- Indie developers more experimental with AI-driven game experiences
- Speaker created an entirely AI-generated podcast as an early experiment
- While video game market is limited, commercial applications for AI NPCs seem promising
Future Possibilities
- Potential for AI to expand and repurpose intellectual property across different media formats
- Excitement about AI's ability to extend the lifespan of existing games through modding and AI characters
- Anticipation of legal challenges surrounding AI and IP in the coming years
- Creating interactive experiences with virtual characters, including:
- Potential for "sanctioned" AI models approved by the original person/entity
- Challenges of accurately representing historical figures and their contexts
Contact Information
- Naila is open to connections from people interested in:
- Contact methods: