Latent Space: The AI Engineer Podcast

Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind

Overview

* Nicholas Carlini, a DeepMind research scientist, advocates for a balanced perspective on AI capabilities - rejecting both overhyped promises and dismissive skepticism while emphasizing AI's practical utility for specific, well-defined tasks that would otherwise consume significant time.

* AI models provide distinct benefits for different users: experts can maintain focus on complex problems while offloading routine tasks, while non-experts gain access to previously inaccessible technical capabilities - though all users should verify outputs rather than blindly trust AI-generated content.

* Despite not enjoying writing itself, Carlini communicates technical ideas through time-boxed, minimally-edited publications that leverage his security expertise to provide nuanced perspectives, deliberately avoiding social media promotion to maintain productivity and topical freedom.

* As a security researcher, Carlini systematically identifies vulnerabilities in machine learning systems through an "attack first" approach, following responsible disclosure protocols while exploring issues like model stealing, training data extraction, and dataset vulnerabilities.

* Beyond professional research, Carlini maintains creativity through playful technical explorations like building CPUs in Conway's Game of Life, investigating paper-based data storage, and participating in code obfuscation contests - demonstrating his belief that following genuine interests produces higher quality work.

Content

Introduction and Background

* The podcast is Latent Space, hosted by Alessio and Swix, featuring guest Nicholas Carlini, a research scientist at DeepMind. * Carlini has a PhD and BA from Berkeley and works at the intersection of machine learning and computer security. * Despite not enjoying the act of writing itself, Carlini writes to share interesting work and projects. * He took remedial writing classes due to initially poor writing skills but is motivated by communicating interesting ideas rather than by writing itself.

Personal Approach and Interests

* Carlini focuses on high-quality security research while also enjoying fun, sometimes purposeless side projects. * He's fascinated by Turing completeness and Conway's Game of Life as cellular automata. * He's intrigued by the theoretical possibility of running any program within Game of Life and has worked on building complex circuits in the game. * He enjoys exploring technical "rabbit holes" like creating a CPU in Game of Life, maintaining joy and creativity in technical work.

Blog Post on AI/LLMs

* Carlini wrote a blog post about using AI/LLMs to provide a balanced, reality-based perspective on AI capabilities. * He was frustrated with polarized views that are ideologically driven and aimed to present a nuanced view: AI has problems but can also be useful. * He leveraged his credibility as a security researcher who studies AI's limitations. * The post was written quickly (less than 10 hours) in one pass, with minimal editing. * His goal was to counter extreme narratives about AI (both overly positive and negative).

Writing and Publishing Approach

* Carlini tends to create content, publish it, and rarely edit afterward. * He treats published work as a fixed artifact, similar to academic research approach. * He uses time-boxing to limit effort on writing tasks.

Practical AI Applications

* Carlini used GPT-4 to help build a simple web application, finding AI models useful for generating boilerplate code and getting started with unfamiliar technologies. * He used AI to learn Docker, a technology he was conceptually familiar with but hadn't practically used. * His AI assistance workflow involves copying and pasting error messages to debug code, acknowledging outputs aren't perfect but still valuable.

Benefits of Using AI Models

* Carlini emphasizes using AI for specific, small-scale tasks without getting bogged down in unnecessary details. * Key benefits he highlights include: * Quickly solving specific problems * Lowering activation energy for creating software * Usefulness for one-off, ephemeral projects * Saving time on tasks that would otherwise take hours * He takes a focused approach, asking precise questions and being willing to abandon tasks if the model can't help.

AI for Technical Problem-Solving

* Carlini discusses practical uses of AI models like Claude for: * Helping solve specific technical problems (e.g., irrigation system programming) * Debugging complex technical errors * Assisting with research and development tasks * He notes that most research/innovation involves 90% routine tasks that can be automated. * Models are particularly useful for solving "uninteresting parts" of problems. * Contrary to skeptics' claims, models can help with novel research by handling repetitive groundwork.

AI Benefits for Different User Types

* For experts: * Can quickly get help with known algorithms or helper functions * Allows maintaining mental state while solving complex problems * Experts can verify and check the model's output * For non-experts: * Can generate solutions for tasks they couldn't previously do * Useful in contexts like spreadsheet operations * Enables access to more complex problem-solving tools

Cautions and Concerns

* Carlini warns against blindly trusting AI models. * He highlights risks of deploying models in inappropriate or adversarial situations. * References research showing AI-generated code can be less secure. * Emphasizes the importance of rigorous testing and verification.

Evolution of Personal AI Perspective

* Carlini initially viewed AI models like GPT-2 as "toys" and was skeptical of instruction tuning and RLHF techniques. * He acknowledges being wrong about early AI capabilities and wants others to be open to reconsidering past opinions. * He advocates for a nuanced approach to AI adoption, critically evaluating applications rather than dismissing or blindly embracing them.

Specific AI Use Cases

* Google Sheets integration with Gemini for automatic formula writing * Using AI for understanding technologies and APIs when underlying concepts are already known * Navigating complex reference materials (e.g., FFmpeg command line arguments) * Security work for quickly skimming and identifying important code sections * Generating code explanations and understanding technical concepts * Decompiling binaries, where the model can convert compiled source code into readable Python code

Future AI Perspectives

* Carlini advocates for a nuanced view of AI's potential, rejecting extreme positions. * He suggests having "wide margins of error" when predicting AI capabilities. * Believes models will likely improve, but warns against absolute certainty in either direction. * Plans to write about language models' near-term future to provide a balanced perspective.

Online Presence and Content Sharing

* Uses RSS feed and email list for content distribution. * Deliberately avoids social media to maintain productivity. * Posts content without specific promotional goals, prioritizing research work. * Not concerned with view counts or building a personal brand. * Wants to maintain freedom to write about diverse, unrelated topics.

Technical Exploration Projects

* Investigating data storage on paper (potentially around 1-2 megabytes per page) * Exploring data recovery techniques with high accuracy (95-99.7%) * Preparing a submission for the International Obfuscated C Code Contest * Working on a gate-level CPU emulation project

AI Benchmarking and Evaluation

* Discusses the importance of creating domain-specific benchmarks for evaluating AI models. * Current AI model evaluations often use benchmarks that: * Are not directly relevant to specific user needs * Measure progress abstractly rather than practical utility * Can be easily "gamed" by training models to perform well on specific datasets * Proposes developing a domain-specific language for creating personalized benchmarks. * Notes that benchmarks often use multiple "shots" (example prompts) that aren't representative of typical use. * Highlights that single-turn evaluations don't capture the interactive nature of modern AI chat systems.

Prompt Engineering Perspectives

* Skeptical of spending excessive time crafting perfect prompts * Prefers dumping in thoughts in their current form * Believes models should adapt to user input, not vice versa * Views "you're using it wrong" as an unhelpful response * Speculates that models might develop self-prompting capabilities * Notes recent models can now intrinsically break down complex tasks

Security Research in Machine Learning

* Takes an experimental approach where he's satisfied with a model performing "90% of the way there" * Proposed using "adversarial examples" as canaries to detect model overtraining * Discussed research fields like "membership inference" and "dataset inference" * Highlighted vulnerabilities in datasets like the Leon 400M image dataset * Advocates for proactively studying potential security issues in emerging technologies

Model Stealing Research

* Systematically evaluates potential security vulnerabilities in machine learning models * Explores techniques to replicate expensive machine learning models by querying them * Conducted research on OpenAI models with explicit legal permission * Followed responsible disclosure protocols: * Notified vulnerable parties * Gave 90-day window to patch vulnerabilities * Confirmed all identified vulnerabilities were fixed * Found that the primary fix for preventing model stealing was to stop showing logprobs when supplying a logit bias

Memorization and Training Data

* While models can't memorize all training data, they can memorize some portions * Evidence includes finding verbatim 50-word (and sometimes hundreds of words) matches between model outputs and existing internet documents * Researchers have shown methods to potentially recover training data from production models

Personal Work Philosophy

* Carlini explains his motivation for attacking systems rather than building defenses: * He finds attacking systems interesting and fun * Attacking helps understand what is truly secure * Believes doing work you're passionate about leads to higher quality outcomes * Argues that doing meaningful work requires genuine interest and motivation * Suggests that forcing oneself into work that doesn't excite them will result in lower quality output * Believes people should focus on tasks they find intrinsically motivating

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store