Latent Space: The AI Engineer Podcast

SF Compute: Commoditizing Compute

Overview

Content

CoreWeave's Business Model and GPU Cloud Computing Economics

- CPU cloud model: Commodity hardware, high software margins, on-demand compute - GPU cloud model: Fundamentally different economics due to scaling laws, significantly more expensive hardware (billions vs. millions of dollars)

- Highly price-sensitive despite large budgets - See value in adding more compute capacity (unlike CPU customers) - For model training and inference, adding GPUs consistently provides incremental performance improvements

- Focusing exclusively on long-term contracts - Ignoring short-term/bottom-end market - Maximizing long-term customer commitments

- Innovative cloud provider competitive with hyperscalers - Potentially problematic business model

Hyperscalers and GPU Market Dynamics

- Training their own models - Competing with NVIDIA - Leveraging existing high-margin CPU businesses

- Top clients (Microsoft and OpenAI) representing 77% of revenue - Low-risk clients enabling lower interest rates and favorable lending terms - Long-term, prepaid or trusted payment contracts

- Selling to low-risk customers (like OpenAI) is optimal - Large contracts paid over time are riskier but potentially viable

- High-price GPU hour strategies often fail due to customer price sensitivity - Customers continually push prices down, regardless of software features - High initial prices to offset depreciation risk are unsustainable

Business Model Challenges and NVIDIA's Strategy

- Short-term contracts with low prices paid over time create cash flow squeezes - Increases interest rate risks - Large tech players (Microsoft, NVIDIA) can potentially undercut smaller providers

- Prevent competing with existing customers - Maintain a diverse customer base to avoid price control by a few large customers - Avoid potential antitrust concerns

- Coupling software and hardware in compute businesses is economically difficult - Companies like DigitalOcean and Together may struggle to make money on GPU clusters - Price-sensitive customers have limited additional budget beyond hardware costs

- Treating GPU clouds as real estate businesses - Separating hardware and software services - Following successful models like Modal (software services without hardware ownership) or CoreWeave (effective real estate/infrastructure business)

SF Compute's Origin and Evolution

- Traditional providers only offered long-term (year-long) contracts - Short-term/monthly compute rentals were essentially unavailable - Providers faced economic risks with flexible arrangements

- Purchasing a year-long compute contract - Planning to sublease unused months to manage costs - Facing potential bankruptcy risk if unable to sell excess cluster capacity - Needing to sell ~$500,000 of compute monthly to survive

- Transitioned to becoming a "GPU realtor" or compute broker - Began matching clients with available compute resources - Developed expertise in navigating complex compute market dynamics - Started as GPU brokers making "bespoke deals" by combining customers and vendors

- Offers hourly GPU rentals (thousands of H100s) - Enables selling back unused GPU contracts - Allows for short-term and burst capacity purchases - Prices adjust dynamically to maintain near 100% utilization

GPU Market Supply and Demand Dynamics

- Supply chain bottlenecks initially caused shortages - Companies overordered GPUs to secure allocations, leading to temporary oversupply - Current market has a functional "glut" of GPUs - Prediction: By winter, the market may return to a shortage state

- Test-time inference is expected to significantly expand compute usage - Most inference demand is concentrated in a few companies - Enterprise sectors like bio/pharma are purchasing large numbers of H100s for training - Inference use cases vary by industry, with some having limited but critical inference events

- OpenRouter operated with minimal GPU infrastructure (around 10 H100 nodes) - Large consumer AI products like GPT-4 require significantly more computational resources

Perspectives on Distributed Computing and Financing

- Physical limitations like speed of light make centralized, co-located data center clusters more efficient - Potential for crypto tokens to initially subsidize compute prices, but not a long-term solution - Centralized, high-speed interconnected clusters remain more practical than fully distributed compute networks

- There was a period when VCs providing GPU clusters made financial sense by arbitraging credit risk - Startups typically struggle to get large loans ($50 million), while entities with existing assets find it easier - VCs could offer equity for compute as a strategic financial hack - This arbitrage opportunity has since been competed down

SF Compute's Pricing Mechanism and Market Development

- Prices drop as compute time approaches expiration - Immediate/instant compute prices are essentially "preemptible" prices - Market prices can be volatile but historically offer significant cost savings

- Setting a high limit price (e.g., $4) to ensure continuous compute access - Buying at lowest available market price - Potentially restricting purchases to specific conditions - Achieving average prices as low as $0.80-$1 per hour

- Create an underlying spot market with an index price - Develop a cash-settled futures market for compute resources - De-risk compute pricing and reduce capital costs for data centers

Technical Infrastructure and Reliability

- Burn-in process involves running Linpack and stress testing hardware for 48-72 hours - Performance tests simulating realistic environments - Active and passive testing during GPU operations - Automated refund mechanisms for hardware failures

- Hardware failures are common, with components having varying probabilities of failure - New and unexpected failure modes can emerge that aren't caught by standard checks

- Immediate replacement of problematic hardware - Automatic refunds or prorated adjustments - Strict Service Level Agreement (SLA) with cloud providers - Website mechanism for customers to report hardware problems

- Baseboard Management Controller (BMC) access for remote machine management - Ability to reset/re-image machines remotely - Direct engineering support, including debugging at any hour - Running clusters "from bare metal up" - Custom UEFI shims and boot images - Flexible deployment options (Kubernetes, VMs)

Risk Reduction Philosophy and Branding

- Optimizing for buyers and sellers - Potentially exploring cash-settled futures in the future - Creating standardized contracts for trading compute resources

- Futures/derivatives primarily for reducing risk, not speculation - Addressing current market practices where data centers push risk onto startups and VCs - Introducing more stability and risk management to counter the potentially unstable venture capital market

- Deliberately positioning as a "calm" alternative to the hyped-up AI market - Seeking to be the "opposite force" of extreme, speculative approaches - Intentionally avoiding a hyped-up, "magical" website experience - Setting low expectations to ensure users are pleasantly surprised - Embracing an "anti-hype" approach that paradoxically created its own form of hype

Personal Journey and Current Focus

- Started with Quirk (a mental health app) - Transitioned to Room Service (a distributed systems company) - Spent about four years trying multiple product ideas (approximately 40 different products) - Primary goal was to "not die" as a startup

- Hiring for systems engineering (Linux/Rust focus) and financial systems engineering - CTO Eric Park described as extremely kind and chill - Team culture emphasized as positive

- Managing financial ledgers - Ensuring recording requirements are met - Preventing loss of money flowing through the system - Creating better pricing for vendors and buyers - Enabling researchers to access expensive computational resources - Supporting critical research (e.g., cancer research)

More from Latent Space: The AI Engineer Podcast

Explore all episode briefs from this podcast

View All Episodes →

Listen smarter with PodBrief

Get AI-powered briefs for all your favorite podcasts, plus a daily feed that keeps you informed.

Download on the App Store