Overview

GPU cloud economics differ fundamentally from traditional CPU cloud computing, with CoreWeave succeeding by focusing exclusively on long-term contracts while operating more like a bank/real estate company than a typical cloud provider.

The market experiences cyclical supply-demand fluctuations, with current temporary GPU oversupply likely returning to shortage by winter, while inference demand remains concentrated among few companies with enterprise sectors increasingly purchasing H100s.

Price sensitivity dominates customer behavior despite large budgets, making high-margin software services difficult to sustain and pushing providers toward treating GPU infrastructure as real estate businesses with separated hardware and software services.

SF Compute evolved from a music model training startup to a "GPU realtor" offering hourly rentals, dynamic pricing, and the ability to sell back unused contracts—creating a liquid market that maintains near 100% utilization.

The company employs comprehensive reliability systems including 48-72 hour burn-in testing, automated refunds for failures, and direct engineering support, while deliberately positioning itself as a "calm," anti-hype alternative in the AI infrastructure space.

Content

CoreWeave's Business Model and GPU Cloud Computing Economics

The discussion begins by examining CoreWeave's successful business model in the GPU cloud computing market, highlighting fundamental differences between GPU and traditional CPU cloud computing:

- CPU cloud model: Commodity hardware, high software margins, on-demand compute - GPU cloud model: Fundamentally different economics due to scaling laws, significantly more expensive hardware (billions vs. millions of dollars)

GPU cloud customers exhibit distinct behaviors:

- Highly price-sensitive despite large budgets - See value in adding more compute capacity (unlike CPU customers) - For model training and inference, adding GPUs consistently provides incremental performance improvements

CoreWeave's successful strategy involves:

- Focusing exclusively on long-term contracts - Ignoring short-term/bottom-end market - Maximizing long-term customer commitments

CoreWeave operates more like a bank and real estate company than a traditional cloud provider, with two prevailing perspectives:

- Innovative cloud provider competitive with hyperscalers - Potentially problematic business model

Hyperscalers and GPU Market Dynamics

Hyperscalers (Microsoft, AWS, Google) likely lose money reselling NVIDIA GPUs, as this is less profitable compared to:

- Training their own models - Competing with NVIDIA - Leveraging existing high-margin CPU businesses

CoreWeave's strategic advantages include:

- Top clients (Microsoft and OpenAI) representing 77% of revenue - Low-risk clients enabling lower interest rates and favorable lending terms - Long-term, prepaid or trusted payment contracts

Different contract strategies exist for GPU cloud providers, with varying levels of risk:

- Selling to low-risk customers (like OpenAI) is optimal - Large contracts paid over time are riskier but potentially viable

Pricing and market dynamics create challenges:

- High-price GPU hour strategies often fail due to customer price sensitivity - Customers continually push prices down, regardless of software features - High initial prices to offset depreciation risk are unsustainable

Business Model Challenges and NVIDIA's Strategy

Simply copying CPU cloud business models doesn't work for GPU infrastructure:

- Short-term contracts with low prices paid over time create cash flow squeezes - Increases interest rate risks - Large tech players (Microsoft, NVIDIA) can potentially undercut smaller providers

NVIDIA likely avoids launching its own cloud provider to:

- Prevent competing with existing customers - Maintain a diverse customer base to avoid price control by a few large customers - Avoid potential antitrust concerns

Cloud provider economics present challenges:

- Coupling software and hardware in compute businesses is economically difficult - Companies like DigitalOcean and Together may struggle to make money on GPU clusters - Price-sensitive customers have limited additional budget beyond hardware costs

Business recommendations include:

- Treating GPU clouds as real estate businesses - Separating hardware and software services - Following successful models like Modal (software services without hardware ownership) or CoreWeave (effective real estate/infrastructure business)

SF Compute's Origin and Evolution

SF Compute initially planned to train music and audio models but discovered significant difficulties in obtaining flexible cloud computing contracts:

- Traditional providers only offered long-term (year-long) contracts - Short-term/monthly compute rentals were essentially unavailable - Providers faced economic risks with flexible arrangements

SF Compute's initial strategy involved:

- Purchasing a year-long compute contract - Planning to sublease unused months to manage costs - Facing potential bankruptcy risk if unable to sell excess cluster capacity - Needing to sell ~$500,000 of compute monthly to survive

The company evolved from its initial plans:

- Transitioned to becoming a "GPU realtor" or compute broker - Began matching clients with available compute resources - Developed expertise in navigating complex compute market dynamics - Started as GPU brokers making "bespoke deals" by combining customers and vendors

SF Compute now claims to be the most liquid GPU market with unique features:

- Offers hourly GPU rentals (thousands of H100s) - Enables selling back unused GPU contracts - Allows for short-term and burst capacity purchases - Prices adjust dynamically to maintain near 100% utilization

GPU Market Supply and Demand Dynamics

The H100 GPU market experienced complex fluctuations:

- Supply chain bottlenecks initially caused shortages - Companies overordered GPUs to secure allocations, leading to temporary oversupply - Current market has a functional "glut" of GPUs - Prediction: By winter, the market may return to a shortage state

Inference and compute trends show:

- Test-time inference is expected to significantly expand compute usage - Most inference demand is concentrated in a few companies - Enterprise sectors like bio/pharma are purchasing large numbers of H100s for training - Inference use cases vary by industry, with some having limited but critical inference events

Open source AI currently represents only about 5% of total AI compute demand:

- OpenRouter operated with minimal GPU infrastructure (around 10 H100 nodes) - Large consumer AI products like GPT-4 require significantly more computational resources

Perspectives on Distributed Computing and Financing

The speaker is skeptical of fully decentralized compute markets:

- Physical limitations like speed of light make centralized, co-located data center clusters more efficient - Potential for crypto tokens to initially subsidize compute prices, but not a long-term solution - Centralized, high-speed interconnected clusters remain more practical than fully distributed compute networks

GPU cluster and startup financing insights:

- There was a period when VCs providing GPU clusters made financial sense by arbitraging credit risk - Startups typically struggle to get large loans ($50 million), while entities with existing assets find it easier - VCs could offer equity for compute as a strategic financial hack - This arbitrage opportunity has since been competed down

AI Grants (founded by Nat and Daniel) is 5-7 years old, with Andromeda being a $100 million GPU cluster

SF Compute's Pricing Mechanism and Market Development

Compute pricing is dynamic, similar to perishable goods:

- Prices drop as compute time approaches expiration - Immediate/instant compute prices are essentially "preemptible" prices - Market prices can be volatile but historically offer significant cost savings

Optimal usage strategies include:

- Setting a high limit price (e.g., $4) to ensure continuous compute access - Buying at lowest available market price - Potentially restricting purchases to specific conditions - Achieving average prices as low as $0.80-$1 per hour

SF Compute's market development goals:

- Create an underlying spot market with an index price - Develop a cash-settled futures market for compute resources - De-risk compute pricing and reduce capital costs for data centers

Technical Infrastructure and Reliability

SF Compute developed a comprehensive cluster auditing infrastructure:

- Burn-in process involves running Linpack and stress testing hardware for 48-72 hours - Performance tests simulating realistic environments - Active and passive testing during GPU operations - Automated refund mechanisms for hardware failures

HPC clusters inherently face hardware reliability challenges:

- Hardware failures are common, with components having varying probabilities of failure - New and unexpected failure modes can emerge that aren't caught by standard checks

SF Compute's approach to service issues includes:

- Immediate replacement of problematic hardware - Automatic refunds or prorated adjustments - Strict Service Level Agreement (SLA) with cloud providers - Website mechanism for customers to report hardware problems

Technical capabilities include:

- Baseboard Management Controller (BMC) access for remote machine management - Ability to reset/re-image machines remotely - Direct engineering support, including debugging at any hour - Running clusters "from bare metal up" - Custom UEFI shims and boot images - Flexible deployment options (Kubernetes, VMs)

Risk Reduction Philosophy and Branding

SF Compute's marketplace focuses on:

- Optimizing for buyers and sellers - Potentially exploring cash-settled futures in the future - Creating standardized contracts for trading compute resources

Their risk reduction philosophy emphasizes:

- Futures/derivatives primarily for reducing risk, not speculation - Addressing current market practices where data centers push risk onto startups and VCs - Introducing more stability and risk management to counter the potentially unstable venture capital market

SF Compute's branding and approach:

- Deliberately positioning as a "calm" alternative to the hyped-up AI market - Seeking to be the "opposite force" of extreme, speculative approaches - Intentionally avoiding a hyped-up, "magical" website experience - Setting low expectations to ensure users are pleasantly surprised - Embracing an "anti-hype" approach that paradoxically created its own form of hype

Personal Journey and Current Focus

The speaker discusses his entrepreneurial journey:

- Started with Quirk (a mental health app) - Transitioned to Room Service (a distributed systems company) - Spent about four years trying multiple product ideas (approximately 40 different products) - Primary goal was to "not die" as a startup

Current professional context at SF Compute:

- Hiring for systems engineering (Linux/Rust focus) and financial systems engineering - CTO Eric Park described as extremely kind and chill - Team culture emphasized as positive

The specialized fintech engineering role focuses on:

- Managing financial ledgers - Ensuring recording requirements are met - Preventing loss of money flowing through the system - Creating better pricing for vendors and buyers - Enabling researchers to access expensive computational resources - Supporting critical research (e.g., cancer research)

SF Compute: Commoditizing Compute