Dylan Patel on the AI Chip Race - NVIDIA, Intel & the US Government

Key Takeaways

Nvidia's $5 billion investment in Intel signals a strategic shift in the semiconductor industry.
China is actively pursuing self-sufficiency in AI chip production amidst US export restrictions.
Nvidia maintains market leadership through aggressive inventory management and rapid product development.
Hyperscalers are projected to spend up to $500 billion on AI infrastructure in the coming year.
Oracle is emerging as a significant player in the AI compute market by securing major partnerships.
Next-generation GPUs like Nvidia's GB200 offer substantial performance gains but introduce new reliability challenges.
AI inference workloads are being disaggregated to optimize performance and user experience.

Nvidia has invested $5 billion in Intel for custom data center and PC products, a move that surprised the semiconductor industry.
The deal is viewed as a potential 'lifeline' for Intel and could lead to the development of integrated NVIDIA graphics in x86 laptops.
Guido Appenzeller, a former Intel CTO, suggests this could prompt Intel to reconsider or discontinue its internal graphics and AI product initiatives due to current market competitiveness.
Nvidia's investment has already reportedly seen a 30% increase in value, exhibiting a 'Buffett effect'.

China aims for self-sufficiency in AI hardware, with DeepSeek planning to use domestically produced chips for future models, aligning with government mandates.
Huawei, banned from foreign supply chains in 2020, has developed its Ascend AI chip line and is now manufacturing custom High Bandwidth Memory (HBM).
Despite developing advanced chips, Huawei's production capacity relies on foreign components, creating a bottleneck for scaling HBM manufacturing.
The US government's restrictions on NVIDIA chip exports to China have spurred a domestic industry, with companies like Huawei and Camera-Con gaining opportunities.

Analyst Dylan Patel projects hyperscaler (Microsoft, Amazon, Google, Oracle, Meta) capital expenditure for AI infrastructure to reach $450-500 billion next year.
This estimate significantly exceeds the consensus bank projection of $360 billion, with most spending directed towards NVIDIA.
OpenAI's rapid revenue growth is estimated at $35-45 billion ARR by next year, with a projected $15 billion annual compute burn rate.
Companies like Microsoft and OpenAI may not achieve profitability until 2029 due to substantial cash burn on AI compute and development.

Nvidia CEO Jensen Huang is known for his 'bet the farm' approach, making large volume orders for unproven technologies like the Xbox before securing deals.
Huang's leadership emphasizes continuous innovation and winning the 'next game,' often prioritizing long-term vision over predictable quarterly results.
His willingness to take significant risks, despite past failures in sectors like mobile, is a defining characteristic of his over 30-year tenure.
Early AI presentations by Huang at CES around 2014-2015, focusing on AI and self-driving cars, were initially met with skepticism from a gaming-focused audience.

Nvidia is recognized for its consistent ability to achieve successful first-silicon designs, often shipping A0 or A1 revisions, due to robust simulation and verification processes.
The company initiates high-volume production of early chip revisions to prepare for metal layer transitions, avoiding delays faced by competitors requiring multiple revisions.
Nvidia's speed in transitioning from design to shipment involves cutting non-essential features, exemplified by the late integration of TensorCores into the Volta chip.
This hardware development speed challenges Nvidia's software division to concurrently develop necessary drivers and infrastructure.

Given regulatory limitations on major acquisitions like ARM, Nvidia faces questions about deploying its substantial cash flow.
Potential avenues include building large-scale AI infrastructure or investing in robotics and other emerging AI applications.
The company's investment strategy could disrupt venture capital by fully funding promising AI startups like Anthropic or OpenAI to secure strategic influence.
Nvidia has shifted its pricing strategy from volume discounts for hyperscalers to a uniform price for all customers, influenced by antitrust considerations.

Amazon's cloud infrastructure, previously optimized for scale-out computing, is undergoing an AI resurgence after being deemed unsuited for current AI-driven scale-up demands.
AWS revenue growth, which had been decelerating year-over-year, is projected to re-accelerate past 20% due to deployment of new data centers with NVIDIA and Trainium GPUs.
Despite structural issues like lagging networking technology, Amazon holds the largest amount of data center capacity for AI revenue generation in the immediate future.
Amazon's historical operation of high-density data centers positions it to implement advanced cooling solutions for AI build-outs, despite added costs.

Oracle is identified as a leader in the AI compute market due to its large balance sheet, flexible hardware choices, and strong software capabilities.
The company is significantly expanding its data center footprint, with multi-gigawatt commitments tracked through regulatory filings and supply chain analysis.
Oracle's strategy involves leasing data center capacity to major AI players like OpenAI and ByteDance, mitigating its own risk by purchasing GPUs closer to rental dates.
This methodology accurately predicted Oracle's revenue for recent years and forecasts continued growth based on identified future data center plans.

Nvidia's GB200 GPUs are estimated to be 1.6 times more expensive than H100s in terms of total cost of ownership (TCO).
While some performance metrics for GB200 show 2-3x gains, DeepSeek inference indicates over 6-7x improvement, translating to 3-4x performance per dollar.
The GB200's 72-GPU configuration presents reliability challenges; a single GPU failure in an H100 system takes the server offline, but a GB200 failure requires a more complex solution.
Cloud providers are adjusting Service Level Agreements (SLAs) for high-density GPU systems like GB200 to account for increased failure rates and significant 'blast radius' of single failures.

AI chips are evolving from distinct training and inference designs towards unified architectures as workloads blur, with inference now dominating training processes like reinforcement learning.
AI inference involves two distinct workloads: pre-fill (calculating KV cache for initial document processing) and decode (auto-regressively generating each token), each with different computational demands.
Companies like OpenAI, Anthropic, and Google disaggregate these workloads, running them on separate GPU sets to optimize for long context inputs (pre-fill) or long output generation (decode).
This disaggregation improves user experience by guaranteeing a specific time to first token, which is often prioritized over total generation time.