Did DeepSeek Use 50,000 NVIDIA GPUs for R1? AI Model Sparks Debate on Efficiency & Transparency
Image Credit: Adi Goldstein | Splash
Chinese AI startup DeepSeek has unveiled its latest model, R1, which reportedly rivals leading U.S. AI technologies. This achievement has ignited discussions about the resources employed in its development, particularly the speculated use of 50,000 NVIDIA GPUs. The implications of this claim are profound, potentially reshaping the landscape of AI research and development.
[Read More: DeepSeek’s 10x AI Efficiency: What’s the Real Story?]
DeepSeek's R1: Efficiency and Performance
DeepSeek, a Chinese AI startup based in Hangzhou, has developed an open-source AI model named R1. The company claims that R1 was developed in just two months at a cost of under US$6 million. This figure is notably lower than the expenditures of competitors like OpenAI, which reportedly spends over US$5 billion annually on AI development.
In terms of performance, DeepSeek-R1 has demonstrated strong results across various benchmarks. According to DeepSeek's official release, DeepSeek-R1 achieved a Pass@1 score of 79.8% on the AIME 2024 reasoning task, slightly surpassing OpenAI's o1-1217 model. Additionally, on the MMLU benchmark, which evaluates language models across multiple subjects, DeepSeek-R1 scored 90.8%, closely approaching OpenAI's o1 model, which scored 91.8%.
However, it's important to note that the reported US$6 million cost pertains specifically to the final training phase of the R1 model. This figure does not encompass other expenses such as computing infrastructure, prior training runs, or research and development efforts. Consequently, the total investment in developing R1 is likely higher than the stated amount.
Speculations on GPU Usage
The details of DeepSeek's computational resources for training their R1 model have been a topic of debate among industry analysts. While DeepSeek reported using approximately 2,048 NVIDIA H800 GPUs over a two-month period for training its V3 model, there has been no official release on the number of GPUs used for the R1 model. Some analysts remain skeptical of these claims. For instance, a report from SemiAnalysis suggests that DeepSeek may have a significantly larger infrastructure, potentially involving up to 50,000 NVIDIA GPUs and an investment of approximately US$1.6 billion in hardware for the development of the R1 model.
However, these figures are speculative and not officially confirmed by DeepSeek.
[Read More: DeepSeek AI Faces Security and Privacy Backlash Amid OpenAI Data Theft Allegations]
The Evolution of GPU Clusters in AI
The use of large GPU clusters in AI development has become a common practice among leading technology companies. Some notable examples include:
Meta's AI Infrastructure:
In March 2024, Meta announced the deployment of two new data center-scale clusters, each equipped with 24,576 NVIDIA H100 GPUs, designed to support advanced AI research and development, including the training of models like Llama 3.
By October 2024, Meta disclosed that it was using over 100,000 NVIDIA H100 GPUs to train Llama 4, marking a significant expansion of its AI infrastructure.
Tesla's Dojo Supercomputer:
In August 2023, Tesla powered on Dojo for production use and launched a new AI training cluster featuring 10,000 NVIDIA H100 GPUs. This infrastructure is designed to enhance Tesla's Full Self-Driving (FSD) technology by improving AI training efficiency.
[Read More: Italy Bans DeepSeek AI: First Nation to Block China’s AI Over Privacy Issues]
DeepSeek vs. Industry Peers
DeepSeek's reported approach contrasts with industry norms, where extensive GPU clusters are standard for training large AI models. While companies like Meta and Tesla have publicly detailed their GPU usage, DeepSeek's lack of transparency has led to skepticism. Some experts argue that such opacity makes it difficult to assess the true capabilities and scalability of DeepSeek's R1 model.
Elon Musk: The CEO of Tesla and SpaceX has expressed doubts about DeepSeek's claims. He questioned the feasibility of achieving such advancements with the resources claimed by DeepSeek.
Palmer Luckey: The founder of Oculus VR has also expressed skepticism about DeepSeek's assertions, particularly concerning the cost and scale of their AI model development. He criticized the U.S. media for uncritically reporting DeepSeek's claims, suggesting that the narrative might be influenced by strategic interests.
Stacy Rasgon: An analyst at Bernstein, Rasgon has been critical of the notion that DeepSeek achieved its feats with minimal investment, stating, "It seems categorically false that ‘China duplicated OpenAI for $5M’ and we don’t think it really bears further discussion".
Atif Malik: A Citi analyst, Malik has questioned DeepSeek's reported GPU usage, suggesting that the company's achievements may not have been possible without the use of advanced GPUs for fine-tuning or building the underlying large language models.
The tech community remains divided over DeepSeek's claims. Some experts praise the potential for more efficient AI development, while others express concerns about the lack of transparency. The absence of detailed information about DeepSeek's hardware infrastructure has led to calls for greater openness to validate its claims.
The Transparency Debate: DeepSeek’s Silence on Hardware Usage
DeepSeek’s reluctance to fully disclose the details of its hardware infrastructure raises questions about its strategic motivations. One possibility is that the company aims to maintain a competitive advantage by keeping its resources and methods confidential. In an industry where AI development is heavily reliant on computing power, revealing the exact scale of their infrastructure could expose DeepSeek’s capabilities to competitors, regulators, or even potential investors.
Another perspective suggests that the lack of transparency may indicate that DeepSeek’s reported efficiency claims are exaggerated or incomplete. The stark contrast between DeepSeek’s stated costs and those of other AI giants like OpenAI and Meta has fueled skepticism. If the company’s model was truly developed with a fraction of the usual investment, it would represent a major breakthrough in AI efficiency—but without verifiable data, such claims remain speculative.
Alternatively, DeepSeek’s secrecy could be part of a deliberate marketing strategy. By allowing speculation to circulate, the company keeps itself at the center of industry discussions, potentially boosting interest and investment. The ambiguity surrounding its AI model could also attract attention from policymakers, media outlets, and global AI researchers, positioning DeepSeek as a formidable player in the AI space.
Regardless of the underlying reason, the lack of transparency poses challenges for credibility. Without clearer disclosures, the tech community remains divided—some see DeepSeek as a disruptor, while others view its claims with deep skepticism. Until the company provides concrete evidence about its infrastructure and training process, questions about its true capabilities will persist.
License This Article
Source: Financial Times, Tom’s Hardware, Time, Reuters, Engineering FB, The Register, Newsweek, The Verge