AI界重磅炸弹:“50美元”复现DeepSeek R1
近期,斯坦福大学李飞飞团队的研究人员与华盛顿大学等研究机构,共同发布了一款新的模型:s1,在数学和编程能力的评测上展现出较为优秀的水准。这一消息无疑在AI界投下了一颗重磅炸弹,各方跃跃欲试,然而复现是否真的如传闻中那么容易呢?
“50美元26分钟”复现DeepSeek R1 其实没那么简单!
通义模型的“基座”作用
时间上来说,“50美元26分钟”仅指针对一个开源基础模型进行监督微调(SFT)所耗费的资源与时间,不包括前期的数据准备、基础模型的训练,以及各类相关组件的部署时间。尽管微调过程较快,但整个研究仍然依赖于SFT训练数据的整理和基础模型的预训练,这两个环节通常会耗时数周至数月。
s1模型背后借力了两款大模型。其一是Google 近期推出的Gemini Flash Thinking,负责生成1000条问题及其答案的思维链,作为训练数据集。其二是阿里云近期推出的Qwen2.5-32B-Instruct,作为SFT的基座模型。在这两者的共同支撑下,s1模型得以产生。
根据 s1模型的研究论文显示,s1模型是以阿里通义千问模型为基础微调,再通过 1000个样本数据完善训练,通义模型在其中起到了重要的基座作用。与“原生”深度推理模型不同,s1 不是经过强化学习(RLHF)训练,是利用1000条带有思维链的问题-答案对,使预训练基座模型(Qwen)变得更擅长推理。
斯坦福s1论文原文注明模型是以阿里通义千问模型为基础微调
国外多位人工智能研究者也指出,不少的“新”模型都是建立通义模型基础上
低成本训练大模型为AI领域提供了新的思考方向:
训练基于阿里云通义千问(Qwen)模型进行监督微调,s1模型并非从零开始,而是结合了预算约束训练(Budget Forcing)、极小样本训练、高效SFT、测试时扩展(test-time scaling)等技术与方法,构建出的一个低成本、高推理能力模型。这种低成本训练得益于已有的强大开源基础模型,展现了AI训练的潜力,依然是开源的优势显现,其研究思路无疑也为AI领域提供了新的思考方向……
其发布充分展现了全球大模型技术与产业的飞速发展趋势,也有力地证明当前全球大模型发展应是在高水平开放环境下的良性竞争。在全球范围内,各国顶尖企业、研究团队和人才在前沿技术领域持续竞逐,在竞争中相互借鉴,在融合中推动创新。这种开放合作的生态方能加速技术突破,并为全球人工智能产业的繁荣奠定坚实基础。
AI Community Bombshell: "$50" Recreation of DeepSeek R1
Recently, researchers from Li Feifei's team at Stanford University, in collaboration with institutions such as the University of Washington, released a new model: S1, which demonstrated commendable performance in mathematical and programming ability assessments. This news has undoubtedly sent shockwaves through the AI community, with many eager to try their hand at it. However, is the recreation really as easy as the rumors suggest?
The "50 dollars for 26 minutes" recreation of DeepSeek R1 is actually not that simple!
The "Foundation" Role of the Qwen Model
The "$50 for 26 minutes" only refers to the resources and time spent on supervised fine-tuning (SFT) of an open-source foundation model, excluding the initial data preparation, the training of the foundation model, and the deployment time for various related components. While the fine-tuning process is relatively fast, the entire research still relies on the organization of SFT training data and the pre-training of the foundation model, both of which typically take several weeks to months. The S1 model did not emerge out of thin air but stands on the "shoulders" of two top models in the industry. One is Google’s recently launched Gemini Flash Thinking, which is responsible for generating 1,000 questions and their corresponding answers as a reasoning chain for the training dataset. The other is Alibaba’s recently launched Qwen 2.5-32B-Instruct, which serves as the base model for the SFT. It is through the combined support of these two models that the S1 model was able to emerge.
According to the research paper on the S1 model, the S1 model is fine-tuned based on Alibaba's Qwen Model, and further trained with 1,000 sample data. The Qwen Model plays a crucial foundational role in this process. Unlike "native" deep reasoning models, the S1 model has not undergone reinforcement learning (RLHF) training. Instead, it uses 1,000 question-answer pairs with reasoning chains to enhance the pre-trained foundation model (Qwen), making it better at reasoning.
The original Stanford S1 paper also explicitly states that the model was fine-tuned based on Alibaba's Qwen model.
Many foreign AI researchers have also noted that numerous "new" models are actually built on the Qwen model foundation.
Low-Cost Training of Large Models Provides a New Direction for AI Field:
The S1 model, trained through supervised fine-tuning based on Alibaba Cloud's Qwen Model, does not start from scratch. Instead, it combines techniques such as budget-constrained training (Budget Forcing), extremely small sample training, efficient SFT, and test-time scaling to build a low-cost, high-reasoning-capability model. This low-cost training benefits from the existing powerful open-source foundation model, demonstrating the potential of AI training and showcasing the advantages of open-source. Its research approach undoubtedly provides a new direction for thinking in the AI field...
The release of the S1 model fully showcases the rapid development trend of global large model technology and industry, and it strongly proves that the future development of global large models should be driven by healthy competition in a high-level open environment. Around the world, top enterprises, research teams, and talent from various countries continue to compete in cutting-edge technology, learn from one another in the process, and drive innovation through collaboration. This open and cooperative ecosystem is the key to accelerating technological breakthroughs and laying a solid foundation for the prosperity of the global AI industry.