Ascend Boosts DeepSeek Growth
Advertisements
The rapid rise of DeepSeek, an advanced AI model, has caused significant upheaval in the industry, leading to what many describe as a "server busy" epidemicUsers attempting to access the platform often find themselves met with the frustrating message to "please try again later." The tremendous demand highlights the startling success of DeepSeek, which topped off at a remarkable leap from 347,000 daily active users to an astonishing 119 million within just one month, fueled by innovative algorithmic enhancements and an open-source strategy that promotes its adoption in niche markets.
As DeepSeek ignites discussions about computational power, the pressure on tech companies to keep pace has intensifiedProminent players in the field—ranging from Shengteng and TianShu Zhixin to more recent entrants like Moole and Huaran Technology—have all announced compatibility with DeepSeek, indicating a broader industry recognition of the urgent need to optimize hardware capabilities to support sophisticated AI operationsHowever, experts caution that merely achieving compatibility is a preliminary stepIn order to fully harness DeepSeek's algorithms, substantial investment is required in areas such as mixed precision training (like FP8), energy consumption balance across multiple scenarios, and deep collaborative optimization between software and hardware.
The emergence of DeepSeek has stimulated a bifurcation in computational power, which experts identify as a dual trajectory of technological sophistication and engineering innovationAs a result, the demand for computational resources is expected to grow even furtherIndustry leaders are doubling down on their investment in pre-trained foundational models, aiming to align with what is termed the Scaling Law while simultaneously exploring the lofty goals of artificial general intelligence (AGI). They are prioritizing the development of efficient, stable, and open infrastructures, along with the building of robust AI clusters and expansive ecosystems.
To illustrate this trend, we can observe Meta’s substantial increase in AI investment from $40 billion to $65 billion, while Google has raised its investment from $52.5 billion to $75 billion
Advertisements
Furthermore, model iteration and technological upgrades are accelerating, as evidenced by the release of Qwen 2.5 - Max by Qianwen and the Gemini 2.0 series by Google.
On the engineering front, new paradigms have emerged, lowering the entry barriers for post-training and distillation processes, leading to what some refer to as a resurgence of "a hundred models, a thousand variations." Companies are now focusing on user-friendly, affordable platforms that balance cost and performance in their distillation and fine-tuning approaches, as well as emphasizing quick deployment and agile business rollouts.
In the B2B sector, many enterprises are rapidly integrating DeepSeek to capitalize on the traffic it generates; within just 20 days of the R1 release, more than 160 businesses worldwide have hooked into DeepSeekOn the consumer side, the user base has experienced explosive growth, giving rise to super apps that are accelerating the widespread adoption of large language models (LLMs). DeepSeek’s extraordinary performance has played a crucial role in elevating societal awareness of LLMs, paving the way for new business models and promoting a virtuous cycle of commercial activity.
To cater to the divergent needs presented by this landscape, computational power structures must evolve to support various demandsFirst, model architecture needs optimization, allowing for larger models to run on existing hardware, thereby enhancing both scale and performanceNext, communication between computational units must be optimized to improve usage efficiency and reduce training time, allowing companies to carry out complex AI tasks more effectivelyAdditionally, optimizations during post-training are essential to minimize labeling data requirements and lower data costs, while techniques like reinforcement learning can significantly enhance model performance.
In terms of inference optimization, supporting the prediction of multiple tokens simultaneously could exponentially increase inference efficiency, providing businesses with quicker, more effective AI applications
Advertisements
The reality is that most AI practitioners need to rely on sufficiently robust foundational computational resources and comprehensive solutions to achieve effective training and inferencingA stable and reliable computational platform not only reduces trial-and-error costs but also enables companies to focus on optimizing their models.
After the launch of DeepSeek V3, Huawei promptly initiated an internal analysis and technological adaptation process, discovering a strong match between DeepSeek’s technical framework and their Shengteng productsFor instance, the MoE architecture aligns with Huawei’s earlier predictions regarding the future of large models, demonstrating their proactive approach in this domainThe Shengteng platform also offers substantial capabilities for simplifying the reinforcement learning process, making it easier for developers.
Remarkably, Shengteng is noted as the industry's first chip platform that has completed full-scale adaptation of DeepSeek's core algorithms, thus supporting the pre-training and fine-tuning of all DeepSeek modelsIt includes advanced features such as support for DualPipe, cross-node All2All, and high-bandwidth communication that align well with DeepSeek's pipeline parallelism and other innovative functionalitiesFurthermore, Shengteng stands out as the only AI training platform that comprehensively adapts from pre-training to fine-tuning for DeepSeekWith the industry shifting towards reinforcement learning training methods from standard fine-tuning (SFT), Shengteng is positioned to provide DeepSeek R1 models alongside reinforcement learning algorithms, coupled with prompt engineering and data sampling techniques to generate high-quality synthetic data.
Through collaboration with partners and clients, Shengteng has rolled out various product forms, including integrated machines, cloud services, and hardware plus open-source community platforms to expedite enterprise deploymentThe coverage spans various sectors, including internet services, finance, telecommunications, government, and education
Advertisements
Advertisements
Advertisements
Leave Your Comment