DeepSeek Releases Next-Generation AI Model: Enhanced Inference, Half the Cost

Sep 30, 2025

Chinese AI developer DeepSeek has launched an experimental Large Language Model (LLM), claiming significant improvements in training and inference capabilities, and lower operating costs.

Hangzhou-based DeepSeek claims the model uses sparse attention technology, which reduces API (application programming interface) call prices by half. APIs are the primary way for businesses and developers to access AI models, with payments based on usage or call volume.


In a post on the developer community Hugging Face, DeepSeek described the new model as a "significant advancement in its next-generation AI product line."


With increasingly fierce competition both domestically and internationally, Chinese tech companies are continuously upgrading their in-house developed large-scale models. Just last week, Alibaba launched its largest and most powerful new-generation flagship model to date.


In fact, international giants like Google and OpenAI explored sparse attention technology as early as 2019. OpenAI noted at the time that the computational cost of a full attention matrix is ​​prohibitive for very large inputs, and that "sparse mode" significantly improves efficiency by focusing only on a few key inputs.


In a paper accompanying the new model, DeepSeek explained that the model utilizes a "lightning indexer" and a "refined token selection mechanism" to ensure that attention calculations are applied only to the most relevant tokens.


Notably, Huawei Cloud confirmed in a statement late Monday that it had "quickly completed" the adaptation of the DeepSeek-V3.2-Exp model.


Currently, DeepSeek's V3.1 and Alibaba's Tongyi Qianwen Qwen3 series rank in the top two in China in the global LLM ranking of AI analytics platforms, trailing only international vendors such as OpenAI, xAI, and Anthropic.


The picture is from the Internet.
If there is any infringement, please contact the platform to delete it.