Google's most powerful model is finally here! The fastest and lightest version is extremely cost-effective!

Jun 19, 2025

On June 17, Google announced a major update to the entire Gemini 2.5 model series: Gemini 2.5 Pro and Gemini 2.5 Flash released official versions and entered the stable operation stage, and launched the Gemini 2.5 Flash-Lite preview version.

This means that Gemini 2.5 Pro and Gemini 2.5 Flash have changed from experimental preview versions to official versions and can be put into enterprise applications.

Gemini 2.5 Flash Lite surpasses 2.0 Flash-Lite in programming, mathematics, science, reasoning and multimodal benchmarks, and the latency of a wide range of tasks is lower than 2.0 Flash-Lite and 2.0 Flash. Google calls this model the most affordable and fastest model in its 2.5 series.


▲Gemini 2.5 Flash Lite benchmark results


Meanwhile, in the latest LMArena rankings, Gemini-2.5-Flash-Lite ranked 12th in text, and by category, creative writing ranked third, programming ranked 14th, and puzzle prompts ranked 17th.



In the price-performance chart released by LMArena, Gemini 2.5 Pro scored more than 120 points higher than Gemini 1.5 Pro, higher than other mainstream models of OpenAI, xAI, and Anthropic.


▲Mainstream model price-performance comparison chart released by LMArena


In terms of price, Gemini-2.5-Flash-Lite is 30%-60% cheaper than Gemini-2.5-Flash, with an input price of $0.1/million tokens and an output price of $0.4/million tokens.

Google also announced the latest pricing for Gemini 2.5 Flash, with the same price for both thinking and non-thinking models, with an input price of $0.3/million tokens and an output price of $2.5/million tokens.



The Google blog mentioned that the sales and demand for Gemini 2.5 Pro continued to grow strongly, the highest of all their models ever. Based on this, the researchers stabilized the 06-05 version of this model and maintained the same Pareto frontier price point as before.

If developers are using Gemini 2.5 Pro Preview 05-06, the model will continue to be available until June 19, 2025, after which it will be closed. If using Gemini 2.5 Pro Preview 06-05, just update the model string to "gemini-2.5-pro".

The Gemini 2.5 Flash-Lite preview is now available in Google AI Studio and Vertex AI, and the 2.5 Flash and Pro stable versions are also available. Both 2.5 Flash and Pro versions can be accessed in the Gemini application. Google also introduced customized versions of 2.5 Flash-Lite and Flash for Google Search.



1. Comprehensively surpassing 2.0 Flash-Lite and supporting Google native tools

The Gemini 2.5 model is an inference model that can perform inferences before responding, thereby improving performance and accuracy. Each model can control the thinking budget, allowing developers to choose the time and extent to which the model "thinks" before generating a response.

Google's blog mentioned that its latest 2.5 Flash-Lite preview version is the model with the lowest latency and cost in the 2.5 series of models. It is a cost-effective upgrade of the Gemini 1.5 and 2.0 Flash models.

Gemini 2.5 Flash Lite comprehensively surpasses 2.0 Flash-Lite in programming, mathematics, science, reasoning and multimodal benchmarks. It performs well in high-capacity, latency-sensitive tasks such as translation and classification, and has lower latency than 2.0 Flash-Lite and 2.0 Flash in a wide range of task samples.



In terms of performance, the new model shortens the time to get the first token while achieving a higher token decoding rate per second. This model is suitable for high-throughput tasks such as large-scale classification or aggregation.

Gemini 2.5 Flash-Lite is an inference model that allows dynamic control of the thinking budget through API parameters. Because Flash-Lite is optimized for cost and speed, unlike other models in Gemini 2.5, the "thinking" function is turned off by default.

The new model has many features of Gemini 2.5, including turning on thinking mode at different budgets, connecting tools such as Google search and code execution, multimodal input, and a context length of 1 million tokens.


2. Gemini 2.X series surpasses previous generations in all aspects, and programming and image understanding are inferior to OpenAI

Google also updated the Gemini 2.5 series model technical report in one go, and fully introduced the Gemini 2.X model series: including Gemini 2.5 Pro and Gemini 2.5 Flash, as well as Gemini 2.0 Flash and Flash-Lite models.

Google's technical report mentioned that Gemini 2.5 Pro is Google's most intelligent thinking model, showing strong reasoning and programming capabilities, good at generating interactive web applications, capable of code base-level understanding, and showing emergent multimodal programming capabilities.


Gemini 2.5 Flash is a hybrid reasoning model with a controllable thinking budget, suitable for most complex tasks, while also controlling the balance between quality, cost, and latency.

Gemini 2.0 Flash is Google's fast and cost-effective non-thinking model built for daily tasks; Gemini 2.0 Flash-Lite is Google's fastest and lowest-cost model, built for large-scale use.



In the technical report, Google compared the performance of the Gemini 2.5 series with the Gemini 1.5 and 2.0 models, as well as the Gemini 2.5 series with other models. It can be seen that the Gemini 2.5 series models perform well on programming tasks such as LiveCodeBench, Aider Polyglot, and SWE-bench Verified, and have made significant improvements over previous models.

In addition to programming performance, the Gemini 2.5 model also performs significantly better than the Gemini 1.5 series on math and reasoning tasks: in the AIME 2025 test, the accuracy of Gemini 2.5 Pro is 88.0%, while the accuracy of Gemini 1.5 Pro is 17.5%; in the GPQA (Diamond) test, the accuracy of Gemini 2.5 Pro is 86.4%. Similarly, image understanding capabilities have also been significantly improved.



Compared with other mainstream large language models, Gemini 2.5 Pro achieved SOTA in the Aider Polyglot programming task. In addition, Gemini 2.5 Pro achieved the highest score in Humanity’s Last Exam, GPQA (Diamond), and SimpleQA and FACTS Grounding factual benchmarks. Gemini 2.5 Pro achieved SOTA with a context length of 128k in LOFT and MRCR long context tasks, and is the only model among all the models examined in the above table that supports a context length of 1M+tokens.

However, in mathematics, Gemini 2.5 Pro performs slightly worse than OpenAI o4-mini, and scores slightly lower than OpenAI-o3 high in image understanding.



It is worth noting that in terms of performance, the Gemini 2.5 Flash model has become the second most powerful model in the Gemini family, surpassing not only the previous Flash model, but also the Gemini 1.5 Pro model released a year ago.


3. The first model series trained on the TPU v5p architecture

The Gemini 2.5 series models use sparse mixture of experts (MoE) models and natively support text, visual, and audio inputs. Sparse MoE models activate a subset of model parameters for each input token by learning to dynamically route tokens to parameter subsets (experts); this enables them to separate the total model capacity from the computation and service costs of each token.

In the face of training instability, the Gemini 2.5 model series has made significant progress in optimizing the stability of large-scale training, signal propagation, and optimization dynamics.

The Gemini 2.5 model builds on the success of Gemini 1.5 in handling long-context queries and incorporates new modeling advances that enable Gemini 2.5 Pro to outperform Gemini 1.5 Pro in handling long-context input sequences of 1M tokens.



Both Gemini 2.5 Pro and Gemini 2.5 Flash can handle long-form text, entire code bases, and long-form audio and video data.

The Gemini 2.5 model family is the first Google model family trained on the TPU v5p architecture. Google uses synchronous data-parallel training, parallelized across multiple 8960 chip pods of Google TPU v5p accelerators distributed across multiple data centers.

Its pre-training datasets are large-scale, diverse data collections covering a wide range of domains and modalities, including publicly available web documents, code (various programming languages), images, audio (including speech and other audio types), and video. The deadline for Gemini 2.0 is June 2024, and the deadline for Gemini 2.5 is January 2025.


Google also used new methods to improve the data quality of filtering and deduplication. Its post-training dataset, composed of carefully collected and reviewed instruction tuning data, is a collection of multimodal data, including paired instructions and responses in addition to human preferences and tool usage data.

In the post-training method stage, Google Research reported that they used models to assist the supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning (RL) stages to achieve more efficient and detailed data quality control.


In addition, Google has increased the training calculations allocated to RL, which is combined with a focus on verifiable rewards and model-based generation rewards to provide more complex and scalable feedback signals. Algorithmic changes to the RL process improve stability during long training periods.

The Gemini inference model is trained through reinforcement learning, which can use additional calculations at inference time to derive more accurate answers. The generated model is able to spend tens of thousands of forward passes in the "thinking" stage before answering questions or queries.


Conclusion: Accelerating the model into production, Google accelerates the deployment of large models

Gemini 2.X is based on the Gemini 1.5 series. Google is exploring the route of creating a more general AI assistant. It can be seen that the performance of the 2.X series models has overall surpassed the previous generation.

In addition, Google decided to change these models from preview to official version in one go. Its new models emphasize more powerful reasoning capabilities and economical features, which may reflect the increasing pressure it faces and the need to keep pace with other large model companies in quickly deploying corresponding tools for consumers and enterprises.

The picture is from the Internet.
If there is any infringement, please contact the platform to delete it.