o3 conquers virology, surpassing 94% of PhD experts! Bioweapons collapse?

Apr 24, 2025

o3's virology ability beat 94% of PhD experts, with an accuracy rate of 43.8%. Several research institutions joined forces to reveal through VCT testing that top LLMs can not only solve complex experimental problems, but also directly lower the threshold for biological weapons manufacturing.



AI is conquering the field of biology again.

Researchers from SecureBio, Center for AI Safety and other institutions found that o3's virology ability has surpassed 94% of virology experts.

They developed a "Toxicology Ability Test" (VCT), which includes 322 multiple-choice questions, covering text and images, focusing on complex problems in actual laboratory operations.

These problems were designed by 57 virologists to simulate scenarios in real experiments that are difficult to search online for solutions.


The test results are shocking:

o3 has an accuracy rate of 43.8%, Gemini 2.5 Pro has an accuracy rate of 37.6%. It should be noted that the average score of doctoral-level virologists is only 22.1%.

At the same time, a 31-page technical report has been released. This discovery is indeed exciting, but it also sounded the alarm.

Seth Donoughe, co-author of the paper, said bluntly, "These amazing results make people a little nervous."

Paper address


This is also the first time in history that almost anyone can access "AI virology experts", which will greatly reduce the threshold for manufacturing biological weapons.


In the latest ARC-AGI test, o3 (medium) achieved a new SOTA score, and the cost was only 1/20 (US$1.5 per task ≈ 11 yuan)


If timely action is not taken, AI may become a black hole that destroys civilization.


AI breaks down the barriers to virology

For a long time, virology knowledge has usually been confined to a small group of professionals.

To become a top expert in the field of virology, it takes years of academic training and multiple degree certifications.

Even in public, professional literature is full of terminology, which makes laymen discouraged. However, the rapid development of AI is breaking down this barrier.



In addition, experiments involving biosafety level 3 (BSL-3) pathogens, such as SARS, anthrax, and H5N1 influenza, require approval processes, including facility certification, safety permits, professional training, and continuous medical monitoring.

It is these high thresholds that effectively limit the number of people who have dual-use knowledge in virology and reduce the risk of misuse.

However, the accelerated development of AI is breaking down this barrier - not only bringing these professional skills to ordinary people, but may even provide convenience for malicious people.


o3Accuracy rate of 43.8%, surpassing human experts

As mentioned earlier, in the latest study, multiple institutions jointly developed the VCT benchmark test, which is specifically designed to evaluate the actual operational capabilities of top LLMs in the field of virology.

The 322 multiple-choice questions designed by 57 virologists were inspired by specific problems encountered in their own experiments, and the answers could not be obtained through simple searches.

Below is a typical VCT question, which describes a scenario and can only be solved with visual information, and finally determines which of the 7 answer statements provided are correct.



The entire VCT benchmark focuses on practical, field-specific virology knowledge, while excluding basic topics common to all disciplines of biology and content that clearly uses dual purposes.

As shown in the figure below, the horizontal axis represents the increase in potential for abuse, and the vertical axis represents the level of knowledge abstraction (highly conceptual to highly practical).



In the experiment, the research team selected a series of cutting-edge models to participate in the VCT evaluation, including multimodal models from OpenAI, Google, Anthropic, and plain text DeepSeek-R1 and o3-mini models.

The results show that the problem-solving ability of large models in wet labs has surpassed that of doctoral-level virologists.


Specifically, GPT-4o outperformed 53% of experts, Gemini 1.5 Pro surpassed 67% of experts, Claude Sonnet 3.5 was 75%, and o1 reached 89%.

More importantly, o3 was the most impressive among all models, with an accuracy of 43.8%, beating 94% of human experts.



In addition to GPT-4o, these top AIs also outperformed the average score of human experts (22.1%) in professional fields.

In addition, the researchers compared the models with individual experts and then ranked the models in the entire expert pool.

As shown in Figure B below, all models scored higher than the median human expert, and OpenAI's o3 even surpassed 34 of the 36 experts, ranking 94% among virologists.



AI is rising in STEM

In a long analysis by AI expert Dan Hendrycks, it is said that the results of VCT are not isolated phenomena.

In recent years, the performance of Frontier LLM in STEM subjects such as mathematics, physics, and biological sciences has continued to improve, especially in the field of biological sciences.

For example, in the "Weapons of Mass Destruction Proxies" (WMDP) test, o1 scored as high as 87%, far exceeding the 60% benchmark of human experts.

Other tests, such as ProtocolQA and BioLP-bench, show that AI is close to or even exceeds human experts in reasoning and troubleshooting of biological laboratory protocols.



As part of STEM, virology's knowledge system is no exception for AI. If AI has reached the doctoral level in other disciplines, it is also the case in the field of virology.


Biosafety alarm bells ringing

The problem is that virology knowledge is dual-use - a doctoral-level virologist can promote medical progress and also make biological weapons.

The risk of biological weapons depends mainly on three points:

  • The number of people who master the skills
  • The intention to make weapons
  • The potential harm of weapons


Now, AI is rapidly amplifying the first factor.

The picture is from the Internet.
If there is any infringement, please contact the platform to delete it.