The "R1" model from the Chinese company "Deep Seek" experienced a major failure during security testing performed together by Cisco researchers along with the University of Pennsylvania research team in their attempt to stop malicious content. According to the report from Interesting Engineering the model displayed 100% success rate thus failing to stop any damaging requests.
The "Deep Seek" organization achieved widespread industry recognition when it launched its new chatbot because of its high efficiency and economical price relative to market competitors. The "R1" model development required $6 million in investment yet major companies spent billions more on their models from "OpenAI," "Meta," and "Google."
The "Deep Seek" model incorporated prompt chaining and reward modeling and distillation technology after which the team developed a model more efficient than traditional large language models that maintained high performance. Security experts identified weaknesses in the model during the Cisco report even after implementing various different techniques.
The research team concluded that the affordable training approaches of 'Deep Seek' consisting of reinforcement learning combined with prompt chaining as well as distillation technology have likely compromised the security elements of its system.
The tests conducted by the research team
The research team conducted their assessments through "Algorithmic Jailbreak" because this technique helps uncover AI system vulnerabilities through security-bypass prompts.
The team employed "Deep Seek" to conduct 50 Harmbench tests as part of evaluations that measure large language models' capacity for generating dangerous content.
According to team statements the "Harmbench" standard features 400 behaviors distributed between 7 harmful categories which include cybercrimes plus misinformation together with illegal activities and general harm behavior.
The evaluation results for "Deep Seek" were concerning because it reached perfect attack success against every attempt. The model reacted with unsafe outputs everytime a harmful prompt was entered but failed to detect any threat which aligned with just one of the seven categories. Research data indicates other leading AI systems managed to reject some unsafe instructions.
The evaluation conducted by the research team demonstrated that "Llama 3.1-405B" experienced a 96% attack success rate alongside "GPT-4o" achieving an 86% rate and "Gemini 1.5 Pro" reaching 64% success but "Claude 3.5" exhibited a 36% rate and "O1" settled for a 26% rate.
These models demonstrate built-in security features that stop their exploitation by hackers for building dangerous operational content. The protection mechanism featured in "Deep Seek" seems to be absent from the model's design.
Research estimates that Deep Seek created a performance-to-safety trade-off in its system when its approach was analyzed. The development of a high-performance model through cost-saving measures seems to have hindered the implementation of sufficient safety protection systems.
The widespread launch of the Chinese AI model triggered substantial criticism from across the nation. Semi Analysis an independent research firm estimated a $1.3 billion cost for training the model yet Tencent estimates the training expenses at $6 million.