INSAIT and LatticeFlow study finds EU AI regulatory flaws in DeepSeek

Serious Compliance Gaps in Distilled DeepSeek Models According to the European Artificial Intelligence Act (EU AI Act)


An investigation by the INSAIT, Sofia University, conducted jointly with leading technology company LatticeFlow, has revealed significant compliance gaps in the distilled models of DeepSeek under the European Artificial Intelligence Act (EU AI Act). The distillation of large models like DeepSeek into smaller ones is a standard process that makes them more practical and efficient for businesses and organizations.


The study reminds us that INSAIT, together with Swiss university ETH Zurich and LatticeFlow, created COMPL-AI – the first framework in the EU that translates regulatory normative requirements into specific technical checks. Through this framework, some of the most popular artificial intelligence models are tested to assess their compliance with European rules (including those from OpenAI, Meta, Google, Anthropic, Mistral AI, and Alibaba).


Distilled DeepSeek models achieve good results in limiting toxic content but fall short in key regulatory aspects such as cybersecurity and bias management. This raises questions about their readiness for deployment in a corporate environment.
The assessment, conducted jointly with LatticeFlow AI, covers two of the most popular distilled DeepSeek models: DeepSeek R1 8B (based on Meta’s Llama 3.1 8B) and DeepSeek R1 14B (based on Alibaba’s Qwen 2.5 14B), both with nearly 400,000 downloads. The evaluation also compares DeepSeek models with those from OpenAI, Meta, Google, Anthropic, Mistral AI, Alibaba, and others.


Final results show that these DeepSeek models rank last among other tested models in terms of cybersecurity. They exhibit increased risks of “goal hijacking” and “prompt leakage” compared to their base versions. This can be problematic not only because it increases the likelihood of the AI model being misled into performing unintended actions (goal hijacking) but also because it increases the risk of disclosing confidential information (prompt leakage). Consequently, these weaknesses reduce the reliability of the models and make their use in secure business environments significantly riskier.


DeepSeek models also rank below average in terms of bias and display significantly greater prejudices than their base models. Bias assessment measures how objective, neutral, and fair an AI model’s responses are towards various social, cultural, ethnic, gender, and political groups. In the case of DeepSeek models, the results show they are below average or, in other words, the models exhibit stronger biases compared to other AI models tested by COMPL-AI. Moreover, they show significantly greater biases than their base models – meaning that during the modification process, DeepSeek models have worsened in this aspect compared to the original Llama 3.1 (Meta) and Qwen 2.5 (Alibaba) models. This could be problematic not only because they will generate unbalanced answers on sensitive topics but also promote misinformation across different subjects.


Despite these shortcomings, DeepSeek models demonstrate good results in managing toxicity, surpassing even their base versions. Toxicity assessment refers to the ability of a language model to identify, minimize, or prevent generating harmful, offensive, or inappropriate content. This includes content that may be racist, sexist, discriminatory, threatening, or otherwise harmful to users. In assessing DeepSeek models through COMPL-AI, it was found that they perform well in limiting toxic content, even better than their base models (Meta’s Llama 3.1 and Alibaba’s Qwen 2.5). This means they rarely generate unsuitable or offensive text, which is an important aspect of their compliance with the EU AI Act regulations.


Full results of the INSAIT and DeepSeek assessment are available at https://compl-ai.org.


COMPL-AI is the first open framework providing a technical interpretation of the European Artificial Intelligence Act (EU AI Act). Using 27 leading AI benchmarks, the platform offers a systematic evaluation of LLM models against regulatory requirements. So far, COMPL-AI has been used to assess models from OpenAI, Meta, Google, Anthropic, and Alibaba, providing unprecedented transparency regarding their compliance.