A landmark case of AI data poisoning — security researchers discovered that Deepseek DeepThink (R1) models had been compromised through deliberately planted jailbreak instructions in their training data. The attack, which allowed the model to bypass safety constraints via a specific prompt referencing "@elder_plinius," validated predictions made six months earlier by researcher Dominick Romano about the vulnerability of AI training pipelines.
The attack vector materialised through a specific prompt referencing "@elder_plinius" and "liberating AI God mode models," which enabled the model to bypass its safety constraints without requiring internet connectivity. This capability was traced back to the model having been trained on a crafted jailbreak repository, confirming Romano's July 2024 hypothesis about the six-month latency period between data poisoning and its manifestation in production models.
The technical mechanics of this breach involved four crucial stages: initial injection of malicious prompts, incorporation during model training and fine-tuning, dormancy until specific trigger conditions, and eventual activation through targeted prompting. The success of this attack highlighted critical vulnerabilities in current data collection and verification processes, particularly in handling large-scale text collections where subtle malicious instructions can evade standard filtering mechanisms.
Poison in the Pipeline: Liberating models with Basilisk Venom https://0din.ai/blog/poison-in-the-pipeline-liberating-models-with-basilisk-venom
The attack vector materialised through a specific prompt referencing "@elder_plinius" and "liberating AI God mode models," which enabled the model to bypass its safety constraints without requiring internet connectivity. This capability was traced back to the model having been trained on a crafted jailbreak repository, confirming Romano's July 2024 hypothesis about the six-month latency period between data poisoning and its manifestation in production models.
The technical mechanics of this breach involved four crucial stages: initial injection of malicious prompts, incorporation during model training and fine-tuning, dormancy until specific trigger conditions, and eventual activation through targeted prompting. The success of this attack highlighted critical vulnerabilities in current data collection and verification processes, particularly in handling large-scale text collections where subtle malicious instructions can evade standard filtering mechanisms.
Poison in the Pipeline: Liberating models with Basilisk Venom https://0din.ai/blog/poison-in-the-pipeline-liberating-models-with-basilisk-venom