No LLM is safe, warns the British institute


The UK AI Safety Institute has extensively tested 4 LLMs without specifying which. No one seriously opposes attempts to bypass their security (jailbreak). (Photo: Solen Feyissa / Unsplash)

Tests by the British Institute for AI Safety show both the operational limits of LLMs, but also their dangers. No one is immune to jailbreak attacks.

AdvertisingFor the UK’s AI Safety Institute (AISI), the safeguards integrated into five major language models (LLM), published by recognized institutions or companies and already available, are all ineffective. The models, anonymized by AISI, were evaluated by measuring the agreement, accuracy and completeness of the answers they provide. These tests were conducted using the institute’s model evaluation framework, called Inspection and released as open source earlier this month.

All LLMs tested remain highly vulnerable to basic jailbreaks, and some will produce potentially harmful results even in the absence of specific attempts to bypass their protections, the institute explains in his report. Remember that jailbreaks aim to bypass the protection measures implemented by the designers of LLM using appropriate prompts.

Cyber ​​Attacks: LLMs at Secondary Level

Found in the crowd at the first AI Safety Summit, held at Bletchley Park (where German codes were broken by Alan Turing’s World War II team) last November, the AI ​​Safety Institute is testing LLMs in several directions: possible use to facilitate cyber attacks , capabilities to provide expert-level knowledge in biology and chemistry (which can be used for malicious purposes), the execution of sequences of actions that prove difficult to control by a human (acting in agents), and finally vulnerability to jailbreaks.

In detail, the results published by the institute either prove to be quite worrying or point to the operational limits of the models. For example, in cybersecurity, publicly available models are capable of solving simple challenges, such as Capture The Flag (CTF), at the level of those intended for high school students, but have difficulty solving more complex university-level problems, writes the institute. Same limitations regarding autonomous, agent-type behavior (consisting of chaining tasks without human intervention). In this regard, two of the tested models are capable of performing simple sequences, especially in software development. However, more complex issues (such as software R&D) remain beyond the reach of all the LLMs studied.

Jailbreak: all LLMs give in easily

More worryingly, none of the tested models really resist jailbreaks. LLMs aren’t even 100% safe when the prompt directly asks for potentially dangerous information without even trying to bypass the safeguards the designers put in place! One of the tested LLMs responded positively to these directions in 28% of cases. And all models succumb to jailbreak attacks designed to bypass their meager defenses, especially when these are repeated. The attacks are relatively basic in the sense that they directly insert the question into a prompt template or follow a procedure with few steps to generate specific prompts, writes AISI.

AdvertisingLLMs still pose a public safety concern because of their skills in biology and chemistry. Tested by AISI on 600 questions written by experts and covering knowledge and skills particularly relevant in a security context, several LLMs demonstrate an expert level in chemistry and biology equivalent to that of a professional with a doctorate in these specialties.

Share this article

Source link

Leave a Comment