Given that the goal of developing a generative artificial intelligence (GenAI) model is to take human instructions and provide a helpful app, what happens if those human instructions are malicious? That was the question raised during a demonstration of AI vulnerabilities presented at the Centre for Emerging Technology and Security (CETaS) Showcase 2025 event in London.
“A language model is designed to summarise large amounts of information,” said Matthew Sutton, solution architect at Advai. “The aim is to give it as much test information as possible and let it handle that data.”
Sutton raised the question of what would happen if someone using a large language model (LLM) asked it to produce disinformation or harmful content, or reveal sensitive information. “What happens if you ask the model to produce malicious code, then go and execute it, or attempt to steal somebody’s data?” he said.
During the demo, Sutton discussed the inherent risk of using retrieval augmented generation (RAG) that has access to a corpus of corporate data. The general idea behind using a RAG system is to provide context that is then combined with external inference from an AI model.
“If you go to ChatGPT and ask it to summarise your emails, for example, it will have no idea what you’re talking about,” he said. “A RAG system takes external context as information, whether that be documents, external websites or your emails.”
According to Sutton, an attacker could use the fact that the AI system reads email messages and documents stored internally to place malicious instructions in an email message, document or website. He said these instructions are then picked up by the AI model, which enables the harmful instruction to be executed.
“Large language models give you this ability to interact with things through natural language,” said Sutton. “It’s designed to be as easy as possible, and so from an adversary point of view, this means that it is easier and has a lower entry barrier to create logic instructions.”
This, according to Sutton, means anybody who wants to disrupt a corporate IT system could look at how they could use an indirect prompt injection attack to insert instructions hidden in normal business correspondence.
If an employee is interacting directly with the model and the harmful instructions have found their way into the corporate AI system, then the model may present harmful or misleading content to that person.
For example, he said people who submit bids for new project work could provide instructions hidden in their bid, knowing that large language model will be used to summarise the text of their submission, which could be used to influence their bid more positively than rival bids, or instruct the LLM to ignore other bids.
For Sutton, this means there is quite a broad range of people that have the means to influence an organisation’s tender process. “You don’t need to be a high-level programmer to put in things like that,” he said.
From an IT security perspective, Sutton said an indirect prompt injection attack means people need to be cognisant as to the information being provided to the AI system, since this data is not always reliable.
Generally, the output from an LLM is an answer to a query followed by additional contextual information, that shows the users how the information is referenced to output the answer. Sutton pointed out that people should question the reliability of this contextual information, but noted that it would be unrealistic and undermine the usefulness of an LLM if people had to check the context every single time it generated a response.
#attack #corporate #decisionmaking