Nabugu - stock.adobe.com
Explore mitigation strategies for 10 LLM vulnerabilities
As large language models enter more enterprise environments, it's essential for organizations to understand the associated security risks and how to mitigate them.
The public release of ChatGPT marked a significant milestone in AI, paving the way for a wide range of consumer and enterprise applications.
Today, many organizations are looking to integrate LLMs, like OpenAI's GPT-4, into their business processes. But, despite LLMs' potential benefits in business scenarios, their adoption also introduces numerous security challenges.
The Open Web Application Security Project (OWASP) is a nonprofit foundation that has taken a leading role in establishing security best practices, including for generative AI systems. OWASP gathered international experts to define the OWASP Top 10 list of security vulnerabilities for LLM applications, along with suggested mitigations. Review each type of cyberthreat to learn how organizations can address security risks when working with LLMs.
OWASP Top 10 vulnerabilities for LLM applications
The following is the OWASP Top 10 list of LLM security vulnerabilities:
- Prompt injection.
- Insecure output handling.
- Training data poisoning.
- Model DoS.
- Supply chain vulnerabilities.
- Sensitive information disclosure.
- Insecure plugin design.
- Excessive agency.
- Overreliance.
- Model theft.
1. Prompt injection
The most common type of cyberattack against LLMs, known as prompt injection, manipulates the LLM to ignore its moderation guidelines and instead execute a hacker's instructions. In a prompt injection attack, threat actors craft inputs that trick an LLM into performing malicious actions, such as revealing sensitive data or generating malware code and phishing email text.
There are two types of prompt injection attacks: direct and indirect. In a direct attack, the threat actor interacts directly with the LLM to extract sensitive information or take some other malicious action. Indirect attacks take a more roundabout approach, such as asking the chatbot to summarize a webpage containing malicious code. When the LLM begins the summarization process, the malicious code within the page executes.
Take the following actions to mitigate the risk of prompt injection attacks:
- Limit LLM access to authorized personnel. For example, use an identity and access management tool to govern access to sensitive resources within the IT environment.
- Require human consent before applying critical changes to the LLM. Attempts to access a model's internal data or other vital functions should be approved by a human administrator first, not allowed automatically.
2. Insecure output handling
This type of attack occurs when the LLM's output feeds information or instructions to other back-end systems without scrutiny. If threat actors manage to compromise the LLM, they can instruct it to create malicious output in response to user inputs. This malicious output can then be used to launch further attacks, such as cross-site scripting, cross-site request forgery, server-side request forgery, privilege escalation and remote code execution, on other systems or client-side devices.
Take the following actions to mitigate the risk of insecure output handling:
- Apply a zero-trust approach when handling output from LLMs, similar to best practices for handling user inputs.
- Implement the recommended OWASP input validation strategies in LLM applications, ensuring that only properly validated data enters IT systems.
- Perform output encoding for all LLM output that could be interpreted as programming code. For example, convert characters in LLM output that have special meaning in HTML and JavaScript into their equivalent HTML entities before sending the output to a user web browser so that the browser interprets them as plain text, not executable code.
3. Data poisoning
LLMs are trained on large volumes of data, which can be acquired from a range of sources:
- The internet, including forums, social media platforms, job listings, news and weather.
- Companies' interactions with customers and employees.
- Finance, marketing and sales data.
- Public data sets, such as the UC Irvine Machine Learning Repository, which contains 657 data sets across different domains.
By introducing malicious data into a machine learning model's training set, malicious actors can cause the model to learn harmful patterns, a type of attack known as data poisoning.
Take the following actions to mitigate the risk of data poisoning:
- Verify the sources of training data, especially data sets from external providers.
- Define boundaries for the LLM's training data. This ensures that the model does not scrape content from unintended places that might contain malicious or poisoned content.
- Train different models for different tasks instead of using a single model for everything. For example, if developing a self-driving car, consider different models for various tasks, such as a use case for parking and another for driving at night versus in the daytime.
4. Model DoS
DoS and DDoS attacks occur when threat actors flood an online service with a large volume of false traffic, interrupting its operations. In the case of LLM DoS attacks, attackers attempt to overwhelm the target LLM's infrastructure.
LLMs need hardware and software resources to run and deliver their services to end users. These resources are limited, and depleting them can significantly degrade the model's performance or even cause it to cease functioning entirely. For example, attackers could attempt to exhaust an LLM's computational resources by bombarding it with long, repeated queries.
Take the following actions to mitigate the risk of model DoS attacks:
- Validate data type, formatting, length and content for all user inputs to the LLM.
- Apply rate limiting, allowing only a specific number of input queries from a particular user or IP address.
- Restrict input length to prevent users from feeding the model excessively long queries that require heavy compute resources.
- Filter suspicious inputs, such as SQL injection and code execution attempts, and prevent them from entering the LLM's processing engine.
- When there is high traffic volume, spread the load across multiple servers.
- Protect the model with CAPTCHAs, honeypots and behavioral analysis to deter users from employing automated tools to send a large number of input queries in a short time.
5. Supply chain vulnerabilities
Like any software, LLMs consist of various components, typically sourced from multiple vendors. And the LLM supply chain does not stop at software components and training data sets; it also encompasses the underlying IT infrastructure used to host and run the model, which can also be exploited.
Threat actors can exploit security vulnerabilities in any component of the LLM supply chain to perform malicious activities. These could include accessing sensitive model data, manipulating the model to produce an incorrect response, or forcing the model to malfunction or fail.
Take the following actions to mitigate the risk of LLM supply chain vulnerabilities:
- Only use software components from trusted vendors.
- Carefully test plugins before integrating them into the LLM application.
- Only use training data sets from trusted sources, and use hashing and checksums to validate the integrity of these data sets.
- Use sandboxing to test model components before incorporating them into the system.
- Use encryption to protect model data at rest and in transit.
6. Sensitive information disclosure
LLM applications can inadvertently reveal sensitive information, including training data, algorithmic architecture and other proprietary details, through their outputs. Users of LLM applications should understand that sensitive information they send to the LLM could be revealed elsewhere when other customers use the LLM.
Take the following actions to mitigate the risk of sensitive information disclosure:
- Implement data sanitization to prevent sensitive data from entering the training data sets.
- Validate model input to prevent attackers from extracting sensitive data using input injection.
- Create a clear model usage policy that lets users opt out if they do not want their personal data to be used to train the model.
- Limit the model's access to external data sources.
7. Insecure plugin design
To overcome certain limitations encountered when using LLMs for specific tasks, software developers have begun building specialized LLM plugins. For example, a plugin might enable an LLM to access up-to-date data in real time by browsing the internet or accessing internal network hosting data.
The downside of LLM plugins is their vulnerability to various attacks. For example, threat actors can exploit these plugins by supplying malicious input to exfiltrate data, execute remote code or conduct privilege escalation attacks.
Take the following actions to mitigate the risk of insecure plugin design:
- Apply stringent input validation measures within plugins to prevent users from submitting malicious inputs.
- Implement the necessary authentication and authorization measures to prevent plugin access to sensitive resources -- for example, using OAuth 2.0.
- Run plugins in an isolated environment independent from the environment used to run the LLM, effectively blocking unauthorized access to sensitive resources.
- Apply encryption in transit to secure the communication channels between plugins and the LLM.
- Regularly audit plugin security to discover and fix vulnerabilities in code before they can be exploited by threat actors.
8. Excessive agency
Excessive agency is a security vulnerability in which the LLM executes malicious or unsafe actions based on unexpected output. The ambiguous output could stem from several sources, such as input injection, malicious plugins, misconfigurations in the environment running the LLM and errors in the LLM's implementation.
Take the following actions to mitigate the risk of excessive agency:
- Include only the features that the LLM application needs to fulfill its main purpose, as excessive functionality can introduce new vulnerabilities.
- Limit permissions and greater access rights to only ones required for the LLM application to perform its tasks to reduce the risk of any excessive permissions.
- Ensure that the LLM application cannot make decisions independently without returning to the user for input. An example of excessive autonomy is an AI-powered smart home system that automatically regulates temperature and lighting settings without requiring user consent or intervention.
9. Overreliance
This issue extends beyond LLMs and pertains more to human behavior. Overreliance occurs when users excessively depend on LLM outputs for decision-making without critically assessing the model's responses.
As discussed above, LLMs can produce inaccurate or harmful results for various reasons. For instance, consider a scenario where a software developer uses LLM-generated code in writing their program. If the generated code contains errors or vulnerabilities, integrating it into the developer's project could compromise that program's security.
Take the following actions to mitigate the risk of overreliance on LLMs:
- Review LLM outputs carefully, and employ human oversight, especially when using LLMs to perform critical tasks.
- Verify LLM output with trusted sources.
- Regularly audit the LLM's output and performance to reveal and correct biased or unsafe response patterns.
10. Model theft
Model theft occurs when threat actors gain unauthorized access to machine learning model files or other proprietary information. The attackers' aim is typically to create a cloned version of the LLM that they can then use to their advantage or to steal sensitive information stored within the model.
Take the following actions to mitigate the risk of model theft:
- Apply a strong authentication mechanism to govern access to LLM files and training data.
- Encrypt model data and code.
- Store the model in a secure physical environment.
- Use a data loss prevention tool to prevent unauthorized transfer of model files outside an organization's IT systems.
- Store the LLM in a separate, isolated network segment to prevent unauthorized access to its files.
- Apply code obfuscation to conceal critical model parameters.
Nihad A. Hassan is an independent cybersecurity consultant, expert in digital forensics and cyber open source intelligence, blogger and book author. Hassan has been actively researching various areas of information security for more than 15 years and has developed numerous cybersecurity education courses and technical guides.