XML External Entity injection risks, also known as XXE attacks, are one of the most common security issues across applications, APIs, and microservices. Although the XXE family of vulnerabilities is not as popular as SQL injection or XSS attacks, it is present in the OWASP Top 10 ranking of risks, at the 2017:A4 position of the list.
The essence of this risk is a misconfiguration in server endpoints that accept XML as input from untrusted clients, particularly when the client can provide a custom document definition (DTD). Attackers abuse this aspect to inject custom XML payloads to extract data, build server-side request forgery (SSRF) or drive a denial of service (DoS) attempt. Frequently, the XML External Entity vulnerabilities hide as a dependency in third-party code.
After looking in detail at other OWASP Top 10 risks, such as Insecure Deserialization or Insecure Direct Object References (IDOR) as a type of the more general Broken Authentication risk, this blog post explains how XML External Entity injection attacks work and provides guidance on the prevention of such attacks.
Table of contents
- What are XXE Attacks?
- How to prevent XXE XML External Entity Injections?
- Manual XXE Prevention
- XXE Vulnerability detection
- WAFs are easy to bypass
- How instrumentation prevents XML External Entity attacks (XXE)
- How Hdiv can help
- XML External Entity (XXE): prevention takeaways
What are XXE Attacks?
XML External Entity vulnerabilities concern services that rely on XML as the messaging format, as opposed to the classic web format based on human-readable HTML.
An XML External Entity vulnerability occurs when the service that parses (or in simpler words, reads and processes) the XML messages sent by the client, accepts an external definition of the XML message itself. This message definition, known as external DTD, allows for extraordinary flexibility so that the sender and receiver can agree on new message formats during runtime. External DTD is designed to be used when both parties are trusted.

However, if untrusted clients are allowed to provide their own custom DTD, they can exploit this flexibility and prepare requests that bypass the server controls ultimately, resulting in serious breaches such as:
- Confidentiality issues: for example, accessing the server filesystem
- Integrity issues: execution of external code injected as part of the attack
- Availability issues: DoS attacks such as Billion Laughs
As a result of the popularity of XML-based communication formats, the impact of XXE vulnerabilities has been steadily increasing. A simple search of the term “xxe” on the Common Vulnerabilities and Exposures database (CVE) returns more than 500 distinct vulnerabilities involving this risk.
Libraries vulnerable to XML External Entity attacks
Many teams are not aware that their own applications include XML processing features because one of the most prevalent sources of this risk is hidden in third-party code that includes XML processing functionality. This third-party code can be open-source libraries, including web development frameworks (Spring, Struts, etc) and/or ancillary internal compiled packages. The XML processing dependencies can be deeply nested under two or more levels in the hierarchy, thus making it very hard to identify manually.
XXE attack examples
The basic structure involves the redefinition of an XML entity. The following is an example:
<!DOCTYPE Response [
<!ENTITY message 'Hello World'>
]>
<Response>&message;</Response>
In the code above, the XML document Response is defining a custom entity named “message” that can be referenced in the body by using the &message syntax, which effectively inserts “Hello World” in the body of Response.
This basic approach can be organized in different ways resulting in each of the three types of major breaches mentioned above.
In the case of accessing a private server file (a confidentiality breach) the attack looks like this:
<!DOCTYPE Response [
<!ENTITY message SYSTEM "file:///etc/passwd" >
]>
<Response>&message;</Response>
In the case of external execution, like the probing of local network resources, the corresponding attack follows:
<!DOCTYPE Response [
<!ENTITY message SYSTEM "https://192.168.1.1/private" >]>
]>
<Response>&message;</Response>
To compose a DoS availability attack, one can simply build a complex hierarchy based on the initial example, resulting in an exponential processing time:
<!DOCTYPE Response [
<!ENTITY ha "Ha!">
<!ENTITY ha2 "&ha; &ha;">
<!ENTITY ha3 "&ha2; &ha2;">
<!ENTITY ha4 "&ha3; &ha3;">
<!ENTITY ha5 "&ha4; &ha4;">
...
<!ENTITY ha128 "&ha127; &ha127;">
]>
<Response>&ha128;</Response>
The example above is a variant of the popular Billion Laughs attack. In this example it can be observed that once the entity definition has been completed to define the entire 128 entities “ha,” the execution of the document would produce 2^128 impressions of the message “Ha!”. Just a few of these concurrent requests, or even just one, can easily bring down a server due to memory exhaustion.
How to prevent XXE XML External Entity Injections?
Let’s review some of the different methods available to prevent XXE, including both manual configuration options and automated protection approaches.
Manual XXE Prevention
In order to avoid an XML External Entity vulnerability, teams must configure their XML parsers so that they don’t accept custom document definitions (DTDs). There are very few instances in which an application truly requires a custom DTD, so the functionality trade-off is small. However, the difficulty lies on the fact that each parser, in each programming language, has its own way of setting this configuration parameter. So, if one project includes multiple parses, each parser will have to be configured properly and manually.
Generally, using the Apache XML project definition features, the configuration should be something like this:
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
And again, each parser will have different feature names and characteristics meaning the relevant documentation should be reviewed. This OWASP XXE Cheat Sheet is a good place to start.
XXE Vulnerability detection
Static Application Security Tools (SASTs) are often used to detect XXE vulnerabilities. However, this approach is not ideal because XXE vulnerabilities do not follow a clear pattern making it difficult for SAST tools to correctly pinpoint the actual vulnerabilities, resulting in false positives.
We often see an approach based on flagging all XML parsers in the code, but with this approach there is no easy way for the tool to specifically find the misconfigured instances and single them out. As a result, a manual code review process will be necessary, which opens the door to human error.
Dynamic Application Security Testing tools (DASTs) probe applications by sending payloads and checking the results of the attack. DASTs are able to identify some XXE vulnerabilities, but as a requirement they must be able to identify all exposed endpoints. As APIs become more dynamic gradually evolving over time, it becomes increasingly difficult to keep track of them all.
WAFs are easy to bypass
External defenses like Web Application Firewalls (WAFs) struggle to stop XML External Entity attacks. Multiple levels of atypical encoding, such as UTF-16 and UTF-32, plus encryption dramatically hamper the WAF’s visibility of traffic.
Additionally, WAFs would have to completely parse and reconstruct the payloads to identify XXE attacks, which has significant performance implications, and in and of itself, constitutes the basis of a DoS vulnerability.
How instrumentation prevents XML External Entity attacks (XXE)
Application Server Instrumentation is a technique used to insert checkpoints in certains parts of code to monitor execution flow during runtime. Adding instrumented security sensors to your server provides excellent real time visibility of the application architecture and data for each request. The RASP and IAST product categories leverage code instrumentation to solve application security problems with ease.
Instrumentation is very valuable to prevent XXE attacks because it allows for automatic monitoring of certain key classes related to all the XML processing and validates any activity concerning external DTDs. As we described above, XML parsers can be part of third party code in your application. It is very easy to miss some of the parsers and endpoints in your application making a manual configuration process inherently dangerous. Instrumentation removes this manual verification process by automating the prevention of XML External Entity attacks.
Instrumentation also ensures that some of the most pernicious exploits are contained. For instance, instrumentation can prevent the execution of external code by limiting the time that a particular request is executing, resulting in the significant reduction of DoS attacks like Billion Laughs.
How Hdiv can help
Hdiv Protection RASP relies on the runtime instrumentation technique described in this post to keep your applications secure and protected from XML External Entity attacks. Adding the Hdiv Agent to your server instances removes the need to manually locate and configure the XML parsers packaged in your application. Hdiv Protection will prevent the exploitation of XXE vulnerabilities, including the examples cited above.
Hdiv protection covers not only your own source code, but also third-party code included in your binary (even dependencies provided as binaries) and also potential XXE vulnerabilities in the web development framework that your app uses, such as Spring MVC or Struts.
XML External Entity (XXE): prevention takeaways
Here is a quick formula that summarizes the steps that should be taken to prevent XML External Entity attacks:
- Avoid by design: choose APIs that use other formats such as JSON or YAML
- Watch out for dependencies: remember, third party code might be introducing XXE vulnerabilities
- Automate protection: eliminate the manual configuration process and ensure protection by relying on automatic security checkpoints based on instrumentation, such as Hdiv Protection
- Proper configuration: if it’s not possible to remove external DTDs by design or automate protection, then configure all your XML parsers to whitelist only trusted external DTDs
- Containment: properly configure the execution context of the server to limit external code execution or server file system browsing