Deserialization considered harmful: the security case for persistent objects

I’ve done a fair amount of work on persistent object systems, starting with the Thor distributed storage system and more recently, the Fabric system. I used to think the point of persistent object systems was to make programming easier. Lately I think security might be an even stronger argument.

For programmers, the great thing about persistent objects is that your persistent data looks almost exactly like your nonpersistent data: it is packaged up into objects that don’t require any special coding style to create and modify. These systems save you the trouble of writing often tedious deserialization/unmarshaling code whose purpose is to translate from the persistent representation into the in-memory representation. Object-relational mapping (ORM) systems (e.g, Hibernate, Django) have become highly popular for the same reason: they reduce the impedance mismatch between the in-memory and persistent representations. (However, ORMs have some weaknesses in programmability and performance compared to “real” persistent object systems.)

One standard argument against persistent objects is that serializing and deserializing data cleans the data, by fixing inconsistencies or simply by discarding garbage. But this argument ignores the danger of deserialization code—in practice, deserialization is a rich source of security vulnerabilities. Deserializers are, in essence, parsers. And they are parsers typically written in an ad-hoc style rather than by using higher-level tools such as parser generators. Ad-hoc parsers are very tricky to get right, especially when the parsed data might be influenced by an adversary, and especially dangerous when written in a type-unsafe language.

This is why I’ve always believed that to get to secure computing systems, it is essential that we raise the level of abstraction. Programmers should avoid, to the extent possible, writing code whose purpose is simply to transform data between different representations. Simpler code is easier to reason about and to make secure, and eliminating low-level data transformations avoids opportunities to insert security vulnerabilities.

Advertisements