I’ve done a fair amount of work on persistent object systems, starting with the Thor distributed storage system and more recently, the Fabric system. For programmers, the great thing about this approach to persistence is that your persistent data looks almost exactly like your nonpersistent data: it can be packaged up into objects that don’t require any special coding style to create and modify. These systems save you the trouble of writing often tedious deserialization/unmarshaling code whose purpose is to translate from the persistent representation into the in-memory representation. Object-relational mapping (ORM) systems (e.g, Hibernate, Django) have become highly popular for the same reason: they reduce the impedance mismatch between the in-memory and persistent representations. (However, ORMs have some weaknesses in programmability and performance compared to a “real” persistent object systems.)
One standard argument against persistent objects is that there is virtue in having such code, because it cleans the data, by fixing inconsistencies or simply by discarding garbage. But this argument ignores the danger of deserialization code—in practice it’s a rich source of security vulnerabilities. Deserializers are, in essence, parsers. And they are parsers typically written in an ad hoc style rather than using higher-level tools such as parser generators. Ad-hoc parsers are very tricky to get right, especially when the parsed data might be influenced by an adversary, and especially dangerous when written in a type-unsafe language.
This is why I’ve always believed that to get to secure computing systems, it is essential that we raise the level of abstraction so programmers avoid, to the extent possible, writing code whose purpose is simply to transform data between different representations. Simpler code is easier to reason about and to make secure, and eliminating data transformations avoids opportunities to insert security vulnerabilities.