Append-only databases and the GDPR conundrum
The GDPR (General Data Protection Regulation) requires that customers and other individuals for whom you hold private data, have the right of erasure. That is, they can demand that you remove all of their details, assuming it is not required for either contractual or legal reasons.
For most database technologies this is not an issue. However, a significant number of NoSQL databases use an append-only approach. The advantage of append-only storage is that the database is “immutable”, it keeps an entire history of all the transactions that have been done. This is useful for log data, is recommended for Kappa architectures and has become widespread. In particular, HDFS, the bedrock of Hadoop, was designed in this fashion.
The way that these databases work is that, when data is deleted the data is flagged as “deleted” it is not actually removed from disk and, therefore, at least in theory, the data can still actually be accessed. In the case of a GDPR erasure request simply flagging the data as deleted – when it is still there in reality – is not likely to stand up in court. To truly remove data, you must use a workaround, typically by “dropping” an object. Unfortunately, objects tend to be large constructs (tables, partitions and so forth), which means that you would have to drop the whole customer table and append the new customer table without the erased data. This is clearly not feasible given that you may well have multiple requests for erasure every month.
The bottom line is that if you want to build a data lake with customer or consumer data in it, then you’d better opt for a platform that does not run on HDFS. MapR-DB is one such, but there are plenty of others.
From a vendor perspective, this means that append-only databases are effectively limited to – or, at least, should be limited to – targeting environments where private data is not involved. Do I think that the suppliers of such products will pay attention to this? No. On the other hand, do I think that the EU legislators who put GDPR in place, know what they were doing? No. They probably weren’t even aware of append-only databases. On a related note, did they think about database back-ups? That every time a record is erased, you should, technically, throw away all earlier back-ups? No, I don’t think that either. Is the law a ass? Very probably.