False Positive “Rates” of Data Loss Prevention (DLP) Solutions
I saw an interesting request posted in a DLP discussion group today asking for the false positive rates for some of the top DLP products in the marketplace. (Just the question itself, I think, goes to prove that the DLP space is still misunderstood by a lot of would-be DLP users.)
Oh, that it were that easy to have someone provide the “official” false positive rates of each vendor and go and buy the vendor with the lowest false positive rate. Not only are false positive rates of DLP vendor products impossible to effectively and fairly determine, but the question seems to over-simplify the whole idea of DLP as it discounts dozens of other critical criteria for selecting the right DLP product.
A Note About False Positive Rates
The question of false positives was one of the early complaints about first-to-market DLP technologies. False positives cast a negative shadow on DLP technologies because of user experience with other commonly-used security technologies. What added more to the concern was the idea that a false positive could have the unintended effect of hobbling business efficiency. I have heard horror stories of business production being shut down single-handedly by DLP enforcement technologies. While the effect is possible, it’s hardly likely if today’s legitimate DLP technologies are configured and used effectively in the enterprise. (Maybe a specific post on that at a later time…?)
Unfortunately, while false positives still occur with DLP, many DLP detractors beat that drum with the assumption that false positives will undermine the effectiveness of DLP in general. Too often, these detractors make such accusations without first-hand experience with legitimate, comprehensive DLP technologies.
By way of example, many of my customers have used content monitoring technologies of various email security platforms in what they then considered to be DLP. You can’t really blame them for expecting these solutions to effectively prevent sensitive data from leaving the network since almost all email security platforms use the term “Data Loss Prevention (DLP)” in marketing literature. The difference is that these solutions are limited in how they detect sensitive data. They rely almost wholly on regular expression patterns for identifying this data, so throw in a pattern for a US SSN and lo and behold, you get a bunch of false positives. (That’s why I hate how the term DLP is so loosely applied to all kinds of security technologies.)
The good news is that today’s legitimate DLP technologies rely on far more effective means of sensitive data detection, including exact data matching. This methodology makes a fingerprint of the known sensitive data (whether that’s sensitive database fields or complete documents) and detects actual matches to these fingerprints. This, along with a number of other detection methods, effectively reduce false positives to next to nothing when used correctly. This is *the* advantage of legitimate DLP technologies over technologies that include DLP as a feature.
My recommendation is to let legitimate DLP technologies do what they do best: detect and deal with sensitive data. Let the email security solutions of the world do what they’re good at.
Determining False Positive Rates
I also contend that it’s terribly difficult, if not impossible, to get fair and accurate data on the false positive rates of the major DLP vendors. Here’s why:
Legitimate DLP vendors use very similar data detection methods. Not all, but, most combine a) regular expression patterns (SSN or credit card number pattern matches); b) data fingerprinting (hashes of specific known sensitive data, database fields, files, etc.) and c) content analysis techniques (in its many varied forms). Between a combination of these technologies, it’s likely that each DLP technology can be tuned to accurately detect the same stuff as the next guy. The problem for fair and accurate testing, however, requires that tuning be performed over a period of time longer than most test are willing to run.
This also means that users will likely be forced to rely on the studies paid for by the DLP vendors themselves. Not exactly what I would consider to be fair and accurate reporting of false positive rates.
In the end, because every customer has different data, they will need to test and determine the best solution for their specific needs. There are DLP vendors that, because of their specific detection methods, may handle certain data types better than others. That’s why it’s critical to always understand your sensitive data and then seek a solution that matches your needs.
What Could Be More Important Than Accuracy?
Really, sensitive data detection accuracy is the most critical component of effective DLP. However, there are so many other criteria for selecting the right solution, including coverage areas (gateway, endpoint, discovery), appliance versus software, tolerance for architectural complexity, etc.
All the effectiveness in the world won’t do a bit of good if the platform is too complex for your organization to manage or if it doesn’t provide the coverage you need.
Ultimately, do your homework. But do not get bogged down with this idea of having to know false positive rates of each vendor. If you wait to move on your DLP project until you get this data, you’ll be waiting a long, long time.