Security in Statistical Databases
In a statistical database, it is often desired to allow query access only to aggregate data, not individual records. Securing such a database is a difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual.
Some common approaches are:
- only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.)
- rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k)
- return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.)
- don't allow overly selective WHERE clauses
- audit all users queries, so users using system incorrectly can be investigated
- use intelligent agents to detect automatically inappropriate system use
Research in this area has largely stalled; reference 3 below showed that, in general, securing statistical databases was an impossible aim: if they were open to legitimate use, they were also open to abuse; and if they were restricted so tightly as to be incapable of abuse, they would then be useless for practical statistical purposes. To quote:
- The conclusion is that statistical databases are almost always subject to compromise. Severe restrictions on allowable query set sizes will render the database useless as a source of statistical information but will not secure the confidential records.
Read more about this topic: Statistical Database
Famous quotes containing the word security:
“Is a Bill of Rights a security for [religious liberty]? If there were but one sect in America, a Bill of Rights would be a small protection for liberty.... Freedom derives from a multiplicity of sects, which pervade America, and which is the best and only security for religious liberty in any society. For where there is such a variety of sects, there cannot be a majority of any one sect to oppress and persecute the rest.”
—James Madison (17511836)