Statistical Database - Security in Statistical Databases

Security in Statistical Databases

In a statistical database, it is often desired to allow query access only to aggregate data, not individual records. Securing such a database is a difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual.

Some common approaches are:

  • only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.)
  • rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k)
  • return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.)
  • don't allow overly selective WHERE clauses
  • audit all users queries, so users using system incorrectly can be investigated
  • use intelligent agents to detect automatically inappropriate system use

Research in this area has largely stalled; reference 3 below showed that, in general, securing statistical databases was an impossible aim: if they were open to legitimate use, they were also open to abuse; and if they were restricted so tightly as to be incapable of abuse, they would then be useless for practical statistical purposes. To quote:

The conclusion is that statistical databases are almost always subject to compromise. Severe restrictions on allowable query set sizes will render the database useless as a source of statistical information but will not secure the confidential records.

Read more about this topic:  Statistical Database

Famous quotes containing the word security:

    I feel a sincere wish indeed to see our government brought back to it’s republican principles, to see that kind of government firmly fixed, to which my whole life has been devoted. I hope we shall now see it so established, as that when I retire, it may be under full security that we are to continue free and happy.
    Thomas Jefferson (1743–1826)