Data Vault Modeling - History and Philosophy

History and Philosophy

In data warehouse modeling there are two well-known competing options for modeling the layer where the data is stored. Either you model according to Ralph Kimball, with conformed dimensions and an enterprise data bus, or you model according to Bill Inmon with the database normalized. Both techniques have issues when dealing with changes in the systems feeding the data warehouse. For conformed dimensions you also have to cleanse data (to conform it) and this is undesirable in a number of cases since this inevitably will lose information. Data Vault is designed to avoid or minimize the impact of those issues, by moving them to areas of the data warehouse that are outside the historical storage area (cleansing is done in the data marts) and by separating the structural items (business keys and the associations between the business keys) from the descriptive attributes.

Dan Linstedt, the creator of the method, describes the resulting database as follows:

The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise

Data Vault's philosophy is that all data is relevant data, even if it is not in line with established definitions and business rules. If data is not conforming to these definitions and rules then that is a problem for the business, not the data warehouse. The determination of data being "wrong" is an interpretation of the data that stems from a particular point of view that may not be valid for everyone, or at every point in time. Therefore the Data Vault must capture all data and only when reporting or extracting data from the Data Vault is the data being interpreted.

Another issue to which Data Vault is a response is that more and more there is a need for complete auditability and traceability of all the data in the datawarehouse. Due to Sarbanes-Oxley in the USA and similar measures in Europe this is a relevant topic for many business intelligence implementations, hence the focus of any Data Vault implementation on complete traceability and auditability of all information.

Data Vault 2.0 is the new specification. The new specification contains components which define the implementation best practices, the methodology (SEI/CMMI, Six Sigma, SDLC, etc..), the Architecture, and the model. Data Vault 2.0 has a focus on including new components such as Big Data, NoSQL - and also focuses on performance of the existing model. The old specification (documented here for the most part) is highly focused on Data Vault Modeling.

It is necessary to evolve the specification to include the new components, along with the best practices in order to keep the EDW and BI systems current with the needs and desires of today's businesses.

Read more about this topic:  Data Vault Modeling

Famous quotes containing the words history and, history and/or philosophy:

    All history and art are against us, but we still expect happiness in love.
    Mason Cooley (b. 1927)

    What is most interesting and valuable in it, however, is not the materials for the history of Pontiac, or Braddock, or the Northwest, which it furnishes; not the annals of the country, but the natural facts, or perennials, which are ever without date. When out of history the truth shall be extracted, it will have shed its dates like withered leaves.
    Henry David Thoreau (1817–1862)

    Of your philosophy you make no use
    If you give place to accidental evils.
    William Shakespeare (1564–1616)