Data Vault Modeling - Loading Practices

Loading Practices

The ETL for updating a Data Vault model is fairly straightforward (see Data Vault Series 5 – Loading Practices). First you have to load all the Hubs, creating surrogate ID's for any new business keys. Having done that, you can now resolve all business keys to surrogate ID's if you query the Hub. The second step is to resolve the Links between Hubs and create surrogate ID's for any new associations. At the same time, you can also create all satellites that are attached to Hubs, since you can resolve the key to a surrogate ID. Once you have created all the new Links with their surrogate keys, you can add the Satellites to all the Links.

Since the Hubs are not joined to each other except through Links, you can load all the Hubs in parallel. Since Links are not attached directly to each other, you can load all the Links in parallel as well. Since Satellites can be attached only to Hubs and Links, you can also load these in parallel.

The ETL is quite straightforward and lends itself to easy automation or templating. Problems occur only with Links relating to other Links, because resolving the business keys in the Link only leads to another Link that has to be resolved as well. Due to the equivalence of this situation with a Link to multiple Hubs, this difficulty can be avoided by remodeling such cases and this is in fact the recommended practice.

Data is never deleted from the Data Vault, unless you have a technical error while loading data.

Read more about this topic:  Data Vault Modeling

Famous quotes containing the words loading and/or practices:

    Nitrates and phosphates for ammunition. The seeds of war. They’re loading a full cargo of death. And when that ship takes it home, the world will die a little more.
    Earl Felton, and Richard Fleischer. Captain Nemo (James Mason)

    To learn a vocation, you also have to learn the frauds it practices and the promises it breaks.
    Mason Cooley (b. 1927)