Record Linkage - Mathematical Model

Mathematical Model

In an application with two files, A and B, denote the rows (records) by in file A and in file B. Assign characteristics to each record. The set of records that represent identical entities is defined by

and the complement of set, namely set representing different entities is defined as

.

A vector, is defined, that contains the coded agreements and disagreements on each characteristic:

where is a subscript for the characteristics (sex, age, marital status, etc.) in the files. The conditional probabilities of observing a specific vector given, are defined as

 m(\gamma) = P \left\{ \gamma \left | (a,b) \in M \right\} = \sum_{(a, b) \in M} P \left\{\gamma\left \right\} \cdot P \left

and

 u(\gamma) = P \left\{ \gamma \left | (a,b) \in U \right\} = \sum_{(a, b) \in U} P \left\{\gamma\left \right\} \cdot P \left,
respectively.

Read more about this topic:  Record Linkage

Famous quotes containing the words mathematical and/or model:

    The circumstances of human society are too complicated to be submitted to the rigour of mathematical calculation.
    Marquis De Custine (1790–1857)

    AIDS occupies such a large part in our awareness because of what it has been taken to represent. It seems the very model of all the catastrophes privileged populations feel await them.
    Susan Sontag (b. 1933)