SDTM - Datasets and Domains

Datasets and Domains

Observations are normally collected for all subjects in a series of domains. A domain is defined as a collection of logically-related observations with a topic-specific commonality about the subjects in the trial. The logic of the relationship may relate to the scientific subject matter of the data, or to its role in the trial.

Typically, each domain is represented by a dataset, but it is possible to have information relevant to the same topicality spread among multiple datasets. Each dataset is distinguished by a unique, two-character DOMAIN code that should be used consistently throughout the submission. This DOMAIN code is used in the dataset name, the value of the DOMAIN variable within that dataset, and as a prefix for most variable names in the dataset.

The dataset structure for observations is a flat file representing a table with one or more rows and columns. Normally, one dataset is submitted for each domain. Each row of the dataset represents a single observation and each column represents one of the variables. Each dataset or table is accompanied by metadata definitions that provide information about the variables used in the dataset. The metadata are described in a data definition document named 'Define' that is submitted along with the data to regulatory authorities.

Submission Metadata Model uses seven distinct metadata attributes to be defined for each dataset variable in the metadata definition document:

  • The Variable Name (limited to 8-characters for compatibility with the SAS System V5 Transport format)
  • A descriptive Variable Label, using up to 40 characters, which should be unique for each variable in the dataset
  • The data Type (e.g., whether the variable value is a character or numeric)
  • The set of controlled terminology for the value or the presentation format of the variable(Controlled Terms or Format)
  • The Origin or source of each variable
  • The Role of the variable, which determines how the variable is used in the dataset. Roles are used to represent the categories of variables as Identifier, Topic, Timing, or the five types of Qualifiers. Since these roles are predefined for all domains that follow the general classes, they do not need to be specified by sponsors in their Define data definition document. Actual submission metadata may use additional role designations, and more than one role may be assigned per variable to meet different needs.
  • Comments or other relevant information about the variable or its data.

Data stored in dataset variables include both raw (as originally collected) and derived values (e.g., converted into standard units, or computed on the basis of multiple values, such as an average). In SDTM only the name, label, and type are listed with a set of CDISC guidelines that provide a general description for each variable used by a general observation class.

Comments are included as necessary according to the needs of individual studies. The presence of an asterisk (*) in the 'Controlled Terms or Format' column indicates that a discrete set of values (controlled terminology) is expected to be made available for this variable. This set of values may be sponsor-defined in cases where standard vocabularies have not yet been defined (represented by a single *) or from an external published source such as MedDRA (represented by **). ↑

Read more about this topic:  SDTM

Famous quotes containing the word domains:

    I shall be a benefactor if I conquer some realms from the night, if I report to the gazettes anything transpiring about us at that season worthy of their attention,—if I can show men that there is some beauty awake while they are asleep,—if I add to the domains of poetry.
    Henry David Thoreau (1817–1862)