Methods
The most common approach of GWA studies is the case-control setup which compares two large groups of individuals, one healthy control group and one case group affected by a disease. All individuals in each group are genotyped for the majority of common known SNPs. The exact number of SNPs depends on the genotyping technology, but are typically one million or more. For each of these SNPs it is then investigated if the allele frequency is significantly altered between the case and the control group. In such setups, the fundamental unit for reporting effect sizes is the odds ratio. The odds ratio reports the ratio between two proportions, which in the context of GWA studies are the proportion of individuals in the case group having a specific allele, and the proportions of individuals in the control group having the same allele. When the allele frequency in the case group is much higher than in the control group, the odds ratio will be higher than 1, and vice versa for lower allele frequency. Additionally, a P-value for the significance of the odds ratio is typically calculated using a simple chi-squared test. Finding odds ratios that are significantly different from 1 is the objective of the GWA study because this shows that a SNP is associated with disease.
There are several variations to this case-control approach. A common alternative to case-control GWA studies is the analysis of quantitative phenotypic data, e.g. height or biomarker concentrations or even gene expression. Likewise, alternative statistics designed for dominance or recessive penetrance patterns can be used. Calculations are typically done using bioinformatics software such as PLINK, which also includes support for many of these alternative statistics.
In addition to the calculation of association, it is common to take several variables into account that could potentially confound the results. Sex and age are common examples of this. Moreover, it is also known that many genetic variations are associated with the geographical and historical populations in which the mutations first arose. Because of this association, studies must take account of the geographical and ethnical background of participants by controlling for what is called population stratification.
After odds ratios and P-values have been calculated for all SNPs, a common approach is to create a Manhattan plot. In the context of GWA studies, this plot shows the negative logarithm of the P-value as a function of genomic location. Thus the SNPs with the most significant association will stand out on the plot, usually as stacks of points because of haploblock structure. Importantly, the P-value threshold for significance is corrected for multiple testing issues. The exact threshold varies by study, but typically P-values must be very low (10 to the power of -7 or -8) to be considered significant in the face of the millions of tested SNPs. Modern GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort.
Read more about this topic: Genome-wide Association Study
Famous quotes containing the word methods:
“All good conversation, manners, and action, come from a spontaneity which forgets usages, and makes the moment great. Nature hates calculators; her methods are saltatory and impulsive.”
—Ralph Waldo Emerson (18031882)
“Generalization, especially risky generalization, is one of the chief methods by which knowledge proceeds... Safe generalizations are usually rather boring. Delete that usually rather. Safe generalizations are quite boring.”
—Joseph Epstein (b. 1937)
“It would be some advantage to live a primitive and frontier life, though in the midst of an outward civilization, if only to learn what are the gross necessaries of life and what methods have been taken to obtain them.”
—Henry David Thoreau (18171862)