Risk Calculations
The creation of a model to calculate the overall genetic risk involves two steps: i) conversion of odds-ratios for a single genetic variant into relative risk and ii) combination of risk from multiple variants in different genetic loci into a single relative risk value.
Deriving risk from odds-ratios
Most gene discovery studies for complex diseases that have been published to date in authoritative journals have employed a case-control design because of their retrospective setup. These studies sample and genotype a selected set of cases (people who have the specified disease condition) and control individuals. The interest is in genetic variants (alleles) which frequency in cases and controls differ significantly.
The results are typically reported in odds-ratios, that is the ratio between the fraction (probability) with the risk variant (carriers) versus the non-risk variant (non-carriers) in the groups of affected versus the controls, i.e. expressed in terms of probabilities conditional on the affection status:
OR = (Pr(c|A)/Pr(nc|A)) / (Pr(c|C)/Pr(nc|C))
It is however the absolute risk for the disease that we are interested in, i.e. the fraction of those individuals carrying the risk variant who get the disease or in other words the probability of getting the disease. This number cannot be directly measured in case-control studies, in part, because the ratio of cases versus controls is typically not the same as that in the general population. However, under certain assumption, we can estimate the risk from the odds-ratio.
It is well known that under the rare disease assumption, the relative risk of a disease can be approximated by the odds-ratio. This assumption may however not hold for many common diseases. Still, it turns out that the risk of one genotype variant relative to another can be estimated from the odds-ratio expressed above. The calculation is particularly simple under the assumption of random population controls where the controls are random samples from the same population as the cases, including affected people rather than being strictly unaffected individuals. To increase sample size and power, many of the large genome-wide association and replication studies used controls that were neither age-matched with the cases, nor were they carefully scrutinized to ensure that they did not have the disease at the time of the study. Hence, while not exactly, they often approximate a random sample from the general population. It is noted that this assumption is rarely expected to be satisfied exactly, but the risk estimates are usually robust to moderate deviations from this assumption.
In summary, the calculations (see appendix) show that for the dominant and the recessive models, where we have a risk variant carrier, “c”, and a non-carrier, “nc”, the odds-ratio of individuals is the same as the risk-ratio between these variants:
OR = Pr(A|c)/Pr(A|nc) = r
And likewise for the multiplicative model, where the risk is the product of the risk associated with the two allele copies, the allelic odds-ratio equals the risk factor:
OR = Pr(A|aa)/Pr(A|ab) = Pr(A|ab)/Pr(A|bb) = r
Here “a” denotes the risk allele and “b” the non-risk allele. The factor “r” is therefore the relative risk between the allele types.
For many of the studies published in the last few years, reporting common variants associated with complex diseases, the multiplicative model has been found to summarize the effect adequately and most often provide a fit to the data superior to alternative models such as the dominant and recessive models.
The risk relative to the average population risk
It is most convenient to represent the risk of a genetic variant relative to the average population since it makes it easier to communicate the lifetime risk for developing the disease compared with the baseline population risk. For example, in the multiplicative model we can calculate the relative population risk for variant “aa” as:
RR(aa) = Pr(A|aa)/Pr(A) = (Pr(A|aa)/Pr(A|bb))/(Pr(A)/Pr(A|bb)) = r2/(Pr(aa) r2 + Pr(ab) r + Pr(bb)) = r2/(p2 r2 + 2pq r + q2) = r2/R
Here “p” and “q” are the allele frequencies of “a” and “b” respectively. Likewise, we get that RR(ab) = r/R and RR(bb) = 1/R. The allele frequency estimates are obtained from the scientific publications that report the odds-ratios and from the HapMap database. Note that in the case where we do not know the genotypes of an individual, the relative genetic risk for that test or marker is simply equal to one.
As an example, in type-2 diabetes risk, allele T of the disease associated marker rs7903146 in the TCF7L2 gene on chromosome 10 has an allelic OR of 1.37 and a frequency (p) around 0.28 in non-Hispanic white populations. The genotype relative risk compared to genotype CC are estimated based on the multiplicative model.
For TT it is 1.37×1.37 = 1.88; for CT it is simply the OR 1.37, and for CC it is 1.0 by definition.
The frequency of allele C is q = 1 – p = 1 – 0.28 = 0.72. Population frequency of each of the three possible genotypes at this marker is:
Pr(TT) = p2 = 0.08, Pr(CT) = 2pq = 0.40, and Pr(CC) = q2 = 0.52
The average population risk relative to genotype CC (which is defined to have a risk of one) is:
R = 0.08×1.88 + 0.40×1.37 + 0.52×1 = 1.22
Therefore, the risk relative to the general population (RR) for individuals who have one of the following genotypes at this marker is:
RR(TT) = 1.88/1.22 = 1.54, RR(CT) = 1.37/1.22 = 1.12, RR(CC) = 1/1.22 = 0.82.
Combining the risk from multiple markers
When genotypes of many SNP variants are used to estimate the risk for an individual, unless otherwise stated, a multiplicative model for risk is assumed. This means that the combined genetic risk relative to the population is calculated as the product of the corresponding estimates for individual markers, e.g. for two markers g1 and g2:
RR(g1,g2) = RR(g1)RR(g2)
The underlying assumption is that the risk factors occur and behave independently, i.e. that the joint conditional probabilities can be represented as products:
Pr(A|g1,g2) = Pr(A|g1)Pr(A|g2)/Pr(A) and Pr(g1,g2) = Pr(g1)Pr(g2)
Obvious violations to this assumption are markers that are closely spaced on the genome, i.e. in linkage disequilibrium such that the concurrence of two or more risk alleles is correlated. In such cases, we use so called haplotype modeling where the odds-ratios are defined for all allele combinations of the correlated SNPs. For these cases, the SNPs are listed together in the risk variant table in the scientific details section of the disease.
As is in most situations where a statistical model is utilized, the model applied is not expected to be exactly true since it is not based on an underlying bio-physical model. However, the multiplicative model has so far been found to fit the data adequately, i.e. no significant deviations are detected for many common diseases for which many risk variants have been discovered. We are however prepared to make appropriate adjustments in the future if and when significant deviations from the multiplicative model are identified. Also, noted is that the calculations may not include all genetic risk markers (many of which may still be unknown) and do not include non-genetic risk factors such as environmental risk factors.
As an example, an individual who has the following genotypes at 4 markers associated with risk of type-2 diabetes along with the risk relative to the population at each marker:
| Chromo 3 | PPARG | CC | RR(CC) = 1.03 |
| Chromo 6 | CDKAL1 | GG | RR(GG) = 1.30 |
| Chromo 9 | CDKN2A | AG | RR(AG) = 0.88 |
| Chromo 11 | TCF7L2 | TT | RR(TT) = 1.54 |
Combined, the overall risk relative to the population for this individual is: 1.03×1.30×0.88×1.54 = 1.81
Adjusted life-time risk
Finally, the lifetime risk of the individual is derived by multiplying the overall genetic risk relative to the population with the average life-time risk of the disease in the general population of the same ethnicity and gender and in the region of the individual’s geographical origin. As there are usually several epidemiologic studies to choose from when defining the general population risk, we will pick studies that are well-powered for the disease definition that has been used for the genetic variants.
For example, with type-2 diabetes, if the overall genetic risk relative to the population is 1.8 for a white male, and if the average life-time risk of type-2 diabetes for individuals of his demographic is 20%, then the adjusted lifetime risk for him is 20% x 1.8 = 36%.
Note that since the average RR for a population is one, this multiplication model provides the same average adjusted life-time risk of the disease. Furthermore, since the actual life-time risk cannot exceed 100%, there must be an upper limit to the genetic RR. For some diseases, there is however the possibility that the RR, for the highest risk combinations of genotypes, times the population average life-time risk exceeds 100%. This can be due to various reasons such as: the disease specifications in studies differ (e.g. age range), use of imperfect parameters such as risks and population allele frequencies, potential unaccounted saturation effects in the biological risk model (i.e. deviation from the multiplication model) etc. Currently, we never report a life-time risk over 90% for an individual.


