Cohort Study


Step 5: Data Analysis

Now that you have collected the data, you quickly glance over the information and realize that there are a number of ways to analyze it. The most appropriate analysis of the data collected in this study employs the use of person-time as a way of taking into account the fact that subjects may have been followed for varying amounts of time (Please see Aschengrau pp. 212-213).

Learn more about person-time calculations [here]. In our retrospective cohort study, all individuals will enter the study at the same moment in time (September 1, two years ago). However, not all will exit at the same time. How can they exit the study? Any number of ways, including:

  1. The development of Susser Syndrome (once they have the disease, they are no longer at risk of developing it);
  2. Death from other competing causes;
  3. Loss to follow-up (Please see Aschengrau pg. 184).

Loss to follow-up presents a unique challenge in epidemiological studies. Clearly, without regular contact with study participants, it may not be possible to estimate when, and if, a person developed the disease of interest. In these situations, your calculations may be severely compromised. Epidemiologists employ two different estimates of effect to assess exposure-disease relationships in cohort studies: the risk ratio and the rate ratio (Please see Aschengrau pp. 66-68). Since this is your first real work as a budding epidemiologist, you decide to analyze the data using both measures of effect and later on compare them later.

7. Calculation of the risk ratio from person-time information. [Aschengrau, Chapter 3]

The data collected by your team yield the following information:
  • Number of cases among exposed - 74
  • Number of cases among unexposed - 120
  • Total number of exposed individuals - 1,900
    • Low exposure group - 1,000
    • Medium exposure group - 650
    • High exposure group - 250
  • Total number of unexposed individuals - 7,400
a. How would you present the data in the 2x2 table? [ Check Answer ]
b. Calculate the risk of disease among the exposed. The formula for calculating risk is: (Number of exposed cases per 2-yr time period) / (Total number of exposed persons per 2-yr time period) [ Check Answer ]
c. Calculate the risk of disease among unexposed [ Check Answer ]
d. Calculate risk ratio [ Check Answer ]
e. Interpret your findings [ Check Answer ]


Intellectually curious?

In the preceding example, you estimated the magnitude of risk due to exposure to SUPERCLEAN by comparing those with exposure to those without exposure.  However, the exposure data could be characterized more accurately by dividing into three exposure categories, i.e., low, medium and high exposure.  If the risk increases with the increase in exposure level, then one can conclude that there is a dose-response relationship in the data, i.e. biological dose gradient.  The presence of the dose-response relationship strengthens our conviction that the relationship is causal.

Please calculate the incidence risk in the three exposure groups using the following data:

Check your answers here.

 

8. Calculation of the rate ratio [Aschengrau, Chapter 3].

The data collected by your team yield the following information:
  • Number of cases among exposed - 74
  • Number of cases among unexposed - 120
  • Number of exposed person-time of observation (PYO)- 3,675
    • Low exposure group- 2,000 PYO’s
    • Medium exposure group- 1,225 PYO’s
    • High exposure group- 450 PYO’s
  • Number of unexposed PYO’s- 14,550
a. How would you present the data in the 2x2 format? [ Check Answer ]
b. Calculate the incidence rate among the exposed. The formula for calculating incidence rate is: (Number of exposed cases during 2-yr time period) / (PYO's among exposed persons during 2-yr time period) [ Check Answer ]
c. Calculate the incidence rate among the unexposed. [ Check Answer ]
d. Calculate the rate ratio. [ Check Answer ]
e. Interpret your findings. [ Check Answer ]


9. Calculation of rate ratio in different age strata. [Aschengrau]

The data collected by your team yield the information:
Age Group Exposed Unexposed
Number of Cases PYO Number of Cases PYO
< 30 43 2,188 75 9,249
≥ 30 31 1,487 45 5,301
Total 74 3,675 120 14,550
a. Calculate the rate of disease among the exposed in each age group [ Check Answer ]
b. Calculate the rate of disease among the unexposed in each age group [ Check Answer ]
c. Calculate the rate ratio in each age group [ Check Answer ]
d. Interpret your findings [ Check Answer ]
e. Does the association between SUPERCLEAN and Susser Syndrome seem to vary by age group? [ Check Answer ]

If you had chosen instead to compare the rate of Susser Syndrom in the exposed workers at Glop Industries to the rate of Susser Syndrome in the general population (e.g. the city of Epiville), the resulting rate ratio would be called the Standardized Incidence Ratio (SIR).

Learn more on how to calculate the standardized incidence ratio (SIR) here.

10. After putting exhaustive effort into data analysis, you present your findings to your supervisor.  What should you tell her?
checkbox a. The elevated estimates (both risk and rate ratios) lend support to your hypothesis that exposure to SUPERCLEAN production is associated with Susser Syndrome.
checkbox b. The exposure to SUPERCLEAN production is the definite cause of Susser Syndrome. Those elevated rates are very convincing.
checkbox c. The data clearly suggest an association between exposure to SUPERCLEAN at Glop Industries and successive development of Susser Syndrome. I think we might want to explore other potential exposure sources as well and try to improve exposure measurement.

blank image