Cohort Study


Step 6: Data Analysis

Before crunching the numbers, you quickly glance over the data and realize that an appropriate analysis of the data collected in this study employs the use of person-years as a way of taking into account the fact that subjects may be followed for varying amounts of time (see Gordis, Ch.5, pg.83-85). This allows the researcher to account for those who dropped out of the study and no longer contribute to person-years at risk due to a variety of reasons (moved away, refused to participate, died from unrelated causes, etc.). At the end of follow-up period, all person-years are summed up to represent the cumulative time at risk for disease. The time at risk for each person will be calculated from the time the individual entered the study until the time he/she exits the study. As previously stated, all individuals will enter the study at the same moment in time. However, no all will exit at the same time. How can they exit the study? Any number of ways, including:

  1. the development of Susser Syndrome (once they have the endpoint, they are no longer at risk of developing it);
  2. death; or
  3. loss-to-follow-up, meaning they choose to no longer participate in the study.

Loss-to-follow-up results in data not being collected for the epidemiological study. We may not know when the study participants dropped out and thus we may not know whether they developed the disease. It becomes impossible to directly calculate person-years. In these situations, epidemiologists may use simple counts of subjects to calculate measures of effect. This is obviously not the best choice, but it provides an estimate of the true measure of effect. This is your first real work as a budding Epidemiologist and you decide to analyze the data using both simple counts and person-years. It is time to get to work!

8. Calculation of the rate ratio based on simple counts. [See Gordis, Ch.3, pg. 32-33, Ch.10, pg. 159-162, 171]

The data collected by your team yield the following counts:
  • Total number of exposed individuals - 1900
    • low exposure group - 1000
    • medium exposure group - 650
    • high exposure group - 250
  • Total number of unexposed individuals - 7400
  • Number of exposed diseased (all people who develop Susser Syndrome among the exposed) - 74
  • Number of unexposed diseased - 120
a. The first step is to tabulate the data in the classic 2x2 table. How would you do this? [ Check Answer ]
b. Calculate cumulative incidence among all exposed [ Check Answer ]
c. Calculate cumulative incidence among unexposed [ Check Answer ]
d. Calculate rate ratio [ Check Answer ]
e. Interpret your finding [ Check Answer ]


9. Calculation of the rate ratio from person-year information. [See Gordis, Ch.10, pg. 159- 162, 171]
  • Number of exposed person-years of observation (PYO) - 3700, i.e.,
    • low exposure group - 2000 PYO
    • medium exposure group - 1250 PYO
    • high exposure group - 450 PYO
  • Number of unexposed person-years - 14500 PYO
  • Number of exposed cases - 74
  • Number of unexposed cases - 120
a. Again, how would you present the data in the 2x2 format? [ Check Answer ]
b. Calculate cumulative incidence among all exposed [ Check Answer ]
c. Calculate cumulative incidence among unexposed [ Check Answer ]
d. Calculate rate ratio [ Check Answer ]
e. Interpret your finding [ Check Answer ]

The above analyses are called "crude analyses." They suggest that there is an association between involvement with SUPERCLEAN production and the development of Susser Syndrome. You decide to better characterize this association using the information you have collected detailing the exposure sub-groups.

10. Calculation of rate ratio in exposure sub-groups. [See Gordis, Ch.10, pg. 159-162, 171]
  • Number of exposed person-years of observation (PYO) - 3700, i.e.,
    • low exposure group - 2000 PYO
    • medium exposure group - 1250 PYO
    • high exposure group - 450 PYO
  • Number of unexposed person-years - 14500 PYO
  • Number of exposed cases - 74
    • low exposure group - 32
    • medium exposure group - 30
    • high exposure group - 12
a. There is too much information here to present in the simple 2x2 format. How would you present the data in the table according to different exposure sub-groups? [ Check Answer ]
b. Calculate incidence rate among exposed by level of exposure [ Check Answer ]
c. Calculate incidence rate among unexposed [ Check Answer ]
d. Calculate rate ratio at each level of exposure [ Check Answer ]
e. Interpret your finding [ Check Answer ]
f. What is this pattern of increase in the rate ration consistent with? [ Check Answer ]


11. Calculation of rate ratio in different age strata. [See Gordis, Ch.10, pg. 159-162, 171] The crack team of field agents has presented you with the data on the age distribution of all subjects in the cohort, detailed as follows:
Age Group Number of Cases in Exposed PYO in Exposed Number of Cases in Unexposed PYO in Unexposed
Younger than 30 17 1000 30 4257
30 - 39 26 1200 45 5037
40 - 49 21 1000 40 4606
50 and older 10 500 5 600
Total 74 3700 120 14,500
a. Calculate incidence rate among exposed in each age group [ Check Answer ]
b. Calculate incidence rate among unexposed in each age group [ Check Answer ]
c. Calculate rate ratio in each age group [ Check Answer ]
d. Interpret your findings [ Check Answer ]
e. Does the association between exposure and outcome seem to vary by age group? [ Check Answer ]


12. Calculation of standardized incidence ratio (extra credit). [See Gordis, Ch.3, pg. 54-56]

You have data available from the local department of health on the annual incidence rate of the neurological disorder in Epiville. These data would allow you to calculate the standardized incidence ratio (indirect method) to determine if the incidence among SUPERCLEAN employees is higher than the incidence in the general population. Because the age distribution of the general population is quite different from the age distribution of the working population you have to take into account the age structure of the respective groups.
Age Group Incidence Rate
Younger than 30 0.0039
30 - 39 0.0052
40 - 49 0.0047
50+ 0.0062
a. Calculate the number of observed cases (total of cases among exposed and unexposed) and PYO in each age strata [ Check Answer ]
a. Calculate the number of expected cases in each strata [ Check Answer ]
c. Calculate standardized incidence ratio (SIR) [ Check Answer ]
d. How do you interpret your findings? [ Check Answer ]


13. After putting in an exhaustive effort of data analysis, you present your findings to your supervisor. What should you tell her?
checkbox a. It looks like we were chasing a red herring. In my opinion, there does not appear to be any relationship between working with SUPERCLEAN and the development of Susser Syndrome. Let's looks at another source of exposure.
checkbox b. The data regarding the possible association between SUPERCLEAN and the development of Susser Syndrome is totally inconclusive. I think we should repeat the study with more participants.
checkbox c. The exposure to SUPERCLEAN production is the definite cause of Susser Syndrome. Those elevated rates are very convincing.
checkbox d. The data clearly suggest an association between exposure to SUPERCLEAN production and later development of Susser Syndrome. I think we might want to explore other potential exposure sources to be sure as well as further characterizing this association.

blank image