Essentials of Biostatistics Indian Pediatrics 2000;37: 967-981 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9. Statistical Inference From Qualitative Data: Proportions, Relative Risks and Odds Ratios |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Earlier in this series(1) we used the terms qualitative and quantitative for different types of data. Inference for the latter will be discussed in the next Article. This Article is restricted to the qualitative data. These include all those that are summarized in terms of proportions or rates and ratios. For example, systolic blood pressure is quantitative but becomes qualitative for our purpose when divided into categories such as <120, 120-139 and 140 +. This occurs when the interest is in the proportion of subjects falling into these categories rather than in mean. A large number of statistical methods are available for different situations of this type of data but we include those in this Article that are commonly used in health and medicine. Methods of inference from proportions in one variable setup are given in Section 9.1. Sections 9.2 and 9.3 are devoted to inference on proportions in two-variable setup from independent and paired samples, respectively. Inference on relative risks and odds ratios is included in Section 9.4.
Presence or absence of jaundice among HBsAg positive children and occurrence of less than two or atleast two diarrheal episodes per month in children, are examples of the binary variable. Such a variable is also called dichoto-mous. In contrast to this, distribution of children in five birthweight groups or four grades of malnutrition are examples a polytomous variable. Let the interest be in finding that the subjects in the target population do or do not follow a prespecified pattern. Examples are: (i) whether or not occurence of sudden infant death syndrome is twice as much in winter as in summer, and (ii) whether or not the distribution of none, mild, moderate and severe grades of malnutrition in primary school children in a community is 5%, 35%, 40% and 20%, respectively. Statistically, such a problem is known as the problem of goodness of fit because the interest is in finding whether or not the pattern observed in the sample fits into the specified pattern. A statistical method known as Chi-square is used to test goodness of fit.
We explain this procedure with the help of an example. Example 1: A consecutive sample of 107 children with thalassemia admitted to a pediatric ward in a Delhi hospital is investigated for their blood group to examine the possibility of a preponderance of a particular blood group in such cases. If there is no preponderance, the profile would be the same as in the general population. This suppose is 6:5:8:1 or 0.30:0.25:0.40:0.05 for blood groups O, A, B and AB, respectively. The sample observations are shown in Table I. Does this pattern in thalassemia cases conform to that in the general population? Denote the population proportion in the four groups by p1, p2, p3 and p4, respectively. The null hypothesis(2) in this case is : H0 : p1 = 0.30, p2 = 0.25, p3 = 0.40 and p4 = 0.05. This is what is sought to be refuted by conducting the investigation. The interest is in testing whether the sample provides enough evidence against this H0. The alternative hypo-theseis H1 is that the pattern is any other than specified above. Testing of such hypotheses is based on Chi-square method. Chi-square and its explanation: Denote the observes frequencies given in Table I in the four blood group categories in the sample by O1, O2, O3 and O4, respectively. That is, O1 = 37, O2 = 13, O3 = 48 and O4 = 9. If H0 is really true then the frequency expected for blood group O would be E1 = 107 ΄ 0.30 = 32.10. Similarly, E2 = 26.75, E3 = 42.80 and E4 = 5.35 for blood groups A, B and AB. These are based on the pattern seen in the general population and conform to the null hypotheses H0. A large difference between observed and expected frequencies would suggest that the pattern in thalassemia cases is different than stipulated in H0. This then would be evidence against H0 and would favour H1. It seems that the examination of the differences (Ok Ek) for different k could be helpful. Since the total of the expected frequencies has to be the same 107 as of the observed frequencies, it is imperative that some of these differences would be negative and some positive. The sum ε(Ok Ek) always would be zero. As in the case of deviations (xi "x ) for calculating SD, square of these differences helps to get rid of the negative sign. This gives (Ok Ek)2. A quantity [(Ok Ek)2/Ek ] becomes relatively free of the differentials existing in the expected frequencies in different groups and helps to give nearly equal weight to the groups. In place of taking average of these quantities, this time obtain the sum ε[(Ok Ek)2/Ek ]. To remind that the quantity is a square, the sum is called Chi-square. This sum for blood group example is 10.94 as shown in the last row of Table I. The Chi-square is valid when n is large (say more than 30) and each Ek ³ 5. The value of Chi-square so obtained is referred to the values in standard Chi-square tables to find the P-value. This is dependent on degrees of freedom. TABLE IChi-square Computation for Blood Group Data
Ok : Observed frequency in the kth category. For the first column (O group), Ok Ek = 37-32.10 = 4.90. Similarly for other columns. * Source: Dubey A P (personal communication).
Degrees of freedom (df): There are four categories of blood group, namely, O, A, B, and AB. However, the frequency in only three of them can be freely chosen, the fourth is automatically determined by the total. If the frequencies chosen for O, A and AB are 35, 15 and 10 then the frequency in B group has to be 47 since the total is 107. Thus, there is freedom to choose only three out of four cells. This is called the degrees of freedom (df). Example 1 (continued): The Chi-square computation is shown in Table I. From standard Chi-square table, the P-value for 3 df and c2 = 10.94 is less than 0.05. From a computer package P = 0.012. When P-value from a computer package is available, there is no need to consult standard table. Thus, the value 10.94 of c2 obtained for these data is extremely unlikely when H0 is true. That is, the frequencies observed in different blood groups in this example are very inconsistent with H0. The sample values do pro-vide sufficient evidence against H0. Therefore, the H0 can be rejected. Conclude that the blood group pattern in thalassemics is not the same as in the general population. An examination of data in Table I reveals that the observed frequency in blood group A is very much lower than expected from the pattern in the general population. Other differences are not as large. To check that this really is so, check first that the pattern in blood groups O, B and AB is nearly same as expected, and then check later about difference in blood group A. This can be achieved by computing one Chi-square for blood groups O, B and AB ignoring the group A and by computing the other for blood group A versus the total of the remaining groups. This division of earlier four-group Chi-square into two parts is called partitioning of Chi-square. We repeatedly stated in our previous Article(2) that P = 0.05 is the conventional cut-off. There is a growing feeling that the use of such cut-off should be discontinued and the exact P-value, which in any case automatically comes from computer packages, be used instead. The user then is in a better position to decide how much significance does he want to attach to the results. However, a threshold is always helpful in making a definitive statement. A P <0.05 is considered significant and P <0.01 even stronger because the chance of Type I error is so much smaller. When conclusion is of crucial nature, a smaller P is desirable.
We now discuss the problem whether the proportion of subjects possessing a particular characteristic is the same in one group as in another. First we discuss only dichotomy in the variable, i.e., a characteristic present or absent. For example, proportion mortality in separate groups of patients receiving two methods of treatments. An alternative way of displaying these proportions is through a 2 ΄ 2 contingency table. A general structure of a 2 ΄ 2 contingency table is given in Table II. This is also known as a four-fold table. This setup is essentially bivariate. In our earlier Article(3), we introduced forms of a 2 ΄ 2 table that can arise in: (i) a prospective study, (ii) retrospective study, and (iii) cross-sectional study. The null hypothesis in the first two cases is called hypothesis of homogeneity (column homogeneity or row homogeneity) and in the third case hypothesis of independence. These distinctions are important for a valid inference. Basic philosophy of statistical tests was discussed in the previous Article of this series(2) and a review of that would be helpful to appreciate the statistical procedures that we discuss now. The following are some situations where these statistical tests are used to draw inferences from 2 ΄ 2 tables. TABLE IIGeneral Structure of a 2 x 2 Contingency Table
O1. and O.2 are first row and first column totals. O2. and O.2 are second row and second column totals n is the overall total.
Chi-square test Although, the null hypothesis mentioned above in three types of study is different, and consequently the interpretation is also different, it can be shown that the test criterion is the same for all of them. Under any of the three H0s, the method of computing expected frequencies is same. An example of a two-way (2 ΄ 2) table is given in Table III. In square brackets are expected frequencies under the hypothesis that the prevalence of ARI in the two communities is the same. The expected frequencies are computed as E11 = 500 ΄ 35/900 = 19.4, E21 = 500 ΄ 865/900 = 480.6, etc. These are obtained by multiplying corresponding row and column totals and dividing by the overall total n. In notations, the expected frequency for (r,c)th cell is given by Erc = (Or.*O.c)/n, (r,c = 1,2); where Or. is the total of rth row and O.c is the total of cth column. In Table II, O11, O12, O21 and O22 are observed frequencies. The corres-ponding expected frequencies are computed as follows: E11 = (O1.* O.1) / n , E21 = (O2.* O.1) / n and E12 = (O1.* O2.) / n and E22 = (O2.*O.2) / n. The test criterion is: Chi-square = ε[(Orc Erc)2/Erc], where r = 1, 2 and c = 1,2. For a 2 ΄ 2 table, this test criterion can be written as: (O11 E11)2 (O12 E12)2 c2 = + +E11 E12
(O21 E21)2 (O22 E22)2 + E21 E22The test procedure is to calculate c2, and find the probability (P-value) of obtaining this or higher value. If P-value(2) is sufficiently small, say less than 0.05, reject H0, otherwise not. In a 2 ΄ 2 table, df = 1. There is a freedom to arbitrarily choose frequency in only one cell. Others are automatically decided because the row and column totals are considered fixed. These days we do not expect that any researcher would actually go through these calculations himself. Computer is almost invariably available to do all the calculations. The reason that we give the details of the calculations is that the concepts behind these weird looking formulas are understood and appreciated in right perspective. Table IIIPresence of ARI in the Two Communities
In square brackets are expected frequencies under Ho. Example 2: Consider the date in Table III. The actual null hypothesis in this cross-sectional study is of independence. If the type of community does not affect the ARI prevalence then it would be same in the two communities. The expected frequencies under this H0 are also given in Table III as already indicated. For these data (15 19.4)2 (20 15.6)2 c2 = + +19.4 15.6
(485 480.6)2 (380 384.4)2 + 480.6 384.4= 2.38. A computer based statistical package gives P (Chi-square ³ 2.38) = 0.123. Otherwise also from Chi-square tables for 1 df, P<0.05 only when c2 ³ 3.84. Since P-value is not sufficiently small in this example, the chance of committing Type I error is not small. We can not confidently say that the prevalence is different in the two communities. Gaussian Test (or Z-test) The other method to find that the two groups are different or not is by using the standard error of the difference of proportions explained in our previous Article(2). This procedure is also valid only for large n. The test criterion is as follows: Z = (p1 p2) / SE (p1 p2) = (p1 p2) / Φ p(1 p)(1/n1+ n2) where p1 = O11 / n1, p2 = O12 / n2 and p = O.1 / n; n1 and n2 are the number of subjects in the two groups. The value of Z is referred to Gaussian distribution and a two-sided P-value obtained. An advantage with Z-test is that H0 can be tested against one-sided alternative also. In the case of Chi-square test the P-value obtained would always be two-tailed. Example 3: The data on ARI prevalences in Example 2 can be used to illustrate the Z-test also. In these data, p1 = 0.03 and p2 = 0.05 and n1 = 500 and n2 = 400. Also, p =35/900 = 0.039, which gives SE(p1 p2) = Φp(1 p)(1/n1+1/n2) = 0.13. Therefore, Z = (0.05 0.03)/0.013 = 1.54. From Gaussian table, P = 0.123 for Z = 1.54. The interpretation is same as explained in Example 2. This Z should give same P-value for large n as preceding c2. In fact there is a theoretical relationship saying that Z2 = c2 with 1 df. This is called equivalence of c21 with Z. Detecting a medically important difference in proportions As mentioned in the previous Article of this series(2), a small difference between the groups can become statistically significant when the sample size is large and a large difference can be statistically not significant when n is small. If the cure rate after one month of a particular new therapy is 70 per cent in a sample of subjects, against 60 per cent in the existing therapy group, the difference would be statistically significant if the number of subjects in each group is 123 or more. But this small gain of 10 per cent may not be worth the trouble of switching over to the new therapy if it is difficult to implement because of high cost, increased inconvenience to the patient or requiring intensive investigations. Thus, medical significance of the difference is always a separate consideration. In view of the importance of medical significance of the result vis-a-vis statistical significance, we discuss this aspect in detail in the next Article of this series. For the time being, our concern is with the method that could fairly ensure that a medically important difference between the groups is detected. The investigator specifies the minimum difference that would be considered medically important. If the difference in excess of 20 per cent is considered of some consequence, then H0 : p1 p2 = 0.20. If this is rejected, the alternative hypothesis H1 : p1 p2 >0.20 is accepted. In this example, 20 per cent is the minimum difference. The statistical procedure is then geared to test not that a difference is present but that it is more than 20 per cent.
Fisher's Exact Test (Small n) Chi-square is not valid for small n. Fisher's exact test is needed for this situation. This can be easily calculated manually for really small n (say £10) but can become difficult for larger n. We advise to take help of a computer package. Many packages issue warning in case n or expected cell frequencies are small, or would automatically compute Fisher's exact test for 2 ΄ 2 tables. Most statistical software would give the exact P-value for small n based on Fisher's test that uses the following formula: O1. ! O2. ! O.1 ! O.2 ! P = n ! O11 ! O12 ! O21 ! O22 ! where n! = 1 ΄ 2 ΄ ... ΄ n. For example 5! = 1 ΄ 2 ΄ 3 ΄ 4 ΄ 5 and 0! = 1. For different possible values of O11, there will be different 2 ΄ 2 table with the same marginal total. We can use the above formula for calculating P for each table and sum of these probabilities would be unity. Now, to assess the significance in an observed table, we calculate the probability of that table, and of all other extreme configurations that favour H1. The sum of such probabilities would be the required Fisher's exact one-tail P-value. An appropriate computer package would do that for us. This can be doubled in many cases (though not always) to obtain the two-tailed P-value. For large n two-tailed Fisher's P would be same as the P obtained from a regular Chi-square test. Reject H0 if P <0.05, otherwise contend with the assertion made in H0. Example 4: Consider survival in treated and untreated groups as shown in Table IV. The H0 is that there is equal survival in two groups and H1 is that survival in the two groups differ. To compute Fishers exact P-value, cell frequencies are changed keeping marginal totals fixed. In Table IV one cell frequency is already zero. Thus, no further configuration favouring H1 can be obtained. We compute P as follows. P = 6! 8! 8! 6! / (14! 6! 0! 2! 6!) = 0.0093. Thus, the one-tail P-value is less than 0.01 and two-tail P would be less than 0.05. Therefore, the null hypoothesis can be safely rejected. Conclude that survival in treated group is indeed higher. The chance that the conclusion is wrong is less than 5 per cent, in this case even less than two per cent. Thus, this is a safe conclusion. TABLE IVA Hypothetical Example of Surviving Children in Treated and Untreated Categories
Fishers exact P = 0.0093.
Tables III and IV are examples of setups with two dichotomous variables each. But there are a large number of variables that have more than two categories. Contingency tables with R rows and C columns are not uncommon. The df for such tables is (R 1) ΄ (C 1). An example of a 4 ΄ 2 table is in Table V on hemoglobin levels in urban and rural adolescent girls. In this table, the variable in the columns has nominal categories and the variable in rows has ordinal categories. The df for this table is (4 1) ΄ (2 1) = 3. The Chi-square value can be computed similar to the one explained in Example 2. For this table, c2 = 37.3 with 3 df. From a computer package P <0.001. Thus, the conclusion is that the Hb level distribution is different in urban area than in rural area. Study of linear trend in proportions in case of ordinal categories is relatively simple when such categories can be assigned a valid score. These scores could be 0, 1, 2 and 3 for the Hb level categories in example in Table V. For details of the method of Chi-square for trend, refer to Armitage and Berry(5). Trend can be studied only when the categories are ordinal. For other methods to analyse ordinal data, see Agresti(6). The methods stated in Section 9.1 and here in 9.2 are restricted to one-way and two-way contingency tables, respectively. A three-way contingency table arises when the classification of the subjects is done with respect to three variables. An example of this is age distribution of girls according to menstuating status and socio-economic status given in one of our earlier Articles(1) of this. As in the case of two-way tables, the null hypothesis in a three-way table could be of different types of homogeneity of different groups or of independence, depending upon individual variables being factor or response. No matter what type H0 is, the calculation of chi-square proceeds on the same lines. However, in this case, log-linear models may be extremely useful in separating the effect of one variable from the effect of the others. For details, see Haberman(7). Table V Hemoglobin Levels of Adolescent Girls According to Area of Residence
The procedures given in the previous sections are valid only when the two groups of subjects are independent. Independence is lost when there is some kind of matching or pairing. Matching is a frequently adopted mechanism in medical studies. For example, a study on comparison of supplementation and non-supplementation of curd in diet can be done on two groups of age, sex and weight matched children, of age 2-3 years, and change in nutritional status observed. Pairing also occurs when the same group of subjects is observed before therapy and after therapy. For example, two consecutive blood samples drawn from children of acute flaccid paralysis for observing the poliovirus.
The tabulation of paired data is different from that of independent samples. This can be noted from the difference between Table VI and Table II. A very popular test criterion in case of matched pairs is as follows. (| b c | 1)2 McNemar c2m= (b + c) where b and c are as in Table VI. This continues to be referred to Chi-square distribu-tion with 1 df for obtaining the P-value. Example 5: To evaluate the role of a therapy in relieving common cold within a week, 50 cases were given a therapy and another group of 50 cases served as controls. The experiment and the control subjects were one-to-one matched for age, gender and BMI so that these factors do not act as confounders. The results obtained are shown in Table VII. There are 22 pairs in which both the subjects, with therapy and without therapy, felt relieved in one weeks time. In 15 pairs the subject with therapy felt relieved but the subject without therapy did not feel so.The frequencies in the second row can also be similarly explained. For Table VII,
= 4.05. Again, there is only 1 df in a 2 ΄ 2 table. Computer package gives P = 0.044 and from Chi-square table, P <0.05. The null hypothesis in this case is that the therapy has no effect. But the likelihood of this being true is extremely smallless than 5 per cent.Thus, reject H0 and conclude that the therapy is helpful in relieving common cold within one week. Note that the number of those relieved by therapy (15 sub-jects) is much more than those relieved without therapy (5 subjects) among the disconcordant pairs. McNemars test shows that such a large difference is extremely unlikely to have arisen due to chance. Example 6: Tables VIII and IX show isolation pattern of poliovirus in 97 cases in first and second stool samples in Delhi(8). The null hypothesis in this case is that the patterns of isolation of poliovirus is same in both the samples. This can be checked by McNemars test as follows: McNemars c2m = (| 8 6 | 1)2/(8 + 6) = 0.071. For 1 df at 5 per cent level of significance, the table value of c2 is 3.84. Our value of Chi-square, 0.071, is much less and so P >0.05. There is no evidence to reject the contention that the two samples tend to give similar result. In fact, in this sample, the number of pairs with positive result in the first sample and negative in the second is only slightly different from the number with negative in the first and positive in the second. As in most other cases, McNemars test ceases to follow Chi-square when n is small. The test is then done with the help of a binomial distribution. For details, see Le(9). TABLE
VIStructure of Table for
Matched Pairs with Dichotomous Antecedent
TABLE
VII A Hypothetical Example of Children Relieved From
Common Cold Within
TABLE VIII Isolation of Poliovirus in First and Second Stool Sample in Delhi, 1997
TABLE
IX Pattern of Isolation of Poliovirus in Two Stool
Samples of 97 Cases
We discussed different types of designs in medical studies in our earlier Article(3). One of these is a cross-over design. Since the knowledge of contingency tables with binary response and pairing is necessary for cross-over designs, we deferred its discussion to this point. Cross-over is an effective strategy to minimize the effect of interindividual variation in those situations where there is no carry-over effect of the drug. A patient is given one drug for a specified duration, and then the other drug with a wash-out period in between. Sometimes the sequence of administration makes a difference in the outcome. That is, the patients who receive drug B then drug A (BA sequence) give different results than those who receive AB sequence. In order to study the effect of sequencing, the patients are divided into two random groups one is given BA sequence and the other AB sequence. Our concern here is with a situation where the response or the outcome is binary. This is yes/no, present/absent, relieved/not relieved type of response. In such cases, the number of subjects with different response in a cross-over trial can be listed as in Table X. Each row of this table is of the type given in Table VI. Thus, the null hypothesis that the drugs are equally effective can be tested separately for each sequence by using McNemars Chi-square test. For finding statistical significance between the two sequencing, the concordant pairs in the first and the last columns are ignored. They do not provide any evidence against the null hypo-thesis. The frequencies in only the middle two columns are analysed. This will indicate whether response from one treatment is different than the other treatement. If (a2 + b3) is large relative to (a3 + b2) then drug B is more effective. If (a3 + b2) is larger then drug A is more effective. Setup (ii) of Table XI gives counts of relieved subjects in the two groups. In both the setups (i) and (ii) the prerequisite is that the performance of the drug is not dependent on whether it is given first or given second. This really means that there is no interaction. Most practical situations would meet this condition. Statistical testing of this can be done by using setup (iii). For all these three setups the usual Chi-square can be calculated for large n and Fishers exact test for small n. We have stated these three setups in reverse order. In practice, do (iii) then (ii) and then (i). If the presence of interaction is detected by (iii), it is not worthwhile to reach to (ii) or (i). Presence of interaction means that the sequence of the drugs makes a difference in the outcome. In this case, find why this is occuring and do the trial again after taking steps to remove the likelihood of interaction. TABLE X Binary Responses Observed in a Cross-Over Design
Codes for response: 0 No relief, 1 Relief.
Statements on risk for development of a condition in the presence of an exposure are common in medical literature. For example, low weight gain during pregnancy is an important risk factor for perinatal and infant deaths(10). In a 2 ΄ 2 table, the magnitude of risk or of association is measured by calculating relative risk (RR) in case of prospective studies, and by odds ratio (OR) in case of retrospective and case-control studies. When there are more than two groups, the comparison could be made between two groups at a time. Apart from the relative risk and odds ratio, attributable risks are also discussed in this section. All these come under the category of epidemiological measures. TABLE XIRearrangement of Responses From Cross-Over Design in Three Set-ups
Risk of cerebral palsy in multiple births is nearly six times relative to singletons. In this example, relative risk (RR) = 6. Relative risk is the ratio of the risk of developing an outcome such as disease (D) in those with antecedent factor (A) compared to those without this factor. This obviously requires a prospective study. The antecedent factor would generally be an exposure believed to cause the disease. The term risk here has the same meaning as incidence. Using the notation of Tabel II, the RR is computed as: RR = (O11/O.1) / (O12/O.2). [1] It measures the degree of association of outcome with the antecedent factor. Confidence Interval for RR: When n is large, the 95 per cent CI for RR can obtained as exp [lnRR ± 2SE(lnRR)] [2] where lnRR is natural logarithm of RR and exp is the exponential function. Logarithm is taken because lnRR has an approximate Gaussian distribution for large n. The formula for SE (lnRR) is: 1 1 1 1 SE(lnRR) = Φ + O11 O.1 O12 O.2 Significance test for RR: The H0 in this case mostly is RR = 1 and H1 : RR <1 or RR >1. The test procedure is to calculate: Z = lnRR / SE(lnRR) [3] H0 is rejected if the corresponding P-value based on Gaussian distribution is less than 0.05. This test is in addition to the usual Chi-square that can always be used for two-sided alternatives. Example 8: Consider a prospective study where the outcome is an infant death and antecedent factor is pregnancy weight gain categorized as <7 kg and >= 7 kg(10). The bivariate data on antecedent factor and outcome are tabulated in Table XII. Risk of infant death in pregnancies with weight gain during pregnancy less than 7 kg can be computed as: RR = (181/1847)/(36/687) = 1.87. This means there is approximately two fold risk of infant death if the pregnancy weight gain is <7 kg against higher weight gain. In this example, SE(lnRR) = Φ 1/181 1/1847 + 1/36 1/687 = 0.177 Thus, 95 per cent CI for RR = exp[lnRR ± 2 SE(lnRR) = exp[ln 1.87 ± 2 ΄ 0.177] = exp[0.626 ± 0.354] = (exp[0.272], exp[0.980]) = (1.31, 2.66). The RR is likely to be in the range of 1.3 and 2.7 when repeated samples of this type are studied.
A general situation of matched pairs with regard to antecedent and outcome in a pros-pective study is shown in Table VI. In this case, RR is estimated as follows: RR (matched pairs) RRm = (a + b) / (a + c) [4] The numerator is the number of subjects developing the disease among exposed and the denominator is the number of subjects developing disease among the nonexposed. For data in Example 5, RRm = (22 + 15) / (22 + 5) = 1.37. That is, in this sample of 50 pairs, the estimated chance of a relief within one week is nearly 1.4 times in the therapy group relative to the nontherapy group. TABLE
XII Number of Infant Deaths
According to Weight Gain During Pregnancy
RR = (181/1847) / (36/687) = 1.87
In betting, it is stated, for example, that the odd of winning is 1 : 3. This odd means that a loss is 3 times more likely than a win. Or, the odds in favour of win is 1/4. Similarly, in case-control studies, an odd is the frequencey of presence of antecedent relative to its absence. This is calculated for the cases and the controls. The ratio of these two odds is called the odds ratio. For example, one may find that the odds of asthmatic parents in children with asthma is two times the odds in children without asthma. The comparison here is between the frequency of occurrence or of presence of antecedent among the cases relative to among the controls. Ideally, all other possible factors are appropriately matched so that they do not influence the result. If there are some that are not matched then the statistical analysis is geared to minimize the influence of these factors on the result. In case-control studies, OR is estimated using the notation in Table II as follows: OR = (O11*O22)/(O12*O21) [5] Since the numerator is the product of elements in the leading diagonal (Table II) and the denominator of the elements in the other diagonal, OR is also sometimes called cross-product ratio. The value of OR in [5] becomes undefined if any of cell frequencies is zero. The modified estimate of OR is: (O11+ 0.5) (O22+ 0.5) OR = (O12 + 0.5) (O21 + 0.5) An OR = 3 says that the presence of ante-cedent is thrice as common among the cases as in the controls. It can be shown that odds ratio approximates relative risk fairly well when the outcome of interest is rare, say less than 5 per cent, in the target population. Most outcomes of medical interest are rare. This property of OR obviates the need to do expensive prospective studies. The results obtained from relatively inexpensive case-control studies can be used to draw inferences on the relative risk. Example 9: Consider a case-control study that evaluates risk factors for persistent diarrhea(11). Table XIII depicts frequency of malnutrition in cases with persistent diarrhea and controls along with OR and 95 per cent CI for OR. The odds of malnutrition in cases of persistent diarrhea is nearly four times than in subjects with no persistent diarrhea. CI for OR: OR is a ratio and its natural logarithm (ln) is used since lnOR has nearly Gaussian pattern for large n. It has been established for large n that SE(lnOR) = Φ 1/O11 + 1//O12 + 1//O21 + 1//O22 where O11, O12, O21 and O22 are as in Table II. 95% CI for OR: exp[lnOR ± 2SE(lnOR)] [6] In Example 9, the OR can vary between two and nine approximately in repeated samples. Significance test for OR: The H0 in this case almost invariably is that OR = 1. This says that the presence of antecedent is as common in cases as in controls. Since OR = RR if outcome is rare, H0:OR=1 also says in that case that presence or absence of antecedent does not influence the outcome. A simple statement which takes care of both the directions of relationship is that there is no association between antecedent and outcome. The alternative could be one-sided H1 : OR<1 or H1 : OR>1, or could be two-sided H1 : OR Ή 1. The latter is applicable when there is no a priori assurance that the realtionship could be one sided. The two-sided hypothesis for the data on malnutrition as risk factor for persistent diarrhea shown in Table XIII is tested by classical Chi-square as stated earlier. For one-sided alternative, use Z-test and refer to Gaussian tables to obtain the P-value. This is given by lnOR/SE(lnOR) [7] where SE(lnOR) is the same as given for CI earlier. TABLE XIII Frequency of Malnutrition in Persistent Diarrhea Cases and Controls
OR = (48΄38) / (16΄26) = 4.38, 95% CI of OR: (2.1, 9.3).
Since the SE depends on the number of subjects in the study, a useful strategy could be to include more control subjects per case. Multiple controls can also be used as an effective strategy to reduce the number of cases when they are difficult to enlist or are very costly. In many practical situations, controls are easily available, less cumbersome to investigate and more cooperative. Thus their number can be increased without corresponding increase in the cost. In that case, in place of one-to-one matching, C : 1 matching can be done, i.e., C controls per case. This increase in the number of controls reduces the SE and increase the power of the study to detect an association. However, the returns are diminishing. In general, as the ratio of controls to cases increases beyond 4 : 1, the additional gain in statistical power may be small compared to the cost involved. For details see Breslow and Day(12). When several 2 ΄ 2 tables are available, it is possible to draw a joint conclusion by pooling the evidence from different strata. For example, data of the type shown in Table XII when available for women of different age groups. Pooled OR is obtained by using Mantel-Haenszel procedure. For the formula of pooled OR and the details of the procedure, refer Armitage and Berry(5). When stratification is to be done for more than one variable, methods such as logistic regression are used(12). This aspect will be discussed briefly later in a future Article of this series.
Consider a table similar to Table VI on matched pairs. The total number of pairs is a+b+c+d. Let a be the number of pairs with both case and control subjects exposed, d is the number of pairs with both nonexposed. These two together are the concordant pairs. The odds ratio is computed on the basis of the disconcordant pairsb is the number of pairs where case-partner is exposed but control partner is nonexposed, and c is the number of pairs where case-partner is nonexposed but control-partner is exposed. In case of an association between exposure and disease, clearly b should be more than c. Odds ratio (matched pairs) ORm = b/c [8] and SE(lnORm) = Φ 1/b + 1/c 95% CI for ORm : lnORm ± 2SE( lnORm) [9] Test of significance for ORm: The relevant null hypothesis in this case H0 : ORm =1. To test this against a one-sided alternative H1 : ORm>1, calculate Z = (b c) / Φ (b + c) [10] For large n, refer it to the usual Gaussian distribution to find that the P-value is sufficiently small or not. For a two-tailed test, it may be easier to calculate McNemar's chi-square test with one df.
Magnitude of excess risk due to an exosure is called attributable risk. In a classical study on British doctors, Doll and Hill(13) compared the mortality by lung cancer and cardio-vascular disease in nonsmokers and heavy smokers. Relative risks indicated a very strong association with lung cancer (RR = 32.4) and low associa-tion with cardiovascular disease (RR = 1.36). The AR in these two cases was nearly same. Thus, the elimination of heavy smoking among British male doctors would have reduced the cause specific mortality for lung cancer almost as much in absolute terms as for cardiovascular disease. AR is basically the difference in the risk among the exposed and the nonexposed subjects: AR = (O11/O.1) (O12/O.2) TABLE XIVSummary of Procedures on Inference From Proportions
Methods for small n not discussed except Fisher's exact. TABLE XVSummary of Procedures for Inference From Relative Risks and Odds Ratios
Figures in square brackets are the concerned equation numbers.
There are following variations of this concept. Population attributable risk (PAR) is concerned with excess rate of disease in a specific exposure group, for example, excess risk of pneumonia in under-nourished children compared to all the children including the under nourished as well. This excess is called the population attributable risk (PAR) and is calculated as
Thus, PAR is the rate of disease in the population minus the rate in the unexposed group. This is different from AR since the population comprises both the exposed and the nonexposed groups of people. Attributable fraction or PAR fraction is the excess calculated as proportion of the rate of disease in the population. This measures the proportion of disease in the population that is attributable to the exposure, and is the portion of the disease incidence that could be eliminated if the exposure were eliminated. PAR fraction can be directly obtained from RR as follows when the proportion of persons with the given risk factor is known. [p * (RR 1)] PAR fraction = [p * (RR 1) + 1] where p is the proportion of persons having the given risk factor. Since RR can be approximated by OR in most cases, the PAR fraction can be estimated on the basis of case-control data also. This article is necessarily a concise version of the methods more commonly used in medical investigations. A summary is provided in Tables XIV and Table XV. Among the methods that we have generally not discussed are the ones that can be used for small n. A small sample size requires consideration of exact methods such as Fisher's exact test. Other such methods are far too complex for inclusion in this Article.
|