Research Papers Indian Pediatrics 2007; 44:675-681 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Impact of Training on Observer Variation in Chest Radiographs of Children with Severe Pneumonia |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pneumonia is the leading cause of childhood death in developing countries contributing globally to 21% of deaths in under-five children(1). World Health Organization’s (WHO) clinical case definitions for diagnosing pneumonia are sensitive and appropriate when the consequences of missed diagnosis are serious and useful for treatment decisions, but have low positive predictive value (2,3). Therefore, for epidemiological purposes, a method for diagnosing pneumonia can be chest radiography which also allows the readers to be blinded to the intervention or clinical course of the patient. However inter-observer variation in the interpretation of chest radiographs of children with ARI is well known as there exists no strict radio-logical definition of pneumonia(4-6). Standardization, simplification and categorization of radio-logical features along with training can help mitigate this problem(7). For the purposes of vaccine efficacy trial, WHO established a working group to standardize the categorization of radiological pneumonia, for the purpose of establishing the burden estimates of likely bacterial pneumonia and estimating vaccine impact(7). Using this categorization of radiologic features of pneumonia, we undertook this study to estimate the inter-observer agreement in interpretation of chest radiographs in children with severe pneumonia and to examine whether the WHO training intervention increases the inter-observer agreement. Subjects and Methods The present study was conducted at Indira Gandhi Government Medical College and Hospital (IGGMC), a tertiary care center at Nagpur, India. This site participated in the Amoxicillin Penicillin Pneumonia International Study (APPIS) which was a multicenter randomized, study conducted in 8 countries to determine whether oral amoxicillin and parenteral penicillin were equivalent in the treatment of WHO-defined severe pneumonia (fast breathing with lower chest wall indrawing) in children aged 3-59 months(8). The chest radiographs of 172 of 200 children with WHO-defined severe pneumonia were assigned a unique code number to maintain confidentiality and to blind the observers. Three observers i.e. a pediatric faculty (ABP), a radiology faculty (SZS) and a radiology resident (APA), blinded to each other’s observations, independently read the chest radiographs before undergoing training with the standardized software. Based on their experience they recorded what they thought was: film adequacy, significant pathology in lung fields, consolidation on the left side, consolidation on the right side, other infiltrates/abnormality on the left side, other infiltrates/abnormality on the right side, pleural effusion on the left side and pleural effusion on the right side. These were recorded as "before-training readings". Standardized training by the WHO expert, using the software began a week after these readings. After training, the three clinicians re-interpreted the 172 chest radiographs in random order, under the same conditions. These were the "after-training readings". The training intervention AiMS multimedia is a commercial software with a repository of normal and abnormal films showing a spectrum of radiological changes in pneumonia. It contains training, teaching and assessment tutorials. A team of experts for WHO vaccine trials standardized the observed radiological changes of these chest radiographs (Table I). Each observer (ABP, SZS and APA) was trained for 3 days to recognize the standard radiological features under the supervision of a WHO representative who was also a member of the team for development of the software and its training program for the vaccine trials. The software has several pneumonia tutorial sets (20 chest radiographs in each). The observers have to read these radiographs and their responses are compared to that provided by the software. The responses of the observers had to have a sensitivity and specificity of 80% with the software responses to be successfully trained. TABLE I WHO Standardized Chest Radiographic Features in Pneumonia
Statistical analysis Using the statistical method of latent class analysis which uses the consensus among the observers as a measure of the estimated ‘true’ prevalence, we estimated the prevalence of ‘uniterpreatable’ and ‘adequate’ films before and after training. For each described radiographic feature (Table I), its Cohen’s kappa statistic was used to assess between-observer agreement between pairs of observers, and, to assess agreement among all three clinicians Fleiss’s multiple rater kappa statistic was used(9,10). Unweighted kappa was used for radiological features with two outcome categories (e.g. ‘Yes’ or ‘No’ outcome) while weighted kappa was used for features with more than two ordered categories (e.g. ‘uninterpretable’, ‘suboptimal’ and ‘normal’ category for "Film Adequacy"). We defined ‘unanimity’ as a complete agreement by all observers on each category of a radiographic feature. For instance, ‘unanimity’ was said to exist if all three observers agreed on the presence or absence of consolidation on the left side. To estimate the strength of association between training and ‘unanimity’ for each radiographic feature, univariate logistic regression models were used. Agreement 1.0 (Lata Medical Research Foundation, Nagpur, India) and Stata 7.0 (Stata Corp, College Station, TX, USA) software programs were used for analysis. Results The prevalence of ‘uniterpreatable’ films was 16.6% before training and significantly reduced to 8.1% after training (P = 0.000). For ‘adequate’ films, it was 54.22% before training and significantly increased to 70% after training (P = 0.000). There was ‘unanimity’ for absence of pleural effusion on the left side before and after training. Observers ABP, SZS and APA identified 2, 2 and 1 films, respectively as having pleural effusion on the right side before training. This lack of pre-training ‘unanimity’ improved after training and right pleural effusion was now identified by all, in just one film. For further agreement analysis, we only excluded the films classified as uninterpretable by at least one observer, and, those of pleural effusion as the numbers were small. TABLE II Comparison of Pair-wise Agreement (Cohen’s kappa) Before and After Training for all Radiographic Features
* Observer 1 is pediatric faculty (ABP), Observer 2 is radiology faculty (SZS) and Observer 3 is radiology resident (APA), A comparison of pair-wise Cohen’s kappa values before and after training is shown in Table II. For agreement between observers ABP and SZS, there was a significant improvement for all features except infiltrates on the left side. For observers ABP and APA the improvement in agreement was significant for all the features. For the pair of observers SZS and APA, a significant improvement in agreement was seen only for the feature of primary end point consolidation on the left side – all other kappa values for this pair of observers being moderate-to-high even before training.
Figure 1 shows the multiple rater Fleiss’s kappa estimates before and after training. It was observed that the training intervention contributed to an improved agreement among the three observers for all the radiographic features. The Fleiss’s kappa values after training ranged from 0.37 to 0.52 indicating moderate to good agreement and were highly significant at Z values ranging from 7.9 to 11.4. Maximum improvement in Fleiss’s kappa was observed for primary end point consolidation on the left side followed by the one for infiltrates on right side. TABLE III Effect of Training on Agreement for Various Radiographic features.
Odds ratio represents the times-likelihood of unanimous agreement consequent to the training intervention. Finally, we assessed whether ‘unanimity’ improved significantly after training by using a logistic regression model. Table III summarizes the results of the logistic regression analysis. It was again observed that the best improvement in ‘unanimity’ was for the feature of primary end point consolidation on left side followed by the one for the feature of infiltrates on the right side. Discussion This study emphasizes the importance and benefit of standardizing the interpretation of the chest radiograph using a training intervention. Chest radiographs are often used in epidemiological studies and for antimicrobial or vaccine clinical trials, to determine the outcome of pneumonia(11,12). The reporting of this important study outcome is often difficult and ambiguous if there is a lack of agreement between observers. This can contribute to bias and misclassification(13). Reported agreements between readers also vary from study to study (4,6,14). There are many possible reasons for a lack of agreement between observers. Firstly, a wide spectrum of radiological findings are observed in children such as a typical appearance of lobar consolidation of bacterial pneumonia to mild interstitial and perihilar changes often associated with bacterial, viral infections, asthma or normal children. The varied radiological manifestations in patients of Human Immunodeficiency Virus may further complicate the issue(6). Secondly, clinicians can describe the radiographic features in different terminologies and also a single feature can have different grades. So standardizing definitions for radiological features and simplification of their grades can improve agreement(15,16). Thirdly, the observers can be of different specialization and experience(6). In this study, radiologist observers (SZS and APA) had less disagreement whereas the pediatrician’s observations were more in disagreement with radiologist colleagues. In this study, there was a significant agreement for all radiological features subsequent to training. Although the outcome of infiltrates were scaled on a simple two point scale as presence or absence, training further enhanced agreement. Overall the reporting of "adequate" films improved and the diagnosis of "significant pathology" decreased. This shows that training even experienced clinicians could help reduce the number of repeat orders for chest radiographs, when judged as "uninterpretable" in clinical practice. It also helped to decrease an over interpretation of abnormality, which often can occur if the observers know that they are reading chest radiographs of children already diagnosed with pneumonia. Previously, the agreement for these simplified and standardized WHO defined radiographic features was studied on 20 radiologists and clinicians with a reference reading but improvement with training was not assessed(17). The agreement for any abnormality ranged from 71-85% with a range of kappa from 0.31-0.68. The post training agreement between observers in our study is similar to that reported between trained readers and the reference standard in the WHO study. This simplified standardized method of reporting chest radiographic features without training has been used in a double blind randomized trial of 9–valent pneumococcal conjugate vaccine enrolling 39,836 children in South Africa(11). However our study showed that merely standardization and simplification may not be enough to achieve even moderate agreement and training helps improve agreement. Finally, in spite of a significant improvement in agreement for all the radiographic features, the post-training kappa values indicated only moderate-to-good agreement. The training was provided using computer images but chest radiographs of patients were read on the viewbox, which could be one of the limitations in achieving better agreement. Also it endorses the fact that radiological diagnosis of chest radiographs in children with severe pneumonia are inherently subject to a substantial inter-observer variation which reinforces the importance of standardization and training. Acknowledgements We are thankful to the APPIS project for providing the chest radiographs, Dr. Shamim Qazi for providing the WHO standard training software package and to Dr. Thomas Cherian, WHO for imparting training to observers. We are also thankful to Dr. Jon Simon, Director, Center for International Health, Boston University for reviewing the manuscript and to Elizabeth Bertone, DSc for her help in analysis and reviewing the manuscript. We are indebted to Smita Puppalwar for her help in the study. Contributors: ABP and AA designed the study, collected the data and wrote the manuscript. AA and HK analyzed data and also participated in writing manuscript. SZS and AAt assisted in data collection and its management. Funding: World Health Organization Competing interests: None
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
References | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|