Performance of Quantitative Ultrasound and Six Osteoporosis Risk Indexes in Post-Menopausal Women Validation and Comparative Evaluation Study

Background: A number of questionnaire-based systems and the use of portable quantitative ultrasound scanners (QUS) have been devised in an attempt to produce a cost-effective method of screening for osteoporosis. Objective : to assess the sensitivity and specificity of different techniques and their ability to act as screening tools in relation to dual energy X-ray absorptiometry (DXA). Methods: 295 white postmenopausal women aged over 60 were enrolled. Each subject completed a standardized questionnaire which permits the measure of six osteoporosis indexes and had bone mineral density (BMD) measured using QUS and DXA. Sensitivity and specificity of the different techniques in relation to DXA were plotted as receiver-operating characteristic (ROC) curves at DXA T- score total hip ≤ -2.5 (osteoporosis). Results: BUA sensitivity and specificity values were respectively 76.8% and 51.2% at the total hip. The optimal cut-off T-score for QUS was -2 at the total hip. The osteoporosis self-assessment tool (OST) provided consistently the highest AUC (0.80) among the clinical tools and had the best sensitivity and specificity balance (90.2%- 44.5%). OST negative likelihood ratio was 0.22. Conclusion: OST (based only on the weight and the age) performed slightly better than QUS and other risk questionnaires in predicting low BMD at the total hip.


INTRODUCTION
Osteoporosis has enormous health and socioeconomic implications in terms of morbidity, mortality and disability worldwide. For a 50-year-old white woman, the lifetime risk of suffering a fragility fracture is estimated to be 30-40% (1)(2)(3). The most widely used technique to diagnose osteoporosis and assess fracture risk is dual-energy X-ray absorptiometry (DXA). However, due to increases in public awareness and the introduction of novel therapies for osteoporosis, there has been an increase in the demand for bone density measurements. Due to cost considerations and limited availability of bone mineral density (BMD) technology in some communities, it has been proposed that BMD measurements should target subjects with risk factors for osteoporosis. Recently, many epidemiological studies have validated risk assessment indices for osteoporosis in women. The purpose of the risk assessment indices is not to diagnose osteoporosis or low BMD, but to identify people who are more likely to have low BMD. Such indices, while not identifying all cases of osteoporosis, increase the efficiency of BMD measurement by focusing on subjects who are at an increased risk (4)(5)(6)(7)(8). Several questionnaires have been devised in an attempt to produce a cost-effective method of screening for osteoporosis. These questionnaires focus on well known clinical risk factors for osteoporosis and combine a varying number of them to produce quantitative scores. The scores are designed to identify patients at risk of having low bone mineral density, who need to undergo a full assessment of their bone status (9). Examples of previously examined questionnaires are the Osteoporosis Self-assessment Tool (OST) (6,10,11), Osteoporosis Risk Assessment Instrument (ORAI) (12), Simple Calculated Osteoporosis Risk Estimation (SCORE) (13), OSteoporosis Index of RISk (OSIRIS ) (14), the risk index derived using data from the Study of Osteoporotic Fractures (SOFSURF) (15) and the pBW based purely on the patient's body weight (16). Quantitative ultrasonography (QUS), which has been used to assess bone (especially calcaneal) status for almost 2 decades, has proven to be widely and clinically useful. QUS is a portable and radiation-free system with shorter investigation time and less cost than DXA and may be a better proposition for screening large populations, especially where DXA availability is limited (17)(18)(19). In this study, we aimed to compare the performance of six questionnaire-based screening systems (OST, ORAI, OSIRIS, SOFSURF, SCORE and pBW) and calcaneal QUS in identifying women with hip osteoporosis as assessed by DXA.

MATERIALS AND METHODS
Patients 295 consecutive women aged 60 years and over who had no previous diagnosis of osteoporosis were entered into the study. Women were recruited prospectively with consent from our Rheumatology Department or addressed by private rheumatologists in Rabat area who were invited to participate to the study. General exclusion criteria were non-Caucasian origin and diseases, drugs, and other major determinants known to affect bone metabolism Thus, we excluded subjects with gastrectomy, intestinal resection, recent hyperthyroidism or hyperparathyroidism, recent severe immobilization or treatment with corticosteroids (more than 3 months). Our institutional review board approved this study. The procedures of the study were in accordance with the Declaration of Helsinki, and formal ethics committee approval was obtained for the study. All the participants gave an informed and written consent. Each subject completed a standardized questionnaire designed to document putative risk factors of osteoporosis. Lifestyle (alcohol consumption, gymnastics or jogging/walking, smoking) and diet (milk, yogurt, cheese) habits were also recorded. The women were asked whether they usually drank milk, coffee, or alcohol, if they ate cheese or yogurt, if they did gymnastics or jogging/walking, and if they smoked tobacco. Menstrual and reproductive history were assessed. Height and weight were measured in our centre before DXA measurement, in light indoor clothes without shoes. Body mass index (BMI)] was calculated by dividing weight in kilograms by height in meters squared. To ensure that DXA exam, QUS and questionnaire tools will be performed in a blinded fashion, every patient was first interviewed by a clinician (FM); then she was addressed to perform the DXA and QUS analysis (performed by 2 technicians which did not have access to the questionnaire results). Densitometry measurements BMD was determined by a Lunar Prodigy Vision DXA system (Lunar Corp., Madison, WI) and QUS (Euromedix corp. Leuven, Belgium). All BMD measurements were carried out by 2 experienced technicians. The DXA scans were obtained by standard procedures supplied by the manufacturer for scanning and analysis. Daily quality control was carried out by measurement of a Lunar phantom. At the time of the study, phantom measurements showed stable results. The phantom precision expressed as the CV (%) was 0.08. Moreover, reproducibility has been assessed recently in clinical practice and showed a smallest detectable difference of 0.04 g/cm2 (spine) and 0.02 (hips) (20,21). Patient BMD was measured at the lumbar spine (anteroposterior projection at L1-L4) and the femurs (dual femur), and the mean result of the measure of the two femurs (total hip) was used. BMD values, expressed in g/cm2, were converted into T scores expressed in standard deviations (SDs) using our reference values (22). We used the total hip T score to categorise subjects as normal (T>−1), osteopenic (−2.5<T≤−1) or osteoporotic (T≤−2.5). The QUS machine underwent system quality verification tests each day prior to any measurements. All ultrasound scans were performed on the patient's non-dominant side, using an ultrasound gel to provide the coupling between the ultrasound probes and skin surface. The precision of the QUS machines was previously examined in a group of 30 normal subjects (aged 25-58). Two repeated measurements were performed on each individual, with repositioning between scans, and gave a CV of 0.29% for broadband ultrasound attenuation (BUA) and 0.31% for Speed of Sound (SOS). The QUS T-score was computed using the database supplied with the system. Calculation of risk indices The OST, ORAI, OSIRIS, SOFSURF, and SCORE indices were derived according to the algorithms suggested by their developers. The used variables, method of calculation, and interpretation of each test are described in Table 1.

Cut-off points
The purpose of a screening tool is to select correctly patients that are at risk of having osteoporosis and to exclude patients who are subsequently found to have normal BMD levels. The optimum screening tool would provide a cut-off point that could be used to provide the correct diagnosis of every individual's bone status and provide no false positives or false negatives. It is therefore important that a point be selected above which patients are considered to be normal, and below which they are deemed to need a further investigation. Previous studies performing validation of screening tools have used cutoffs, which supply a sensitivity of 90%, regardless of the specificity, to ensure that the percentage of patients with low BMD correctly selected is high. In this study, the best balance between the sensitivity and specificity was investigated. By combining the sensitivity and specificity scores, a value was supplied that varied between cut-off levels, and the highest combined value was used as the cutoff level.

Statistical analysis
The study was conducted on different steps: The first step consisted on the description of the population study. We performed in the second step correlations between QUS, risk indexes and DXA measurement using the spearman test. In the third step, patients were divided in two risk categories which were obtained using different cut-offs. We evaluated the QUS and risk indexes at the BMD total hip T-score threshold of −2.5 to assess their performance in predicting hip osteoporosis. Receiver operating characteristic (ROC) analyses were performed, and the area under the curve (AUC) was computed. To assess the internal validity of QUS and the risk indexes, sensitivity was defined as the proportion of the population with low BMD correctly classified (true positive fraction), and specificity was defined as the proportion with normal BMD correctly identified by the QUS and risk indexes

Factor
Method of calculation Interpretation OST (Osteoporosis Self-assessment Tool) ( The risk index states that pBW >70 kg indicates a low risk, between 57 kg and 70 kg a moderate risk and below 57 kg a high risk of having low BMD. (true negative fraction). ROC curves provided a graphical representation of the overall accuracy of a test by plotting sensitivity against (1-specificity) all thresholds, while the AUC quantified the accuracy of the test. We also calculated the positive predictive value (PPV), negative predictive value (NPV) and negative likelihood ratio (LR) to evaluate the external validity of the QUS and osteoporosis risk indexes. The PPV and NPV represent the proportion of women who tested positive or negative (as classified by the QUS and risk indexes) and who truly had, or did not have, BMD below the T-score threshold being tested, respectively. Negative LR tells us how much to decrease the probability of disease if the test was negative. Statistical analysis used SPSS statistical software (SPSS, Chicago, Il).

RESULTS
The mean age of the women in our sample was 66.3 (±5.3) years, ranging from 60 to 84 years.       To evaluate whether OST and QUS identified different subgroups of patients, we determined the concordance of the two tests. Among the 41 subjects with T-score ≤-2.5 at the total hip, 23 were correctly classified as increased risk by both methods and 2 were misclassified as low risk by both methods. Among the 254 subjects with total hip Tscore above -2.5, 89 were correctly classified by both methods as low risk, and 54 were misclassified as increased risk by both methods; hence, the concordance for correctly classifying the subjects with T-score ≤-2.5 at the total hip was 56.9%. Concordance as indicated by Bennett's kappa was 0.19, indicating fair agreement between OST and QUS for classifying risk. Adding QUS T-score to OST improved sensivity to 95.1% but reduced specificity to 35%, increased NPV to 97.8% and decreased negative LR to 0.14.

DISCUSSION
This study comparatively examined the performances of six osteoporosis risk indexes (OST, ORAI, OSIRIS, SOFSURF, SCORE and pBW) and QUS as potential screening tools to identify patients to address for DXA. The study also investigated the cut-off levels for the various techniques. The aim was not necessarily to replace DXA, but to explore various strategies and approaches by which the demand on DXA services could be reduced by, for instance, screening large sections of the population for the exclusion of individuals who upon DXA examination would have shown themselves as normal.
In the present study, in correlation to DXA, OST performed best effectively at the total hip (r=0.49), but this is to no extent disappointing, and if anything considering that the population had a bias to the lower BMD end of the population and that the questionnaires were not designed to be replacement measurements for BMD but an indicator of a patient's bone status, it is very encouraging. Overall, there were moderate (r=0.35-0.46) correlations seen between the various questionnaires. In our study, the OST successfully identified most women with hip osteoporosis with a sensitivity of 90.2%, specificity of 44.5% and NPV of 96.6% at the total hip site. The OST, based only on age and weight performed as well as the more complex risk assessment indices (SCORE, ORAI, and OSIRIS) in identifying women at low risk of osteoporosis who would not need DXA testing (96.6% of patients classified as low risk with OST don't have osteoporosis at the total hip).
The results for the QUS within this study showed close comparison with those of the previous studies. Correlations between measurement sites using the ultrasound are affected by the physical principle of the ultrasound application and the kind of bone matrix that is scanned (cortical or cancellous). The correlations from this study of (r=0.47 and 0.65) for BUA were in close agreement with previous studies (r=0.20 and 0.64) (24). The moderate correlations for the QUS can be further explained. The attenuation and velocity of an ultrasound pulse will be affected by the structure of the material it is passing through, with strong and complex trabecular structure affecting the ultrasound in different ways in fragile and broken trabeculae, a factor not taken into consideration by the measurement of density alone.
Previous studies looking at the AUC for the QUS prediction of osteoporosis at the hip achieved AUC of 0.72 for BUA (25,26). The results from the present study are in agreement with the previous results for the BUA and QUS T-score, which ranged from 0.70 and 0.67 for both respectively (5,(27)(28)(29). The AUC for the different techniques also supplied information on the diagnostic accuracy of the different technique. The AUC results for the different techniques in this study showed the majority of the methods being considered to have moderate diagnostic accuracy, with the measurements ORAI and pBW showing low diagnostic accuracy.
In the present study, 93% (NPV) of women would not have hip osteoporosis with DXA at a QUS T-score upper than -2. Several cut-offs have been proposed in the literature ranging from -1 to -2.3 with a sensitivity and specificity between 69% to 61% and 51% to 83% respectively (30)(31)(32)(33). Previous studies have provided evidence that showed that screening with QUS is cost-effective relative to clinical criteria and DXA (34,35). However other authors did not support this conclusion in higher-risk patients (36,37). According to our findings, with the use of OST as a firststep screening tool and applying strict cut-off values generated by a likelihood ratio analysis for the diagnosis of densitometric osteoporosis or non osteoporosis, approximately 60% of the population of women 60 years or older can be reasonably excluded for DXA, while only 40% of the women should be referred for DXA to confirm diagnosis. Using QUS T-score as a first-step screening tool, approximately 35% of women 60 years or older would have been excluded for DXA. When combing both systems, a useless DXA analysis would have been avoided in 69% of women. However, in this study, adding QUS Tscore to OST improved sensivity to 95.1% but reduced specificity to 35% and increased NPV and decreased negative LR at 97.8% and 0.14.
Although the correlation between DXA and QUS is statistically significant, there is poor concordance between the two measurement techniques. For clinical and ethical reasons, we stress that sensitivity rather than specificity is the critical value to concentrate on because the diagnosis of positive individuals from QUS measurements would eventually depend on DXA. When screening for low BMD, a high sensitivity, as we focused on in this study, could only be achieved by lowering specificity. This is comparable to previously reported results.
As with most studies, our study has limitations. For example, the subjects in our sample were either referred or came in spontaneously for osteoporosis evaluations, and may differ in some ways from the general population. Another limitation of this kind of studies is that it does not take into account the risk of fracture, which is the main purpose of treating osteoporosis. DXA itself has a low sensitivity and about half of patients who fracture don't have densitometric osteoporosis. However, the main objective of our study and similar studies is to identify patients with low BMD in order to avoid unnecessary exams, which is very important in developing countries, while developing a fracture risk assessment tool needs prospective longitudinal cohorts. Recent studies by experts of the WHO enhance the assessment of fracture risk in both men and women by the integration of clinical risk factors alone and/or in combination with BMD (38,39).

CONCLUSION
In summary, we demonstrated that OST performed slightly better than QUS and many other risk questionnaires in predicting low BMD at the total hip. Although the combination of OST and QUS was somewhat better than OST alone in identifying women with total hip osteoporosis, the difference was in fact small and probably not clinically pertinent. Moreover, the ability to identify an extra few subjects has to be weighed against the extra cost of QUS examination which may defeat the purpose of an inexpensive screening strategy, especially when the OST clinical risk assessment tool is free and simple. The purpose of these tools is not to diagnose osteoporosis, but to identify subjects who should be tested using DXA, thereby minimizing the total number of BMD measurements needed to do so. Indeed, all the authors have actively participated in the redaction, the revision of the manuscript and provided approval for this final revised version.

PATIENT CONSENT
Written informed consent was obtained from patients for publication of this study.