Missing data are a common feature in many areas of research
especially those involving survey data in biological, health and social
sciences research. Most of the analyses of the survey data are done taking a
complete-case approach, that is taking a list-wise deletion of all cases with
missing values assuming that missing values are missing completely at random
(MCAR). Methods that are based on substituting the missing values with single
values such as the last value carried forward, the mean and regression
predictions (single imputations) are also used. These methods often result in
potential bias in estimates, in loss of statistical information and in loss of
distributional relationships between variables. In addition, the strong MCAR
assumption is not tenable in most practical instances.
Since missing data are a major problem in HIV research, the
current research seeks to illustrate and highlight the strength of multiple
imputation procedure, as a method of handling missing data, which comes from
its ability to draw multiple values for the missing observations from plausible
predictive distributions for them. This is particularly important in HIV
research in sub-Saharan Africa where accurate collection of (complete) data is
still a challenge. Furthermore the multiple imputation accounts for the
uncertainty introduced by the very process of imputing values for the missing
observations. In particular national and subgroup estimates of HIV prevalence
in Zimbabwe were computed using multiply imputed data sets from the 2010–11
Zimbabwe Demographic and Health Surveys (2010–11 ZDHS) data. A survey logistic
regression model for HIV prevalence and demographic and socio-economic
variables was used as the substantive analysis model. The results for both the
complete-case analysis and the multiple imputation analysis are presented and
discussed.
Across different subgroups of the population, the crude
estimates of HIV prevalence are generally not identical but their variations are
consistent between the two approaches (complete-case analysis and multiple
imputation analysis). The estimates of standard errors under the multiple
imputation are predominantly smaller, hence leading to narrower confidence
intervals, than under the complete case analysis. Under the logistic regression
adjusted odds ratios vary greatly between the two approaches. The model based
confidence intervals for the adjusted odds ratios are wider under the multiple
imputation which is indicative of the inclusion of a combined measure of the
within and between imputation variability.
There is considerable variation between estimates obtained
between the two approaches. The use of multiple imputations allows the
uncertainty brought about by the imputation process to be measured. This
consequently yields more reliable estimates of the parameters of interest and
reduce the chances of declaring significant effects unnecessarily (type I
error). In addition, the utilization of the powerful and flexible statistical
computing packages in R enhances
the computations.
Full article
at: http://goo.gl/U1QbRp
By: Amos Chinomona12* and Henry Mwambi2
1Department of Statistics, Rhodes
University, Grahamstown, South Africa
2School of Mathematics, Statistics and
Computer Science, University of Kwa-Zulu Natal, Pietermaritzburg, South Africa
More at: https://twitter.com/hiv_insight
No comments:
Post a Comment