Weekly Paper Review: What is the impact of missing data on bias and precision when estimating change in patient-reported outcomes?

Arimoro Olayinka
13 min readJul 3, 2020
Source: https://www.123rf.com/photo_99216070_writing-note-showing-review-time-business-photo-showcasing-evaluating-survey-reviewing-analysis-chec.html

This week I read the paper titled: “Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry” by Ayilara, O et al. (2019).

The main objective of the study was to compare the impact of several missing data methods on the precision of the estimated change in patient-reported outcome (PRO) measures in longitudinal data from a clinical registry.

Clinical registries are databases that capture information about the health and healthcare use of patients having a specific health condition or healthcare treatment.

Note that, studies involving clinical registry data are often longitudinal in nature.

Longitudinal data, sometimes referred to as panel data, track the same sample at different points in time. Therefore, a longitudinal study is a research design that involves repeated observations of the same variables (e.g people) over short or long periods of time (Wikipedia).

For example, in this study, the authors sought to examine the change in patient-reported outcomes (PROs) before and after an intervention or healthcare treatment.

Preamble

In as much as patient-reported outcomes (PROs) provide insights about the patient’s health perspective. Missing data can affect PRO data scores, which are used for healthcare decision making.

Therefore, this study aimed to compare the precision and bias of different missing data methods such as complete case analysis (CCA), maximum likelihood (ML) which uses the idea of expectation maximization (EM), and multiple imputation (MI) with and without an auxiliary variable to estimate longitudinal change in PRO scores.

As you might suspect, missing data is a serious issue in surveys as a whole. Recently, I completed a project with some colleagues on optimizing healthcare performance in one of Lagos top hospitals. One major challenge we had to tackle was the issue of missing data. It took serious and careful considerations, such as looking at underlying assumptions of the missing data mechanisms, etc.

When I read this paper, the quality of analysis in this paper “blew” my mind. It was indeed a beautiful work. The work looked more like having two studies in one paper.

Are you ready? Let’s see some interesting things in this paper.

Longitudinal studies and Missing data

It is important to discuss issues of longitudinal studies and missing data.

Do you know that findings of longitudinal studies may be strongly influenced by missing data? I guess you know. This can happen when study participants die, or participants miss scheduled visits, or failure to respond to hospitals questionnaires or interviews (this is about the most common source of missing data).

In fact, according to Bell & Fairclough (2014), missing data can lead to under-or -over estimation of treatment effects, depending on characteristics. Recall that, I said that the choice of methods to handle missing data depends on the missingness mechanism. These mechanisms could be:

  • Missing Completely at Random (MCAR)
  • Missing at Random (MAR)
  • Missing Not at Random (MNAR)

MCAR: Data are MCAR if the reason for the missingness is unrelated to the outcomes. Another way to look at MCAR is; if probability of being missing is the same for all cases, then the data are said to be missing completely at random (MCAR). An example of MCAR is a weighing scale that ran out of batteries. Some of the data will be missing simply because of bad luck.

MAR: MAR arises if the reason for dropout depends on the observed outcomes and possibly on observed covariates at any or all occasions before the individual is lost to follow up. That is, if the probability of being missing is the same only within groups defined by the observed data, then the data are missing at random (MAR). MAR is a much broader class than MCAR. For example, when placed on a soft surface, a weighing scale may produce more missing values than when placed on a hard surface.

MNAR: The MNAR mechanism depends, in whole or in part, on unobserved measurements. If the probability of being missing varies for reasons that are unknown to us, then we say data is MNAR. For example, the weighing scale mechanism may wear out over time, producing more missing data as time progresses, but we may fail to note this.

Source: https://stefvanbuuren.name/fimd/sec-MCAR.html

You can check out the textbook titled: Multivariate Data Analysis (7th Ed.) by Joseph F. Hair Jr., William C. Black, Barry J. Babin, Rolph E. Anderson for better explanation of these missing data mechanisms.

Also, there are lot of commonly used missing data methods in longitudinal studies, such as list-wise deletion, complete-case analysis (CCA), average available observation carried forward, last observation carried forward, and conditional or unconditional mean imputation, ML and MI.

However, ML and MI are practical to implement on real-world data. They are recommended when the missing data mechanism is ignorable and most likely predicated on the assumption that the data is MAR.

Many times, missing data are expected and part of the research design. In these instances, the missing data are termed ignorable missing data, meaning that specific remedies for the missing data are not needed because the allowances for missing data are inherent in the technique used. The justification for designating missing data as ignorable is that the missing data mechanism is operating at random (i.e., the observed values are a random sample of the total set of values, observed and missing) or explicitly accomodated in the trchnique used. (Source: Multivariate Data Analysis book by Joseph F. Hair Jr, et al.)

Although some machine learning algorithms such as K-NN method, decision trees, random forest imputations, can be used to build predictive models that will replace missing values with observations it has estimated. However, these algorithms may distort data distribution if not carefully implemented.

One useful approach one can use in cases when assumptions about ignorability of missing data is valid is the “use of auxiliary or supplementary variables”, that are potential correlates of missingness and/or the outcome of interest.

The use of auxiliary variables related to the outcome of interest may reduce the bias due to missing data in model estimates, by adding information associated with missingness to the model. Auxiliary variables are typically found in external data sources. An example of a data source that may contain useful auxiliary variables is administrative health data, which captures information about healthcare use and health status of patients.

Methods

This study used both real-world cohort data and computer-simulated data. It is true that previous studies have compared MI and other missing data methods on real-world and simulated data. However, no previous study have compared the precision or the bias of MI with and without auxiliary variables in PROs from clinical registry.

Isn’t that a lovely way to fill the gap in knowledge? Let’s us see the approach used in the paper. Yeah?

Data Source

The study conducted analyses of clinical registry data and simulated data. The registry data were from a population-based regional joint replacement registry for Manitoba, Canada; the study cohort consisted of 5,631 patients having total knee arthroplasty (TKA) between 2009 and 2015.

PROs were measured using the 12-item Short Form Survey version 2 (SF-12v2) at pre- and post-operative occasions. The simulation cohort was a subset of 3000 patients from the study cohort with complete PRO information at both pre- and post-operative occasions.

As exclusion criteria, patients with inaccurate data on sex and BMI were excluded.

Study Measures

Patients completed self-report questionnaires in the pre-operative assessment clinic and completed mailed self-report questionnaires one year following surgery. The generic Short Form Survey version 2 (SF-12v2), a 12-item generic measure of physical and mental well-being was used. The survey produces the Physical Component Summary (PCS) and Mental Component Summary (MCS) scores, which can range in value from 0 (worst) to 100 (best).

Demographic information collected includes patient age, BMI and sex defined at the time of the pre-operative assessment. Information about comorbid health conditions, such as heart disease were also obtained via self-report at the pre-operative assessment.

Missing data methods

ML and MI methods were selected for use in the study because of their efficient computational requirements and as recommended in literature due to their adoption in practice. Also, both methods relies on the assumption that the missingness mechanism is MAR.

You can decide to read more about the approach these methods (ML and MI) take in handling missing data.

Statistical Analysis

Descriptive statistics including means, standard deviations (SD), frequencies, and percentages were used to describe the cohorts at the pre-operative measurement occasion. Patterns of missing data were described for the study cohort using percentages.

A linear mixed-effects model was used to estimate change in SF-12v2 PCS and MCS scores between pre and post-operative occasions; the choice of models and covariates was based on previous research with these data in the work of Zhang, et al. (2018).

Linear mixed-effects regression models are extensions of linear regression models for data that are collected and summarized in groups. These models describe the relationship between a response variable and independent variables, with coefficients that can vary with respect to one or more grouping variables. A mixed-effects model consists of two parts, fixed effects and random effects.

Specifically, the model included a random intercept and multiple fixed covariates, including time, age, sex (male [reference], female), and body mass index (BMI < 24.9, 25.0–29.9, 30.0+ [reference]), comorbid chronic conditions (including heart disease, depression, high blood pressure, diabetes and back pain (No [reference], Yes)). In addition, a two way interaction of sex and time was also included.

Mixed-effects regression models based on CCA, ML, and MI methods were applied to the study cohort data; separate analyses were conducted for the PCS and MCS.

CCA was conducted for the subset of patients who had no missing observations on any variables at either the pre- or post-operative occasions.

For the MI method, Markov Chain Monte Carlo (MCMC) sampling of the full predictive distribution was adopted. Ten imputations were conducted, as this number has been shown to be sufficient for achieving a reasonable efficiency for high proportions of missing observations according to Raghunathan (2015).

All analysis were carried out in R using the lme function and multiple imputation by chained equations.

Simulation Study

The simulation study used all variables used for the study cohort in addition to a single hypothetical auxiliary variable, Z, which was generated from a bivariate normal distribution. Random samples of size n = 1000 were selected from the simulation cohort; mixed-effects models, as specified previously, were applied to PCS scores.

Pre-specified amounts (10%, 25% and 50%) of data were removed from the outcome variable via MCAR, MAR, and MNAR mechanisms by modeling the probability of the missing indicator conditional on the outcome variable using a logistic regression model. The ML, MI and MI-Aux (i.e., multiple imputation with Z included the imputation model) methods were used to address missingness.

A total of 1000 replications were conducted for each of the 27 simulation conditions, which were obtained by crossing all possible combinations of types and amounts of missingness with the magnitude of correlation of the hypothetical auxiliary variable with the outcome measure.

The authors evaluated bias and error in the regression parameter estimates including the intercept (β0), which is the estimated average PRO score at the pre-operative occasion, change (βT) between the pre- and postoperative occasions, and time-sex interaction (βTS). Specifically, the authors computed standardized bias, root mean squared error (RMSE), 95% confidence interval (CI) coverage, and the average width of the 95% CI for each regression parameter mentioned above.

Standardized bias was the ratio of the bias, the difference between the estimates obtained from the model applied to the random sample with n = 1000 observations, and all data in the simulation cohort, and the SD of the estimates expressed as a percent; smaller values indicate less bias.

The RMSE was calculated from the sum of squared bias and variance; smaller values indicate less error. Coverage was calculated as the proportion of the replications for which the 95% CI contained the true value of the parameter of interest; good performance is evident when the actual coverage is approximately equal to the nominal coverage rate of 95%.

The average width of the 95% CI was the difference between the upper and lower limits of the interval averaged over the number of replications. Shorter intervals imply greater precision and higher power, provided the 95% CI coverage is high.

Results

Basically, the results from the study were in split into three areas:

  1. Description of cohorts and missing data
  2. Results for the study cohort
  3. Simulation study results

Let’s see each of this areas and some interesting insights from the paper.

Description of cohorts and missing data

Table 1 below describes characteristics of the study and simulation cohorts. It is clear that the approximate average age of patients was 67 years in both cohorts. More than half of the patients were obese. The most common chronic conditions were high blood pressure and back pain.

Source: Result section of the paper

Also, overall in the study cohort, 57.4% of the cohort had complete data at both pre- and post-operative occasions. Almost one-third of this cohort had missing data at the post-operative occasion only.

Results for the study cohort

The mixed-effects regression model results for the study cohort on both the SF-12v2 PCS and MCS measures for the intercept, time, and time-sex effects are presented in Table 3 below.

Source: Result section of the paper

Clearly, from table 3 above, parameter estimates, standard errors and 95%
CI width are provided for the CCA, ML and MI methods. Overall, the three methods did not differ on statistical significance of the parameter estimates for the intercept, time, and time-sex. On a whole, the CCA method yielded 95% CIs that were substantially wider than for the ML and MI methods. ML and MI produced similar estimates and 95% CI widths.

Simulation study results

The performance measures for the computer simulation, including the standardized bias, RMSE, and average width of the 95% CI for the CCA, ML, and MI methods are reported in Table 4 below.

Source: Result section of the paper

From table 4 above, the RMSE and the average width of the 95% CI increased
as the rate of missingness increased, reflecting the expected loss of information that occurs with increased rates of missing data. In addition, the standardized bias for the CCA, ML and MI methods when data were MNAR was twice the size of the bias observed when the data were missing because
of MCAR and MAR mechanisms.

As the rate of missingness increased, the standardized bias also increased. However, when the missing data were MCAR, the standardized bias for the MI and ML methods were largerthan for the CCA method, while the RMSE of the CCA method was substantially larger than for the MI and ML methods.

Simulation results for the MI and MI-Aux methods are reported in Table 5 below. Including the hypothetical auxiliary variable in the imputation model reduced the average width of the 95% CI as its correlation with the outcome variable increased.

In addition, the inclusion of the hypothetical auxiliary variable in the imputation model reduced the bias and RMSE, particularly in cases where the rate of missing data was high and correlation coefficient is 0.8.

Discussion

The authors investigated the effect of missing data methods on the precision of estimates of change in pre and post-operative PROs. It was discovered that standard errors were consistently larger for the CCA method when compared with ML and MI methods. The ML and MI methods produced consistent parameter estimates and standard errors.

As for the simulation study, the authors investigated the potential benefit of using a supplementary variable on the bias and precision of the MI method. Also, the impact of the auxiliary variable on bias and precision was substantial when the amount of missing data was large, and when the correlation between the hypothetical auxiliary variable and the outcome of interest was high.

It was noticed that when missingness on the outcome of interest was ignorable, inclusion of an auxiliary variable that was strongly associated with the outcome variable added extra information to the imputation model. This agrees with the recommendation of the International Society of Arthroplasty Registries on how to deal with missing data in arthroplasty registries.

The inclusion of the auxiliary variable helped obtain significant reduction in standard errors, and consequently increased the precision of the analysis. The results were consistent with those of previous researches by Collins, et al. (2001), Wang, et al. (2010).

Also, including an auxiliary variable in the imputation model helped moderate the amount of bias and size of the RMSE when missingness was non-ignorable.

Exhaustive analysis I must say!

Limitations & Conclusion

The authors identified some limitations encountered, which includes that in the simulation study, the authors considered only the hypothetical situation of using a single auxiliary variable in the imputation model due to the substantial computation time for the simulation study.

Another limitation noted by the authors was that they only considered the case where the relationship between the auxiliary variable and outcome of interest was linear. They noted that there could be cases where the relationship is non-linear.

In all, the results of the study showed that using auxiliary information in the imputation model can increase the precision and reduce the bias of parameter estimates, especially in cases where the percentage of missing data is high.

In the absence of an auxiliary variable, the simulation results revealed that the ML method is more precise in estimating longitudinal change in PRO measures than the MI method, especially when there is complete data on the covariates. However, MI offers an advantage of straightforward inclusion of one or more auxiliary variables in the imputation model over the ML method.

I believe this is a beautiful conclusion to hold onto from this review. It was long but full of information for our consumption.

Thank you for taking time to read through this. See you next time!

Link to Paper

https://hqlo.biomedcentral.com/articles/10.1186/s12955-019-1181-2

References

  • Wikipedia: https://en.wikipedia.org/wiki/Longitudinal_study
  • Textbook: Multivariate Data Analysis by Joseph F. Hair Jr, et al
  • Bell, M.B; Fairclough, D.L (2014). Practical and statistical issues in missing data for longitudinal patient-reported outcomes. Stat Methods Med Res.;23(5):440–9
  • Zhang L, Lix L, Ayilara O, Sawatzky R, Bohm E . The effect of multimorbidity on changes in health-related quality of life following hip and knee arthroplasty. Bone Jt J. 2018;100–B(9):1168–74
  • Raghunathan T. Missing data analysis in practice. Michigan: CRC Press; 2015
  • Collins LM, Schafer JL, Kam C-M. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods. 2001;6(4):330–51
  • Wang C, Hall CB. Correction of bias from non-random missing longitudinal data using auxiliary information. Stat Med. 2010;29(6):671–9

--

--