Weekly Paper Review: Twitter data as a means of measuring patient perceived-quality of healthcare in US hospitals

Arimoro Olayinka
8 min readJul 24, 2020

This week I read the paper titled: “Measuring patient-perceived quality of care in US hospitals using Twitter” by Hawkins JB, Brownstein JS, Tuli G, et al. (2015).

Preamble

Due to some drawbacks like time lag before official data is made available, low response rate, etc, in common methods like Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) designed to access patients perception of quality of their own healthcare. In recent years, social media data for health research have been significantly noticed.

The main objective of the paper was to use Twitter as a novel real-time supplementary data stream to identify and measure patient-perceived quality of care in US hospitals. That is, the authors wanted to see if Twitter data are associated with quality of care, as compared with other established metrics.

Methods

Hospital Twitter data

The authors compiled a list of 4,679 hospitals with Twitter accounts. In the space of 1 year (October 1st, 2013 - September 30th, 2013), a total of 404,065 tweets mentioned (directed at) these hospitals.

As a criteria, the authors only analyzed tweets that were completely public (with no privacy settings) and original tweets (not retweets) in order to capture the hospital feedback experience of patients. In addition, as a measure, there were no personal identifiers used in the analysis.

Machine Learning Classifier

The authors manually curated a random subset of hospital tweets to identify those that pertained to patients healthcare experience. Some examples of patients experience included: interactions with staff, treatment effectiveness, hospital environment (food, cleanliness, parking, etc), mistakes or errors in treatment or medication administration, and timing or access to treatment.

In addition, the tweets curation was achieved using two methods:

  1. A custom web-app that allowed multiple curators label tweets if related to patients experiences was used.
  2. Amazon Mechanical Turk (AMT) for crows-sourcing labeling was used.

It is important to note that only tweets that agreed with the labelling of curators were used. Also, in order to test if the curators for both methods of tweets classification were reliable, the authors calculated inter-rater agreement and Cohen’s kappa values between raters.

After multiple rounds of curation, curator pairs rated 24,408 tweets using the web-app (overall agreement of 90.64%) and 15,000 tweets using AMT (overall agreement of 80.64%). These two sets were combined to create a training set of 2,216 tweets relating to patient experiences and 22,757 tweets covering other aspects of the hospital.

The training set obtained was used to build a classifier that could automatically label the full database of tweets. The machine learning approach looks at features of the tweets like the number of friends/followers/tweets from the user, user location etc, and used this information to develop a classifier.

For the text of a tweet, the authors used a bag-of-words approach and included unigrams, bigrams and trigrams in the analysis. Also, the authors compared multiple different classifiers (naive Bayes and support vector machine) using 10-fold cross validation to evaluate the classifiers. They chose the best classifier based on a variety of metrics such as F1 score, precision, recall and accuracy.

Beautiful approach I must say!

Sentiment Calculation

Next, the authors used Natural Language Processing (NLP) to measure the sentiment of all patients experience. TextBlob which is an open source Python library was used to generate the sentiment scores. The scores range from -1 to 1; a score of 0.0 was discarded because this means there was not enough context to the tweet as it relates to patient experience topics.

Finally, in order to get an accurate assessment of sentiments, the authors only calculated a mean sentiment score for hospitals with 50 or more patients experience tweets.

Topic Classification & Hospitals Surveys

Recall that I mentioned that random subsets of tweets were curated using two methods, and that the authors analyzed only tweets which agreed with the labels as classified by the curators.

Therefore, owing to the vast number of topics, the authors calculated averages agreement and Cohen’s kappa values for each topic.

Cohen’s kappa statistic, κ , is a measure of agreement between categorical variables X and Y. For example, kappa can be used to compare the ability of different raters to classify subjects into one of several groups.

Source: https://online.stat.psu.edu/stat509/node/162/

The authors found that the topics Food, Money, Pain, General, Room condition, and Time had an average agreement of 91.7% and a moderate κ of 0.52 (p<0.001), while the topics Communication, Discharge, Medication instructions, and Side effects had an average agreement of 97.4% and a low κ of 0.18 (p<0.001).

Also, to ensure that there were enough tweets for accurate sentiment analysis, the authors emailed 297 hospitals with 50 or more patients experience tweets (111 unique Twitter accounts — some hospitals shared accounts with larger healthcare provider networks) and asked them to give feedback regarding their use of Twitter for patients relations. Of these 297 hospitals, only about 49.5% (roughly half) responded.

Comparison with validated measures of quality of care

I am sure you still recall the aim for the study. The aim was to use Twitter data to identify and measure patient-perceived quality of care. To achieve this, one approach they took was to validate the result from using Twitter as a measure for quality of care with established measures for quality in healthcare.

Therefore, they chose two validated measures of quality of care. The first was HCAHPS, the formal US nationwide patient experience survey.

HCAHPS provides a standardised survey instrument and data collection methodology for measuring patients’ perspectives on hospital care, which enables valid comparisons to be made across all hospitals.

The authors analysed data from the HCAHPS period October 1st, 2012 to September 30th, 2013, and focused on the percentage of patients who rated a hospital a 9 or 10 (out of 10) since this have shown to correlate with direct measures of quality.

The second validated measure of quality of care was the Hospital Compare 30-day hospital readmission rate calculated from the period July 1st, 2012 to June 30th, 2013.

This is a standardised metric covering 30-day overall rate of unplanned readmission after discharge from the hospital and includes patients admitted for internal medicine, surgery/gynaecology, cardiorespiratory, cardiovascular and neurology services. (Medicare.gov, 2015)

According to Centres for Medicare and Medicaid Services, the score represents the ratio of predicted readmissions (within 30 days) to the number of expected readmissions, multiplied by the national observed rate.

Statistical Analysis

The authors used Pearson’s correlation to assess the linear relationship
between numeric variables, Fisher’s exact test to compare proportions between categorical variables, and a two-tailed independent t test to compare the means of quantiles.

Multivariable linear regression was used to adjust for potential confounders
such as: region, size, bed count, profit status, rural/urban status, teaching status, nurse-to-patient ratio, percentage of patients on Medicare and percent
age of patients on Medicaid.

Now, let’s consider some interesting results from the study.

Results

During the 1-year study period, the authors found 404,065 total tweets directed towards these hospitals (data from 1,418 Twitter handles, representing 2,137 hospitals). Out of these 404,065 tweets, 369,197 (91.4%) were original tweets (data from 1,417 Twitter handles, representing 2,136 hospitals).

The classifier tagged 34,725 (9.4%) original tweets relating to patient experiences and 334,472 (90.6%) relating to other aspects of the hospital. Patient experience tweets were found for 1,065 Twitter handles, representing 1,726 hospitals (36.9%).

Table 1 below shows the common characteristics for all of the hospitals with Twitter accounts.

Source: Results section of paper

Overall, the mean number of patient experience tweets received for all hospitals during the 1-year study period was 43. The median sentiment values for the highest and lowest quartiles were 0.362 and 0.211, respectively.

They also found no correlation between sentiment and Twitter characteristics, except a weak negative correlation (r= -0.18, p=0.002) with total days the account was active.

In Table 2 below, the authors identified the topics of patient experience that were discussed in a random subset of tweets.

Source: Results section of paper

Finally, let’s see some results as it relates to linking Twitter data to quality of care.

After adjusting for hospital characteristics using the multivariate linear regression, the authors found that Twitter sentiment was not associated with HCAHPS ratings (but having a

Twitter account was), although there was a weak association with 30-day hospital readmission rates (p=0.003).

Discussion

As seen earlier, the authors found that approximately half of the hospitals in the USA have a presence on Twitter and that sentiment towards hospitals was positive on average. Also, out of the 297 surveyed, half responded and all confirmed that they closely monitor social media and interact with users.

Therefore, the authors concluded that the stakeholders of these hospitals see the value of capturing information on the quality of care in general, and patient experience in particular.

In addition, the authors found only a weak association with one measure of hospital quality (30-day readmission), but not with an established standard of patient experience (HCAHPS). These results taken together suggests that Twitter is a unique platform to engage with patients and to collect potentially untapped feedback and possibly a useful measure for supplementing traditional approaches of assessing and improving quality of care.

Link to paper

You can download the paper here.

Limitations & Conclusion

The authors noted some limitations to the study. I have listed out some here:

  • Selection bias due to age group — the largest group of Twitter users are below 30 years
  • Possible response and selection bias in the hospital questionnaire as seen in most surveys.
  • Although the authors showed an association between organisations that use Twitter and their interactions with patients. However, they cannot confirm any causal relationship.

In all, I believe it was a great read. The authors showed that that monitoring Twitter can provide useful, unsolicited, and real-time data that might not be captured by traditional feedback mechanisms. This is clear because, Twitter sentiment had a weak correlation with 30-days readmission and no correlation with HCAHPS ratings.

Writing a paper about 30-days readmission rates looks interesting. I think I will consider that.

Thank you for reading this review. Please give this a clap if you loved it!

References

--

--