Dr. Joel Gitlit gave flu vaccinations to some 200 seniors at a Board of Health clinic in New York City during 1968 Pandemic

Introduction

The abstract and introduction for the project can be obtained by visiting here.

Methods

Sources of data

National Health Interview Survey (NHIS)

The National Health Interview Survey (NHIS) is one of the major data collection programs of the National Center for Health Statistics (NCHS) which is part of the Centers for Disease Control and Prevention (CDC). While the NHIS has been conducted continuously since 1957, the content of the survey has been updated about every 15-20 years to incorporate advances in survey methodology and coverage of health topics. In January 2019, NHIS launched a redesigned content and structure that differs from its previous questionnaire design. Persons excluded from the sample are those with no fixed household address, military personnel, persons in long-term care institutions, persons in correctional facilities, and U.S. nationals living in foreign countries. Data collection on the NHIS is continuous, i.e., from January to December each year.

Access data set here

Behavioral Risk Factor Surveillance System (BRFSS)

The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The basic philosophy was to collect data on actual behaviors, rather than on attitudes or knowledge, that would be especially useful for planning, initiating, supporting, and evaluating health promotion and disease prevention programs. In addition to age, gender, and race/ethnicity, raking permits more demographic variables to be included in weighting such as education attainment, marital status, tenure (property ownership), and telephone ownership.

Access data set here

NYC Open Data

Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. In New York City, access to open data is the law. In 2012, the “Open Data Law,” which amended the New York City administrative code to mandate that all public data be made available on a single web portal by the end of 2018. Since then, there have been several amendments approved to the Open Data Law. These laws, which include stronger requirements on data dictionaries and data retention, response timelines for public requests, and an extension of the Open Data mandate into perpetuity, help make it easier for New Yorkers to access City data online and anchor the city’s transparency initiatives around Open Data.

New York City Locations Providing Seasonal Flu Vaccinations

Data provide the location and facility information for places in New York City providing seasonal flu vaccinations. Data were provided by Department of Health and Mental Hygiene (DOHMH)

Access data set here

New York City influenza-like and pneumonia hospital admissions

Data provide the total emergency department visits, and visits and admissions for influenza-like and/or pneumonia illness by modified ZIP code tabulation area of patient residence. Data were provided by Department of Health and Mental Hygiene (DOHMH).

Access data set here

Twitter

Using rtweet, data were scraped from the website.

Overview of research design

Our project is based on our research questions:

  • “What are the attitudes of adults regarding the seasonal influenza vaccine?”

  • “What trends emerge regarding adults’ seasonal influenza vaccine?”

  • “What are the effects from adult’s engagement or disengagement with the seasonal influenza vaccine?”

Attitudes structure can be described in terms of three components.

  • Affective component: this involves a person’s feelings / emotions about the attitude object. “I hate vaccines”.

  • Behavioral (or conative) component: the way the attitude we have influences on how we act or behave. For example: “I will obtain a influenza vaccine once it becomes available”.

  • Cognitive component: this involves a person’s belief / knowledge about an attitude object. For example: “I believe vaccines are safe”.

Through engagement with the allotted datasets we will begin to understand:

  1. What feelings do adults have regarding flu vaccines?

Why? This will help us understand how adults might feel in regards to available or theoretical vaccines.

  1. How do adults behave towards flu vaccines?

Why? This will help us understand what actions should be encouraged or discontinued in regards to available or theoretical vaccines.

  1. What do adults know about vaccines?

Why? This will help us understand gaps in education, misinformation that might be occurring, and where information should be directed towards.

Hypothesis - What do we think we will find?

  • The proportion of individuals who receive a flu vaccine will increase over the years
  • There might be a decrease in flu vaccinations due to the Anti-Vaxx movement
  • Communities with lower allocations of resources will have less access to obtain vaccinations
  • Insurance coverage will have a significant association with vaccination status
  • There will be more individuals with polarizing views on vaccines than those with indifferent stances

Research aim

Highlights trends in flu vaccine data to ultimately support efforts in research and program implementation, to ensure that all of those eligible to obtain a flu vaccine to get one. This is in a desire to encourage immunization rates for other communicable disease, especially for the ongoing COVID-19 pandemic.

Research setting

Our study utilizes data composed of a random sample of inhabitants from the United States, as BRFSS and NHIS data consists of respondents from all US geographical and social communities. However, for part of this study, the geographical community is specifically geared towards those in the NYC metro area. We find this selection to be fitting considering our research question is aimed at understanding the attitudes, associations, and influences surrounding flu vaccination engagement. This geographic awareness brings us to the current situation unfolding in the United States, that the country is leading the world in having one of the poorest responses to the current COVID-19 pandemic. Consequently, we found it appropriate to explore any trends that might arise from this area. We are all students of Columbia University’s Mailman School of Public Health. Attending a graduate institution located in NYC and being residents in the city allows for us to be particularly engaged with the study for its’ apparent relevance. Dense, metropolitan areas tend to be the most affected during health crisis, lending support to our exploration of data regarding influenza like symptoms.

Data collection methods

Tidying data

The process of tidying data began with the import of the collected data sets. Considering the size of the data gathered, we then selected variables that were relevant and appropriate to our research question. Using respective codebooks, we then mutated and recoded variable names so that the values were readable and gave the actual description. For example, values were coded as “female” or “male”, rather than “0” or “1”. In order to engage with the discrete nature of the data, we then changed the variables classes into a format that was appropriate for analysis. Additionally, any appropriate releveling occurred in order to present the logical progression of some information. Tidy data was then saved and exported to our project repository. The intention of these steps was to ensure that the code was readable and reproducible.

Scraping Twitter

Twitter data was easily scraped from the web using the rtweet package and the command search_tweets, where search terms can be specified. rtweet connects to twitter’s API through a web-based app, and API access tokens are limited to once per hour. As a result, two separate .csv files were generated using the write_as_csv function within the package to limit the number of “calls” being made to request twitter data. Further information and code examples can be found here

Analytical methods

Analytical methods for the project can be obtained by visiting the page below.

Statistical information from National Health Interview Survey

Because we are interested in the binary outcome of whether an individual received an influenza vaccine or not, we will be primarily analyzing the data using logistic regression. We hypothesize that an individual’s health insurance status is an important predictor of their decision to obtain a flu shot, so we will consider this our main effect variable. We will also include other important covariates, and attempt to build a parsimonious model that explains this relationship in a clear way. We also will look at the distribution of demographic and health variables in our data set.

Ethical concerns and protection of human subjects

IRB approval was not necessary to access the data. Data from the NHIS, BRFSS, and NYC Open Data were de-identified. Additionally, data scraped from Twitter have been de-identified to protect all human subjects.

Discussion

Problems we ran into

R programming

We are all aware of the fickleness of R. Although not rampant throughout the project, we did encounter some hiccups with codes, inability to knit, and general lack of knowledge regarding some R language and grammar. Considering the size of some of our data, we had difficulties committing and pushing data files within Github’s size allotment. After troubleshooting and learning about the benefits of the gitignore file, future instances of the problem were averted.

Data

Although quite satisfied with the quality of the data, we would have greatly appreciated more variables that consisted of individual level data (continuous). Most of the variables were discrete which limited some of the visualizations that we were interested in exploring. Additionally, this project showcased a dearth of open data available concerning vaccines within the Americas. Initially we wanted to hinge our research question on adult immunization schedules, concerning Tetanus, HPV, etc. However, this information was limited. As a result, we shifted direction during the project to focus on flu vaccines considering the universal nature of its’ effect.

Conclusions

In conclusion, we used 4 sources of data for our exploratory and statistical analysis. We discovered that…

  • National Health Interview Survey
    • Biological sex appears to have a significant association with flu shot engagement, among other variables
  • Behavioral Risk Factor Surveillance System
    • The Western Region of the US has lower frequencies of flu vaccinations
  • NYC Open Data
    • Flu shot locations are distributed throughout NYC, but that doesn’t mean all New Yorkers get vaccinated
  • Twitter
    • Vaccines are a hot topic right now on twitter

Our project resulted in

  • A unique Twitter data frame that will later be used for a semantic analysis using an appropriate lexicon package
  • An interactive display of facilities that offer flu shots within the NYC metro area
  • An awesome website, with an even better Spotify playlist
  • Long-lasting memories and friendships, and a mutual love of data <3

Next steps

To advance the project, it would be necessary to obtain more data, from more diverse sources, in order to deepen our exploratory and statistical analysis. Especially, we would like this data to concern communities that traditionally disengage from vaccinations or those that have low resources. Additionally, to touch more on the cognitive component of attitudes, it would be fruitful to collect data from surveys that assess respondents’ knowledge regarding vaccines and immunization schedules. To engage with a deeper dive of the affective component of attitudes, it would also be fruitful to gather information from more nuanced qualitative sources, aside from Twitter, such as Reddit. Reddit can contain separate “communities” internationally, in which polarized thoughts and opinions can be analyzed.

Things to think about

We recognize that the data did not gather information from those with no fixed household address, military personnel, persons in long-term care institutions, persons in correctional facilities, and U.S. nationals living in foreign countries. This is a problem. Ethical research efforts and program implementation should be directed towards these vulnerable populations, especially persons experiencing homelessness, living in long term institutions, and in “correctional” facilities. We also must recognize that those apart of black and brown communities are significantly less likely to engage with the medical community, health professionals, and resources such as vaccines, due to the trauma that has occurred in the past. In order to properly engage with data science concerning vaccine engagement, this barrier must be broached and trust must be amended to gather appropriate data.