The data acquisition

As part of this project, we thought it would be useful to glean (scrape) information from the web to see what people had to say about the flu vaccine, or vaccines in general. To do so, our team used the rtweet package to get data from twitter. A few notes on this:

  • In order to gain access to twitter’s API, you have to either request a developer account or install an app from rtweet (MUCH easier)
  • As I went about exploring this package, attempting to remedy and run code multiple times, I remembered that I was accessing resources on someone else’s server…
  • So I learned the hard way that I was limited to one “access” at a time (and by time, I mean by hour)

As a work around, I “called” the data on two separate days and saved the resulting data frames as csvs.

The tweets_df.csv was created on Nov. 29, 2020 at 11:40 am

tweets_df =
  read_csv("./final_project_large_data/tweets_df.csv")

the tweets_df2.csv was created on Dec. 1, 2020 at 10:40 pm

tweets_df2 =
  read_csv("./final_project_large_data/tweets_df2.csv")

The process that created them is below

tweets1 = search_tweets(q = "#vaccine",
                        n = 18000,
                        include_rts = FALSE,
                        `-filter` = "replies",
                        lang = "en")

ts_plot(tweets1, "weeks") +
  labs(x = NULL, y = NULL,
       title = "Frequency of tweets mentioning #vaccine",
       subtitle = paste0(format(min(tweets1$created_at), "%d %B %Y"), " to ", format(max(tweets1$created_at),"%d %B %Y")),
       caption = "Data collected from Twitter's REST API via rtweet") +
  theme_minimal()


tweets_df2 =
  search_tweets(q = "flu OR vaccine OR flushot",
                        n = 5000,
                        include_rts = FALSE,
                        `-filter` = "replies",
                        lang = "en") %>%
  write_as_csv("./final_project_large_data/tweets_df2.csv")

Let’s explore!

The data

With the code below, we can sample 10 tweets from the first dataset generated. Because this is a sample function, specified to pull 10 observations, each time the code is run a new table of tweets will pop up. For the tweets_df data, I am commenting out the code.

Why?

Because if you look back to my search terms, I included flu OR vaccine OR flushot. When I sampled a few times in testing this code, I noticed a few tweets that were irrelevant, referring to the “hit me with your best shot” kind of shot that Pat Benatar sang about.

tweets_df %>%
  sample_n(10) %>%
  select(created_at, text) %>%
  knitr::kable()

SO let’s sample from the dataframe that was created with the search term “vaccine” to get a feel for what people are saying:

tweets_df2 %>%
  sample_n(10) %>%
  select(text)%>%
  knitr::kable(format = "markdown")
text
@randycorporon @dbongino If they can come up with a covid vaccine in 9 months, why is there still no cure for cancers?
as though we are not likely to benefit from the addition reduction in influenza.

So, I expect this will be a much more severe flu season for the US than it was for the Southern Hemisphere. The one saving grace may be that, while I don’t have good data, it appears through | |@arneduncan @RexChapman Temperature required for the vaccine would make this very difficult logistically. | |Tomorrow will be… interesting. The 10 y/o (sore throat) asked me to culture her in the AM.

The 5 y/o (fever 102.5) will be home with me and we’ll see the ped to swab for flu/COVID.

Ill be doing televisits with 30 chemo patients.

The 5 y/o can… help? | |@dwilliam9940 @nypost I refuse the vaccine | |Perfect marriage, carried the mic (MJ) thru the door, like it caught the flu after game 4, this a hell of a series, of events. Stories of my past tense jump like Vince, I know to some it don’t make sense <U+23EA><U+23EA><U+23EA><U+23EA> https://t.co/JZ4eYPGKoQ | |@bennedose here’s my set of slides on the topic, doc.

i’m no anti-vaxxer. i’m only objecting to these specific vaccines because they are put out there TOO DAMN SOON without testing (< 7 months). the fastest vaccine in history took 4 YEARS to be approved (MMR).

https://t.co/K34Spl3Yrs | |@BarackObama @MichelleObama @ObamaFoundation @realDonaldTrump #hiv @WHO @WhoopiGoldberg #WorldAIDSDay2020 Hi with the new #disruption tech that will be soon the end of #aids. The #COVID19 #vaccine was build in 2 days with pc. Its the #terminator party https://t.co/FSEkpVU0va | |Coronavirus diary - Part 35 An enduring issue about the vaccine is a trust which is accentuated by Robert F. Kennedy Jr. and Dr. Carey Maddey. The former raises a red-flag in respect of the nucleic acid, namely, DNA and RN https://t.co/3oo97NDzVy | |@DrRosena It’s appalling that vulnerable people are no longer advised to shield; with a vaccine on the horizon they’re being called back into work and are having to make a choice between their lives and livelihoods. If they were in care homes all social contact would be tightly controlled! |

Why do we care?

The other analyses generated in these report deal with numbers. While numbers and statistics are highly important, we believe that research can be further supplemented by looking at people’s behaviors and attitudes. Using tools like rtweet and other data scraping methods allow us to gain insights from unconventional data sources in order to gain a deeper picture of the current landscape regarding the flu vaccine. It is important to note, however, that using twitter data does introduce selection bias. This means that we are only getting information from people who have access to twitter, and does not capture the opinions and attitudes of those who do not choose or cannot use twitter.