Extract Tweets From Within R

The 'twitteR' package can be used to extract tweets from a specific user or retrieve recent tweets containing a given search term. It is convenient to collect all related data such as the number of favorites, retweets, geolocation if the user has chosen to share it, along with the source device of the tweet - whether it came from a iPhone, android device, etc. This page shows how to collect the tweets and optionally clean them up of special characters, hashtags, etc.

To do this you need to install the latest version for ‘twitteR’ package from git hub and create an app in twitter from your twitter account, both of which are straight forward to do. Here’s how:

Install ‘twitteR’ Package from GitHub

Below we use the updated version of ‘twitteR’ package from github, because, this is has more facilities than the version available in cran repository.
library (devtools)
install_github (“geoffjentry/twitteR”) # install 'twitteR'

How To Create An App In Twitter

Step 1: Sign in to your twitter account and go to twitter apps site.

Twitter Apps Page
Twitter Apps Page

 

Step 2: Click the Create App button and fill up the form.

Create twitter app
Create twitter app

Step 3: Get the credentials

The user credentials need to be authenticated every time you use the twitter app to retrieve tweets. The credentials needed are: the consumer key, consumer secret, access token and access token secret. These can be accessed from the ‘Keys and access tokens’ tab in your app page. Save this for later use whenever you need to retrieve tweets with ‘twitteR’

twitter credentials
twitter credentials

How To Retrieve Tweets ?

To access tweets from R, you will need  the consumer key and consumer secret of your app as generated from above steps.

setup_twitter_oauth (consumerKey, consumerSecret, accessToken, accessTokenSecret)  # authenticate

Extract Tweets

To get tweets from a particular user

userTimeline('r_programming',n=10) # tweets from a user
homeTimeline (n=15) # get tweets from home timeline
mentions (n=15) # get your tweets that were retweeted
favs <- favorites("r_programming", n =10) # tweets a user has favorited

To extract all tweets with a particular hashtag or user mention

tweets <- searchTwitter("rstats", n=25) # top 25 tweets that contain search term
tweetsDF <- twListToDF(tweets) # more info about tweets.

The twListToDF() function converts the collected tweets list into a data frame (download csv). More information such as the number of re-tweets, source etc is available as columns in the DF after this conversion.

tweets sample

Lets further clean up these tweets. The below code will process your tweets for punctuations, URLs, hashtags, mentions etc. Feel free to comment out the appropriate lines if you wish to keep those features, like if you wish to retain the URLs in your tweets, you need to comment out the URL’s and the puncuations part.
TextPreprocessing <- lapply(p, function(x) {
x = gsub('http\\S+\\s*', '', x) ## Remove URLs
x = gsub('\\b+RT', '', x) ## Remove RT
x = gsub('#\\S+', '', x) ## Remove Hashtags
x = gsub('@\\S+', '', x) ## Remove Mentions
x = gsub('[[:cntrl:]]', '', x) ## Remove Controls and special characters
x = gsub("\\d", '', x) ## Remove Controls and special characters
x = gsub('[[:punct:]]', '', x) ## Remove Punctuations
x = gsub("^[[:space:]]*","",x) ## Remove leading whitespaces
x = gsub("[[:space:]]*$","",x) ## Remove trailing whitespaces
x = gsub(' +',' ',x) ## Remove extra whitespaces
})
}
Sentences

# Sample output of processed tweets
[1] "Nice post on maps with different class intervals in ggplot Similar options in rworldmap"               
[2] "Rabbit introduction to R free web book"                                                                
[3] "lets you learn how stuff was made via"                                                                 
[4] "You can now see the coding process for the following graphic at enjoy"
Summary
Article Name
Extract tweets and clean them up with R
Description
How to extract tweets using R via the 'twitteR' package using the Twitter API. Then it shows how to clean up the tweets of special characters, hashtags, mentions etc so as to make them more readable and suited for text mining.
Author

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook
  • Srihari Mohan

    Hi , I tried using this tweet extraction. But R is restricting itself to extract only 140 characters. I have even tried using rtweet and for which I am getting the following error ” Error in readRDS(pat): error in connection” .
    Could you please help me how to retrieve complete tweets from twitter ?