13 min read

 In this article by the authors, Vikram Garg and Sharan Kumar Ravindran, of the book, Mastering Social Media Mining with R, we learn about data mining using Facebook as our resource.

(For more resources related to this topic, see here.)

We will see how to use the R package Rfacebook, which provides access to the Facebook Graph API from R. It includes a series of functions that allow us to extract various data about our network such as friends, likes, comments, followers, newsfeeds, and much more. We will discuss how to visualize our Facebook network and we will see some methodologies to make use of the available data to implement business cases.

Rfacebook package installation and authentication

The Rfacebook package is authored and maintained by Pablo Barbera and Michael Piccirilli. It provides an interface to the Facebook API. It needs Version 2.12.0 or later of R and it is dependent on a few other packages, such as httr, rjson, and httpuv. Before starting, make sure those packages are installed. It is preferred to have Version 0.6 of the httr package installed.

Installation

We will now install the Rfacebook packages. We can download and install the latest package from GitHub using the following code and load the package using the library function. On the other hand, we will also install the Rfacebook package from the CRAN network. One prerequisite for installing the package using the function install_github is to have the package devtools loaded into the R environment. The code is as follows:

library(devtools)
install_github("Rfacebook", "pablobarbera", subdir="Rfacebook")
library(Rfacebook)

After installing the Rfacebook package for connecting to the API, make an authentication request. This can be done via two different methods. The first method is by using the access token generated for the app, which is short-lived (valid for two hours); on the other hand, we can create a long-lasting token using the OAuth function.

Let’s first create a temporary token. Go to https://developers.facebook.com/tools/explorer, click on Get Token, and select the required user data permissions.

The Facebook Graph API explorer will open with an access token. This access token will be valid for two hours. The status of the access token as well as the scope can be checked by clicking on the Debug button. Once the tokens expire, we can regenerate a new token.

Now, we can access the data from R using the following code. The access token generated using the link should be copied and passed to the token variable. The use of username in the function getUsers is deprecated in the latest Graph API; hence, we are passing the ID of a user. You can get your ID from the same link that was used for token generation. This function can be used to pull the details of any user, provided the generated token has the access. Usually, access is limited to a few users with a public setting or those who use your app. It is also based on the items selected in the user data permission check page during token generation. In the following code, paste your token inside the double quotes, so that it can be reused across the functions without explicitly mentioning the actual token.

token<- "XXXXXXXXX"

A closer look at how the package works

The getUsers function using the token will hit the Facebook Graph API. Facebook will be able to uniquely identify the users as well as the permissions to access information. If all the check conditions are satisfied, we will be able to get the required data.

Copy the token from the mentioned URL and paste it within the double quotes. Remember that the token generated will be active only for two hours. Use the getUsers function to get the details of the user. Earlier, the getUsers function used to work based on the Facebook friend’s name as well as ID; in API Version 2.0, we cannot access the data using the name. Consider the following code for example:

token<- "XXXXXXXXX"
me<- getUsers("778278022196130", token, private_info = TRUE)

Then, the details of the user, such as name and hometown, can be retrieved using the following code:

me$name

The output is also mentioned for your reference:

[1] "Sharan Kumar R"

For the following code:

me$hometown

The output is as follows:

[1] "Chennai, Tamil Nadu"

Now, let’s see how to create a long-lasting token. Open your Facebook app page by going to https://developers.facebook.com/apps/ and choosing your app.

On theDashboard tab, you will be able to see the App ID and Secret Code values. Use those in the following code.

require("Rfacebook")
fb_oauth<-
fbOAuth(app_id="11",app_secret="XX",extended_permissions = TRUE)

On executing the preceding statements, you will find the following message in your console:

Copy and paste into Site URL on Facebook App Settings:
http://localhost:1410/
When done, press any key to continue...

Copy the URL displayed and open your Facebook app; on the Settings tab, click on the Add Platform button and paste the copied URL in the Site URL text box. Make sure to save the changes.

Then, return to the R console and press any key to continue, you will be prompted to enter your Facebook username and password. On completing that, you will return to the R console. If you find the following message, it means your long-lived token is ready to use. When you get the completion status, you might not be able to access any of the information. It is advisable to use the OAuth function a few minutes after creation of the Facebook application.

Authentication complete.
Authentication successful.

After successfully authenticating, we can save it and load on demand using the following code:

save(fb_oauth, file="fb_oauth")
load("fb_oauth")

When it is required to automate a few things or to use Rfacebook extensively, it will be very difficult as the tokens should be generated quite often. Hence, it is advisable to create a long-lasting token to authenticate the user, and then save it. Whenever required, we can just load it from a local file.

Note that Facebook authentication might take several minutes. Hence, if your authentication fails on the retry, please wait for some time before pressing any key and check whether you have installed the httr package Version 0.6. If you continue to experience any issues in generating the token, then it’s not a problem. We are good to go with the temporary token.

Exercise

Create an app in Facebook and authenticate by any one of the methods discussed.

A basic analysis of your network

In this section, we will discuss how to extract Facebook network of friends and some more information about the people in our network.

After completing the app creation and authentication steps, let’s move forward and learn to pull some basic network data from Facebook. First, let’s find out which friends we have access to, using the following command in R. Let’s use the temporary token for accessing the data:

token<- "XXXXXXXXX"
friends<- getFriends(token, simplify = TRUE)
head(friends) # To see few of your friends

The preceding function will return all our Facebook friends whose data is accessible. Version 1 of the API would allow us to download all the friends’ data by default. But in the new version, we have limited access. Since we have set simplify as TRUE, we will pull only the username and their Facebook ID. By setting the same parameter to FALSE, we will be able to access additional data such as gender, location, hometown, profile picture, relationship status, and full name.

We can use the function getUsers to get additional information about a particular user. The following information is available by default: gender, location, and language. We can, however, get some additional information such as relationship status, birthday, and the current location by setting the parameter private_info to TRUE:

friends_data<- getUsers(friends$id, token, private_info = TRUE)
table(friends_data$gender)

The output is as follows:

female   male
     5     21

We can also find out the language, location, and relationship status. The commands to generate the details as well as the respective outputs are given here for your reference:

#Language
table(substr(friends_data$locale, 1, 2))

The output is as follows:

en
26

The code to find the location is as follows:

# Location (Country)
table(substr(friends_data$locale, 4, 5))

The output is as follows:

GB US
1 25

Here’s the code to find the relationship status:

# Relationship Status
table(friends_data$relationship_status)

Here’s the output:

Engaged Married Single
     1       1       3

Now, let’s see what things were liked by us in Facebook. We can use the function getLikes to get the like data. In order to know about your likes data, specify user as me. The same function can be used to extract information about our friends, in which case we should pass the user’s Facebook ID. This function will provide us with a list of Facebook pages liked by the user, their ID, name, and the website associated with the page. We can even restrict the number of results retrieved by setting a value to the parameter n. The same function will be used to get the likes of people in our network; instead of the keyword me, we should give the Facebook ID of those users. Remember we can only access data of people with accessibility from our app. The code is as follows:

likes<- getLikes(user="me", token=token)
head(likes)

After exploring the use of functions to pull data, let’s see how to use the Facebook Query Language using the function getFQL, which can be used to pass the queries. The following query will get you the list of friends in your network:

friends<- getFQL("SELECT uid2 FROM friend
WHERE uid1=me()", token=token)

In order to get the complete details of your friends, the following query can be used. The query will return the username, Facebook ID, and the link to their profile picture. Note that we might not be able to access the complete network of friends’ data, since access to data of all your friends are deprecated with Version 2.0. The code is as follows:

# Details about friends
Friends_details<- getFQL("SELECT uid, name, pic_square FROM user
WHERE uid = me() OR uid IN (SELECT uid2 FROM friend
WHERE uid1 = me())", token=token)

In order to know more about the Facebook Query Language, check out the following link. This method of extracting the information might be preferred by people familiar with query language. It can also help extract data satisfying only specific conditions (https://developers.facebook.com/docs/technical-guides/fql).

Exercise

Download your Facebook network and do an exploration analysis on the languages your friends speak, places where they live, the total number of pages they have liked, and their marital status. Try all these with the Facebook Query Language as well.

Network analysis and visualization

So far, we used a few functions to get the details about our Facebook profile as well as friends’ data. Let’s see how to get to know more about our network. Before learning to get the network data, let’s understand what a network is as well as a few important concepts about the network.

Anything connected to a few other things could be a network. Everything in real life is connected to each other, for example, people, machines, events, and so on. It would make a lot of sense if we analyzed them as a network. Let’s consider a network of people; here, people will be the nodes in the network and the relationship between them would be the edges (lines connecting them).

Social network analysis

The technique to study/analyze the network is called social network analysis. We will see how to create a simple plot of friends in our network in this section.

To understand the nodes (people/places/etc) in a network in social network analysis, we need to evaluate the position of the nodes. We can evaluate the nodes using centrality. Centrality can be measured using different methods like degree, betweenness, and closeness. Let’s first get our Facebook network and then get to know the centrality measures in detail.

We use the function getNetwork to download our Facebook network. We need to mention how we would like to format the data. When the parameter format is set to adj.matrix, it will produce the data in matrix format where the people in the network would become the row names and column names of the matrix and if they are connected to each other, then the corresponding cell in the matrix will hold a value. The command is as follows:

network<- getNetwork(token, format="adj.matrix")

We now have our Facebook network downloaded. Let’s visualize our network before getting to understand the centrality concept one by one with our own network. To visualize the network, we need to use the package called igraph in R. Since we downloaded our network in the adjacency matrix format, we will use the same function in igraph. We use the layout function to determine the placement of vertices in the network for drawing the graph and then we use the plot function to draw the network. In order to explore various other functionalities in these parameters, you can execute the ?<function_name> function in RStudio and the help window will have the description of the function. Let’s use the following code to load the package igraph into R.

require(igraph)

We will now build the graph using the function graph.adjacency; this function helps in creating a network graph using the adjacency matrix. In order to build a force-directed graph, we will use the function layout.drl. The force-directed graph will help in making the graph more readable. The commands are as follows:

social_graph<- graph.adjacency(network)
layout<- layout.drl(social_graph,
   options=list(simmer.attraction=0))

At last, we will use the plot function with various built in parameters to make the graph more readable. For example, we can name the nodes in our network, we can set the size of the nodes as well as the edges in the network, and we can color the graph and the components of the graph. Use the following code to see what the network looks like. The output that was plotted can be saved locally using the function dev.copy, and the size of the image as well as the type can be passed as a parameter to the function:

plot(social_graph, vertex.size=10, vertex.color="green",
vertex.label=NA,
vertex.label.cex=0.5,
edge.arrow.size=0, edge.curved=TRUE,
layout=layout.fruchterman.reingold)
dev.copy(png,filename=
"C:/Users/Sharan/Desktop/3973-03-community.png",
width=600, height=600);
dev.off ();

With the preceding plot function, my network will look like the following one. In the following network, the node labels (name of the people) have been disabled. They can be enabled by removing the vertex.label parameter.

Summary

In this article, we discussed how to use the various functions implemented in the Rfacebook package, analyze the network. This article covers the important techniques that helps in performing vital network analysis and also enlightens us about the wide range of business problems that could be addressed with the Facebook data. It gives us a glimpse of the great potential for implementation of various analyses.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here