





















































Read more about this book |
(For more resources on R, see here.)
For demonstration purposes, it will be assumed that a fire attack was chosen as the optimal battle strategy. Throughout this segment, we will retrace the steps that lead us to this decision. Meanwhile, we will make sure to organize and clarify our analyses so they can be easily communicated to others.
Suppose we determined our fire attack will take place 225 miles away in Anding, which houses 10,000 Wei soldiers. We will deploy 2,500 soldiers for a period of 7 days and assume that they are able to successfully execute the plans. Let us return to the beginning to develop this strategy with R in a clear and concise manner.
To begin our analysis, we must first launch R and set our working directory:
> #set the R working directory using setwd(dir)
> setwd("/Users/johnmquick/rBeginnersGuide/")
> #verify the location of your working directory
> getwd()
[1] "/Users/johnmquick/rBeginnersGuide/"
We prepared R to begin our analysis by launching the soft ware and setting our working directory. At this point, you should be very comfortable completing these steps.
Next, we need to import our battle data into R and isolate the portion pertaining to past fire attacks:
> #read the contents of battleHistory.csv into an R variable
> #battleHistory contains data from 120 previous battles
between the Shu and Wei forces
> battleHistory <- read.table("battleHistory.csv", TRUE, ",")
> #use the subset(data, ...) function to create a subset of
the battleHistory dataset that contains data only from battles
in which the fire attack strategy was employed
> subsetFire <- subset(battleHistory, battleHistory$Method ==
"fire")
> #display the fire attack data subset
> subsetFire
We imported our dataset and then created a subset containing our fire attack data. However, we used a slightly different function, called read.table(...), to import our external data into R.
U p to this point, we have always used the read.csv() function to import data into R. However, you should know that there are oft en many ways to accomplish the same objectives in R. For instance, read.table(...) is a generic data import function that can handle a variety of file types. While it accepts several arguments, the following three are required to properly import a CSV file, like the one containing our battle history data:
Using these arguments, we were able to import the data in our battleHistory.csv into R. Since our file contained headings, we used a value of TRUE for the header argument and because it is a comma-separated values file, we used "," for our sep argument:
> battleHistory <- read.table("battleHistory.csv", TRUE, ",")
This is just one example of how a different technique can be used to achieve a similar outcome in R. We will continue to explore new methods in our upcoming activities.
4,5
5,9
6,12
To begin our analysis, we will examine the summary statistics and correlations of our data. These will give us an overview of the data and inform our subsequent analyses:
> #generate a summary of the fire subset
> summaryFire <- summary(subsetFire)
> #display the summary
> summaryFire
Before calculating correlations, we will have to convert our nonnumeric data from the Method, SuccessfullyExecuted, and Result columns into numeric form.
> #represent categorical data numerically using
as.numeric(data)
> #recode the Method column into Fire = 1
> numericMethodFire <- as.numeric(subsetFire$Method) - 1
> #recode the SuccessfullyExecuted column into N = 0 and Y = 1
> numericExecutionFire <-
as.numeric(subsetFire$SuccessfullyExecuted) - 1
> #recode the Result column into Defeat = 0 and Victory = 1
> numericResultFire <- as.numeric(subsetFire$Result) - 1
With the Method, SuccessfullyExecuted, and Result columns coded into numeric form, let us now add them back into our fire dataset.
> #save the data in the numeric Method, SuccessfullyExecuted,
and Result columns back into the fire attack dataset
> subsetFire$Method <- numericMethodFire
> subsetFire$SuccessfullyExecuted <- numericExecutionFire
> subsetFire$Result <- numericResultFire
> #use cor(data) to calculate all of the correlations in the
fire attack dataset
> cor(subsetFire)
Note that the error message and NA values in our correlation output result from the fact that our Method column contains only a single value. This is irrelevant to our analysis and can be ignored.
Initially, we calculated summary statistics for our fire attack dataset using the summary(object) function. From this information, we can derive the following useful insights about our past battles:
Next, we recoded the text values in our dataset's Method, SuccessfullyExecuted, and Result columns into numeric form. Aft er adding the data from these variables back into our our original dataset, we were able to calculate all of its correlations. This allowed us to learn even more about our past battle data:
The insights gleaned from our summary statistics and correlations put us in a prime position to begin developing our regression model.