





















































Read more about this book |
(For more resources on R, see here.)
The ever popular line chart, or line graph, depicts relationships as continuous series of connected data points. Line charts are particularly useful for visualizing specific values and trends over time. Just as a line chart is an extension of a scatterplot in the non-digital realm, a line chart is created using an extended form of the plot(...) function in R. Let us explore how to extend the plot(...) function to create line charts in R:
> #create a line chart that depicts the durations of past fire attacks
> #get the data to be used in the chart
> lineFireDurationDataX <- c(1:30)
> lineFireDurationDataY <- subsetFire$DurationInDays
> #customize the chart
> lineFireDurationMain <- "Duration of Past Fire Attacks"
> lineFireDurationLabX <- "Battle Number"
> lineFireDurationLabY <- "Duration in Days"
> #use the type argument to connect the data points with a line
> lineFireDurationType <- "o"
> #use plot(...) to create and display the line chart
> plot(x = lineFireDurationDataX, y = lineFireDurationDataY,
main = lineFireDurationMain, xlab = lineFireDurationLabX,
ylab = lineFireDurationLabY, type = lineFireDurationType)
We expanded our use of the plot(...) function to generate a line chart and encountered a new data notation in the process. Let us review these features.
In the plot(...) function, the type argument determines what kind of line, if any, should be used to connect a chart's data points. The type argument receives one of several character values, all of which are listed as follows:
Our chart, which represented the duration of past fire attacks, featured a line that overlapped the plotted points. First, we defined our desired line type in an R variable:
> lineFireDurationType <- "o"
Then the type argument was placed within our plot(...) function to generate the line chart:
> plot(lineFireDurationDataX, lineFireDurationDataY,
main = lineFireDurationMain, xlab = lineFireDurationLabX,
ylab = lineFireDurationLabY,
type = lineFireDurationType)
You may have noticed that we specified a vector for the x-axis data in our plot(...) function.
> lineFireDurationDataX <- c(1:30)
This vector used number-colon-number notation. Essentially, this notation has the effect of enumerating a range of values that lie between the number that precedes the colon and the number that follows it. To do so, it adds one to the beginning value until it reaches a final value that is equal to or less than the number that comes after the colon. For example, the code > 14:21 would yield eight whole numbers, beginning with 14 and ending with 21, as follows:
[1] 14 15 16 17 18 19 20 21
Furthermore, the code > 14.2:21 would yield seven values, beginning with 14.2 and ending with 20.2, as follows:
[1] 14.2 15.2 16.2 17.2 18.2 19.2 20.2
Number-colon-number notation is a useful way to enumerate a series of values without having to type each one individually. It can be used in any circumstance where a series of values is acceptable input into an R function.
Number-colon-number notation can also enumerate values from high to low. For instance, 21:14 would yield a list of values beginning with 21 and ending with 14.
Since we do not have exact dates or other identifying information for our 30 past battles, we simply enumerated the numbers 1 through 30 on the x-axis. This had the effect of assigning a generic identification number to each of our past battles, which in turn allowed us to plot the duration of each battle on the y axis.
> 1:50
A useful way to convey a collection of summary statistics in a dataset is through the use of a box plot. This type of graph depicts a dataset's minimum and maximum, as well as its lower, median, and upper quartiles in a single diagram. Let us look at how box plots are created in R:
> #create a box plot that depicts the number of soldiers required to launch a fire attack
> #get the data to be used in the plot
> boxplotFireShuSoldiersData <- subsetFire$ShuSoldiers
> #customize the plot
> boxPlotFireShuSoldiersLabelMain <- "Number of Soldiers Required to Launch a Fire Attack"
> boxPlotFireShuSoldiersLabelX <- "Fire Attack Method"
> boxPlotFireShuSoldiersLabelY <- "Number of Soldiers"
> #use boxplot(...) to create and display the box plot
> boxplot(x = boxplotFireShuSoldiersData,
main = boxPlotFireShuSoldiersLabelMain,
xlab = boxPlotFireShuSoldiersLabelX,
ylab = boxPlotFireShuSoldiersLabelY)
> #create a box plot that compares the number of soldiers required across the battle methods
> #get the data formula to be used in the plot
> boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~ battleHistory$Method
> #customize the plot
> boxPlotAllMethodsShuSoldiersLabelMain <- "Number of Soldiers Required by Battle Method"
> boxPlotAllMethodsShuSoldiersLabelX <- "Battle Method"
> boxPlotAllMethodsShuSoldiersLabelY <- "Number of Soldiers"
> #use boxplot(...) to create and display the box plot
> boxplot(formula = boxplotAllMethodsShuSoldiersData,
main = boxPlotAllMethodsShuSoldiersLabelMain,
xlab = boxPlotAllMethodsShuSoldiersLabelX,
ylab = boxPlotAllMethodsShuSoldiersLabelY)
We just created two box plots using R's boxplot(...) function, one with a single box and one with multiple boxes.
We started by generating a single box plot that was composed of a dataset, main title, and x and y labels. The basic format for a single box plot is as follows:
boxplot(x = dataset)
The x argument contains the data to be plotted. Technically, only x is required to create a box plot, although you will often include additional arguments. Our boxplot(...) function used the main, xlab, and ylab arguments to display text on the plot, as shown:
> boxplot(x = boxplotFireShuSoldiersData,
main = boxPlotFireShuSoldiersLabelMain,
xlab = boxPlotFireShuSoldiersLabelX,
ylab = boxPlotFireShuSoldiersLabelY)
Next, we created a multiple box plot that compared the number of Shu soldiers deployed by each battle method. The main, xlab, and ylab arguments remained from our single box plot, however our multiple box plot used the formula argument instead of x. Here, a formula allows us to break a dataset down into separate groups, thus yielding multiple boxes.
The basic format for a multiple box plot is as follows:
boxplot(formula = dataset ~ group)
In our case, we took our entire Shu soldier dataset (battleHistory$ShuSoldiers) and separated it by battle method (battleHistory$Method):
> boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~ battleHistory$Method
Once incorporated into the boxplot(...) function, this formula resulted in a plot that contained four distinct boxes—ambush, fire, head to head, and surround:
> boxplot(formula = boxplotAllMethodsShuSoldiersData,
main = boxPlotAllMethodsShuSoldiersLabelMain,
xlab = boxPlotAllMethodsShuSoldiersLabelX,
ylab = boxPlotAllMethodsShuSoldiersLabelY)
> boxplot(x = a)
> boxplot(formula = a ~ b)