This article will help you explore more advanced functions to customize the layout of the heat maps. The main focus lies on the usage of different color palettes, but we will also cover other useful features, such as cell notes that will be used in this recipe.
(For more resources related to this topic, see here.)
To ensure that our heat maps look good in any situation, we will make use of different color palettes in this recipe, and we will even learn how to create our own.
Further, we will add some more extras to our heat maps including visual aids such as cell note labels, which will make them even more useful and accessible as a tool for visual data analysis.
The following image shows a heat map with cell notes and an alternative color palette created from the arabidopsis_genes.csv data set:
Download the 5644OS_03_01.r script and the Arabidopsis_genes.csv data set from your account at http://www.packtpub.com and save it to your hard drive.
I recommend that you save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you will have to change the current R working directory accordingly.
The script will check automatically if any additional packages need to be installed in R.
Execute the following code in R via the 5644OS_03_01.r script and take a look at the PDF file custom_heatmaps.pdf that will be created in the current working directory:
### loading packages
if (!require("gplots")) {
install.packages("gplots", dependencies = TRUE)
library(RColorBrewer)
}
if (!require("RColorBrewer")) {
install.packages("RColorBrewer", dependencies = TRUE)
library(RColorBrewer)
}
### reading in data
gene_data <- read.csv("arabidopsis_genes.csv")
row_names <- gene_data[,1]
gene_data <- data.matrix(gene_data[,2:ncol(gene_data)])
rownames(gene_data) <- row_names
### setting heatmap.2() default parameters
heat2 <- function(...) heatmap.2(gene_data,
tracecol = "black",
dendrogram = "column",
Rowv = NA,
trace = "none",
margins = c(8,10),
density.info = "density", ...)
pdf("custom_heatmaps.pdf")
### 1) customizing colors
# 1.1) in-built color palettes
heat2(col = terrain.colors(n = 1000),
main = "1.1) Terrain Colors")
# 1.2) RColorBrewer palettes
heat2(col = brewer.pal(n = 9, "YlOrRd"),
main = "1.2) Brewer Palette")
# 1.3) creating own color palettes
my_colors <- c(y1 = "#F7F7D0",
y2 = "#FCFC3A",
y3 = "#D4D40D",
b1 = "#40EDEA",
b2 = "#18B3F0",
b3 = "#186BF0",
r1 = "#FA8E8E",
r2 = "#F26666",
r1 = "#C70404")
heat2(col = my_colors,
main = "1.3) Own Color Palette")
my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000)
heat2(col = my_palette, main = "1.3) ColorRampPalette")
# 1.4) gray scale
heat2(col = gray(level = (0:100)/100),
main ="1.4) Gray Scale")
### 2) adding cell notes
fold_change <- 2^gene_data
rounded_fold_changes <- round(rounded_fold_changes, 2)
heat2(cellnote = rounded,
notecex = 0.5,
notecol = "black",
col = my_palette,
main = "2) Cell Notes")
### 3) adding column side colors
heat2(ColSideColors = c("red", "gray", "red",
rep("green",13)),
main = "3) ColSideColors")
dev.off()
Primarily, we will be using read.csv() and heatmap.2() to read in data into R and construct our heat maps. In this recipe, however, we will focus on advanced features to enhance our heat maps, such as customizing color and other visual elements:
gene_data <- read.csv("arabidopsis_genes.csv")
row_names <- gene_data[,1]
gene_data <- data.matrix(gene_data[,2:ncol(gene_data)])
rownames(gene_data) <- row_names
heat2 <- function(...) heatmap.2(gene_data,
tracecol = "black",
dendrogram = "column",
Rowv = NA,
trace = "none",
margins = c(8,10),
density.info = "density", ...)
So, each time we call our newly defined heat2() function, it will behave similar to the heatmap.2() function, except for the additional arguments that we will pass along. We also include a new argument, black, for the tracecol parameter, to better distinguish the density plot in the color key from the background.
So let us make use of the terrain.colors color palette now, which will give us a nice color transition from green over yellow to rose:
heat2(col = terrain.colors(n = 1000),
main = "1.1) Terrain Colors")
Every number for the parameter n that is larger than the default value 12 will add additional colors, which will make the transition smoother. A value of 1000 for the n parameter should be more than sufficient to make the transition between the individual colors indistinguishable to the human eye.
The following image shows a side-by-side comparison of the heat.colors and terrain.colors color palettes using a different number of color shades:
Further, it is also possible to reverse the direction of the color transition. For example, if we want to have a heat.color transition from yellow to red instead of red to yellow in our heat map, we could simply define a reverse function:
rev_heat.colors <- function(x) rev(heat.colors(x)) heat2(col = rev_heat.colors(500))
heat2(col = brewer.pal(n = 9, "YlOrRd"),
main = "1.2) Brewer Palette")
The following image gives you a good overview of all the different color palettes that are available from the RColorBrewer package:
The most convenient way to assign new colors to a color palette is using hex colors (hexadecimal colors). Many different online tools are freely available that allow us to obtain the necessary hex codes. A great example is color picker (http://www.colorpicker.com), which allows us to choose from a rich color table and provides us with the corresponding hex codes.
Once we gather all the hexadecimal codes for the colors that we want to use for our color palette, we can assign them to a variable as we have done before with the explicit color names:
my_colors <- c(y1 = "#F7F7D0",
y2 = "#FCFC3A",
y3 = "#D4D40D",
b1 = "#40EDEA",
b2 = "#18B3F0",
b3 = "#186BF0",
r1 = "#FA8E8E",
r2 = "#F26666",
r1 = "#C70404")
heat2(col = my_colors,
main = "1.3) Own Color Palette")
This is a very handy approach for creating a color key with very distinct colors. However, the downside of this method is that we have to provide a lot of different colors if we want to create a smooth color gradient; we have used 1000 different colors for the terrain.color() palette to get a smooth transition in the color key!
my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000)
heat2(col = my_palette, main = "1.3) ColorRampPalette")
In this case, it is more convenient to use discrete color names over hex colors, since we are using the colorRampPalette() function to create a gradient and do not need all the different shades of a particular color.
The level parameter of the gray() function takes a vector with values between 0 and 1 as an argument, where 0 represents black and 1 represents white, respectively. For a smooth gradient, we use a vector with 100 equally spaced shades of gray ranging from 0 to 1.
heat2(col = gray(level = (0:200)/200),
main ="1.4) Gray Scale")
We can make use of the same color palettes for the levelplot() function too. It works in a similar way as it did for the heatmap.2() function that we are using in this recipe. However, inside the levelplot() function call, we must use col.regions instead of the simple col, so that we can include a color palette argument.
As we recall, the data we read from arabidopsis_genes.csv resembles log 2 ratios of sample and reference gene expression levels. Let us calculate the fold changes of the gene expression levels now and display them—rounded to two digits after the decimal point—as cell notes on our heat map:
fold_change <- 2^gene_data
rounded_fold_changes <- round(fold_change, 2)
heat2(cellnote = rounded_fold_changes,
notecex = 0.5,
notecol = "black",
col = rev_heat.colors,
main = "Cell Notes")
The notecex parameter controls the size of the cell notes. Its default size is 1, and every argument between 0 and 1 will make the font smaller, whereas values larger than 1 will make the font larger. Here, we decreased the font size of the cell notes by 50 percent to fit it into the cell boundaries. Also, we want to display the cell notes in black to have a nice contrast to the colored background; this is controlled by the notecol parameter.
heat2(ColSideColors = c("red", "gray", "red", rep("green", 13)),
main = "ColSideColors")
You can see in the following image how the column side colors look like when we include the ColSideColors argument as shown previously:
Attentive readers may have noticed that the order of colors in the column color box slightly differs from the order of colors we passed as a vector to ColSideColors. We see red two times next to each other, followed by a green and a gray box. This is due to the fact that the columns of our heat map have been reordered by the hierarchical clustering algorithm.
To learn more about the similar technology, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended:
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…