Many modelers have switched to R because of its ability to make flexible graphics and visualization and the best way to master graphs in R is to learn the base graphics. Understand the structure of making and customizing the basic plots such as scatterplot, box and bar plots as explained below and you would have the recipe to make any plot as you wish.
Lets begin by making a very basic plot. The code to plot something in R, lets say, numbers from 1 to 10 would look something like this.
plot (1:10) # make a scatterplot of numbers 1 to 10
The principal function for creating graphs in R is the ‘plot()’ function. It can take a number of arguments that decides what should be drawn on the graph and also how it should look. However, not all the arguments are mandatory, in fact, the only argument it needs to make a valid plot is a single number or a vector of numbers. In the above plot function, we supplied just that – a numeric vector (1:10). With this basic information, R tries to make a scatterplot out of it by taking the supplied information to the Y-axis by default. A default X-axis is created automatically and calls it ‘Index’. Here is how it would look:
Now, that looks ‘OK’, but there is a number of features missing, like the heading, legend, customised axis labels etc. In the section that follows, we will learn how to add all of those and also to make other types of plots such as bar plots, box plots etc.
What does one look for in a scatterplot? Scatterplots helps to visualise if there is any relationship (linear) exists between the x and y variables, or if any groups/clusters exists amongst the data points.
To do that, we generally need to supply both the X and Y axis data to the plot. Lets see how to do this with the ‘cars’ data that contain the ‘dist’ and ‘speed’ values for each datapoint. We will also learn how to add some of the important plot elements such as the title, axis labels, shape of plot symbols, colors etc.
Note that, the method and arguments that we use to add these elements applies not only to scatterplots, but also to all basic plot types such as box plot, line graphs etc.
1. Basic scatterplot with x and y
- x axis values: x
- y axis values: y
plot(x=cars$speed, y=cars$dist) # basic scatterplot with x and y
The x and y axis get default labels labels, same as the names of the x and y variable arguments supplied to the plot function.
2. Adding title, subtitle, axis labels and shape of point character
- plot’s main title: main
- plot’s sub title: sub
- x axis title: xlab
- y axis title: ylab
- plot character(symbol): pch
# scatterplot with title, axis labels etc.
plot(x=cars$speed, y=cars$dist, main="Dist vs Speed", sub="from the 'cars' dataset", xlab="Speed", ylab="Dist", pch='*')
3. Changing color
Extending upon the previous plot to modify the colors of title, points, subtitle, axis labels and axis annotations.
- color of symbols: col
- color of main title: col.main
- color of x and y axes: col.axis
- color of axis labels: col.lab
- color of subtitle: col.sub
- color of chart background: bg
plot(x=cars$speed, y=cars$dist, main="Dist vs Speed", sub="from the 'cars' dataset", xlab="Speed", ylab="Dist", pch='*', col="blue", col.main="red", col.axis="orange", col.lab="darkgreen", col.sub="grey")
4. Changing size
The arguments for changing the size of plot elements are similar to those that change color, but has ‘cex’ instead of ‘col’ in the respective argument names. The value that goes into the cex argument specifies the times by which the elements will be magnified. For example, in the code below, the symbol size, title size etc will be 1.5 times the default base size.
- size of symbols: cex
- size of main title: cex.main
- size of x and y axes: cex.axis
- size of axis labels: cex.lab
- size of subtitle: cex.sub
plot(x=cars$speed, y=cars$dist, main="Dist vs Speed", sub="from the 'cars' dataset", xlab="Speed", ylab="Dist", pch='*', col="blue", col.main="red", col.axis="orange", col.lab="darkgreen", col.sub="grey", cex=1.5, cex.main=1.5 , cex.axis=1.5, cex.lab=1.5, cex.sub=1.5)
5. Adding a line of best fit, change line type and line width
Once the scatterplot is drawn, the function abline() can be used to draw a line of best fit based on a linear model it takes in as an argument. Note that, the abline() function is called only after a scatterplot is already drawn.
After adding the abline, we add a grid and box around the plot using the grid() and box() function respectively. Further, the lwd and lty used here is can also be used when drawing line graphs using the plot() function as you will see in the section on line charts.
plot(x=cars$speed, y=cars$dist, main="Dist vs Speed", sub="from the 'cars' dataset", xlab="Speed", ylab="Dist", pch='*', col="blue", axes=FALSE) # turn off plotting axes
abline(lm(dist ~ speed, data=cars)) # line of best fit
grid() # add grid
box(col="red", lwd=3, lty=3) # draw a box around the graph.
In the call to the function box(), we specify the line type (lty) and line width (lwd) arguments to modify the same for the box around the plot.
6. Adding Legend
Once the plot is drawn and is appearing on the plots panel, use the legend() function to add legend to the plot. It accepts a number of arguments but the key ones we need to style the legend is shown below:
- location on plot goes as the first argument. This could be a x and y co-ordinated or one of the following: “bottomright”, “bottom”, “bottomleft”, “left”, “topleft”, “top”, “topright”, “right” and “center”.
- distance from axis as a fraction of plot area: inset
- title of legend: title
- legend text: legend. This there are more than one text, they can be passed as a character vector.
- what color to fill in: fill. If multiple colors are to be specified, they can be arranged in desired order and passed as a vector.
- place the legend horizontally or not: horiz
legend("bottomright", inset=.01, title="Distance vs Speed", legend=c("dist and speed"), fill="blue", horiz=TRUE) # adding legend once plot is drawn
7. Text Annotations
Writing on plot’s margins
The function for writing on margins is mtext(), where the ‘m’ is presumably an abbreviation for ‘margin’. The ‘mtext()’ function takes a number of arguments the most important of which is the “text” argument that takes the “margin text” itself and the “side” argument, which specifies the side where, 1=bottom, 2=left, 3=top, 4=right.
plot(1:10, 1:10, main = "Sample plot to demonstrate margin text") # main plot
mtext("margin text at bottom", side=1, col=1) # bottom margin text
mtext("margin text at left", side=2, col=2)
mtext("margin text at top", side=3, col=3)
mtext("margin text at right", side=4, col=4)
Writing at particular spots inside the plot
Use text() function to add text inside a plot at specific points of your choice. This gives a lot of control to place text as you wish at any point inside the plot. So, it is not possible to write some text near the plotted points itself too. The ‘adj’ parameter makes it possible to move the attaching point (hinge point) of the text from its center to anywhere. It takes in values from 0 to 1, where 0 points to extreme left and 1 to extreme right.
plot(1:10, 1:10, main = "text(...) examples\n~~~~~~~~~~~~~~")
points(c(6,2), c(2,1), pch = 3, cex = 4, col = "red") # add two red +'s
text(6, 2, "the text is CENTERED around (x,y) = (6,2) by default", cex = .8) # text at (6,2)
text(2, 1, "or Left/Bottom - JUSTIFIED at (2,1) by 'adj = c(0,0)'", adj = c(0,0), cex=0.8) # text at (2,1), with hinge point of text moved to 0,0 so the text doesn't disappear off the plot
text(x=(1:10)-0.25, y=1:10, 1:10, col=1:10) # add numbers as text near the points
8. Saving plot to file
R plots can be saved as various image file formats such as .png, .svg, .pdf, .tiff, .bmp and .jpeg. The procedure to do it is as follows:
- Open the plotting device using the respective function png() for .png file, svg() for .svg file, etc. Each of these functions take in the filename as the first parameter and optionally the height and width in pixels. If height and width is not specified, the plot size will be the same as what is shown on your plotting window.
- Draw the plot
- Turn off the plotting device using dev.off()
# Saving the plot to file
png("path/to/fileDir/imagefilename.png", height=300, width=300) # plot will be saved to this file
plot(1:10) # draw the plot
dev.off() # turn off device
Line graphs, Box plot and Bar plot
Line graphs can be drawn with the plot() function itself, but with the value of type argument as “l” for line only or using “b” for both line and points.
There is another way. You can add a line using the lines() function, to a plot already drawn with plot() function.
Y <- rnorm(50, mean=10, sd=1) # some numbers
plot(Y, type="l", ylim=c(5,15), ,main="Line Plot", ylab="some numbers") # line graph
plot(Y, type="b", ylim=c(5,15), ,main="Line Plot", ylab="some numbers") # line and points graph
boxplot(ToothGrowth$len, main="Tootgrowth length")
boxplot(len ~ dose, data = ToothGrowth, boxwex = 0.25, col = "orange", main="Toothgrowth length for each dosage")
Rural Male Rural Female Urban Male Urban Female 50-54 11.7 8.7 15.4 8.4 55-59 18.1 11.7 24.3 13.6 60-64 26.9 20.3 37.0 19.3 65-69 41.0 30.9 54.6 35.1 70-74 66.0 54.3 71.1 50.0
barplot(VADeaths) # stacked
barplot(VADeaths, beside = TRUE) # besides
barplot(VADeaths, beside = TRUE, col = c("lightblue", "mistyrose", "lightcyan", "lavender", "cornsilk"), legend = rownames(VADeaths), ylim = c(0, 100)) # besides and coloured