Development

R Package: Drawing Layered Plots With ggplot2

17 Apr 2015 8:36am, by

The is the fifth in a series of posts about the R programming language. In the fourth post we learned how to draw quick plots. In the third post, Manjusha examined how to understand data graphically through R. In the second post, she explored how to pull data from R for smart analysis. The first post in the series explored data visualization with R.

If qplot is an integral part of ggplot2, then the ggplot command is a super component of the ggplot2 package. While qplot provides a quick plot with less flexibility, ggplot supports layered graphics and provides control over each and every aesthetic of the graph. The ggplot data should be in data.frame format, whereas qplot should be in vector format. While beginners like qplot, advanced users prefer ggplot. Read more from stackoverflow.com.

Before we begin, remember to first load the ggplot2 library into the current working session.

ggplot

ggplot is designed to work in multiple layers, starting with a layer of raw data, then adding layers of statistical information. We define ggplot as ggplot(data,mapping), and can use the inbuilt data set mtcars that is available in the ggplot2 library.
Here is a simple example:

The output will be:

Here, the output will display in Format (a data frame with 32 observations on 11 variables) as:

This will show first few entries in the data set mtcars:

This will output categories of cyl (number of cylinders for the car). You can see there are only three categories available:

This will not plot the graph, so assign it to some variable first, and then add some layer of geom (geometry for data), like a bar or line graph:

ggbar
By setting the below options with geom, we can obtain a colored border:

Observe the violet color border for bars with thickness size=2 (default size=1):
ggbar2More options can adjust the width of the bars:

Observe that due to width=.5, the actual bar width reduced to half the initial width:
ggbar-width

More About Geometric Objects 

There are so many options with a geom (geometry of data) where one can draw a layer of graphics:

geom_point() plots points with (x,y) coordinates
geom_jitter() plots duplicate points
geom_line() connects points of increasing order by line
geom_path() connects points in original order by line
geom_bar() barplot with x,y values
geom_histogram() histogram of single data column versus frequency range of that value

Interchanging Coordinates

Interchanging coordinates is possible with the coord_flip() command. For example, to obtain the graph count versus factor(cyl), add one more operation onto the earlier graph as follows:

The result of the new command is shown below:
gbarflip

The Aesthetics of a Graph

Aesthetics is “aes,” and its function is to describe the relationship of data columns. The “aes” function comes with many options (like fill, color, shape, size, etc.) that generate aesthetic mappings to describe how variables in the data are mapped to the visual properties of geoms. With ggplot, we can write formulas to calculate data points to plot:

geompt
Once we store the plot in some variable, say p , then we can add layers on top of it, which is sort of superimposing new graphs on an earlier graph. Observe how one can add layers to the raw graph stored in p:

geomptln

Plot Your Data

To plot the data using ggplot(), we need to store it in data.frame:

Geom_point() Versus Geom_jitter()

With geom_point(), whenever there are repeated points in the data, it shows only once. On the other hand, geom_jitter() shows its multiple presence. Now we will compare graphs of q+geom_point(color=”red”) and q+geom_jitter( color=”blue”):

geom-jt

geom-jt2

Getting More From Graphs

Plotting graphs requires command knowledge, whereas understanding graphical information requires some statistical knowledge. When we view data points we want to see the relationship between the point on the x axis with that on the y. This can be seen through linear regression. The line of best fit for the data is useful for prediction of y, if x is known.

gpointcol
To draw a line for data points use method=”lm”:

smooth-im

polysmooth

To add a smooth conditional mean:

car-smooth-pt

car-smooth-jitter

Viewport

With the viewport() function, first plot the main graph, and then the subplot in a small area over the graph. It has the parameters xy, height and width to control the size and position.

By default, x,y controls the location of the center of the viewport:

tryvp

Steps to Generate Subplot With the Viewport

Save the plots in two variables, say a and b:

Now define a new pdf file as:

Then, define a subwindow with the viewport() function:

Next, open the main graph which was stored in b by typing b at the prompt:

Then, superimpose the graph stored in a on the viewport as:

Redirect the graphical device back to the console:

Open tryvp.pdf to see the output of graphics a superimposed on graphics b.

Adding Legends

gglegend
In an mpg data set, the two axes (displ and hwy) get plotted and stored in p. By adding a third component (cyl) from the mpg data set via aes, we see the color difference in cyl. Notice the automatically generated legend.

Faceting Multiple Graphs Together

Faceting works on layers by splitting the data into subsets to create multiple graphs, each shown side by side.

First assign the raw data of mtcars to the variable p. Then add a layer of points to p, by plotting them along the axes wt and mpg. Normally, in 2-d plot, we can consider only two columns from the data; however, because of the color parameters, we can represent the third column (cyl) as follows:

“cyl” has only three different levels: 4, 6 and 8.

We can consider one more parameter due to facet_wrap() for a column “gear.” Based on this column, there will be different subsets of data which will be plotted using facet_wrap(). See how gear has only three different levels: 3, 4 and 5:

The output will produce subsets based on the gear levels:

facet

Hold More Plots in One Table

With library gridExtra, it is possible to add two independent graphs into one table. First, one needs to install the package gridExtra:

That brings us to end of this post, which introduced us to several advanced features of ggplot2. To learn more about ggplot2 visit the ggplot2 website. The code for the exercises can be found here.

Manjusha Joshi is a freelancer of free, open source software for scientific computing. She is a mathematician and a member of the Pune Linux user group.

Feature image via Flickr Creative Commons.

A newsletter digest of the week’s most important stories & analyses.