The is the fifth in a series of posts about the R programming language. In the fourth post we learned how to draw quick plots. In the third post, Manjusha examined how to understand data graphically through R. In the second post, she explored how to pull data from R for smart analysis. The first post in the series explored data visualization with R.

If qplot is an integral part of ggplot2, then the ggplot command is a super component of the ggplot2 package. While qplot provides a quick plot with less flexibility, ggplot supports layered graphics and provides control over each and every aesthetic of the graph. The ggplot data should be in data.frame format, whereas qplot should be in vector format. While beginners like qplot, advanced users prefer ggplot. Read more from stackoverflow.com.

Before we begin, remember to first load the ggplot2 library into the current working session.

**ggplot**

ggplot is designed to work in multiple layers, starting with a layer of raw data, then adding layers of statistical information. We define ggplot as ggplot(data,mapping), and can use the inbuilt data set mtcars that is available in the ggplot2 library.

Here is a simple example:

1 |
colnames(mtcars) |

The output will be:

1 2 |
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" [11] "carb" |

1 |
help(mtcars) |

Here, the output will display in Format (a data frame with 32 observations on 11 variables) as:

1 2 3 4 5 6 7 8 9 10 11 12 |
Format: [, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (lb/1000) [, 7] qsec 1/4 mile time [, 8] vs V/S [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburettors |

1 |
head(mtcars) |

This will show first few entries in the data set mtcars:

1 2 3 4 5 6 7 |
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 |

1 |
factor(mtcars$cyl) |

This will output categories of cyl (number of cylinders for the car). You can see there are only three categories available:

1 2 3 |
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 Levels: 4 6 8 |

1 |
ggplot(mtcars, aes(factor(cyl))) |

This will not plot the graph, so assign it to some variable first, and then add some layer of geom (geometry for data), like a bar or line graph:

1 2 |
g <- ggplot(mtcars, aes(x=factor(cyl))) g + geom_bar(fill = "green") |

By setting the below options with geom, we can obtain a colored border:

1 |
g + geom_bar(fill = "green",color="violet",size=2) |

Observe the violet color border for bars with thickness size=2 (default size=1):

More options can adjust the width of the bars:

1 |
g + geom_bar(fill = "green",color="violet",size=2,width=.5) |

Observe that due to width=.5, the actual bar width reduced to half the initial width:

**More About Geometric Objects **

There are so many options with a geom (geometry of data) where one can draw a layer of graphics:

geom_point() | plots points with (x,y) coordinates |

geom_jitter() | plots duplicate points |

geom_line() | connects points of increasing order by line |

geom_path() | connects points in original order by line |

geom_bar() | barplot with x,y values |

geom_histogram() | histogram of single data column versus frequency range of that value |

**Interchanging Coordinates**

Interchanging coordinates is possible with the coord_flip() command. For example, to obtain the graph count versus factor(cyl), add one more operation onto the earlier graph as follows:

1 |
g + geom_bar(fill = "green",color="violet",size=2,width=.5)+ coord_flip() |

The result of the new command is shown below:

**The Aesthetics of a Graph **

Aesthetics is “aes,” and its function is to describe the relationship of data columns. The “aes”** **function comes with many options (like fill, color, shape, size, etc.) that generate aesthetic mappings to describe how variables in the data are mapped to the visual properties of geoms. With ggplot, we can write formulas to calculate data points to plot:

1 2 |
p<-ggplot(mtcars, aes(x = mpg^2, y = wt/cyl)) p+geom_point(color="magenta",size=4,shape=8) |

Once we store the plot in some variable, say p , then we can add layers on top of it, which is sort of superimposing new graphs on an earlier graph. Observe how one can add layers to the raw graph stored in p:

1 2 3 |
p+geom_point(color="magenta",size=4,shape=8) +geom_line(color="green") +geom_jitter(color="yellow",size=2) |

**Plot Your Data**

To plot the data using ggplot(), we need to store it in data.frame:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
> a<-c(10,10,20,20,35,45,50,50,50) > b<- a > da<-data.frame(a,b) > da a b 1 10 10 2 10 10 3 20 20 4 20 20 5 35 35 6 45 45 7 50 50 8 50 50 9 50 50 > q<-ggplot(da,aes(x=a,y=b)) |

**Geom_point() Versus Geom_jitter()**

With geom_point(), whenever there are repeated points in the data, it shows only once. On the other hand, geom_jitter() shows its multiple presence. Now we will compare graphs of q+geom_point(color=”red”) and q+geom_jitter( color=”blue”):

**Getting More From Graphs**

Plotting graphs requires command knowledge, whereas understanding graphical information requires some statistical knowledge. When we view data points we want to see the relationship between the point on the *x* axis with that on the *y*. This can be seen through linear regression. The line of best fit for the data is useful for prediction of *y,* if *x* is known.

1 |
p+geom_point(color="green",size=3) |

To draw a line for data points use method=”lm”:

1 2 |
p+geom_smooth(fill="purple",color="darkorange",size=2,method="lm") +geom_point() |

1 |
p+geom_smooth(fill="purple",color="darkorange",size=2,method="lm",formula=y ~ poly(x,2))+geom_point() |

1 |
p+geom_smooth(fill="red") |

To add a smooth conditional mean:

1 2 3 |
ggplot(mtcars, aes(x = mpg^2, y = wt/cyl)) +geom_smooth(fill="purple",color="darkorange",size=2) +geom_point(color="green") |

1 2 3 4 |
ggplot(mtcars, aes(x = mpg^2, y = wt/cyl)) +geom_smooth(fill="purple",color="darkorange",size=2) +geom_jitter(color="green",shape=2) +geom_point(color="yellow") |

**Viewport**

With the viewport() function, first plot the main graph, and then the subplot in a small area over the graph. It has the parameters *x*, *y*, height and width to control the size and position.

1 |
vp<-viewport(width=unit(2,''cm''),height=unit(3,''cm'')) |

By default, *x*,*y* controls the location of the center of the viewport:

### Steps to Generate Subplot With the Viewport

Save the plots in two variables, say *a* and *b:*

1 2 |
a<-p+geom_point(color="green",size=3) b<-p+ geom_smooth(fill="purple",color="darkorange",size=2,method="lm",formula=y ~ poly(x,2)) + geom_point() |

Now define a new pdf file as:

1 |
pdf("tryvp.pdf",width=4,height=4) |

Then, define a subwindow with the viewport() function:

1 |
subvp<-viewport(width=.4,height=.4,x=.75,y=.35) |

Next, open the main graph which was stored in *b* by typing *b* at the prompt:

1 |
b |

Then, superimpose the graph stored in *a* on the viewport as:

1 |
print(a,vp=subvp) |

Redirect the graphical device back to the console:

1 |
graphics.off() |

Open tryvp.pdf to see the output of graphics *a* superimposed on graphics *b*.

**Adding Legends**

1 2 |
p <- ggplot(mpg, aes(displ, hwy)) p + geom_jitter(aes(colour = cyl)) |

In an mpg data set, the two axes (displ and hwy) get plotted and stored in *p*. By adding a third component (*cyl*) from the mpg data set via *aes*, we see the color difference in *cyl*. Notice the automatically generated legend.

## Faceting Multiple Graphs Together

Faceting works on layers by splitting the data into subsets to create multiple graphs, each shown side by side.

1 |
p |

First assign the raw data of mtcars to the variable *p*. Then add a layer of points to *p*, by plotting them along the axes wt and mpg. Normally, in 2-d plot, we can consider only two columns from the data; however, because of the color parameters, we can represent the third column (cyl) as follows:

1 2 3 |
> factor(mtcars$cyl) [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 Levels: 4 6 8 |

“cyl” has only three different levels: 4, 6 and 8.

We can consider one more parameter due to facet_wrap() for a column “gear.” Based on this column, there will be different subsets of data which will be plotted using facet_wrap(). See how gear has only three different levels: 3, 4 and 5:

1 2 3 |
> factor(mtcars$gear) [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4 Levels: 3 4 5 |

The output will produce subsets based on the gear levels:

## Hold More Plots in One Table

With library gridExtra, it is possible to add two independent graphs into one table. First, one needs to install the package gridExtra:

1 2 |
library(gridExtra) grid.arrange( plot1, plot2, ncol=2) |

That brings us to end of this post, which introduced us to several advanced features of ggplot2. To learn more about ggplot2 visit the ggplot2 website. The code for the exercises can be found here.

Manjusha Joshi is a freelancer of free, open source software for scientific computing. She is a mathematician and a member of the Pune Linux user group.

Feature image via Flickr Creative Commons.