Technology

Understanding Data Graphically Through R

10 Mar 2015 5:07pm, by

This is the third post in a series about the R programming language.  In the second post, Manjusha explored how to pull data from R for smart analysis. The first post in the series explored data visualization with R. In the fourth and final post she explains how to draw quick plots using ggplot2.

Data that is surveyed is often collected and represented as clear and effective graphs. In R programming, data is generally illustrated with some simple yet useful commands. This post is about understanding data representations graphically through the use of R programming commands.

Basic Table and Bar Graph

Let us consider the classic case of a survey report analysis. The number of students mapped to their study course is collected every year, indicated in the table below. The table describes the annual data accumulated by surveying the number of students taking a study course.

Year Management Commerce Science
1996 2810 890 540
1997 3542 1363 471
1998 4301 1663 652
1999 5363 2071 895
2000 6567 2752 1113

Bar graph 1: Number of students

From bar graph 1, above, one can easily conclude that every year there is a steady increase in the number of students with the respective study streams.

Enhanced Detail for More Information

What if we wish to get more information from the graph? Maybe we would like to know which stream has more students or would like to compare the number of students by year to study stream? Bar graph 2, below, depicts more on our requirements. Notice it indicates the number of students is increasing, with added study course delineation.

2

Bar Graph 2: Number of Students by Study Course

3

Bar Graph 3: Number of Students by Study Course

Bar graph 3, above, indicates the number of students is increasing annually in every study course.

In general, from a given table, you can get information on popular branches of study, and see the increase in the number of students per branch.

A Picture is Worth a Thousand Words

With R programming, we can use some of its commands to depict data pictorially. Based on the tabular data collected from the survey, for instance, implementing R commands can produce that same data representation as a graph that aids in easy comprehension. Like they say, a picture is truly worth a thousand words. With R programming, it effortlessly enables us to represent data pictorially.

Draw Graphs With R Programming Commands

With R, one can draw a variety of graphs which help the viewer better understand the data. Some of the supported graphs in R are:

  • Histogram
  • Bar plot
  • Dotchart
  • Plot points: use command plot(c(3,12,15),c(23,1,14))
  • Pie graph: use command pie(1:10,col=rainbow(10))

4

 

Plot functions: use command curve(sin(x),-3,3,col=3)

5

Add Colors to Graphs

Colors can be added to the graphs for increased distinction. For instance, col=”red” is one of the options to be used with the plot command to specify the color of the graph. Another variation is to try palette() to generate the graph in different colors.

Plot Inbuilt Data

As we have seen earlier, there are inbuilt datasets in R.  The below code indicates interesting examples of built-in datasets:

6

Populate the Table With Data

Here is the syntax in R to populate data. First, input numerical data as a matrix. Next, add the required row names and column names. Here is the code:

Plot the Data

To plot the data, simply use the command barplot. As per the code below, the data is stored in year.stud and we apply “t” to it for transpose. To change color use the command col. Execute the code below to see the bar graph for the data.

 

Here are some more options that can be used:

  • beside=TRUE allows bar graphs to be displayed side by side.
  • The legend gives information of the item displayed in color.

Unique Graphs

In this session, let’s look at some unique graphs which use R programming features.

Sunflower Plot

Multiple points are plotted as ‘sunflowers’ with multiple ‘petals’, such that over-plotting is visualized instead of repeating data. This graph is useful for finding patterns in the data. Let’s look at the data collection below, labeled “a”, which clearly contains repeated data.

7

To plot it, issue > sunflowerplot(a,a). From the output below, the number of petals for each flower is marked in red. It denotes the number of times the data item is present.

ggplot2

ggplot2 is a popular plotting library in R. It is based on the grammar of graphics, which tries to use the good parts of base and lattice graphics. First, install it with install.packages(”ggplot2”), and every time you want to use it for a new session you need to load it with the command library(ggplot2).
 
ggplot2 represents many details of the data with the issuance of some quick commands where colorful and meaningful graphs help us understand the data quickly.

Some of the benefits of using ggplot2 are:

  • It helps generate reports as tidy graphs.
  • It makes reports readable for data analysis.
  • It is useful to make quick decisions due to its ease of use.

Diamond Graph One

Let us plot a graph of diamonds, based on carat and price, where diamonds are a data set in ggplot2, carats are viewed as columns and prices are viewed as rows.

8

Observe how the legend gets generated automatically.

 

Diamond Graph Two

Here is the code for the below graph:

  • qplot is the basic command available, like plot command.
  • geom is geometry: it allows the user to choose which type of graph to use for the given data.

8a

Diamond Graph Three

9

For more details about ggplot2, see http://www.cookbook-r.com/Graphs/.

Github repo link here: https://github.com/thenewstack/R-code/tree/master/visualizedata.

Manjusha Joshi is a freelancer of free, open source software for scientific computing. She is a mathematician and a member of the Pune Linux user group.

Feature image via Flickr Creative Commons.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.