Data / Development

Pulling Data in R for Smart Analysis

17 Feb 2015 10:32am, by

Here is part two of a series about the R programming language. In part one we explored how R is used for data visualization. Part two examines how to pull data into R for analysis purposes. In part three, Manjusha explains how to understand data graphically through the R programming language, and in part four she has us drawing quick plots using ggplot2.

It is a no-brainer that purchasing a mobile phone can be a very challenging process. With so many models and brands to choose from, determining the right phone to befit your usage involves research and understanding of product utility. Interestingly, there are several product reviews and price comparisons available for user discretion that helps consumers make the right selection. This practical data is accumulated and cached for consumers to exploit while making decisions, or for comprehensive analysis of the product itself. As a consumer, looking for the right kind of data involves a high degree of sophistication. This is where R programming is valid. With R programming, one can use a script to quickly draw statistics suitable for one’s analysis. Let us look into some of the features and usage of R programming.

Smartanalysis_fig1

Different Ways to Handle Data in R

R can read data from:

  • Spread sheets
  • Excel sheets
  • Databases
  • Images
  • Text files
  • Many other special formats

Smartanalysis_fig2

Get Data Into R

Whether data is local or available on the Web, with R programming you will be able to successfully import data in different formats.

Read Data From Files

Ideally, data is available on the file stored within the system. All that is required to read or write this data is identification of the current directory in which the file is stored.

Setting Directories

One of the foremost things required is to set up the working directory.
To identify the directory(folder) use the command getwd()
On the linux pc, output is displayed with the path as follows:

On Windows it is depicted as:

To set the directory in which the data file is saved, use the command setwd (“path”) where path has directories with subdirectories where the datafile is located. For example, if data is in file temp.txt and the file is in folder /home/test/example/ then issue:

On Windows it will be represented as:

It is necessary to know the folder in which the file is saved.

Reading Text File

Data contained in text files can be read in R session using scan command.
Remember to use option what=”” with scan command which indicates that input will be of character data type.
For this session, I have created the textsample.txt file which can be read in R session.

Now, fdata is to hold the data from the .txt file.
Let’s review the few first entries with command head(fdata):

To change to lower-case use tolower.

There are many words in the file that are stored separately. Some of the words are also repetitive.

To count the frequency of the words use

To view a pie graph of ft use command

Smartanalysis_fig3

From the above graph, the words “file” and “the” have the highest frequency.

The maximum frequency of the words in ft can be found directly by using the max command.

Look at the output of the command.

The plot shows the words against frequency graph.

Smartanalysis_fig4

Commands to Read Data From File

It is not unknown that some of the most common data files available are csv and .xls format files, where csv is a file with comma separated values and xls is the file extension of an excel file.

Smartanalysis_fig5

Some of the most common data file formats that can be handled through commands are read.csv and read.table:

Fetch Data Directly From the Web

It is possible to read data directly from the Web. The data available in the Web link or URL will be directly fetched through R in the memory. Data is set on the network at http://lib.stat.cmu.edu/datasets/csb/ch3a.dat.

Read the data directly with read.csv or read.table command.

data1 and data2 are objects that hold the same file with different formats.

Reading Spreadsheets

To read spreadsheet data we need to install the library gdata.

With this package the new command read.xls will be available.
The data file test.xls can be read with read.xls(“test.xls”).

Fill Spread Sheet Type Data Through the Editor in R

Smartanalysis_fig6

Datasets in R

One can pull datasets available in R with data() which will show the lists of data sets available in R.

To see the description of the data use the command:

To see the actual data use head command:

More about data can be found at r-manual
Here is the Github repo link for codes we have used in this post.

Featured image via Flickr Creative Commons.

Manjusha Joshi is a freelancer for free open source software in scientific computing. She is a mathematician and a member of the Pune Linux User group.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.