Blog

One of the fundamental objective of R Programming language is to process Dataset containing 'Huge Amount of Data' and then generate Insight from the same.

Thus , invariably , in any Data Science / Data Analytics Project , the very first step is to 'IMPORT' data available outside of R into R environment .

The Data which has to be Imported into R could be in any Format - 
Text File with data delimited by Tab
Text File with Data separated by Commas
 Excel Files 
 Stata data
 SPSS data
 SAS data 
 etc ...
The most common Data Format which gets imported into R are 'Tab Delimited' or CSV files.

Before , we learn how to read / import 'Tab Delimited' or CSV files , lets spend some time to understand what are 'Tab Delimited' or CSV files -

Tab Delimited File / TSV ( Tab Separated Values )

A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure, e.g., database table or spreadsheet data, and a way of exchanging information between databases.
Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab character. 
The TSV format is thus a type of the more general delimiter-separated values format.

CSV ( Comma Separated Values )

a comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

Note : Any excel file can be saved as 'Tab Delimited' (*.txt)  or 'Comma Separated Value ' (*.csv) file.

Now , lets learn how to read/import  TSV or CSV files into R .

Lets , look at 3 base functions available in R to read / import  TSV or CSV files.
read.table()
read.csv()
read.delim()

read.table()

read.table() is the most commonly used approach in R for importing data.
this function reads a file in table format and creates a data frame from it, with rows corresponding to lines and variables to fields in the file.

Commonly used syntax of read.table() -

read.table( file_name , sep = " " , header = T , row.names = n , nrows = k , skip = x , stringsAsFactors = F )

Blue text are mandatory fields & Red texts are optional .

File_name  : Name of the File which has to be imported into R 
sep : Specify the separator in the file ( mention ' ' for tab delimited & ',' for comma separated )
header : Specify T in case file contains header else specify F 
row.names : Specify 1 in case the first column in file contains the row name else ignore this parameter
nrows : Specify the number of rows to be read from the file 
skip : Specify the number of rows from the start to be skipped while reading the file
stringAsFactors : Specify F , if you donot want the Character column to be converted into Factor 

Example Code 1 - To read a Tab delimited file having a header ,where first 5 rows to be skipped & only 8 records to be read.
> Read_Tab_File <- read.table("Tab_Delim_Data.txt" ,
+                              sep = ' ' ,
+                              header = T ,
+                              row.names = 1,
+                              skip = 5,
+                              nrows = 8)

Example Code 2 - To read a CSV file having a header ,where first 2 rows to be skipped & only 4 records to be read.
> Read_Comma_File <- read.table("Comma_Delim_Data.csv" ,
+                              sep = ',' ,
+                              header = T ,
+                              row.names = 1,
+                              skip = 2,
+                              nrows = 4)

read.csv()

read.csv() is identical to read.table() except for the default separator being comma.
Thus , with read.csv() -
the input file to be read has to be a CSV file format
and , no need to mention the separator parameter 

Example Code  - To read a CSV file having a header ,where first 2 rows to be skipped & only 4 records to be read using read.csv() without having to explicitly mention the separator.
> Read_CSV_File <- read.csv("Comma_Delim_Data.csv" ,
+                              header = T ,
+                              row.names = 1,
+                              skip = 2,
+                              nrows = 4)

read.delim()

read.delim() is identical to read.table() except for the default separator being 'Tab'.
Thus , with read.delim() -
the input file to be read has to be a tab delimited file format
and , no need to mention the separator parameter 

Example Code  - To read a tab delimited file having a header ,where first 5 rows to be skipped & only 8 records to be read using read.delim() without having to explicitly mention the separator.

> Read_delim_File <- read.delim("Tab_Delim_Data.txt" ,
+                              header = T ,
+                              row.names = 1,
+                              skip = 5,
+                              nrows = 8)

Thanks & Happy Learning
Priyaranjan Mohanty
@AUTHOR : Admin

Tags:Eco, Water, Air, Environment

Comments (0)

    No Comments Found
Leave a Comment