Blog

R is primarily attributed to be a language used for 'Quantitative Analysis' or 'Statistical Analysis' and thus , we rarely think of utility of R towards Data Processing.

But there are multiple Use Cases as part of Data Analytics which needs 'String Processing' capabilities :-

a) Text Mining 
b) Sentiment Analysis
c) Word Cloud creation
d) Web Scraping 
e) Natural Language Processing
etc ....

So , lets build our understanding / knowledge on 'String Processing' using R.......

The most popular package in R for string handling is "stringr".

"stringr" package is not available by default in R and thus has to be installed & loaded into R env before we can use the functions available with in "stringr" package ...

1) Installing & Loading the 'stringr' package 

> install.packages('stringr',dependencies = T)

> library('stringr')

2) There are few string functions available with base R .. ( which are useful in String Processing ) 

Example : To check if the variable contains string type of data 
> is.character("String Data") 
[1] TRUE

3) Now , lets look at various functions available with "stringr" package 

Example 1 : To convert the String to all Capital letters ( str_to_upper() )
> str_to_upper("Convert to All Caps") 
[1] "CONVERT TO ALL CAPS"

Example 2 : str_to_upper() is vectorized , i.e this functions also takes a character vector as input
> str_to_upper(c("String1" , "string2")) 
[1] "STRING1" "STRING2"

Example 3 : To convert the String to all Small letters ( str_to_lower() )
> str_to_lower("Convert to All Small") 
[1] "convert to all small"

Example 4 : str_to_lower() is vectorized , i.e this functions also takes a character vector as input
> str_to_lower(c("String1" , "string2")) 
[1] "string1" "string2"

Example 5 : To convert the String to Title case ( str_to_title() ) [ convert all first character to Caps for each Word in the Sentence ]
> str_to_title("Convert to title case") 
[1] "Convert To Title Case"

Example 6 : str_to_title() is vectorized , i.e this functions also takes a character vector as input
> str_to_title(c("String1" , "string2")) 
[1] "String1" "String2"

Example 7 : word() is used to extract words from a string containing a sentence
> word("string for test") 
[1] "string"

Example 8 : word() can be used to extract words from certain position & also can extract certain number of words .
> word("string for test" , 2,3)
[1] "for test"


Example 9 : str_detect() returns TRUE or FALSE based on whether the Pattern is found in the String

> str_detect("Steve Jobs" , "Job")
[1] TRUE

> str_detect(c("Steve Jobs" , "Steve Balmer"), "Job")
[1]  TRUE FALSE


Example 10 : str_which() takes a string vector and returns the element number which matches the search Pattern.

> str_which(c("Steve Woznaik","Steve Jobs" , "Steve Balmer"), "Job") 
[1] 2

Example 11 : str_subset() returns  only those strings from the String Vector which matches the Search Pattern.

> str_subset(c("Steve Woznaik","Steve Jobs" , "Steve Balmer"), "Job")
[1] "Steve Jobs"

Example 12 : str_sub() returns  the Sub-string from the String Vector as per the start and end Position argument

> str_sub("Test String" , 2 , 4)
[1] "est"

> str_sub(c("Test String","last String") , 2 , 4)
[1] "est" "ast"

Example 13 : str_length() returns  the count of the characters in the String 

> str_length("Test String" )
[1] 11

> str_length(c("Test String","Test String 2 "))
[1] 11 14

Example 14 : str_c() concatenates individual Strings into a single string 

Plain vanila String Concatenation
> str_c("Test" , "String")
[1] "TestString"

String Concatenation with separator (blank Space ) between the combined strings
> str_c("Test" , "String" , sep = " ") 
[1] "Test String"

Vectorized String Concatenation with separator (blank Space ) between the combined strings
> str_c(c("Test","Last") , "String" , sep = " ")
[1] "Test String" "Last String"

Collapsing Vectorized String Concatenation with separator (blank Space ) between the combined strings
> str_c(c("Test","Last") , "String" , sep = " " , collapse = "/")
[1] "Test String/Last String"

Example 15 : str_count() counts the number of occurrences of search pattern being found in the Original String 

> str_count("aabra kaa daabra" , "aa")
[1] 3
> str_count("aabra kaa daabra" , "\\s")
[1] 2

Example 16 : str_locate_all() returns all the positions in the String where the pattern is being found

> str_locate("aabra kaa daabra" , "aa")
     start end
[1,]     1   2

> str_locate_all("aabra kaa daabra" , "aa")
[[1]]
     start end
[1,]     1   2
[2,]     8   9
[3,]    12  13

Example 17 : str_replace_all() replaces the patterns found in the string with replacement substring

> str_replace_all(c("Test 1 x","Test 2"),"\\s" ,"_")
[1] "Test_1_x" "Test_2" 

Example 18 : str_split() splits the original string into individual substring based on the pattern mentioned as delimiter.
Note : str_split() returns a "list" data type 

> str_split("Testing the split","\\s")
[[1]]
[1] "Testing" "the"     "split"  

> str_split("Testing_the_split","_")
[[1]]
[1] "Testing" "the"     "split" 

> str_split(c("Testing_01","Testing_02"),"_")
[[1]]
[1] "Testing" "01"     
[[2]]
[1] "Testing" "02" 

Example 19 : str_trim() removes all the leading and trailing Spaces in the Input String .

> str_trim(" String   having   Spaces   ")
[1] "String   having   Spaces"


Example 20 : str_squish() removes all the leading and trailing Spaces in the Input String , also removes embedded multiple Spaces .

> str_squish(" String   having   Spaces   ")
[1] "String having Spaces"

Example 21 : str_match_all() returns all the substrings from the input string which matches the search pattern.

> str_match_all(c("aabraa","ka","daabraa"),"a{2}")
[[1]]
     [,1]
[1,] "aa"
[2,] "aa"

[[2]]
     [,1]

[[3]]
     [,1]
[1,] "aa"
[2,] "aa"


Thanks & Happy Learning !!
Priyaranjan Mohanty
@AUTHOR : Admin

Tags:Eco, Water, Air, Environment

Comments (0)

    No Comments Found
Leave a Comment