how to subset strings
By Jen Richmond
September 29, 2022
Sometimes you have a column in your data frame that is text, but there is some of it that you don’t need. Lets say your data looks like this…
df <- data.frame(animals = c("this is a bear", "this is a lion", "this is a tiger"))
df
## animals
## 1 this is a bear
## 2 this is a lion
## 3 this is a tiger
And perhaps only want the animal names… you can use sub_str()
from the stringr
package to strip out the extra characters. The sub_str()
function allows you to specify the position of the character you want to start and end with.
Here I want to start at the 11th character and keep the rest.
Note: spaces are included in your character count.
df_new <- df %>%
mutate(new_col = str_sub(animals, start = 11))
df_new
## animals new_col
## 1 this is a bear bear
## 2 this is a lion lion
## 3 this is a tiger tiger
If you wanted to get just the “is a” piece of the string, you can specify both a start and end character.
df_new <- df_new %>%
mutate(new_col2 = str_sub(animals, start = 6, end = 9))
df_new
## animals new_col new_col2
## 1 this is a bear bear is a
## 2 this is a lion lion is a
## 3 this is a tiger tiger is a
You can also use - to count backwards from the end. Here I am selecting the last 9 characters.
df_new <- df_new %>%
mutate(new_col3 = str_sub(animals, start = -9))
df_new
## animals new_col new_col2 new_col3
## 1 this is a bear bear is a is a bear
## 2 this is a lion lion is a is a lion
## 3 this is a tiger tiger is a s a tiger
Yay- I no longer have to google (IDHTG) how to subset strings with stringr!