my favourite things about R

I am prepping a talk for R-Ladies Sydney about my favourite R things, the packages and functions that end up in every script I write.

joining dataframes

The dplyr package includes several different kind of joining functions which allow you to join dataframes together, when they share a common id variable.

across and summary tables

Don’t make objects that have the same name as a function and how to use across() to get summary statistics

group_by and summarise

Some students have been asking me how they can calcuate means and standard errors by condition. Here is a quick example using the palmer penguin data. Details of the palmer penguin data, with art by Allison Horst, can be found here. load packages library(palmerpenguins) library(tidyverse) read in data penguins <- penguins glimpse(penguins) ## Rows: 344 ## Columns: 8 ## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel… ## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse… ## $ bill_length_mm <dbl> 39.

recoding variables

Series: IDHTG

I don’t often deal with questionnaire data in R, but Ariana and I have started trying import her qualtrics data into R and to write a script to score her measures. The first step is to recode the variables to make it possible to add up scores on subscales. load packages library(tidyverse) make a little dataframe df <- data.frame("pp_no" = 1:12, "sectionA_1" = c("Strongly Agree","Agree", "Disagree","Strongly Disagree"), "sectionA_2" = c("Strongly Agree","Agree", "Disagree","Strongly Disagree"), "sectionB_1" = c("Frequently","Sometimes", "Infrequently"), "sectionB_2" = c("Frequently","Sometimes", "Infrequently")) Option 1: use mutate() and case_when() My first intuition is to use case_when(), which I have written about before.

mutate + if else = new conditional variable

I keep googling these slides by David Ranzolin each time I try to combine mutate with ifelse to create a new variable that is conditional on values in other variables.

more wrangling tips

It is definitely true that it takes much longer to get your data ready for analysis than it does to actually analyse it. Apparently up to 80% of the data analysis time is spent wrangling data (and cursing and swearing). Did you know up to 80% of data analysis is spent on the process of cleaning and preparing data? - cf. Wickham, 2014 and Dasu and Johnson, 2003 So here is an excellent approach to data wrangling in #rstats https://t.