datawrangling

my favourite things about R

I am prepping a talk for R-Ladies Sydney about my favourite R things, the packages and functions that end up in every script I write.

Pivoting

Cute #rstats monster art by the amazing Allison Horst. knitr::include_graphics("gatherspread.jpeg") I have been using gather() and spread() a lot lately. I’m on the tidy data train; long data is essential for ggplot etc, but sometimes you want to do calculations row wise, which is kinda complicated. For example, this week Matilda and I were working with her language/locomotion data and we were looking at the number of action-directed, affirmative, and descriptive responses that parents make to their infants.

Just Three Things

I love me a good #rstats screencast. David Robinson has been screencasting his #TidyTuesday efforts for the past few months and while it is GREAT to watch a master at work, I just don’t have time to watch someone code for an hour, in order to extract a handful of tips. So when I saw Nick Tierney tweet about posting short videos that contain Just Three Things, I thought “that is a GREAT idea.

more wrangling tips

It is definitely true that it takes much longer to get your data ready for analysis than it does to actually analyse it. Apparently up to 80% of the data analysis time is spent wrangling data (and cursing and swearing). Did you know up to 80% of data analysis is spent on the process of cleaning and preparing data? - cf. Wickham, 2014 and Dasu and Johnson, 2003 So here is an excellent approach to data wrangling in #rstats https://t.

lesser known stars of the tidyverse

Emily Robinson writes a great blog called www.hookedondata.org. She talked at the 2018 New York R conference recently and shared some of her favourite (less well known) stars of the Tidyverse. Here are her slides www.tiny.cc/nyrtalk and my notes… 1.use as_tibble() Tibble = modern dataframe. Use instead of printing your dataset to the console. as_tibble() will only print the first 10 rows and columns that fit on the screen.

dirty data

I have been doing lots of data wrangling recently and decided a needed a quick rundown of data cleaning in R. Here are notes on useful things I learned recently. Quick summaries class() will let you know whether you are working with a dataframe or not dim() gives you a little info about the dimensions of your data by telling you how many rows nd columns you have