group_by and summarise

in data wrangling dplyr

August 31, 2020

Some students have been asking me how they can calcuate means and standard errors by condition. Here is a quick example using the palmer penguin data.

Details of the palmer penguin data, with art by Allison Horst, can be found here.

load packages

library(palmerpenguins) 
library(tidyverse)

read in data

penguins <- penguins

glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

use group_by() and summarise() to get standard error by condition (aka species)

penguins %>%
  group_by(species) %>%
  summarise(mean = mean(flipper_length_mm, na.rm = TRUE), 
            n = n(), 
            stdev = sd(flipper_length_mm, na.rm = TRUE), 
            stderr = stdev/sqrt(n)) 
## # A tibble: 3 × 5
##   species    mean     n stdev stderr
##   <fct>     <dbl> <int> <dbl>  <dbl>
## 1 Adelie     190.   152  6.54  0.530
## 2 Chinstrap  196.    68  7.13  0.865
## 3 Gentoo     217.   124  6.48  0.582
Posted on:
August 31, 2020
Length:
2 minute read, 234 words
Categories:
data wrangling dplyr
See Also: