writing in Rmd with inline code

in Rmd writing reproducibility

January 14, 2022

One of the best things about RMarkdown is that you can use inline code to report summary and inferential statistics in your text. This means that it is impossible to make an error and if your data/values change, the text automatically updates.

Here I play with some penguin data and reporting summary stats using inline code.

load packages/data

library(tidyverse)
library(janitor)
library(palmerpenguins)
library(gt)

options(digits=2)

penguins <- penguins

count the penguins

Lets make a table that counts how many penguins there are in each species.

Here I’m using the tabyl() function from the janitor package to count how many penguins there are in each species and adorn a total column, then printing the table using gt().

count_penguins <- penguins %>%
  tabyl(species) %>%
  adorn_totals() 


gt(count_penguins)

species n percent
Adelie 152 0.44
Chinstrap 68 0.20
Gentoo 124 0.36
Total 344 1.00

Now I can use inline text to refer to values in the count_penguins dataframe. The syntax goes like this…

r dataframe$column[rownumber]

For example, the following text in my Rmd file…

… knits into the text below.

In the Palmer penguins dataset, there are body measurements from a total of 344 penguins. There are 3 species represented (N = 124 Gentoo, N = 68 Chinstrap and N = 152 Adelie).

Lets get some summary statistics.

body_mass <- penguins %>%
  group_by(species) %>%
  summarise(mean = mean(body_mass_g, na.rm = TRUE))

gt(body_mass)

species mean
Adelie 3701
Chinstrap 3733
Gentoo 5076

Now this text in my Rmd file…

… knits into the text below.

On average, Gentoo penguins are the heaviest (M = 5076.02 g); Chinstrap (M = 3733.09 g) and Adelie (M = 3700.66 g) penguins are smaller.

Reproducibility risks with inline code

Writing reports with Rmd can save you tons of time because once you have the code, you can reuse it with different data. But there are also risks… what if in the next penguin experiment, the mean body mass that ended up in the 3rd row of this table wasn’t for the Gentoo penguins, but rather some other species.

You can refer to rows by name in inline code using the column_to_rowname() function from tibble.

body_mass <- penguins %>%
  group_by(species) %>%
  summarise(mean = mean(body_mass_g, na.rm = TRUE)) %>%
  column_to_rownames(var = "species") # this replaces rownames that are numbers with species values

# print the rownames
rownames(body_mass)
## [1] "Adelie"    "Chinstrap" "Gentoo"

Once the species values are rownames, you can refer to a particular row,column by their name within square brackets. Not sure why you need quotes to refer by row/col name… its a mystery but it works!

On average, Gentoo penguins are the heaviest (M = 5076.02 g); Chinstrap (M = 3733.09 g) and Adelie (M = 3700.66 g) penguins are smaller.

Posted on:
January 14, 2022
Length:
10 minute read, 1955 words
Categories:
Rmd writing reproducibility
Tags:
Rmd writing reproducibility
See Also:
knitting to pdf
parameterised penguins