group_walk

Code
R
Author

M. L. DeBusk-Lane

Published

February 25, 2020

I ran across this little gem at work today trying to build a function to easily iterate out a series of excel files (.xlsx) that needed to be generated from an identifier column in a larger dataset. Lets take a look.

library(tidyverse)
library(writexl)

Let’s grab some data!!!

data(iris)

Let’s take a look.

iris %>%
  skimr::skim()
Data summary
Name Piped data
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sepal.Length 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sepal.Width 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Petal.Length 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Petal.Width 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃

You’ll notice that aside from the 4 numeric species attributes (Sepal & Petal length and width), there is an identification/classification variable (actually a factor variable) that denotes which Species each flower is.

For this demonstration, I’d like to create three seperate ‘.xlsx’ files simply based on this identification. What would be helpful, in the end, is to have each file named after this identification.

Thankfully, the {dplyr} package has made this super simple.

iris %>%
  group_by(Species) %>% # Group by the variable for which you wish to iterate over to create individual files from.
  group_walk(~ write_xlsx(.x, paste0("iris_", .y$Species, ".xlsx")), keep = TRUE)

group_by

To walk through this a bit, you’ll notice a simple group_by function that serves to identify which column we wish to not only group our data by, but the one which we’d like to iterate over to make individual files from.

group_walk (group_map)

Next, the group_walk function, which is an extension of the group_map series (more info here) that mimics many purrr functions, except through a series of groups… not just a list, df columsn, or some type of nest.

Identical to the other walk functions that are used for their ‘side effects’ (think output, not what it returns… ), the group_walk function silently returns the .x argument. We’re interested in what it does, not the data it may produce. Afterall, I just want the output of the function. I plan to dive into this in a future map post in the future… it can be confusing. Nicely, the group_walk function also includes an option to keep the grouping variable through the keep = option that retains the variable in each .x.

To wrap this up, the paste0 function serves to meet the second argument requirement for write_xlsx. If you’re not familiar with paste0/paste, it simply combindes all arguments together. In this case, it is simply taking the string prefix “iris_,” each Species, and “.xlsx” and concatenates them.

Reuse

Citation

BibTeX citation:
@online{l.debusk-lane2020,
  author = {M. L. DeBusk-Lane},
  title = {Group\_walk},
  date = {2020-02-25},
  langid = {en}
}
For attribution, please cite this work as:
M. L. DeBusk-Lane. 2020. “Group_walk.” February 25, 2020.