RandomCode - group_walk

I ran across this little gem at work today trying to build a function to easily iterate out a series of excel files (.xlsx) that needed to be generated from an identifer column in a larger dataset. Lets take a look.

library(tidyverse)
library(writexl)

Let’s grab some data!!!

data(iris)

Let’s take a look.

iris %>% 
  skimr::skim()
Table 1: Data summary
NamePiped data
Number of rows150
Number of columns5
_______________________
Column type frequency:
factor1
numeric4
________________________
Group variablesNone

Variable type: factor

skim_variablen_missingcomplete_rateorderedn_uniquetop_counts
Species01FALSE3set: 50, ver: 50, vir: 50

Variable type: numeric

skim_variablen_missingcomplete_ratemeansdp0p25p50p75p100hist
Sepal.Length015.840.834.35.15.806.47.9▆▇▇▅▂
Sepal.Width013.060.442.02.83.003.34.4▁▆▇▂▁
Petal.Length013.761.771.01.64.355.16.9▇▁▆▇▂
Petal.Width011.200.760.10.31.301.82.5▇▁▇▅▃

You’ll notice that aside from the 4 numeric species attributes (Sepal & Petal length and width), there is an identification/classification variable (actually a factor variable) that denotes which Species each flower is.

For this demonstration, I’d like to create three seperate ‘.xlsx’ files simply based on this identification. What would be helpful, in the end, is to have each file named after this identification.

Thankfully, the {dplyr} package has made this super simple.

iris %>% 
  group_by(Species) %>% # Group by the variable for which you wish to iterate over to create individual files from. 
  group_walk(~write_xlsx(.x, paste0("iris_", .y$Species, ".xlsx")), keep = TRUE)

group_by

To walk through this a bit, you’ll notice a simple group_by function that serves to identify which column we wish to not only group our data by, but the one which we’d like to iterate over to make individual files from.

group_walk (group_map)

Next, the group_walk function, which is an extension of the group_map series (more info here) that mimics many purrr functions, except through a series of groups… not just a list, df columsn, or some type of nest.

Identical to the other walk functions that are used for their ‘side effects’ (think output, not what it returns… ), the group_walk function silently returns the .x argument. We’re interested in what it does, not the data it may produce. Afterall, I just want the output of the function. I plan to dive into this in a future map post in the future… it can be confusing. Nicely, the group_walk function also includes an option to keep the grouping variable through the keep = option that retains the variable in each .x.

To wrap this up, the paste0 function serves to meet the second argument requirement for write_xlsx. If you’re not familiar with paste0/paste, it simply combindes all arguments together. In this case, it is simply taking the string prefix “iris_,” each Species, and “.xlsx” and concatenates them.

Related