library(tidyverse)
library(writexl)
I ran across this little gem at work today trying to build a function to easily iterate out a series of excel files (.xlsx
) that needed to be generated from an identifier column in a larger dataset. Lets take a look.
Let’s grab some data!!!
data(iris)
Let’s take a look.
%>%
iris ::skim() skimr
Name | Piped data |
Number of rows | 150 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Sepal.Length | 0 | 1 | 5.84 | 0.83 | 4.3 | 5.1 | 5.80 | 6.4 | 7.9 | ▆▇▇▅▂ |
Sepal.Width | 0 | 1 | 3.06 | 0.44 | 2.0 | 2.8 | 3.00 | 3.3 | 4.4 | ▁▆▇▂▁ |
Petal.Length | 0 | 1 | 3.76 | 1.77 | 1.0 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▆▇▂ |
Petal.Width | 0 | 1 | 1.20 | 0.76 | 0.1 | 0.3 | 1.30 | 1.8 | 2.5 | ▇▁▇▅▃ |
You’ll notice that aside from the 4 numeric species attributes (Sepal & Petal length and width), there is an identification/classification variable (actually a factor variable) that denotes which Species
each flower is.
For this demonstration, I’d like to create three seperate ‘.xlsx’ files simply based on this identification. What would be helpful, in the end, is to have each file named after this identification.
Thankfully, the {dplyr}
package has made this super simple.
%>%
iris group_by(Species) %>% # Group by the variable for which you wish to iterate over to create individual files from.
group_walk(~ write_xlsx(.x, paste0("iris_", .y$Species, ".xlsx")), keep = TRUE)
group_by
To walk through this a bit, you’ll notice a simple group_by
function that serves to identify which column we wish to not only group our data by, but the one which we’d like to iterate over to make individual files from.
group_walk (group_map)
Next, the group_walk
function, which is an extension of the group_map
series (more info here) that mimics many purrr
functions, except through a series of groups… not just a list, df columsn, or some type of nest.
Identical to the other walk
functions that are used for their ‘side effects’ (think output, not what it returns… ), the group_walk
function silently returns the .x
argument. We’re interested in what it does, not the data it may produce. Afterall, I just want the output of the function. I plan to dive into this in a future map
post in the future… it can be confusing. Nicely, the group_walk
function also includes an option to keep the grouping variable through the keep =
option that retains the variable in each .x
.
To wrap this up, the paste0
function serves to meet the second argument requirement for write_xlsx
. If you’re not familiar with paste0/paste, it simply combindes all arguments together. In this case, it is simply taking the string prefix “iris_,” each Species
, and “.xlsx” and concatenates them.
Reuse
Citation
@online{l.debusk-lane2020,
author = {M. L. DeBusk-Lane},
title = {Group\_walk},
date = {2020-02-25},
langid = {en}
}