Rowwise operation in dplyr 1.0.0
I have written a post on rowwise operation of data frame in
R a
while ago. purrr::pmap()
is recommended for rowwise operation in that post, since
other methods have their own disadvantages. However, there will be a better rowwise operation support in dplyr 1.0.0 (will be released soon),
and it is very intuitive, simple, easy to use.
Basic
library(dplyr)
df <- tibble(x = 1:3, y = 2:4, z = 3:5)
df %>% rowwise() %>% mutate(m = mean(c(x, y, z)))
## # A tibble: 3 x 4
## # Rowwise:
## x y z m
## <int> <int> <int> <dbl>
## 1 1 2 3 2
## 2 2 3 4 3
## 3 3 4 5 4
We can use tidy selection syntax to succinctly select any variables with
c_across()
.
df %>% rowwise() %>% mutate(m = mean(c_across(everything())))
## # A tibble: 3 x 4
## # Rowwise:
## x y z m
## <int> <int> <int> <dbl>
## 1 1 2 3 2
## 2 2 3 4 3
## 3 3 4 5 4
# equal to
df %>% rowwise() %>% mutate(m = mean(c_across(is.numeric)))
## # A tibble: 3 x 4
## # Rowwise:
## x y z m
## <int> <int> <int> <dbl>
## 1 1 2 3 2
## 2 2 3 4 3
## 3 3 4 5 4
df %>% rowwise() %>% mutate(m = mean(c_across(x:z)))
## # A tibble: 3 x 4
## # Rowwise:
## x y z m
## <int> <int> <int> <dbl>
## 1 1 2 3 2
## 2 2 3 4 3
## 3 3 4 5 4
rowwise()
is behave somewhat similarly to the grouping variables passed to group_by()
, we can preserve variables by rowwise(<var_to_perserve>)
.
# .before = x means new var `v` should locate before var `x`
df2 <- mutate(df, v = letters[1:3], .before = x)
df2 %>% rowwise(v) %>% mutate(m = mean(x:z))
## # A tibble: 3 x 5
## # Rowwise: v
## v x y z m
## <chr> <int> <int> <int> <dbl>
## 1 a 1 2 3 2
## 2 b 2 3 4 3
## 3 c 3 4 5 4
row-wise summary funcitons in base R
For more efficient, we can use row-wise summary functions in base R
across()
is required for multiple
# use rowMeans
df %>% mutate(m = rowMeans(across(everything())))
## # A tibble: 3 x 4
## x y z m
## <int> <int> <int> <dbl>
## 1 1 2 3 2
## 2 2 3 4 3
## 3 3 4 5 4
# equal to
mutate(df, m = rowMeans(df))
## # A tibble: 3 x 4
## x y z m
## <int> <int> <int> <dbl>
## 1 1 2 3 2
## 2 2 3 4 3
## 3 3 4 5 4
Advanced usage
Run a function many times with different arguments
# example from dplyr vignette, rowwise
df <- tribble(
~ n, ~ min, ~ max,
1, 0, 1,
2, 10, 100,
3, 100, 1000,
)
# list is required in here, since `mutate` has to return something in length 1
df %>%
rowwise() %>%
mutate(data = list(runif(n, min, max)))
## # A tibble: 3 x 4
## # Rowwise:
## n min max data
## <dbl> <dbl> <dbl> <list>
## 1 1 0 1 <dbl [1]>
## 2 2 10 100 <dbl [2]>
## 3 3 100 1000 <dbl [3]>
# we can also use `purrr::pmap()`
df %>% mutate(data = purrr::pmap(., .f = runif))
## # A tibble: 3 x 4
## n min max data
## <dbl> <dbl> <dbl> <list>
## 1 1 0 1 <dbl [1]>
## 2 2 10 100 <dbl [2]>
## 3 3 100 1000 <dbl [3]>
More complicated problems, vary the function being called
# example from dplyr vignette, rowwise
df <- tribble(
~rng, ~params,
"runif", list(n = 10),
"rnorm", list(n = 20),
"rpois", list(n = 10, lambda = 5),
)
df %>%
rowwise() %>%
mutate(data = list(do.call(rng, params)))
## # A tibble: 3 x 3
## # Rowwise:
## rng params data
## <chr> <list> <list>
## 1 runif <named list [1]> <dbl [10]>
## 2 rnorm <named list [1]> <dbl [20]>
## 3 rpois <named list [2]> <int [10]>
# use purrr::map2, more complicated
df %>% mutate(data = purrr::map2(rng, params, ~ do.call(.x, .y)))
## # A tibble: 3 x 3
## rng params data
## <chr> <list> <list>
## 1 runif <named list [1]> <dbl [10]>
## 2 rnorm <named list [1]> <dbl [20]>
## 3 rpois <named list [2]> <int [10]>
Fore more details see the vignette in dplyr.