Frank's answer is by far the simplest, but here's a swatch of code I've worked on for mid-pipe debugging and such.
Caveat emptor:
- this code is under-tested;
- even if well-tested, there is no intention for this to be used in production or unattended use;
- it has not been blessed or even reviewed by any authors or contributors to
dplyr
and related packages;
- it currently works in R-3.4 and
dplyr-0.7.4
, but it is not taking advantage of many "goodnesses" that should be used, such as rlang
and/or lazyeval
;
- it works for my uses, not tested for yours.
Bug reports welcome, if/when you find something amyss.
Mid-pipe message
This can include just about anything you want:
mtcars %>%
group_by(cyl) %>%
pipe_message(whichcyl = cyl[1], bestmpg = max(mpg)) %>%
summarize(mpg=mean(mpg))
# Mid-pipe message (2018-05-01 09:39:26):
# $ :List of 2
# ..$ whichcyl: num 4
# ..$ bestmpg : num 33.9
# $ :List of 2
# ..$ whichcyl: num 6
# ..$ bestmpg : num 21.4
# $ :List of 2
# ..$ whichcyl: num 8
# ..$ bestmpg : num 19.2
# # A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4. 26.7
# 2 6. 19.7
# 3 8. 15.1
Mid-pipe assert
You can optionally just realize what's going on and look at the data quickly, allowing you to see the moment and then exit out of the pipe:
mtcars %>%
group_by(cyl) %>%
pipe_assert(all(mpg > 12), .debug=TRUE) %>%
summarize(mpg = mean(mpg))
# #
# # all(mpg > 12) is not TRUE ... in Group: cyl:8
# # 'x' is the current data that failed the assertion.
# #
# Called from: pipe_assert(., all(mpg > 12), .debug = TRUE)
# Browse[1]>
# debug at c:/Users/r2/Projects/StackOverflow/pipe_funcs.R#81: if (identical(x, .x[.indices[[.ind]], ])) {
# stop(.msg, call. = FALSE)
# } else {
# .x[.indices[[.ind]], ] <- x
# return(.x)
# }
# Browse[2]>
x
# # A tibble: 14 x 11
# # Groups: cyl [1]
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 18.7 8. 360. 175. 3.15 3.44 17.0 0. 0. 3. 2.
# 2 14.3 8. 360. 245. 3.21 3.57 15.8 0. 0. 3. 4.
# 3 16.4 8. 276. 180. 3.07 4.07 17.4 0. 0. 3. 3.
# 4 17.3 8. 276. 180. 3.07 3.73 17.6 0. 0. 3. 3.
# 5 15.2 8. 276. 180. 3.07 3.78 18.0 0. 0. 3. 3.
# 6 10.4 8. 472. 205. 2.93 5.25 18.0 0. 0. 3. 4.
# 7 10.4 8. 460. 215. 3.00 5.42 17.8 0. 0. 3. 4.
# 8 14.7 8. 440. 230. 3.23 5.34 17.4 0. 0. 3. 4.
# 9 15.5 8. 318. 150. 2.76 3.52 16.9 0. 0. 3. 2.
# 10 15.2 8. 304. 150. 3.15 3.44 17.3 0. 0. 3. 2.
# 11 13.3 8. 350. 245. 3.73 3.84 15.4 0. 0. 3. 4.
# 12 19.2 8. 400. 175. 3.08 3.84 17.0 0. 0. 3. 2.
# 13 15.8 8. 351. 264. 4.22 3.17 14.5 0. 1. 5. 4.
# 14 15.0 8. 301. 335. 3.54 3.57 14.6 0. 1. 5. 8.
# Browse[2]>
c
# Error: all(mpg > 12) is not TRUE ... in Group: cyl:8
or you can optionally update/change the data; realize that this modifies the data in the pipe, not the source, so is really only good in dev and/or one-off fixes:
mtcars %>%
group_by(cyl) %>%
pipe_assert(all(mpg > 12), .debug=TRUE) %>%
summarize(mpg = mean(mpg))
# #
# # all(mpg > 12) is not TRUE ... in Group: cyl:8
# # 'x' is the current data that failed the assertion.
# #
# Called from: pipe_assert(., all(mpg > 12), .debug = TRUE)
# Browse[1]>
# debug at c:/Users/r2/Projects/StackOverflow/pipe_funcs.R#81: if (identical(x, .x[.indices[[.ind]], ])) {
# stop(.msg, call. = FALSE)
# } else {
# .x[.indices[[.ind]], ] <- x
# return(.x)
# }
(Ignore the current line of debugged code, if ...
, that's my stuff and not beautiful.) I'm in the debugger now, I can look at and alter/fix the data:
# Browse[2]>
x
# ...as before...
x$mpg <- x$mpg + 1000
If the data is changed, the pipe continues, otherwise it'll stop
.
# Browse[2]>
c
# # A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4. 26.7
# 2 6. 19.7
# 3 8. 1015.
(The data can be changed but the labels cannot ... so if we had done x$cyl <- 99
, it still would have shown 8
in rest of the pipe. This is a consequence of dplyr
not allowing you to change grouping variables ... which is a good thing, IMO.)
There's also pipe_debug
which always debugs, but it is less impressive. It also does not (currently) pass on changed data, so use pipe_assert
for that (e.g., pipe_assert(FALSE,.debug=TRUE)
).
Source, also available in my gist:
#' Mid-pipe assertions
#'
#' Test assertions mid-pipe. Each assertion is executed individually
#' on each group (if present) of the piped data. Any failures indicate
#' the group that caused the fail, terminating on the first failure.
#'
#' If `.debug`, then the interpreter enters the `browser()`, allowing
#' you to look at the specific data, stored as `x` (just the grouped
#' data if `is.grouped_df(.x)`, all data otherwise). If the data is
#' changed, then the altered data will be sent forward in the pipeline
#' (assuming you fixed the failed assertion), otherwise the assertion
#' will fail (as an assertion should).
#'
#' @param .x data.frame, potentially grouped
#' @param ... unnamed expression(s), each must evaluate to a single
#' 'logical'; similar to [assertthat::assert_that()], rather than
#' combining expressions with `&&`, separate them by commas so that
#' better error messages can be generated.
#' @param .msg a custom error message to be printed if one of the
#' conditions is false.
#' @param .debug logical, whether to invoke [browser()] if the
#' assertion fails; if `TRUE`, then when the debugger begins on a
#' fail, the grouped data will be in the variable `x`
#' @return data.frame (unchanged)
#' @export
#' @import assertthat
#' @md
#' @examples
#' \dontrun{
#'
#' library(dplyr)
#' library(assertthat)
#'
#' mtcars %>%
#' group_by(cyl) %>%
#' pipe_assert(
#' all(cyl < 9),
#' all(mpg > 10)
#' ) %>%
#' count()
#' # # A tibble: 3 x 2
#' # cyl n
#' # <dbl> <int>
#' # 1 4 11
#' # 2 6 7
#' # 3 8 14
#'
#' # note here that the "4" group is processed first and does not fail
#' mtcars %>%
#' group_by(cyl, vs) %>%
#' pipe_assert( all(cyl < 6) ) %>%
#' count()
#' # Error: all(cyl < 6) is not TRUE ... in Group: cyl:6, vs:0
#'
#' }
pipe_assert <- function(.x, ..., .msg = NULL, .debug = FALSE) {
if (is.grouped_df(.x)) {
.indices <- lapply(attr(.x, "indices"), `+`, 1L)
.labels <- attr(.x, "labels")
} else {
.indices <- list(seq_len(nrow(.x)))
}
for (assertion in eval(substitute(alist(...)))) {
for (.ind in seq_along(.indices)) {
.out <- assertthat::see_if(eval(assertion, .x[.indices[[.ind]],]))
if (! .out) {
x <- .x[.indices[[.ind]],]
if (is.null(.msg)) .msg <- paste(deparse(assertion), "is not TRUE")
if (is.grouped_df(.x)) {
.msg <- paste(.msg,
paste("in Group:",
paste(sprintf("%s:%s", names(.labels),
sapply(.labels, function(z) as.character(z[.ind]))),
collapse = ", ")),
sep = " ... ")
}
if (.debug) {
message("#\n", paste("#", .msg), "\n# 'x' is the current data that failed the assertion.\n#\n")
browser()
}
if (identical(x, .x[.indices[[.ind]],])) {
stop(.msg, call. = FALSE)
} else {
.x[.indices[[.ind]],] <- x
return(.x)
}
}
}
}
.x # "unmodified"
}
#' Mid-pipe debugging
#'
#' Mid-pipe peek at the data, named `x` within [browser()], but
#' *changes are not preserved*.
#'
#' @param .x data.frame, potentially grouped
#' @return data.frame (unchanged)
#' @export
#' @md
#' @examples
#' \dontrun{
#'
#' library(dplyr)
#'
#' mtcars %>%
#' group_by(cyl, vs) %>%
#' pipe_debug() %>%
#' count()
#'
#' }
pipe_debug <- function(.x) {
if (is.grouped_df(.x)) {
.indices <- lapply(attr(.x, "indices"), `+`, 1L)
.labels <- attr(.x, "labels")
} else {
.indices <- list(seq_len(nrow(.x)))
}
# I used 'lapply' here instead of a 'for' loop because
# browser-stepping after 'browser()' in a 'for' loop could continue
# through all of *this* code, not really meaningful; in pipe_assert
# above, since the next call after 'browser()' is 'stop()', there's
# little risk of stepping in or out of this not-meaningful code
.ign <- lapply(seq_along(.indices), function(.ind, .x) {
x <- .x[.indices[[.ind]],]
message("#",
if (is.grouped_df(.x)) {
paste("\n# in Group:",
paste(sprintf("%s:%s", names(.labels),
sapply(.labels, function(z) as.character(z[.ind]))),
collapse = ", "),
"\n")
},
"# 'x' is the current data (grouped, if appropriate).\n#\n")
browser()
NULL
}, .x = .x)
.x # "unmodified"
}
#' Mid-pipe status messaging.
#'
#' @param .x data.frame, potentially grouped
#' @param ... unnamed or named expression(s) whose outputs will be
#' captured, aggregated with [utils::str()], and displayed as a
#' [base::message()]; if present, a '.' literal is replace with a
#' reference to the `data.frame` (in its entirety, not grouped)
#' @param .FUN function, typically [message()] or [warning()] (for
#' when messages are suppressed); note: if set to `warning`, the
#' argument `call.=FALSE` is appended to the arguments
#' @param .timestamp logical, if 'TRUE' then a POSIXct timestamp is
#' appended to the header of the `str`-like output (default 'TRUE')
#' @param .stropts optional list of options to pass to [utils::str()],
#' for example `list(max.level=1)`
#' @return data.frame (unchanged)
#' @export
#' @md
#' @examples
#' \dontrun{
#'
#' library(dplyr)
#'
#' mtcars %>%
#' pipe_message( # unnamed
#' "starting",
#' group_size(.)
#' ) %>%
#' group_by(cyl) %>%
#' pipe_message( # named
#' msg = "grouped",
#' grps = group_size(.)
#' ) %>%
#' count() %>%
#' ungroup() %>%
#' pipe_message( # alternate function, for emphasis!
#' msg = "done",
#' .FUN = warning
#' )
#'
#' head(mtcars) %>%
#' pipe_message(
#' list(a = list(aa=1, bb=2, cc=3))
#' )
#' head(mtcars) %>%
#' pipe_message(
#' list(a = list(aa=1, bb=2, cc=3)),
#' .stropts = list(max.level = 2)
#' )
#'
#' }
pipe_message <- function(.x, ..., .FUN = message, .timestamp = TRUE, .stropts = NULL) {
.expressions <- eval(substitute(alist(...)))
if (is.grouped_df(.x)) {
.indices <- lapply(attr(.x, "indices"), `+`, 1L)
.labels <- attr(.x, "labels")
} else {
.indices <- list(seq_len(nrow(.x)))
.labels <- ""
}
lst <- mapply(function(.ind, .lbl) {
.x <- .x[.ind,,drop=FALSE]
lapply(.expressions, function(.expr) {
if (is.call(.expr)) .expr <- as.call(lapply(.expr, function(a) if (a == ".") as.symbol(".x") else a))
eval(.expr, .x)
})
}, .indices, .labels, SIMPLIFY=FALSE)
.out <- capture.output(
do.call("str", c(list(lst), .stropts))
)
.out[1] <- sprintf("Mid-pipe message%s:",
if (.timestamp) paste(" (", Sys.time(), ")", sep = ""))
do.call(.FUN, c(list(paste(.out, collapse = "\n")),
if (identical(.FUN, warning)) list(call. = FALSE)))
.x # "unmodified"
}
tryCatch
orpurrr::possibly
to your function to try to catch your error and then have it return something else in that case? You don't have to know what the error is in advance. – aosmith