An R-focused pipeline toolkit for reproducibility and high-performance computing
These changes invalidate some targets in some workflows, but they are necessary bug fixes.
$<-()
and @<-()
(#1144).bind_plans()
(#1136, @jennysjaarda).analyze_assign()
(#1119, @jennysjaarda)."running"
progress of dynamic targets."fst_tbl"
format for large tibble
targets (#1154, @kendonB).format
argument to make()
, an optional custom storage format for targets without an explicit target(format = ...)
in the plan (#1124).lock_cache
argument to make()
to optionally suppress cache locking (#1129). (It can be annoying to interrupt make()
repeatedly and unlock the cache manually every time.)cancel()
and cancel_if()
function to cancel targets mid-build (#1131).subtarget_list
argument to loadd()
and readd()
to optionally load a dynamic target as a list of sub-targets (#1139, @MilesMcBain).file_out()
(#1141).drake_config()
level (#1156, @MilesMcBain).config
argument in all user-side functions (#1118, @vkehayas). Users can now supply the plan and other make()
arguments directly, without bothering with drake_config()
. Now, you only need to call drake_config()
in the _drake.R
file for r_make()
and friends. Old code with config
objects should still work. Affected functions:
make()
outdated()
drake_build()
drake_debug()
recoverable()
missed()
deps_target()
deps_profile()
drake_graph_info()
vis_drake_graph()
sankey_drake_graph()
drake_graph()
text_drake_graph()
predict_runtime()
. Needed to rename the targets
argument to targets_predict
and jobs
to jobs_predict
.predict_workers()
. Same argument name changes as predict_runtime()
.drake_config()
is to serve functions r_make()
and friends.@
operator. For example, in the static code analysis of x@y
, do not register y
as a dependency (#1130, @famuvie).deps_profile()
(#1134, @kendonB).deps_target()
output (#1134, @kendonB).drake_meta_()
objects objects.drake_envir()
and id_chr()
(#1132).drake_envir()
to select the environment with imports (#882).vctrs
paradigm and its type stability for dynamic branching (#1105, #1106).target
as a symbol by default in read_trace()
. Required for the trace to make sense in #1107."future"
backend (#1083, @jennysjaarda).log_build_times
argument to make()
and drake_config()
. Allows users to disable the recording of build times. Produces a speedup of up to 20% on Macs (#1078).make()
, outdated(make_imports = TRUE)
, recoverable(make_imports = TRUE)
, vis_drake_graph(make_imports = TRUE)
, clean()
, etc. on the same cache.format
trigger to invalidate targets when the specialized data format changes (#1104, @kendonB).cache_planned()
and cache_unplanned()
to help selectively clean workflows with dynamic targets (#1110, @kendonB).drake_config()
objects and analyze_code()
objects."qs"
format (#1121, @kendonB).%||%
(%|||%
is faster). (#1089, @billdenney)%||NA
due to slowness (#1089, @billdenney).is_dynamic()
and is_subtarget()
(#1089, @billdenney).getVDigest()
instead of digest()
(#1089, #1092, https://github.com/eddelbuettel/digest/issues/139#issuecomment-561870289, @eddelbuettel, @billdenney).backtick
and .deparseOpts()
to speed up deparse()
(#1086, https://stackoverflow.com/users/516548/g-grothendieck
, @adamkski).build_times()
(#1098).mget_hash()
in progress()
(#1098).drake_graph_info()
(#1098).outdated()
(#1098).make()
, avoid checking for nonexistent metadata for missing targets.drake_config()
.use_drake()
(#1097, @lorenzwalthert, @tjmahr).drake
's interpretation of the plan. In the plan, all the dependency relationships among targets and files are implicit. In the spec, they are all explicit. We get from the plan to the spec using static code analysis, e.g. analyze_code()
.drake::drake_plan(x = target(...))
from throwing an error if drake
is not loaded (#1039, @mstr3336).transformations
lifecycle badge to the proper location in the docstring (#1040, @jeroen).readd()
/ loadd()
from turning an imported function into a target (#1067).disk.frame
targets with their stored values (#1077, @brendanf).subtargets()
function to get the cached names of the sub-targets of a dynamic target.subtargets
arguments to loadd()
and readd()
to retrieve specific sub-targets from a parent dynamic target.get_trace()
and read_trace()
functions to help track which values of grouping variables go into the making of dynamic sub-targets.id_chr()
function to get the name of the target while make()
is running.plot(plan)
(#1036).vis_drake_graph()
, drake_graph_info()
, and render_drake_graph()
now
take arguments that allow behavior to be defined upon selection of nodes. (#1031, @mstr3336).max_expand
argument to make()
and drake_config()
to scale down dynamic branching (#1050, @hansvancalster).drake_config()
objects.prework
is a language object, list of language objects, or character vector (#1 at pat-s/multicore-debugging on GitHub, @pat-s).config$layout
. Supports internal modifications by reference. Required for #685.dynamic
a formal argument of target()
.storr
s and decorated storr
s (#1071).setdiff()
and avoiding names(config$envir_targets)
.dir_size()
. Incurs rehashing for some workflows, but should not invalidate any targets.which_clean()
function to preview which targets will be invalidated by clean()
(#1014, @pat-s).storr
(#1015, @billdenney, @noamross)."diskframe"
format for larger-than-memory data (#1004, @xiaodaigh).drake_tempfile()
function to help with "diskframe"
format. It makes sure we are not copying large datasets across different physical storage media (#1004, @xiaodaigh).code_to_function()
to allow for parsing script based workflows into functions so drake_plan()
can begin to manage the workflow and track dependencies. (#994, @thebioengineer)max_expand
in drake_plan()
. max_expand
is now the maximum number of targets produced by map()
, split()
, and cross()
. For cross()
, this reduces the number of targets (less cumbersome) and makes the subsample of targets more representative of the complete grid. It also. ensures consistent target naming when .id
is FALSE
(#1002). Note: max_expand
is not for production workflows anyway, so this change does not break anything important. Unfortunately, we do lose the speed boost in drake_plan()
originally due to max_expand
, but drake_plan()
is still fast, so that is not so bad.NULL
targets (#998).cross()
(#1009). The same fix should apply to map()
and split()
too.map()
(#1010).fst
-powered saving of data.table
objects.transform
a formal argument of target()
so that users do not have to type "transform =" all the time in drake_plan()
(#993).ropensci.github.io/drake
to docs.ropensci.org/drake
.target(format = "keras")
(#989).verbose
argument in various caching functions. The location of the cache is now only printed in make()
. This made the previous feature easier to implement.combine()
(#1008).storr
(#968).Fix broken README links.
format
argument of target()
(#971). This allows users to leverage faster ways to save and load targets, such as write_fst()
for data frames and save_model_hdf5()
for Keras models. It also improves memory because it prevents storr
from making a serialized in-memory copy of large data objects.
tidyselect
functionality for ...
in progress()
, analogous to loadd()
, build_times()
, and clean()
.do_stuff()
and the method stuff.your_class()
are defined in envir
, and if do_stuff()
has a call to UseMethod("stuff")
, then drake
's code analysis will detect stuff.your_class()
as a dependency of do_stuff()
.file_in()
URLs. Requires the new curl_handles
argument of make()
and drake_config()
(#981).drake_plan(transform = slice())
understand .id
and grouping variables (#963).clean(garbage_collection = TRUE, destroy = TRUE)
. Previously it destroyed the cache before trying to collect garbage.r_make()
passes informative error messages back to the calling process (#969).map()
and cross()
on topologically side-by-side targets (#983).dsl_left_outer_join()
so cross()
selects the right combinations of existing targets (#986). This bug was probably introduced in the solution to #983.progress()
more consistent, less dependent on whether tidyselect
is installed.target()
, map()
, split()
, cross()
, and combine()
(#979).file_out()
files in clean()
unless garbage_collection
is TRUE
. That way, make(recover = TRUE)
is a true "undo button" for clean()
. clean(garbage_collection = TRUE)
still removes data in the cache, as well as any file_out()
files from targets currently being cleaned.clean()
only appears if garbage_collection
is TRUE
. Also, this menu is added to rescue_cache(garbage_collection = TRUE)
..drake/
. The old .drake_history/
folder was awkward. Old histories are migrated during drake_config()
, and drake_history()
..drake_history/
in plan_to_code()
, plan_to_notebook()
, and the help file examples. Should fix the note at https://win-builder.r-project.org/incoming_pretest/drake_7.5.1_20190721_153755/Debian/00check.log.make(recover = TRUE)
.recoverable()
and r_recoverable()
to show targets that are outdated but recoverable via make(recover = TRUE)
.drake_history()
. Powered by txtq
(#918, #920).no_deps()
function, similar to ignore()
. no_deps()
suppresses dependency detection but still tracks changes to the literal code (#910).transform_plan()
.seed
column of drake
plans to set custom seeds (#947).seed
trigger to optionally ignore changes to the target seed (#947).drake_plan()
, interpret custom columns as non-language objects (#942).clustermq
>= 0.8.8.ensure_workers
in drake_config()
and make()
.make()
after config
is already supplied.make()
from inside the cache (#927).CITATION
file with JOSS paper.deps_profile()
, include the seed and change the names.make()
. All this does is invalidate old targets.set_hash()
and get_hash()
in storr
to double the speed of progress tracking.$
(#938).xxhash64
as the default hash algorithm for non-storr
hashing if the driver does not have a hash algorithm.These changes are technically breaking changes, but they should only affect advanced users.
rescue_cache()
no longer returns a value.clustermq
(#898). Suggest version >= 0.8.8 but allow 0.8.7 as well.drake
recomputes config$layout
when knitr
reports change (#887).make()
(#878).r_drake_build()
.r_make()
(#889).expose_imports()
: do not do the environment<-
trick unless the object is a non-primitive function.assign()
vs delayedAssign()
.file_in()
files and other strings (#896).ignore()
work inside loadd()
, readd()
, file_in()
, file_out()
, and knitr_in()
.file_in()
and file_out()
. drake
now treats file_in()
/file_out()
files as URLS if they begin with "http://", "https://", or "ftp://". The fingerprint is a concatenation of the ETag and last-modified timestamp. If neither can be found or if there is no internet connection, drake
throws an error."unload"
and "none"
, which do not attempt to load a target's dependencies from memory (#897).drake_slice()
to help split data across multiple targets. Related: #77, #685, #833.drake_cache()
function, which is now recommended instead of get_cache()
(#883).r_deps_target()
function.r_make()
, r_vis_drake_graph()
, and r_outdated()
(#892).