Ropensci Targets Versions Save

Function-oriented Make-like declarative workflows for R

1.7.0

3 weeks ago

targets 1.7.0

Invalidating changes

Use secretbase::siphash13() instead of digest(algo = "xxhash64", serializationVersion = 3) so hashes of in-memory objects no longer depend on serialization version 3 headers (#1244, @shikokuchuo). Unfortunately, pipelines built with earlier versions of targets will need to rerun.

Other improvements

Ensure patterns marshal properly (#1266, #1264, https://github.com/njtierney/geotargets/issues/52, @Aariq, @njtierney).
Inform and prompt the user when the pipeline was built with an old version of targets and changes to the package will cause the current work to rerun (#1244). For the tar_make*() functions, utils::menu() prompts the user to give people a chance to downgrade if necessary.
For type safety in the internal database class, read all columns as character vectors in data.table::fread(), then convert them to the correct types afterwards.
Add a new tar_resources_custom_format() function which can pass environment variables to customize the behavior of custom tar_format() storage formats (#1263, #1232, @Aariq, @noamross).
Only marshal dependencies if actually sending the target to a parallel worker.

1.6.0

1 month ago

targets 1.6.0

Modernize extras in tar_renv().
tar_target() gains a description argument for free-form text describing what the target is about (#1230, #1235, #1236, @tjmahr).
tar_visnetwork(), tar_glimpse(), tar_network(), tar_mermaid(), and tar_manifest() now optionally show target descriptions (#1230, #1235, #1236, @tjmahr).
tar_described_as() is a new wrapper around tidyselect::any_of() to select specific subsets of targets based on the description rather than the name (#1136, #1196, @noamross, @mattmoo).
Fix the documentation of the names argument (nudge users toward tidyselect expressions).
Make assertions on the pipeline process more robust (to check if two processes are trying to access the same data store).

1.5.1

2 months ago

targets 1.5.1

Avoid arrow-related CRAN check NOTE.
use_targets() only writes the _targets.R script. The run.sh and run.R scripts are superseded by the as_job argument of tar_make(). Users not using the RStudio IDE can call tar_make() with callr_function = callr::r_bg to run the pipeline as a background process. tar_make_clustermq() and tar_make_future() are superseded in favor tar_make(use_crwe = TRUE), so template files are no longer written for the former automatically.

1.4.1

4 months ago

targets 1.4.1

Print "errored pipeline" when at least one target errors.
Bump minimum clustermq version to 0.9.2.
Repair the tar_debug_instructions() tips for when commands are long.
Do not look for dependencies of primitive functions (#1200, @smwindecker, @joelnitta).

1.4.0

5 months ago

targets 1.4.0

Invalidating changes

Because of the changes below, upgrading to this version of targets will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.

Use SHA512 during the creation of target-specific pseudo-random number generator seeds (#1139). This change decreases the risk of overlapping/correlated random number generator streams. See the "RNG overlap" section of the tar_seed_create() help file for details and justification. Unfortunately, this change will invalidate all currently built targets because the seeds will be different. To avoid rerunning your whole pipeline, set cue = tar_cue(seed = FALSE) in tar_target().
For cloud storage: instead of the hash of the local file, use the ETag for AWS S3 targets and the MD5 hash for GCP GCS targets (#1172). Sanitize with targets:::digest_chr64() in both cases before storing the result in the metadata.
For a cloud target to be truly up to date, the hash in the metadata now needs to match the current object in the bucket, not the version recorded in the metadata (#1172). In other words, targets now tries to ensure that the up-to-date data objects in the cloud are in their newest versions. So if you roll back the metadata to an older version, you will still be able to access historical data versions with e.g. tar_read(), but the pipeline will no longer be up to date.

Other changes to seeds

Add a new exported function tar_seed_create() which creates target-specific pseudo-random number generator seeds.
Add an "RNG overlap" section in the tar_seed_create() help file to justify and defend how targets and tarchetypes approach pseudo-random numbers.
Add function tar_seed_set() which sets a seed and sets all the RNG algorithms to their defaults in the R installation of the user. Each target now uses tar_seed_set() function to set its seed before running its R command (#1139).
Deprecate tar_seed() in favor of the new tar_seed_get() function.

Other cloud storage improvements

For all cloud targets, check hashes in batched LIST requests instead of individual HEAD requests (#1172). Dramatically speeds up the process of checking if cloud targets are up to date.
For AWS S3 targets, tar_delete(), tar_destroy(), and tar_prune() now use efficient batched calls to delete_objects() instead of costly individual calls to delete_object() (#1171).
Add a new verbose argument to tar_delete(), tar_destroy(), and tar_prune().
Add a new batch_size argument to tar_delete(), tar_destroy(), and tar_prune().
Add new arguments page_size and verbose to tar_resources_aws() (#1172).
Add a new tar_unversion() function to remove version IDs from the metadata of cloud targets. This makes it easier to interact with just the current version of each target, as opposed to the version ID recorded in the local metadata.

Other improvements

Migrate to the changes in clustermq 0.9.0 (@mschubert).
In progress statuses, change "started" to "dispatched" and change "built" to "completed" (#1192).
Deprecate tar_started() in favor of tar_dispatched() (#1192).
Deprecate tar_built() in favor of tar_completed() (#1192).
Console messages from reporters say "dispatched" and "completed" instead of "started" and "built" (#1192).
The crew scheduling algorithm no longer waits on saturated controllers, and targets that are ready are greedily dispatched to crew even if all workers are busy (#1182, #1192). To appropriately set expectations for users, reporters print "dispatched (pending)" instead of "dispatched" if the task load is backlogged at the moment.
In the crew scheduling algorithm, waiting for tasks is now a truly event-driven process and consumes 5-10x less CPU resources (#1183). Only the auto-scaling of workers uses polling (with an inexpensive default polling interval of 0.5 seconds, configurable through seconds_interval in the controller).
Simplify stored target tracebacks.
Print the traceback on error.

1.3.2

7 months ago

targets 1.3.2

Try to fix function help files for CRAN.

1.3.1

7 months ago

targets 1.3.1

Add tar_config_projects() and tar_config_yaml() (#1153, @psychelzh).
Apply error modes to builder_wait_correct_hash() in target_conclude.tar_builder() (#1154, @gadenbuie).
Remove duplicated error message from builder_error_null().
Allow tar_meta_upload() and tar_meta_download() to avoid errors if one or more metadata files do not exist. Add a new argument strict to control error behavior.
Add new arguments meta, progress, process, and crew to control individual metadata files in tar_meta_upload(), tar_meta_download(), tar_meta_sync(), and tar_meta_delete().
Avoid newly deprecated arguments and functions in crew 0.5.0.9003 (https://github.com/wlnadau/crew/issues/131).
Allow tar_read() etc. inside a pipeline whenever it uses a different data store (#1158, @MilesMcBain).
Set seed = FALSE in future::future() (#1166, @svraka).
Add a new physics argument to tar_visnetwork() and tar_glimpse() (#925, @Bdblodgett-usgs).

1.3.0

8 months ago

targets 1.3.0

Invalidating changes

Because of these changes, upgrading to this version of targets will unavoidably invalidate previously built targets in existing pipelines. Your pipeline code should still work, but any targets you ran before will most likely need to rerun after the upgrade.

In the hash_deps() method of the metadata class, exclude symbols which are not actually dependencies, rather than just giving them empty strings. This change decouples the dependency hash from the hash of the target's command (#1108).

Cloud metadata

Continuously upload metadata files to the cloud during tar_make(), tar_make_clustermq(), and tar_make_future() (#1109). Upload them to the repository specified in the repository_meta tar_option_set() option, and use the bucket and prefix set in the resources tar_option_set() option. repository_meta defaults to the existing repository tar_option_set() option.
Add new functions tar_meta_download(), tar_meta_upload(), tar_meta_sync(), and tar_meta_delete() to directly manage cloud metadata outside the pipeline (#1109).

Other changes

Fix solution of #1103 so the copy fallback actually runs (@jds485, #1102, #1103).
Switch back to tempdir() for #1103.
Move path_scratch_dir_network() to file.path(tempdir(), "targets") and make sure tar_destroy("all") and tar_destroy("cloud") delete it.
Display tar_mermaid() subgraphs with transparent fills and black borders.
Allow database$get_data() to work with list columns.
Disallow functions that access the local data store (including metadata) from inside a target while the pipeline is running (#1055, #1063). The only exception to this is local file targets such as tarchetypes literate programming target factories like tar_render() and tar_quarto().
In the hash_deps() method of the metadata class, use a new custom sort_chr() function which temporarily sets the LC_COLLATE locale to "C" for sorting. This ensures lexicographic comparisons are consistent across platforms (#1108).
In tar_source(), use the file argument and keep.source = TRUE to help with interactive debugging (#1120).
Deprecated seconds_interval in tar_config_get(), tar_make(), tar_make_clustermq() and tar_make_future(). Replace it with seconds_meta (to control how often metadata gets saved) and seconds_reporter (to control how often to print messages to the R console) (#1119).
Respect seconds_meta and seconds_reporter for writing metadata and console messages even for currently building targets (#1055).
Retry all cloud REST API calls with HTTP error codes (429, 500-599) with the exponential backoff algorithm from googleAuthR (#1112).
For format = "url", only retry on the HTTP error codes above.
Make cloud temp file instances unique in order to avoid file conflicts with the same target.
Un-deprecate seconds_interval and seconds_timeout from tar_resources_url(), and implement max_tries arguments in tar_resources_aws() and tar_resources_gcp() (#1127).
Use file and keep.source in parse() in callr utils and target Markdown.
Automatically convert "file_fast" format to "file" format for cloud targets.
In tar_prune() and tar_delete(), do not try to delete pattern targets which have no cloud storage.
Add new arguments seconds_timeout, close_connection, s3_force_path_style to tar_resources_aws() to support the analogous arguments in paws.storage::s3() (#1134, @snowpong).

1.2.2

9 months ago

Fix a documentation issue in an Rd file.

1.2.1

9 months ago

targets 1.2.1

Add tar_prune_list() (#1090, @mglev1n).
Wrap file.rename() in tryCatch() and fall back on a copy-then-remove workaround (@jds485, #1102, #1103).
Stage temporary cloud upload/download files in tools::R_user_dir(package = "targets", which = "cache") instead of tempdir(). tar_destroy(destroy = "cloud") and tar_destroy(destroy = "all") remove any leftover files from failed uploads/downloads (@jds485, #1102, #1103).
Use paws.storage instead of all of paws.