Big Data Bag Utilities
Release Notes
Bugfix release.
/
) similar to payload file manifest entries.materialize()
function.Release Notes
Bugfix release.
ensure_valid_output_path
in fetch_globus.py
.false
in bdbag_ro.py
.Release Notes
Compatibility and feature micro release.
Added a monkeypatch for hashlib.algorithms_guaranteed
prior to the
import of any bagit
code so that bagit-1.7.0
(which assumes
algorithms_guaranteed
is present, but in reality only consistently
exists on Python 2.7.9 or greater) can still be used by bdbag
on
systems that only have Python 2.7.0 to 2.7.8 installed.
Lifted the strict pin on Python>=2.7.9. Note that this won't make
standalone bagit
installations work on these systems, but it will
allow bdbag
to successfully import and use bagit
as a library.
Additional notes
here.
Added code to properly url encode whitespace and other illegal
characters in the filename
field of fetch.txt
, per the bagit
spec.
This will automatically be encoded when bdbag
generates a bag from a
remote-file-manifest
, and will automatically decoded when attempting
to resolve files via fetch. Added a corresponding unit test.
Added a new CLI validate option: --completeness
. This is in parity
with bagit
CLI options and is useful primarily for determining which
files in fetch.txt
have not yet been retrieved. Added a corresponding
unit test.
Added code in the CLIs to print stack traces in when --debug
is
specified.
Release Notes
Bugfix release
bdbagit.save()
and "strict mode" version check logic that prohibited mixing of checksum types for payload files when the bagit
specification version of the bag being updated was < 1.0
. Added a unit test that would have caught it.Release Notes
Milestone feature release
Added materialize
CLI and API function. The materialize function is basically a bag bootstrapper. When invoked, it will attempt to fully reconstitute a bag by performing multiple actions depending on the context of the input path
parameter. If path
is an actionable URL or a URI of a resolvable identifier scheme, the file referenced by this value will first be downloaded to the current directory. Next, if the path
value (or previously downloaded file) is a local path to a supported archive format, the archive will be extracted to the current directory. Then, if the path
value (or previously extracted file) is a valid bag directory, any remote file references contained within the bag's fetch.txt
file will attempt to be resolved. Finally, full validation will be run on the materialized bag. If any one of these steps fail, an error is raised.
Refactored identifier resolution into a modular plug-in system. Added support for DOI and DataGUID identifier schemes in addition to existing ARK/Minid schemes. Additional schemes can be supported by creating a compliant "plug-in" resolver class and configuring it via the bdbag.json
configuration file.
Bagit specification version compliance is now configurable. The default specification version used is 0.97
which permits heterogeneous mixing of checksums in bag payload manifests. Fixes #27 and reverts the restriction introduced in release 1.3.0
.
Implement cloud storage fetch transports for access to secured Amazon S3 and Google Cloud Store via boto3
library. GCS bucket and object access via boto3
is only supported when the target GCS bucket is set to "interoperability mode". The boto3
library is an optional runtime dependency and need only be installed if support for automatic download of S3
or GS
URLs from fetch.txt
entries is desired. Various parameters relating to the operation of this fetch handler are exposed via the bdbag.json
configuration file and can be tuned accordingly. Fixes #25.
Numerous improvements to HTTP fetch handler:
keychain.json
configuration file. This authentication mode allows for Bearer Token authentication scenarios such as those used in OAuth 2.0 authorization flows.requests
module's session parameters in the bdbag.json
configuration file. This allows for tuning such values as connect/read retry count, backoff factor, and the status code retry forcelist, along with the option of disabling automatic redirect following.Refactored bdbag.json
configuration file processing into a separate module and significantly increased the scope of the configuration file. Added a basic mechanism for versioning the configuration file and upgrading existing config files to newer versions while preserving forward-compatible configuration settings, when possible.
Improved unit test coverage.
Updated documentation.
Release Notes
Bugfix release
Release Notes
Minor feature release
globus-sdk
to a run-time dependency.Release Notes
Enhanced RO/JSON-LD tagfile metadata support. Additions to the CLI and API now support the creation of the RO tagfile metadata directory and any associated JSON-LD files from a single JSON "meta-manifest". Coupling this with remote-file-manifest
-based bag creation allows for entirely remote payloads but with local RO/JSON-LD metadata using only two metadata input files.
Refactored the overridden manifest saving functions in bdbagit.py
to be more inline with the current bagit
approach and upcoming bagit
1.0 spec changes.
IMPORTANT NOTES:
md5
and sha256
) and not be able to calculate or provide all specified checksum types for each payload file, including those listed in fetch.txt
.remote-file-manifest
, since the checksums for these files must be known a priori and therefore all remote file references must provide the same checksum algorithm type(s) uniformly across the entire set of payload files.Allow the bag-info.txt
metadata value Contact-Orcid
to be specified when using the CLI via the argument --contact-orcid
.
Fixed an issue with the handling of the metadata
and metadata_file
arguments of make_bag
that allowed for arbitrarily complex JSON content as bag-info.txt
lines. Per the bagit
spec, only string values are supported.
Ensure URL escaping (of whitespace only) in generated fetch.txt
URLs, per bagit
spec.
Build universal (Python 2 and 3 compat.) wheels by default. Fixes #19
Release Notes
Release Notes