Big Data Bag Utilities
importlib_metadata
for Python < 3.8. This module is already included as importlib.metadata
in Python versions 3.8 and above.requests
session after redirect.distutils
and distutils.util.strtobool
function.is_bag
API function will no longer attempt to instantiate a Bag
object on non-directories.Fix issue with packaging.parse
throwing InvalidVersion
in the upgrade_config()
function when trying to parse the
informational version string VERSION
set by bdbag
when it is running in a "frozen" (e.g., with cx_Freeze
) environment.
In such cases, VERSION
is set to something like 1.7.1-frozen
, which is not PEP-440
compliant.
This was not an issue in previous releases due to the fact that the implementation used pkg_resources.parse_version
which was not as strict.
The code in upgrade_config()
has been changed to parse the PEP-440
compliant version returned by distribution("bdbag").version
function from importlib_metadata
, rather than use the global string VERSION
, which can still be (and is) used elsewhere for purely informational and descriptive purposes.
Note that this bug only affects bdbag
when it is running in a frozen
environment. Otherwise, release 1.7.0
is equivalent in functionality.
pkg_resources
with importlib-metadata
and packaging
.tzlocal
unless Python<3.gs://
URLs in fetch.txt.Note that this is a soft dependency and you must install the gcloud CLI on the system where you will be running
bdbag
in order for this handler to function.
This handler supports the requester pays usage pattern by allowing the billable project_id
to be specified in the auth_params
object for
a corresponding keychain.json
entry for a matching gs://
URI pattern.
For example, to configure (and allow) requester pays for a GS bucket, you would add a keychain.json
entry similar
to the following:
{
"uri": "gs://gcs-bdbag-integration-testing/",
"auth_type": "gcs-credentials",
"auth_params": {
"project_id": "bdbag-204999",
"allow_requester_pays": true
}
}
You can also explicitly disallow requester pays at the client-side in the following ways:
allow_requester_pays
to false
allow_requester_pays
field.project_id
field.auth_params
object entirely.Note that if you do any of the above, data retrieval requests to buckets which have requester pays enabled will fail. The use case for this configuration option is to ensure that you don't pay for requests when requester pays is disabled on the bucket. Per the following GCS documentation:
Important: Buckets that have Requester Pays disabled still accept requests that include a billing project,
and charges are applied to the billing project supplied in the request.
Consider any billing implications prior to including a billing project in all of your requests.
IMPORTANT NOTE:
At the time of this writing, when using gcloud-CLI
from Google Cloud SDK 416.0.0
and previous, it is
possible to still be billed for bucket usage even if you've disallowed requester pays for a given bucket in
keychain.json
. This is because the gcloud init
process requires that you specify a default project_id
and this
project id is subsequently stored in the application_default_credentials.json
file used by the GCS APIs
(which the bdbag
fetch handler uses) as quota_project_id
. If this value is present it will be passed on all GCS API
calls as a fallback regardless even if explicitly not passed to the API call.
This can be worked around by removing the quota_project_id
from application_default_credentials.json
.
It is also possible to specify a service_account_credentials_file
which is a file path referencing a service account
credentials JSON file provided by Google Cloud Storage. For example:
{
"uri": "gs://bdbag-dev/",
"auth_type": "gcs-credentials",
"auth_params": {
"project_id": "bdbag-204400",
"service_account_credentials_file": "/home/bdbag/bdbag-204400-41babdd46e24.json"
}
}
Release Notes
Bugfix release and dependency update.
bdbag_api.validate()
where underlying BagError
exceptions were not being propagated correctly.setup.py
for the python-requests
dependency. This marker specifies that no greater than requests-2.25.1
be used with Python3.5
environments, due to underlying incompatibilities with requests
dependency chain and Python3.5
after requests-2.26.0
. Reported in issue #47.Note that bdbag
support for Python3.5
is planned to be dropped in the 1.7.0
release.
Release Notes
python-requests
) to "bdbag/{version} (requests/{version})"
.sha1
support for bdbag_utils
function create-rfm-from-url-list
. See PR #46.fetch.txt
, RO metadata.json
, keychain.json
, and remote-file-manifest
JSON files.fetch.txt
and RO metadata.json
.
Per the spec, only CR,LF, whitespace, and literal percent should be encoded.Release Notes
Release Notes
Minor feature release with bugfixes and dependency updates.
--output-path
CLI (and corresponding API) argument for specifying output path for extracted archives.bypass_ssl_cert_verification
configuration option for the https
fetch handler so that SSL certificate verification could be disabled either globally (not recommended) or on a whitelisted set of URL paths used in simple substring matches against a bag's fetch.txt
URLs.--validate-profile
CLI argument so that it can take an optional keyword argument, bag-only
, which can be used to bypass the otherwise automatic profile serialization validation, and therefore is suitable to use on extracted bag directories.archive_bag
API function not including empty directories when creating zip
format archives.extract_bag
API function to accurately include the bag root directory path of the extracted bag archive in the return value. Previously, this value could have wound up being different from the file archive base name; for example if the archive file was renamed or was created in such a way that the base file name never matched the archived bag directory root.bagit-profile
support. This module is no longer "vendored" internally and is now a proper external dependency intended to be pulled from PyPi. The Profile
class is patched internally, as needed. This dependency is currently pinned to 1.3.1
.bdbag-profile.json
and bdbag-ro-profile.json
to leverage newer features of bagit-profile
version 1.3
. Loosened "Manifests-Required" to only require md5
for both profiles.bagit-python
dependency version to 1.8.1
.setup.py
metadata and travis builds.Release Notes
Bugfix release with minor feature addition.
update_keychain
API function in auth/keychain.py
for programmatic add/update/delete of keychain entries.setup.py
metadata and Travis builds.Release Notes
Bugfix release.
/
) similar to payload file manifest entries.materialize()
function.