Utils for streaming large files (S3, HDFS, gzip, bz2...)
__next__
method to FileLikeProxy (PR #811, @ddelange)~/.ssh/config
(PR #790, @wbeardall)KeyError: 'ContentRange'
when received full content from S3 (PR #789, @messense)This release deprecates the old ignore_ext
parameter.
Use the compression
parameter instead.
fin = smart_open.open("/path/file.gz", ignore_ext=True) # No
fin = smart_open.open("/path/file.gz", compression="disable") # Yes
fin = smart_open.open("/path/file.gz", ignore_ext=False) # No
fin = smart_open.open("/path/file.gz") # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension") # Yes, if you want to be explicit
fin = smart_open.open("/path/file", compression=".gz") # Yes
urllib.parse.urlsplit
(PR #633, @judahrand)This release introduces a new top-level parameter: compression
.
It controls compression behavior and partially overlaps with the old ignore_ext
parameter.
For details, see the README.rst file.
You may continue to use ignore_ext
parameter for now, but it will be deprecated in the next major release.
This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.
tests
directory from package (PR #589, @e-nalepa)s3
submodule to minimize resource usage (PR #569, @mpenkov)download_as_string
to download_as_bytes
in gcs
submodule (PR #571, @alexandreyc)requests
from install_requires
dependency list.
If you need it, use pip install smart_open[http]
or pip install smart_open[webhdfs]
.This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
Summary of changes:
newline
parameter to built-in open
function (PR #478, @burkovae)to_boto3
method (PR #539, @interpolatio)Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.
smart_open.s3_iter_bucket
→ smart_open.s3.iter_bucket
newline
parameter to built-in open
function (PR #478, @burkovae)pathlib.Path.open
(PR #436, @menshikh-iv)Starting with this release, you will have to run:
pip install smart_open[gcs] to use the GCS transport.
In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:
pip install smart_open[all]
See the README.rst for details.
__version__
attributeThis release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.
smart_open.open
function (PR #268, @mpenkov)This new function replaces smart_open.smart_open
, which is now deprecated.
Main differences:
transport_params
dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).Main advantages of the new function:
The old smart_open.smart_open
function is deprecated, but continues to work as previously.
python3.7
support (PR #240, @menshikh-iv)http/https
schema correctly (PR #242, @gliv)S3
(PR #235, @rileypeterson)_parse_uri_s3x
, resolve edge cases (PR #237, @mpenkov)sudo
from travis config (PR #256, @cclauss)ValueError
if s3 key does not exist (PR #245, @adrpar)_list_bucket
uses continuation token for subsequent pages (PR #246, @tcsavage)python3.3
and python3.4
& workaround for broken moto
(PR #225, @menshikh-iv)s3a://
support for S3
. Fix #210 (PR #229, @mpenkov)@
in object (key) names for S3
. Fix #94 (PRs #204 & #224, @dkasyanov & @mpenkov)close
idempotent & add dummy flush
for S3
(PR #212, @mpenkov)open
whenever possible. Fix #207 (PR #208, @mpenkov)uri
in smart_open_lib.py
. Fix #213 (PR #214, @cclauss)boto3
. Fix #43 (PR #164, @mpenkov)python2.6
compatibility. Fix #156 (PR #192, @mpenkov)boto3.Session
instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)multipart_upload
parameters (supports ServerSideEncryption) for S3
. Fix (PR #202, @eschwartz)pathlib.Path
. Fix #170 (PR #175, @clintval)ParsedUri
class with functions, cleanup internal argument parsing (PR #191, @mpenkov)f._current_pos
when call f.readline()
(PR #182, @inksink)seek
for S3
. Fix #187 (PR #188, @inksink)setup.py
, avoid bug from setuptools==39.0.0
and add workaround for botocore
and python==3.3
. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)s3_iter_bucket_process_key
to address S3 Read Timeout errors. (PR #96, @bbbco)encrypt_key
and other parameters to initiate_multipart_upload
(PR #63, @asieira)host
and profile_name
to smart_open (PR #71 #68, @robcowie)LC_ALL=C
environment variable setup (PR #40, @nikicc)__next__
method to FileLikeProxy (PR #811, @ddelange)~/.ssh/config
(PR #790, @wbeardall)KeyError: 'ContentRange'
when received full content from S3 (PR #789, @messense)This release deprecates the old ignore_ext
parameter.
Use the compression
parameter instead.
fin = smart_open.open("/path/file.gz", ignore_ext=True) # No
fin = smart_open.open("/path/file.gz", compression="disable") # Yes
fin = smart_open.open("/path/file.gz", ignore_ext=False) # No
fin = smart_open.open("/path/file.gz") # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension") # Yes, if you want to be explicit
fin = smart_open.open("/path/file", compression=".gz") # Yes
urllib.parse.urlsplit
(PR #633, @judahrand)This release introduces a new top-level parameter: compression
.
It controls compression behavior and partially overlaps with the old ignore_ext
parameter.
For details, see the README.rst file.
You may continue to use ignore_ext
parameter for now, but it will be deprecated in the next major release.
This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.
tests
directory from package (PR #589, @e-nalepa)s3
submodule to minimize resource usage (PR #569, @mpenkov)download_as_string
to download_as_bytes
in gcs
submodule (PR #571, @alexandreyc)requests
from install_requires
dependency list.
If you need it, use pip install smart_open[http]
or pip install smart_open[webhdfs]
.This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
Summary of changes:
newline
parameter to built-in open
function (PR #478, @burkovae)to_boto3
method (PR #539, @interpolatio)Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.
smart_open.s3_iter_bucket
→ smart_open.s3.iter_bucket
newline
parameter to built-in open
function (PR #478, @burkovae)pathlib.Path.open
(PR #436, @menshikh-iv)Starting with this release, you will have to run:
pip install smart_open[gcs] to use the GCS transport.
In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:
pip install smart_open[all]
See the README.rst for details.
__version__
attributeThis release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.
smart_open.open
function (PR #268, @mpenkov)This new function replaces smart_open.smart_open
, which is now deprecated.
Main differences:
transport_params
dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).Main advantages of the new function:
The old smart_open.smart_open
function is deprecated, but continues to work as previously.
python3.7
support (PR #240, @menshikh-iv)http/https
schema correctly (PR #242, @gliv)S3
(PR #235, @rileypeterson)_parse_uri_s3x
, resolve edge cases (PR #237, @mpenkov)sudo
from travis config (PR #256, @cclauss)ValueError
if s3 key does not exist (PR #245, @adrpar)_list_bucket
uses continuation token for subsequent pages (PR #246, @tcsavage)python3.3
and python3.4
& workaround for broken moto
(PR #225, @menshikh-iv)s3a://
support for S3
. Fix #210 (PR #229, @mpenkov)@
in object (key) names for S3
. Fix #94 (PRs #204 & #224, @dkasyanov & @mpenkov)close
idempotent & add dummy flush
for S3
(PR #212, @mpenkov)open
whenever possible. Fix #207 (PR #208, @mpenkov)uri
in smart_open_lib.py
. Fix #213 (PR #214, @cclauss)boto3
. Fix #43 (PR #164, @mpenkov)python2.6
compatibility. Fix #156 (PR #192, @mpenkov)boto3.Session
instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)multipart_upload
parameters (supports ServerSideEncryption) for S3
. Fix (PR #202, @eschwartz)pathlib.Path
. Fix #170 (PR #175, @clintval)ParsedUri
class with functions, cleanup internal argument parsing (PR #191, @mpenkov)f._current_pos
when call f.readline()
(PR #182, @inksink)seek
for S3
. Fix #187 (PR #188, @inksink)setup.py
, avoid bug from setuptools==39.0.0
and add workaround for botocore
and python==3.3
. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)s3_iter_bucket_process_key
to address S3 Read Timeout errors. (PR #96, @bbbco)encrypt_key
and other parameters to initiate_multipart_upload
(PR #63, @asieira)host
and profile_name
to smart_open (PR #71 #68, @robcowie)LC_ALL=C
environment variable setup (PR #40, @nikicc)__next__
method to FileLikeProxy (PR #811, @ddelange)~/.ssh/config
(PR #790, @wbeardall)KeyError: 'ContentRange'
when received full content from S3 (PR #789, @messense)This release deprecates the old ignore_ext
parameter.
Use the compression
parameter instead.
fin = smart_open.open("/path/file.gz", ignore_ext=True) # No
fin = smart_open.open("/path/file.gz", compression="disable") # Yes
fin = smart_open.open("/path/file.gz", ignore_ext=False) # No
fin = smart_open.open("/path/file.gz") # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension") # Yes, if you want to be explicit
fin = smart_open.open("/path/file", compression=".gz") # Yes
urllib.parse.urlsplit
(PR #633, @judahrand)This release introduces a new top-level parameter: compression
.
It controls compression behavior and partially overlaps with the old ignore_ext
parameter.
For details, see the README.rst file.
You may continue to use ignore_ext
parameter for now, but it will be deprecated in the next major release.
This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.
tests
directory from package (PR #589, @e-nalepa)s3
submodule to minimize resource usage (PR #569, @mpenkov)download_as_string
to download_as_bytes
in gcs
submodule (PR #571, @alexandreyc)requests
from install_requires
dependency list.
If you need it, use pip install smart_open[http]
or pip install smart_open[webhdfs]
.This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
Summary of changes:
newline
parameter to built-in open
function (PR #478, @burkovae)to_boto3
method (PR #539, @interpolatio)Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.
smart_open.s3_iter_bucket
→ smart_open.s3.iter_bucket
newline
parameter to built-in open
function (PR #478, @burkovae)pathlib.Path.open
(PR #436, @menshikh-iv)Starting with this release, you will have to run:
pip install smart_open[gcs] to use the GCS transport.
In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:
pip install smart_open[all]
See the README.rst for details.
__version__
attributeThis release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.
smart_open.open
function (PR #268, @mpenkov)This new function replaces smart_open.smart_open
, which is now deprecated.
Main differences:
transport_params
dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).Main advantages of the new function:
The old smart_open.smart_open
function is deprecated, but continues to work as previously.
python3.7
support (PR #240, @menshikh-iv)http/https
schema correctly (PR #242, @gliv)S3
(PR #235, @rileypeterson)_parse_uri_s3x
, resolve edge cases (PR #237, @mpenkov)sudo
from travis config (PR #256, @cclauss)ValueError
if s3 key does not exist (PR #245, @adrpar)_list_bucket
uses continuation token for subsequent pages (PR #246, @tcsavage)python3.3
and python3.4
& workaround for broken moto
(PR #225, @menshikh-iv)s3a://
support for S3
. Fix #210 (PR #229, @mpenkov)@
in object (key) names for S3
. Fix #94 (PRs #204 & #224, @dkasyanov & @mpenkov)close
idempotent & add dummy flush
for S3
(PR #212, @mpenkov)open
whenever possible. Fix #207 (PR #208, @mpenkov)uri
in smart_open_lib.py
. Fix #213 (PR #214, @cclauss)boto3
. Fix #43 (PR #164, @mpenkov)python2.6
compatibility. Fix #156 (PR #192, @mpenkov)boto3.Session
instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)multipart_upload
parameters (supports ServerSideEncryption) for S3
. Fix (PR #202, @eschwartz)pathlib.Path
. Fix #170 (PR #175, @clintval)ParsedUri
class with functions, cleanup internal argument parsing (PR #191, @mpenkov)f._current_pos
when call f.readline()
(PR #182, @inksink)seek
for S3
. Fix #187 (PR #188, @inksink)setup.py
, avoid bug from setuptools==39.0.0
and add workaround for botocore
and python==3.3
. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)s3_iter_bucket_process_key
to address S3 Read Timeout errors. (PR #96, @bbbco)encrypt_key
and other parameters to initiate_multipart_upload
(PR #63, @asieira)host
and profile_name
to smart_open (PR #71 #68, @robcowie)LC_ALL=C
environment variable setup (PR #40, @nikicc)~/.ssh/config
(PR #790, @wbeardall)KeyError: 'ContentRange'
when received full content from S3 (PR #789, @messense)~/.ssh/config
(PR #790, @wbeardall)KeyError: 'ContentRange'
when received full content from S3 (PR #789, @messense)Full Changelog: https://github.com/RaRe-Technologies/smart_open/compare/v6.2.0...v6.3.0
This release deprecates the old ignore_ext
parameter.
Use the compression
parameter instead.
fin = smart_open.open("/path/file.gz", ignore_ext=True) # No
fin = smart_open.open("/path/file.gz", compression="disable") # Yes
fin = smart_open.open("/path/file.gz", ignore_ext=False) # No
fin = smart_open.open("/path/file.gz") # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension") # Yes, if you want to be explicit
fin = smart_open.open("/path/file", compression=".gz") # Yes
urllib.parse.urlsplit
(PR #633, @judahrand)This release introduces a new top-level parameter: compression
.
It controls compression behavior and partially overlaps with the old ignore_ext
parameter.
For details, see the README.rst file.
You may continue to use ignore_ext
parameter for now, but it will be deprecated in the next major release.
This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.
tests
directory from package (PR #589, @e-nalepa)This release deprecates the old ignore_ext
parameter.
Use the compression
parameter instead.
fin = smart_open.open("/path/file.gz", ignore_ext=True) # 🚫 No
fin = smart_open.open("/path/file.gz", compression="disable") # Yes
fin = smart_open.open("/path/file.gz", ignore_ext=False) # 🚫 No
fin = smart_open.open("/path/file.gz") # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension") # Yes, if you want to be explicit
fin = smart_open.open("/path/file", compression=".gz") # Yes
urllib.parse.urlsplit
(PR #633, @judahrand)This release introduces a new top-level parameter: compression
.
It controls compression behavior and partially overlaps with the old ignore_ext
parameter.
For details, see the README.rst file.
You may continue to use ignore_ext
parameter for now, but it will be deprecated in the next major release.
This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.
tests
directory from package (PR #589, @e-nalepa)s3
submodule to minimize resource usage (PR #569, @mpenkov)download_as_string
to download_as_bytes
in gcs
submodule (PR #571, @alexandreyc)requests
from install_requires
dependency list.
If you need it, use pip install smart_open[http]
or pip install smart_open[webhdfs]
.This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
This release modifies the behavior of setup.py with respect to dependencies.
Previously, boto3
and other AWS-related packages were installed by default.
Now, in order to install them, you need to run either:
pip install smart_open[s3]
to install the AWS dependencies only, or
pip install smart_open[all]
to install all dependencies, including AWS, GCS, etc.
Summary of changes:
newline
parameter to built-in open
function (PR #478, @burkovae)to_boto3
method (PR #539, @interpolatio)Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.
smart_open.s3_iter_bucket
→ smart_open.s3.iter_bucket
newline
parameter to built-in open
function (PR #478, @burkovae)pathlib.Path.open
(PR #436, @menshikh-iv)Starting with this release, you will have to run:
pip install smart_open[gcs] to use the GCS transport.
In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:
pip install smart_open[all]
See the README.rst for details.
__version__
attributeThis release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.
smart_open.open
function (PR #268, @mpenkov)This new function replaces smart_open.smart_open
, which is now deprecated.
Main differences:
transport_params
dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).Main advantages of the new function:
The old smart_open.smart_open
function is deprecated, but continues to work as previously.
python3.7
support (PR #240, @menshikh-iv)http/https
schema correctly (PR #242, @gliv)S3
(PR #235, @rileypeterson)_parse_uri_s3x
, resolve edge cases (PR #237, @mpenkov)sudo
from travis config (PR #256, @cclauss)ValueError
if s3 key does not exist (PR #245, @adrpar)_list_bucket
uses continuation token for subsequent pages (PR #246, @tcsavage)python3.3
and python3.4
& workaround for broken moto
(PR #225, @menshikh-iv)s3a://
support for S3
. Fix #210 (PR #229, @mpenkov)@
in object (key) names for S3
. Fix #94 (PRs #204 & #224, @dkasyanov & @mpenkov)close
idempotent & add dummy flush
for S3
(PR #212, @mpenkov)open
whenever possible. Fix #207 (PR #208, @mpenkov)uri
in smart_open_lib.py
. Fix #213 (PR #214, @cclauss)boto3
. Fix #43 (PR #164, @mpenkov)python2.6
compatibility. Fix #156 (PR #192, @mpenkov)boto3.Session
instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)multipart_upload
parameters (supports ServerSideEncryption) for S3
. Fix (PR #202, @eschwartz)pathlib.Path
. Fix #170 (PR #175, @clintval)ParsedUri
class with functions, cleanup internal argument parsing (PR #191, @mpenkov)f._current_pos
when call f.readline()
(PR #182, @inksink)seek
for S3
. Fix #187 (PR #188, @inksink)setup.py
, avoid bug from setuptools==39.0.0
and add workaround for botocore
and python==3.3
. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)s3_iter_bucket_process_key
to address S3 Read Timeout errors. (PR #96, @bbbco)encrypt_key
and other parameters to initiate_multipart_upload
(PR #63, @asieira)host
and profile_name
to smart_open (PR #71 #68, @robcowie)LC_ALL=C
environment variable setup (PR #40, @nikicc)