Smart Open Versions Save

Utils for streaming large files (S3, HDFS, gzip, bz2...)

v7.0.4

4 weeks ago

7.0.4, 2024-03-26

  • Fix wb mode with zstd compression (PR #815, @djudd)
  • Remove GCS bucket.exists call to avoid storage.buckets.get permission (PR #813, @ddelange)

7.0.3, 2024-03-21

  • add support for zst writing (PR #812, @mpenkov)
  • roll back PR #812, restore compatibility with built-in open function (@mpenkov)

7.0.2, 2024-03-21

7.0.1, 2024-02-26

  • Do not touch botocore unless it is installed (PR #803, @ddelange)

7.0.0, 2024-02-26

  • Upgrade dev status classifier to stable (PR #798, @seebi)
  • Add zstandard compression support (PR #801, @rlrs)
  • Support moto 4 & 5 (PR #802, @jayvdb)
  • Add logic for handling large files in MultipartWriter uploads to S3 (PR #796, @jakkdl)
  • Add support for SSH connection via aliases from ~/.ssh/config (PR #790, @wbeardall)
  • Secure the connection using SSL when connecting to the FTPS server (PR #793, @wammaster)
  • Make GCS I/O 1000x faster by avoiding unnecessary API call (PR #788, @JohnHBrock)
  • Retry finalizing multipart S3 upload (PR #785, @ddelange)
  • Handle exceptions during writes to Azure (PR #783, @ddelange)
  • Fix formatting of python code in MIGRATING_FROM_OLDER_VERSIONS.rst (PR #795, @kenahoo)
  • Fix str method in SinglepartWriter (PR #791, @ThosRTanner)
  • Fix KeyError: 'ContentRange' when received full content from S3 (PR #789, @messense)
  • Propagate exit call to the underlying filestream (PR #786, @ddelange)

6.4.0, 2023-09-07

6.3.0, 2022-12-12

6.2.0, 14 September 2022

6.1.0, 21 August 2022

  • Add cert parameter to http transport params (PR #703, @stev-0)
  • Allow passing additional kwargs for Azure writes (PR #702, @ddelange)

6.0.0, 24 April 2022

This release deprecates the old ignore_ext parameter. Use the compression parameter instead.

fin = smart_open.open("/path/file.gz", ignore_ext=True)  # No
fin = smart_open.open("/path/file.gz", compression="disable")  # Yes

fin = smart_open.open("/path/file.gz", ignore_ext=False)  # No
fin = smart_open.open("/path/file.gz")  # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension")  # Yes, if you want to be explicit

fin = smart_open.open("/path/file", compression=".gz")  # Yes
  • Make Python 3.7 the required minimum (PR #688, @mpenkov)
  • Drop deprecated ignore_ext parameter (PR #661, @mpenkov)
  • Drop support for passing buffers to smart_open.open (PR #660, @mpenkov)
  • Support working directly with file descriptors (PR #659, @mpenkov)
  • Added support for viewfs:// URLs (PR #665, @ChandanChainani)
  • Fix AttributeError when reading passthrough zstandard (PR #658, @mpenkov)
  • Make UploadFailedError picklable (PR #689, @birgerbr)
  • Support container client and blob client for azure blob storage (PR #652, @cbare)
  • Pin google-cloud-storage to >=1.31.1 in extras (PR #687, @PLPeeters)
  • Expose certain transport-specific methods e.g. to_boto3 in top layer (PR #664, @mpenkov)
  • Use pytest instead of parameterizedtestcase (PR #657, @mpenkov)

5.2.1, 28 August 2021

5.2.0, 18 August 2021

5.1.0, 25 May 2021

This release introduces a new top-level parameter: compression. It controls compression behavior and partially overlaps with the old ignore_ext parameter. For details, see the README.rst file. You may continue to use ignore_ext parameter for now, but it will be deprecated in the next major release.

5.0.0, 30 Mar 2021

This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.

  • Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
  • Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
  • Add timeout parameter for http/https (PR #594, @dustymugs)
  • Remove tests directory from package (PR #589, @e-nalepa)

4.2.0, 15 Feb 2021

  • Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
  • Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)

4.1.2, 18 Jan 2021

  • Correctly pass boto3 resource to writers (PR #576, @jackluo923)
  • Improve robustness of S3 reading (PR #552, @mpenkov)
  • Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)

4.1.0, 30 Dec 2020

  • Refactor s3 submodule to minimize resource usage (PR #569, @mpenkov)
  • Change download_as_string to download_as_bytes in gcs submodule (PR #571, @alexandreyc)

4.0.1, 27 Nov 2020

  • Exclude requests from install_requires dependency list. If you need it, use pip install smart_open[http] or pip install smart_open[webhdfs].

4.0.0, 24 Nov 2020

  • Fix reading empty file or seeking past end of file for s3 backend (PR #549, @jcushman)
  • Fix handling of rt/wt mode when working with gzip compression (PR #559, @mpenkov)
  • Bump minimum Python version to 3.6 (PR #562, @mpenkov)

3.0.0, 8 Oct 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

2.2.1, 1 Oct 2020

  • Include S3 dependencies by default, because removing them in the 2.2.0 minor release was a mistake.

2.2.0, 25 Sep 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

Summary of changes:

  • Correctly pass newline parameter to built-in open function (PR #478, @burkovae)
  • Remove boto as a dependency (PR #523, @isobit)
  • Performance improvement: avoid redundant GetObject API queries in s3.Reader (PR #495, @jcushman)
  • Support installing smart_open without AWS dependencies (PR #534, @justindujardin)
  • Take object version into account in to_boto3 method (PR #539, @interpolatio)

Deprecations

Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.

  • smart_open.s3_iter_bucketsmart_open.s3.iter_bucket

2.1.1, 27 Aug 2020

  • Bypass unnecessary GCS storage.buckets.get permission (PR #516, @gelioz)
  • Allow SFTP connection with SSH key (PR #522, @rostskadat)

2.1.0, 1 July 2020

2.0.0, 27 April 2020, "Python 3"

  • This version supports Python 3 only (3.5+).
    • If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
  • Prevent smart_open from writing to logs on import (PR #476, @mpenkov)
  • Modify setup.py to explicitly support only Py3.5 and above (PR #471, @Amertz08)
  • Include all the test_data in setup.py (PR #473, @sikuan)

1.10.1, 26 April 2020

  • This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
  • Use only if you need Python 2.

1.11.1, 8 Apr 2020

  • Add missing boto dependency (Issue #468)

1.11.0, 8 Apr 2020

Starting with this release, you will have to run:

pip install smart_open[gcs] to use the GCS transport.

In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:

pip install smart_open[all]

See the README.rst for details.

1.10.0, 16 Mar 2020

1.9.0, 3 Nov 2019

1.8.4, 2 Jun 2019

1.8.3, 26 April 2019

1.8.2, 17 April 2019

  • Removed dependency on lzma (PR #262, @tdhopper)
  • backward compatibility fixes (PR #294, @mpenkov)
  • Minor fixes (PR #291, @mpenkov)
  • Fix #289: the smart_open package now correctly exposes a __version__ attribute
  • Fix #285: handle edge case with question marks in an S3 URL

This release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.

1.8.1, 6 April 2019

smart_open.open

This new function replaces smart_open.smart_open, which is now deprecated. Main differences:

  • ignore_extension → ignore_ext
  • new transport_params dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).

Main advantages of the new function:

  • Simpler interface for the user, less parameters
  • Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
  • Better documentation for keyword parameters (previously, they were documented via examples only)

The old smart_open.smart_open function is deprecated, but continues to work as previously.

1.8.0, 17th January 2019

1.7.1, 18th September 2018

  • Unpin boto/botocore for regular installation. Fix #227 (PR #232, @menshikh-iv)

1.7.0, 18th September 2018

1.6.0, 29th June 2018

  • Migrate to boto3. Fix #43 (PR #164, @mpenkov)
  • Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
  • Drop python2.6 compatibility. Fix #156 (PR #192, @mpenkov)
  • Accept a custom boto3.Session instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)
  • Accept multipart_upload parameters (supports ServerSideEncryption) for S3. Fix (PR #202, @eschwartz)
  • Add support for pathlib.Path. Fix #170 (PR #175, @clintval)
  • Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
  • Replace ParsedUri class with functions, cleanup internal argument parsing (PR #191, @mpenkov)
  • Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
  • Fix bug with changing f._current_pos when call f.readline() (PR #182, @inksink)
  • Сlose the old body explicitly after seek for S3. Fix #187 (PR #188, @inksink)

1.5.7, 18th March 2018

  • Fix author/maintainer fields in setup.py, avoid bug from setuptools==39.0.0 and add workaround for botocore and python==3.3. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)

1.5.6, 28th December 2017

1.5.5, 6th December 2017

  • Fix problems from 1.5.4 release. Fix #153, #154 , partial fix #152 (PR #155, @mpenkov)

1.5.4, 30th November 2017

1.5.3, 18th May 2017

  • Remove GET parameters from url. Fix #120 (PR #121, @mcrowson)

1.5.2, 12th Apr 2017

  • Enable compressed formats over http. Avoid filehandle leak. Fix #109 and #110. (PR #112, @robottwo )
  • Make possible to change number of retries (PR #102, @shaform)

1.5.1, 16th Mar 2017

  • Bugfix for compressed formats (PR #110, @tmylk)

1.5.0, 14th Mar 2017

  • HTTP/HTTPS read support w/ Kerberos (PR #107, @robottwo)

1.4.0, 13th Feb 2017

  • HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
  • Support custom S3 server host, port, ssl. (PR #101, @robottwo)
  • Add retry around s3_iter_bucket_process_key to address S3 Read Timeout errors. (PR #96, @bbbco)
  • Include tests data in sdist + install them. (PR #105, @cournape)

1.3.5, 5th October 2016

- Add MANIFEST.in required for conda-forge recip (PR #90, @tmylk)

  • Fix #92. Allow hash in filename (PR #93, @tmylk)

1.3.4, 26th August 2016

  • Relative path support (PR #73, @yupbank)
  • Move gzipstream module to smart_open package (PR #81, @mpenkov)
  • Ensure reader objects never return None (PR #81, @mpenkov)
  • Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
  • Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
  • Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
  • Add unit tests for gzipstream (PR #84, @mpenkov)
  • Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
  • Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
  • Implemented readline for S3 (PR #73, @mpenkov)
  • Added pip requirements.txt (PR #73, @mpenkov)
  • Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
  • Add ability to add query to webhdfs uri. (PR #78, @ellimilial)

1.3.3, 16th May 2016

  • Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
  • Allow passing encrypt_key and other parameters to initiate_multipart_upload (PR #63, @asieira)
  • Allow passing boto host and profile_name to smart_open (PR #71 #68, @robcowie)
  • Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
  • Support LC_ALL=C environment variable setup (PR #40, @nikicc)
  • Python 3.5 support

1.3.2, 3rd January 2016

  • Bug fix release to enable 'wb+' file mode (PR #50)

1.3.1, 18th December 2015

  • Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
  • Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
  • Accept an instance of boto.s3.key.Key (PR #38, @asieira)

1.3.0, 19th September 2015

  • WebHDFS read/write (PR #29, @ziky90)
  • re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
  • return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
  • pass optional keywords on S3 write (PR #30, @val314159)
  • smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
  • various improvements to testing (PR #30, @val314159)

1.1.0, 1st February 2015

  • support for multistream bzip files (PR #9, @pombredanne)
  • introduce this CHANGELOG

v7.0.3

1 month ago

7.0.3, 2024-03-21

  • add support for zst writing (PR #812, @mpenkov)
  • roll back PR #812, restore compatibility with built-in open function (@mpenkov)

7.0.2, 2024-03-21

7.0.1, 2024-02-26

  • Do not touch botocore unless it is installed (PR #803, @ddelange)

7.0.0, 2024-02-26

  • Upgrade dev status classifier to stable (PR #798, @seebi)
  • Add zstandard compression support (PR #801, @rlrs)
  • Support moto 4 & 5 (PR #802, @jayvdb)
  • Add logic for handling large files in MultipartWriter uploads to S3 (PR #796, @jakkdl)
  • Add support for SSH connection via aliases from ~/.ssh/config (PR #790, @wbeardall)
  • Secure the connection using SSL when connecting to the FTPS server (PR #793, @wammaster)
  • Make GCS I/O 1000x faster by avoiding unnecessary API call (PR #788, @JohnHBrock)
  • Retry finalizing multipart S3 upload (PR #785, @ddelange)
  • Handle exceptions during writes to Azure (PR #783, @ddelange)
  • Fix formatting of python code in MIGRATING_FROM_OLDER_VERSIONS.rst (PR #795, @kenahoo)
  • Fix str method in SinglepartWriter (PR #791, @ThosRTanner)
  • Fix KeyError: 'ContentRange' when received full content from S3 (PR #789, @messense)
  • Propagate exit call to the underlying filestream (PR #786, @ddelange)

6.4.0, 2023-09-07

6.3.0, 2022-12-12

6.2.0, 14 September 2022

6.1.0, 21 August 2022

  • Add cert parameter to http transport params (PR #703, @stev-0)
  • Allow passing additional kwargs for Azure writes (PR #702, @ddelange)

6.0.0, 24 April 2022

This release deprecates the old ignore_ext parameter. Use the compression parameter instead.

fin = smart_open.open("/path/file.gz", ignore_ext=True)  # No
fin = smart_open.open("/path/file.gz", compression="disable")  # Yes

fin = smart_open.open("/path/file.gz", ignore_ext=False)  # No
fin = smart_open.open("/path/file.gz")  # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension")  # Yes, if you want to be explicit

fin = smart_open.open("/path/file", compression=".gz")  # Yes
  • Make Python 3.7 the required minimum (PR #688, @mpenkov)
  • Drop deprecated ignore_ext parameter (PR #661, @mpenkov)
  • Drop support for passing buffers to smart_open.open (PR #660, @mpenkov)
  • Support working directly with file descriptors (PR #659, @mpenkov)
  • Added support for viewfs:// URLs (PR #665, @ChandanChainani)
  • Fix AttributeError when reading passthrough zstandard (PR #658, @mpenkov)
  • Make UploadFailedError picklable (PR #689, @birgerbr)
  • Support container client and blob client for azure blob storage (PR #652, @cbare)
  • Pin google-cloud-storage to >=1.31.1 in extras (PR #687, @PLPeeters)
  • Expose certain transport-specific methods e.g. to_boto3 in top layer (PR #664, @mpenkov)
  • Use pytest instead of parameterizedtestcase (PR #657, @mpenkov)

5.2.1, 28 August 2021

5.2.0, 18 August 2021

5.1.0, 25 May 2021

This release introduces a new top-level parameter: compression. It controls compression behavior and partially overlaps with the old ignore_ext parameter. For details, see the README.rst file. You may continue to use ignore_ext parameter for now, but it will be deprecated in the next major release.

5.0.0, 30 Mar 2021

This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.

  • Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
  • Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
  • Add timeout parameter for http/https (PR #594, @dustymugs)
  • Remove tests directory from package (PR #589, @e-nalepa)

4.2.0, 15 Feb 2021

  • Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
  • Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)

4.1.2, 18 Jan 2021

  • Correctly pass boto3 resource to writers (PR #576, @jackluo923)
  • Improve robustness of S3 reading (PR #552, @mpenkov)
  • Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)

4.1.0, 30 Dec 2020

  • Refactor s3 submodule to minimize resource usage (PR #569, @mpenkov)
  • Change download_as_string to download_as_bytes in gcs submodule (PR #571, @alexandreyc)

4.0.1, 27 Nov 2020

  • Exclude requests from install_requires dependency list. If you need it, use pip install smart_open[http] or pip install smart_open[webhdfs].

4.0.0, 24 Nov 2020

  • Fix reading empty file or seeking past end of file for s3 backend (PR #549, @jcushman)
  • Fix handling of rt/wt mode when working with gzip compression (PR #559, @mpenkov)
  • Bump minimum Python version to 3.6 (PR #562, @mpenkov)

3.0.0, 8 Oct 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

2.2.1, 1 Oct 2020

  • Include S3 dependencies by default, because removing them in the 2.2.0 minor release was a mistake.

2.2.0, 25 Sep 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

Summary of changes:

  • Correctly pass newline parameter to built-in open function (PR #478, @burkovae)
  • Remove boto as a dependency (PR #523, @isobit)
  • Performance improvement: avoid redundant GetObject API queries in s3.Reader (PR #495, @jcushman)
  • Support installing smart_open without AWS dependencies (PR #534, @justindujardin)
  • Take object version into account in to_boto3 method (PR #539, @interpolatio)

Deprecations

Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.

  • smart_open.s3_iter_bucketsmart_open.s3.iter_bucket

2.1.1, 27 Aug 2020

  • Bypass unnecessary GCS storage.buckets.get permission (PR #516, @gelioz)
  • Allow SFTP connection with SSH key (PR #522, @rostskadat)

2.1.0, 1 July 2020

2.0.0, 27 April 2020, "Python 3"

  • This version supports Python 3 only (3.5+).
    • If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
  • Prevent smart_open from writing to logs on import (PR #476, @mpenkov)
  • Modify setup.py to explicitly support only Py3.5 and above (PR #471, @Amertz08)
  • Include all the test_data in setup.py (PR #473, @sikuan)

1.10.1, 26 April 2020

  • This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
  • Use only if you need Python 2.

1.11.1, 8 Apr 2020

  • Add missing boto dependency (Issue #468)

1.11.0, 8 Apr 2020

Starting with this release, you will have to run:

pip install smart_open[gcs] to use the GCS transport.

In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:

pip install smart_open[all]

See the README.rst for details.

1.10.0, 16 Mar 2020

1.9.0, 3 Nov 2019

1.8.4, 2 Jun 2019

1.8.3, 26 April 2019

1.8.2, 17 April 2019

  • Removed dependency on lzma (PR #262, @tdhopper)
  • backward compatibility fixes (PR #294, @mpenkov)
  • Minor fixes (PR #291, @mpenkov)
  • Fix #289: the smart_open package now correctly exposes a __version__ attribute
  • Fix #285: handle edge case with question marks in an S3 URL

This release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.

1.8.1, 6 April 2019

smart_open.open

This new function replaces smart_open.smart_open, which is now deprecated. Main differences:

  • ignore_extension → ignore_ext
  • new transport_params dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).

Main advantages of the new function:

  • Simpler interface for the user, less parameters
  • Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
  • Better documentation for keyword parameters (previously, they were documented via examples only)

The old smart_open.smart_open function is deprecated, but continues to work as previously.

1.8.0, 17th January 2019

1.7.1, 18th September 2018

  • Unpin boto/botocore for regular installation. Fix #227 (PR #232, @menshikh-iv)

1.7.0, 18th September 2018

1.6.0, 29th June 2018

  • Migrate to boto3. Fix #43 (PR #164, @mpenkov)
  • Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
  • Drop python2.6 compatibility. Fix #156 (PR #192, @mpenkov)
  • Accept a custom boto3.Session instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)
  • Accept multipart_upload parameters (supports ServerSideEncryption) for S3. Fix (PR #202, @eschwartz)
  • Add support for pathlib.Path. Fix #170 (PR #175, @clintval)
  • Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
  • Replace ParsedUri class with functions, cleanup internal argument parsing (PR #191, @mpenkov)
  • Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
  • Fix bug with changing f._current_pos when call f.readline() (PR #182, @inksink)
  • Сlose the old body explicitly after seek for S3. Fix #187 (PR #188, @inksink)

1.5.7, 18th March 2018

  • Fix author/maintainer fields in setup.py, avoid bug from setuptools==39.0.0 and add workaround for botocore and python==3.3. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)

1.5.6, 28th December 2017

1.5.5, 6th December 2017

  • Fix problems from 1.5.4 release. Fix #153, #154 , partial fix #152 (PR #155, @mpenkov)

1.5.4, 30th November 2017

1.5.3, 18th May 2017

  • Remove GET parameters from url. Fix #120 (PR #121, @mcrowson)

1.5.2, 12th Apr 2017

  • Enable compressed formats over http. Avoid filehandle leak. Fix #109 and #110. (PR #112, @robottwo )
  • Make possible to change number of retries (PR #102, @shaform)

1.5.1, 16th Mar 2017

  • Bugfix for compressed formats (PR #110, @tmylk)

1.5.0, 14th Mar 2017

  • HTTP/HTTPS read support w/ Kerberos (PR #107, @robottwo)

1.4.0, 13th Feb 2017

  • HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
  • Support custom S3 server host, port, ssl. (PR #101, @robottwo)
  • Add retry around s3_iter_bucket_process_key to address S3 Read Timeout errors. (PR #96, @bbbco)
  • Include tests data in sdist + install them. (PR #105, @cournape)

1.3.5, 5th October 2016

- Add MANIFEST.in required for conda-forge recip (PR #90, @tmylk)

  • Fix #92. Allow hash in filename (PR #93, @tmylk)

1.3.4, 26th August 2016

  • Relative path support (PR #73, @yupbank)
  • Move gzipstream module to smart_open package (PR #81, @mpenkov)
  • Ensure reader objects never return None (PR #81, @mpenkov)
  • Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
  • Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
  • Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
  • Add unit tests for gzipstream (PR #84, @mpenkov)
  • Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
  • Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
  • Implemented readline for S3 (PR #73, @mpenkov)
  • Added pip requirements.txt (PR #73, @mpenkov)
  • Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
  • Add ability to add query to webhdfs uri. (PR #78, @ellimilial)

1.3.3, 16th May 2016

  • Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
  • Allow passing encrypt_key and other parameters to initiate_multipart_upload (PR #63, @asieira)
  • Allow passing boto host and profile_name to smart_open (PR #71 #68, @robcowie)
  • Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
  • Support LC_ALL=C environment variable setup (PR #40, @nikicc)
  • Python 3.5 support

1.3.2, 3rd January 2016

  • Bug fix release to enable 'wb+' file mode (PR #50)

1.3.1, 18th December 2015

  • Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
  • Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
  • Accept an instance of boto.s3.key.Key (PR #38, @asieira)

1.3.0, 19th September 2015

  • WebHDFS read/write (PR #29, @ziky90)
  • re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
  • return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
  • pass optional keywords on S3 write (PR #30, @val314159)
  • smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
  • various improvements to testing (PR #30, @val314159)

1.1.0, 1st February 2015

  • support for multistream bzip files (PR #9, @pombredanne)
  • introduce this CHANGELOG

v7.0.2

1 month ago

7.0.2, 2024-03-21

7.0.1, 2024-02-26

  • Do not touch botocore unless it is installed (PR #803, @ddelange)

7.0.0, 2024-02-26

  • Upgrade dev status classifier to stable (PR #798, @seebi)
  • Add zstandard compression support (PR #801, @rlrs)
  • Support moto 4 & 5 (PR #802, @jayvdb)
  • Add logic for handling large files in MultipartWriter uploads to S3 (PR #796, @jakkdl)
  • Add support for SSH connection via aliases from ~/.ssh/config (PR #790, @wbeardall)
  • Secure the connection using SSL when connecting to the FTPS server (PR #793, @wammaster)
  • Make GCS I/O 1000x faster by avoiding unnecessary API call (PR #788, @JohnHBrock)
  • Retry finalizing multipart S3 upload (PR #785, @ddelange)
  • Handle exceptions during writes to Azure (PR #783, @ddelange)
  • Fix formatting of python code in MIGRATING_FROM_OLDER_VERSIONS.rst (PR #795, @kenahoo)
  • Fix str method in SinglepartWriter (PR #791, @ThosRTanner)
  • Fix KeyError: 'ContentRange' when received full content from S3 (PR #789, @messense)
  • Propagate exit call to the underlying filestream (PR #786, @ddelange)

6.4.0, 2023-09-07

6.3.0, 2022-12-12

6.2.0, 14 September 2022

6.1.0, 21 August 2022

  • Add cert parameter to http transport params (PR #703, @stev-0)
  • Allow passing additional kwargs for Azure writes (PR #702, @ddelange)

6.0.0, 24 April 2022

This release deprecates the old ignore_ext parameter. Use the compression parameter instead.

fin = smart_open.open("/path/file.gz", ignore_ext=True)  # No
fin = smart_open.open("/path/file.gz", compression="disable")  # Yes

fin = smart_open.open("/path/file.gz", ignore_ext=False)  # No
fin = smart_open.open("/path/file.gz")  # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension")  # Yes, if you want to be explicit

fin = smart_open.open("/path/file", compression=".gz")  # Yes
  • Make Python 3.7 the required minimum (PR #688, @mpenkov)
  • Drop deprecated ignore_ext parameter (PR #661, @mpenkov)
  • Drop support for passing buffers to smart_open.open (PR #660, @mpenkov)
  • Support working directly with file descriptors (PR #659, @mpenkov)
  • Added support for viewfs:// URLs (PR #665, @ChandanChainani)
  • Fix AttributeError when reading passthrough zstandard (PR #658, @mpenkov)
  • Make UploadFailedError picklable (PR #689, @birgerbr)
  • Support container client and blob client for azure blob storage (PR #652, @cbare)
  • Pin google-cloud-storage to >=1.31.1 in extras (PR #687, @PLPeeters)
  • Expose certain transport-specific methods e.g. to_boto3 in top layer (PR #664, @mpenkov)
  • Use pytest instead of parameterizedtestcase (PR #657, @mpenkov)

5.2.1, 28 August 2021

5.2.0, 18 August 2021

5.1.0, 25 May 2021

This release introduces a new top-level parameter: compression. It controls compression behavior and partially overlaps with the old ignore_ext parameter. For details, see the README.rst file. You may continue to use ignore_ext parameter for now, but it will be deprecated in the next major release.

5.0.0, 30 Mar 2021

This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.

  • Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
  • Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
  • Add timeout parameter for http/https (PR #594, @dustymugs)
  • Remove tests directory from package (PR #589, @e-nalepa)

4.2.0, 15 Feb 2021

  • Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
  • Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)

4.1.2, 18 Jan 2021

  • Correctly pass boto3 resource to writers (PR #576, @jackluo923)
  • Improve robustness of S3 reading (PR #552, @mpenkov)
  • Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)

4.1.0, 30 Dec 2020

  • Refactor s3 submodule to minimize resource usage (PR #569, @mpenkov)
  • Change download_as_string to download_as_bytes in gcs submodule (PR #571, @alexandreyc)

4.0.1, 27 Nov 2020

  • Exclude requests from install_requires dependency list. If you need it, use pip install smart_open[http] or pip install smart_open[webhdfs].

4.0.0, 24 Nov 2020

  • Fix reading empty file or seeking past end of file for s3 backend (PR #549, @jcushman)
  • Fix handling of rt/wt mode when working with gzip compression (PR #559, @mpenkov)
  • Bump minimum Python version to 3.6 (PR #562, @mpenkov)

3.0.0, 8 Oct 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

2.2.1, 1 Oct 2020

  • Include S3 dependencies by default, because removing them in the 2.2.0 minor release was a mistake.

2.2.0, 25 Sep 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

Summary of changes:

  • Correctly pass newline parameter to built-in open function (PR #478, @burkovae)
  • Remove boto as a dependency (PR #523, @isobit)
  • Performance improvement: avoid redundant GetObject API queries in s3.Reader (PR #495, @jcushman)
  • Support installing smart_open without AWS dependencies (PR #534, @justindujardin)
  • Take object version into account in to_boto3 method (PR #539, @interpolatio)

Deprecations

Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.

  • smart_open.s3_iter_bucketsmart_open.s3.iter_bucket

2.1.1, 27 Aug 2020

  • Bypass unnecessary GCS storage.buckets.get permission (PR #516, @gelioz)
  • Allow SFTP connection with SSH key (PR #522, @rostskadat)

2.1.0, 1 July 2020

2.0.0, 27 April 2020, "Python 3"

  • This version supports Python 3 only (3.5+).
    • If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
  • Prevent smart_open from writing to logs on import (PR #476, @mpenkov)
  • Modify setup.py to explicitly support only Py3.5 and above (PR #471, @Amertz08)
  • Include all the test_data in setup.py (PR #473, @sikuan)

1.10.1, 26 April 2020

  • This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
  • Use only if you need Python 2.

1.11.1, 8 Apr 2020

  • Add missing boto dependency (Issue #468)

1.11.0, 8 Apr 2020

Starting with this release, you will have to run:

pip install smart_open[gcs] to use the GCS transport.

In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:

pip install smart_open[all]

See the README.rst for details.

1.10.0, 16 Mar 2020

1.9.0, 3 Nov 2019

1.8.4, 2 Jun 2019

1.8.3, 26 April 2019

1.8.2, 17 April 2019

  • Removed dependency on lzma (PR #262, @tdhopper)
  • backward compatibility fixes (PR #294, @mpenkov)
  • Minor fixes (PR #291, @mpenkov)
  • Fix #289: the smart_open package now correctly exposes a __version__ attribute
  • Fix #285: handle edge case with question marks in an S3 URL

This release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.

1.8.1, 6 April 2019

smart_open.open

This new function replaces smart_open.smart_open, which is now deprecated. Main differences:

  • ignore_extension → ignore_ext
  • new transport_params dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).

Main advantages of the new function:

  • Simpler interface for the user, less parameters
  • Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
  • Better documentation for keyword parameters (previously, they were documented via examples only)

The old smart_open.smart_open function is deprecated, but continues to work as previously.

1.8.0, 17th January 2019

1.7.1, 18th September 2018

  • Unpin boto/botocore for regular installation. Fix #227 (PR #232, @menshikh-iv)

1.7.0, 18th September 2018

1.6.0, 29th June 2018

  • Migrate to boto3. Fix #43 (PR #164, @mpenkov)
  • Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
  • Drop python2.6 compatibility. Fix #156 (PR #192, @mpenkov)
  • Accept a custom boto3.Session instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)
  • Accept multipart_upload parameters (supports ServerSideEncryption) for S3. Fix (PR #202, @eschwartz)
  • Add support for pathlib.Path. Fix #170 (PR #175, @clintval)
  • Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
  • Replace ParsedUri class with functions, cleanup internal argument parsing (PR #191, @mpenkov)
  • Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
  • Fix bug with changing f._current_pos when call f.readline() (PR #182, @inksink)
  • Сlose the old body explicitly after seek for S3. Fix #187 (PR #188, @inksink)

1.5.7, 18th March 2018

  • Fix author/maintainer fields in setup.py, avoid bug from setuptools==39.0.0 and add workaround for botocore and python==3.3. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)

1.5.6, 28th December 2017

1.5.5, 6th December 2017

  • Fix problems from 1.5.4 release. Fix #153, #154 , partial fix #152 (PR #155, @mpenkov)

1.5.4, 30th November 2017

1.5.3, 18th May 2017

  • Remove GET parameters from url. Fix #120 (PR #121, @mcrowson)

1.5.2, 12th Apr 2017

  • Enable compressed formats over http. Avoid filehandle leak. Fix #109 and #110. (PR #112, @robottwo )
  • Make possible to change number of retries (PR #102, @shaform)

1.5.1, 16th Mar 2017

  • Bugfix for compressed formats (PR #110, @tmylk)

1.5.0, 14th Mar 2017

  • HTTP/HTTPS read support w/ Kerberos (PR #107, @robottwo)

1.4.0, 13th Feb 2017

  • HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
  • Support custom S3 server host, port, ssl. (PR #101, @robottwo)
  • Add retry around s3_iter_bucket_process_key to address S3 Read Timeout errors. (PR #96, @bbbco)
  • Include tests data in sdist + install them. (PR #105, @cournape)

1.3.5, 5th October 2016

- Add MANIFEST.in required for conda-forge recip (PR #90, @tmylk)

  • Fix #92. Allow hash in filename (PR #93, @tmylk)

1.3.4, 26th August 2016

  • Relative path support (PR #73, @yupbank)
  • Move gzipstream module to smart_open package (PR #81, @mpenkov)
  • Ensure reader objects never return None (PR #81, @mpenkov)
  • Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
  • Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
  • Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
  • Add unit tests for gzipstream (PR #84, @mpenkov)
  • Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
  • Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
  • Implemented readline for S3 (PR #73, @mpenkov)
  • Added pip requirements.txt (PR #73, @mpenkov)
  • Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
  • Add ability to add query to webhdfs uri. (PR #78, @ellimilial)

1.3.3, 16th May 2016

  • Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
  • Allow passing encrypt_key and other parameters to initiate_multipart_upload (PR #63, @asieira)
  • Allow passing boto host and profile_name to smart_open (PR #71 #68, @robcowie)
  • Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
  • Support LC_ALL=C environment variable setup (PR #40, @nikicc)
  • Python 3.5 support

1.3.2, 3rd January 2016

  • Bug fix release to enable 'wb+' file mode (PR #50)

1.3.1, 18th December 2015

  • Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
  • Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
  • Accept an instance of boto.s3.key.Key (PR #38, @asieira)

1.3.0, 19th September 2015

  • WebHDFS read/write (PR #29, @ziky90)
  • re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
  • return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
  • pass optional keywords on S3 write (PR #30, @val314159)
  • smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
  • various improvements to testing (PR #30, @val314159)

1.1.0, 1st February 2015

  • support for multistream bzip files (PR #9, @pombredanne)
  • introduce this CHANGELOG

v7.0.1

1 month ago

7.0.1, 2024-02-26

  • Do not touch botocore unless it is installed (PR #803, @ddelange)

7.0.0, 2024-02-26

  • Upgrade dev status classifier to stable (PR #798, @seebi)
  • Add zstandard compression support (PR #801, @rlrs)
  • Support moto 4 & 5 (PR #802, @jayvdb)
  • Add logic for handling large files in MultipartWriter uploads to S3 (PR #796, @jakkdl)
  • Add support for SSH connection via aliases from ~/.ssh/config (PR #790, @wbeardall)
  • Secure the connection using SSL when connecting to the FTPS server (PR #793, @wammaster)
  • Make GCS I/O 1000x faster by avoiding unnecessary API call (PR #788, @JohnHBrock)
  • Retry finalizing multipart S3 upload (PR #785, @ddelange)
  • Handle exceptions during writes to Azure (PR #783, @ddelange)
  • Fix formatting of python code in MIGRATING_FROM_OLDER_VERSIONS.rst (PR #795, @kenahoo)
  • Fix str method in SinglepartWriter (PR #791, @ThosRTanner)
  • Fix KeyError: 'ContentRange' when received full content from S3 (PR #789, @messense)
  • Propagate exit call to the underlying filestream (PR #786, @ddelange)

v7.0.0

1 month ago

7.0.0, 2024-02-26

  • Upgrade dev status classifier to stable (PR #798, @seebi)
  • Add zstandard compression support (PR #801, @rlrs)
  • Support moto 4 & 5 (PR #802, @jayvdb)
  • Add logic for handling large files in MultipartWriter uploads to S3 (PR #796, @jakkdl)
  • Add support for SSH connection via aliases from ~/.ssh/config (PR #790, @wbeardall)
  • Secure the connection using SSL when connecting to the FTPS server (PR #793, @wammaster)
  • Make GCS I/O 1000x faster by avoiding unnecessary API call (PR #788, @JohnHBrock)
  • Retry finalizing multipart S3 upload (PR #785, @ddelange)
  • Handle exceptions during writes to Azure (PR #783, @ddelange)
  • Fix formatting of python code in MIGRATING_FROM_OLDER_VERSIONS.rst (PR #795, @kenahoo)
  • Fix str method in SinglepartWriter (PR #791, @ThosRTanner)
  • Fix KeyError: 'ContentRange' when received full content from S3 (PR #789, @messense)
  • Propagate exit call to the underlying filestream (PR #786, @ddelange)

v6.4.0

7 months ago

6.4.0, 2023-09-07

v6.3.0

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/RaRe-Technologies/smart_open/compare/v6.2.0...v6.3.0

v6.2.0

1 year ago

6.2.0, 14 September 2022

6.1.0, 21 August 2022

  • Add cert parameter to http transport params (PR #703, @stev-0)
  • Allow passing additional kwargs for Azure writes (PR #702, @ddelange)

6.0.0, 24 April 2022

This release deprecates the old ignore_ext parameter. Use the compression parameter instead.

fin = smart_open.open("/path/file.gz", ignore_ext=True)  # No
fin = smart_open.open("/path/file.gz", compression="disable")  # Yes

fin = smart_open.open("/path/file.gz", ignore_ext=False)  # No
fin = smart_open.open("/path/file.gz")  # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension")  # Yes, if you want to be explicit

fin = smart_open.open("/path/file", compression=".gz")  # Yes
  • Make Python 3.7 the required minimum (PR #688, @mpenkov)
  • Drop deprecated ignore_ext parameter (PR #661, @mpenkov)
  • Drop support for passing buffers to smart_open.open (PR #660, @mpenkov)
  • Support working directly with file descriptors (PR #659, @mpenkov)
  • Added support for viewfs:// URLs (PR #665, @ChandanChainani)
  • Fix AttributeError when reading passthrough zstandard (PR #658, @mpenkov)
  • Make UploadFailedError picklable (PR #689, @birgerbr)
  • Support container client and blob client for azure blob storage (PR #652, @cbare)
  • Pin google-cloud-storage to >=1.31.1 in extras (PR #687, @PLPeeters)
  • Expose certain transport-specific methods e.g. to_boto3 in top layer (PR #664, @mpenkov)
  • Use pytest instead of parameterizedtestcase (PR #657, @mpenkov)

5.2.1, 28 August 2021

5.2.0, 18 August 2021

5.1.0, 25 May 2021

This release introduces a new top-level parameter: compression. It controls compression behavior and partially overlaps with the old ignore_ext parameter. For details, see the README.rst file. You may continue to use ignore_ext parameter for now, but it will be deprecated in the next major release.

5.0.0, 30 Mar 2021

This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.

  • Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
  • Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
  • Add timeout parameter for http/https (PR #594, @dustymugs)
  • Remove tests directory from package (PR #589, @e-nalepa)

4.2.0, 15 Feb 2021

  • Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
  • Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)

4.1.2, 18 Jan 2021

  • Correctly pass boto3 resource to writers (PR #576, @jackluo923)
  • Improve robustness of S3 reading (PR #552, @mpenkov)
  • Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)

v6.1.0

1 year ago

6.1.0, 21 August 2022

  • Add cert parameter to http transport params (PR #703, @stev-0)
  • Allow passing additional kwargs for Azure writes (PR #702, @ddelange)

v6.0.0

2 years ago

6.0.0, 24 April 2022

This release deprecates the old ignore_ext parameter. Use the compression parameter instead.

fin = smart_open.open("/path/file.gz", ignore_ext=True)  # 🚫 No
fin = smart_open.open("/path/file.gz", compression="disable")  # Yes

fin = smart_open.open("/path/file.gz", ignore_ext=False)  # 🚫 No
fin = smart_open.open("/path/file.gz")  # Yes
fin = smart_open.open("/path/file.gz", compression="infer_from_extension")  # Yes, if you want to be explicit

fin = smart_open.open("/path/file", compression=".gz")  # Yes
  • Make Python 3.7 the required minimum (PR #688, @mpenkov)
  • Drop deprecated ignore_ext parameter (PR #661, @mpenkov)
  • Drop support for passing buffers to smart_open.open (PR #660, @mpenkov)
  • Support working directly with file descriptors (PR #659, @mpenkov)
  • Added support for viewfs:// URLs (PR #665, @ChandanChainani)
  • Fix AttributeError when reading passthrough zstandard (PR #658, @mpenkov)
  • Make UploadFailedError picklable (PR #689, @birgerbr)
  • Support container client and blob client for azure blob storage (PR #652, @cbare)
  • Pin google-cloud-storage to >=1.31.1 in extras (PR #687, @PLPeeters)
  • Expose certain transport-specific methods e.g. to_boto3 in top layer (PR #664, @mpenkov)
  • Use pytest instead of parameterizedtestcase (PR #657, @mpenkov)

5.2.1, 28 August 2021

5.2.0, 18 August 2021

5.1.0, 25 May 2021

This release introduces a new top-level parameter: compression. It controls compression behavior and partially overlaps with the old ignore_ext parameter. For details, see the README.rst file. You may continue to use ignore_ext parameter for now, but it will be deprecated in the next major release.

5.0.0, 30 Mar 2021

This release modifies the handling of transport parameters for the S3 back-end in a backwards-incompatible way. See the migration docs for details.

  • Refactor S3, replace high-level resource/session API with low-level client API (PR #583, @mpenkov)
  • Fix potential infinite loop when reading from webhdfs (PR #597, @traboukos)
  • Add timeout parameter for http/https (PR #594, @dustymugs)
  • Remove tests directory from package (PR #589, @e-nalepa)

4.2.0, 15 Feb 2021

  • Support tell() for text mode write on s3/gcs/azure (PR #582, @markopy)
  • Implement option to use a custom buffer during S3 writes (PR #547, @mpenkov)

4.1.2, 18 Jan 2021

  • Correctly pass boto3 resource to writers (PR #576, @jackluo923)
  • Improve robustness of S3 reading (PR #552, @mpenkov)
  • Replace codecs with TextIOWrapper to fix newline issues when reading text files (PR #578, @markopy)

4.1.0, 30 Dec 2020

  • Refactor s3 submodule to minimize resource usage (PR #569, @mpenkov)
  • Change download_as_string to download_as_bytes in gcs submodule (PR #571, @alexandreyc)

4.0.1, 27 Nov 2020

  • Exclude requests from install_requires dependency list. If you need it, use pip install smart_open[http] or pip install smart_open[webhdfs].

4.0.0, 24 Nov 2020

  • Fix reading empty file or seeking past end of file for s3 backend (PR #549, @jcushman)
  • Fix handling of rt/wt mode when working with gzip compression (PR #559, @mpenkov)
  • Bump minimum Python version to 3.6 (PR #562, @mpenkov)

3.0.0, 8 Oct 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

2.2.1, 1 Oct 2020

  • Include S3 dependencies by default, because removing them in the 2.2.0 minor release was a mistake.

2.2.0, 25 Sep 2020

This release modifies the behavior of setup.py with respect to dependencies. Previously, boto3 and other AWS-related packages were installed by default. Now, in order to install them, you need to run either:

pip install smart_open[s3]

to install the AWS dependencies only, or

pip install smart_open[all]

to install all dependencies, including AWS, GCS, etc.

Summary of changes:

  • Correctly pass newline parameter to built-in open function (PR #478, @burkovae)
  • Remove boto as a dependency (PR #523, @isobit)
  • Performance improvement: avoid redundant GetObject API queries in s3.Reader (PR #495, @jcushman)
  • Support installing smart_open without AWS dependencies (PR #534, @justindujardin)
  • Take object version into account in to_boto3 method (PR #539, @interpolatio)

Deprecations

Functionality on the left hand side will be removed in future releases. Use the functions on the right hand side instead.

  • smart_open.s3_iter_bucketsmart_open.s3.iter_bucket

2.1.1, 27 Aug 2020

  • Bypass unnecessary GCS storage.buckets.get permission (PR #516, @gelioz)
  • Allow SFTP connection with SSH key (PR #522, @rostskadat)

2.1.0, 1 July 2020

2.0.0, 27 April 2020, "Python 3"

  • This version supports Python 3 only (3.5+).
    • If you still need Python 2, install the smart_open==1.10.1 legacy release instead.
  • Prevent smart_open from writing to logs on import (PR #476, @mpenkov)
  • Modify setup.py to explicitly support only Py3.5 and above (PR #471, @Amertz08)
  • Include all the test_data in setup.py (PR #473, @sikuan)

1.10.1, 26 April 2020

  • This is the last version to support Python 2.7. Versions 1.11 and above will support Python 3 only.
  • Use only if you need Python 2.

1.11.1, 8 Apr 2020

  • Add missing boto dependency (Issue #468)

1.11.0, 8 Apr 2020

Starting with this release, you will have to run:

pip install smart_open[gcs] to use the GCS transport.

In the future, all extra dependencies will be optional. If you want to continue installing all of them, use:

pip install smart_open[all]

See the README.rst for details.

1.10.0, 16 Mar 2020

1.9.0, 3 Nov 2019

1.8.4, 2 Jun 2019

1.8.3, 26 April 2019

1.8.2, 17 April 2019

  • Removed dependency on lzma (PR #262, @tdhopper)
  • backward compatibility fixes (PR #294, @mpenkov)
  • Minor fixes (PR #291, @mpenkov)
  • Fix #289: the smart_open package now correctly exposes a __version__ attribute
  • Fix #285: handle edge case with question marks in an S3 URL

This release rolls back support for transparently decompressing .xz files, previously introduced in 1.8.1. This is a useful feature, but it requires a tricky dependency. It's still possible to handle .xz files with relatively little effort. Please see the README.rst file for details.

1.8.1, 6 April 2019

smart_open.open

This new function replaces smart_open.smart_open, which is now deprecated. Main differences:

  • ignore_extension → ignore_ext
  • new transport_params dict parameter to contain keyword parameters for the transport layer (S3, HTTPS, HDFS, etc).

Main advantages of the new function:

  • Simpler interface for the user, less parameters
  • Greater API flexibility: adding additional keyword arguments will no longer require updating the top-level interface
  • Better documentation for keyword parameters (previously, they were documented via examples only)

The old smart_open.smart_open function is deprecated, but continues to work as previously.

1.8.0, 17th January 2019

1.7.1, 18th September 2018

  • Unpin boto/botocore for regular installation. Fix #227 (PR #232, @menshikh-iv)

1.7.0, 18th September 2018

1.6.0, 29th June 2018

  • Migrate to boto3. Fix #43 (PR #164, @mpenkov)
  • Refactoring smart_open to share compression and encoding functionality (PR #185, @mpenkov)
  • Drop python2.6 compatibility. Fix #156 (PR #192, @mpenkov)
  • Accept a custom boto3.Session instance (support STS AssumeRole). Fix #130, #149, #199 (PR #201, @eschwartz)
  • Accept multipart_upload parameters (supports ServerSideEncryption) for S3. Fix (PR #202, @eschwartz)
  • Add support for pathlib.Path. Fix #170 (PR #175, @clintval)
  • Fix performance regression using local file-system. Fix #184 (PR #190, @mpenkov)
  • Replace ParsedUri class with functions, cleanup internal argument parsing (PR #191, @mpenkov)
  • Handle edge case (read 0 bytes) in read function. Fix #171 (PR #193, @mpenkov)
  • Fix bug with changing f._current_pos when call f.readline() (PR #182, @inksink)
  • Сlose the old body explicitly after seek for S3. Fix #187 (PR #188, @inksink)

1.5.7, 18th March 2018

  • Fix author/maintainer fields in setup.py, avoid bug from setuptools==39.0.0 and add workaround for botocore and python==3.3. Fix #176 (PR #178 & #177, @menshikh-iv & @baldwindc)

1.5.6, 28th December 2017

1.5.5, 6th December 2017

  • Fix problems from 1.5.4 release. Fix #153, #154 , partial fix #152 (PR #155, @mpenkov)

1.5.4, 30th November 2017

1.5.3, 18th May 2017

  • Remove GET parameters from url. Fix #120 (PR #121, @mcrowson)

1.5.2, 12th Apr 2017

  • Enable compressed formats over http. Avoid filehandle leak. Fix #109 and #110. (PR #112, @robottwo )
  • Make possible to change number of retries (PR #102, @shaform)

1.5.1, 16th Mar 2017

  • Bugfix for compressed formats (PR #110, @tmylk)

1.5.0, 14th Mar 2017

  • HTTP/HTTPS read support w/ Kerberos (PR #107, @robottwo)

1.4.0, 13th Feb 2017

  • HdfsOpenWrite implementation similar to read (PR #106, @skibaa)
  • Support custom S3 server host, port, ssl. (PR #101, @robottwo)
  • Add retry around s3_iter_bucket_process_key to address S3 Read Timeout errors. (PR #96, @bbbco)
  • Include tests data in sdist + install them. (PR #105, @cournape)

1.3.5, 5th October 2016

- Add MANIFEST.in required for conda-forge recip (PR #90, @tmylk)

  • Fix #92. Allow hash in filename (PR #93, @tmylk)

1.3.4, 26th August 2016

  • Relative path support (PR #73, @yupbank)
  • Move gzipstream module to smart_open package (PR #81, @mpenkov)
  • Ensure reader objects never return None (PR #81, @mpenkov)
  • Ensure read functions never return more bytes than asked for (PR #84, @mpenkov)
  • Add support for reading gzipped objects until EOF, e.g. read() (PR #81, @mpenkov)
  • Add missing parameter to read_from_buffer call (PR #84, @mpenkov)
  • Add unit tests for gzipstream (PR #84, @mpenkov)
  • Bundle gzipstream to enable streaming of gzipped content from S3 (PR #73, @mpenkov)
  • Update gzipstream to avoid deep recursion (PR #73, @mpenkov)
  • Implemented readline for S3 (PR #73, @mpenkov)
  • Added pip requirements.txt (PR #73, @mpenkov)
  • Invert NO_MULTIPROCESSING flag (PR #79, @Janrain-Colin)
  • Add ability to add query to webhdfs uri. (PR #78, @ellimilial)

1.3.3, 16th May 2016

  • Accept an instance of boto.s3.key.Key to smart_open (PR #38, @asieira)
  • Allow passing encrypt_key and other parameters to initiate_multipart_upload (PR #63, @asieira)
  • Allow passing boto host and profile_name to smart_open (PR #71 #68, @robcowie)
  • Write an empty key to S3 even if nothing is written to S3OpenWrite (PR #61, @petedmarsh)
  • Support LC_ALL=C environment variable setup (PR #40, @nikicc)
  • Python 3.5 support

1.3.2, 3rd January 2016

  • Bug fix release to enable 'wb+' file mode (PR #50)

1.3.1, 18th December 2015

  • Disable multiprocessing if unavailable. Allows to run on Google Compute Engine. (PR #41, @nikicc)
  • Httpretty updated to allow LC_ALL=C locale config. (PR #39, @jsphpl)
  • Accept an instance of boto.s3.key.Key (PR #38, @asieira)

1.3.0, 19th September 2015

  • WebHDFS read/write (PR #29, @ziky90)
  • re-upload last S3 chunk in failed upload (PR #20, @andreycizov)
  • return the entire key in s3_iter_bucket instead of only the key name (PR #22, @salilb)
  • pass optional keywords on S3 write (PR #30, @val314159)
  • smart_open a no-op if passed a file-like object with a read attribute (PR #32, @gojomo)
  • various improvements to testing (PR #30, @val314159)

1.1.0, 1st February 2015

  • support for multistream bzip files (PR #9, @pombredanne)
  • introduce this CHANGELOG