PyFunctional Versions Save

Python library for creating data pipelines with chain functional programming

v1.4.3

3 years ago

v1.0.0

7 years ago

Reaching 1.0 primarily means that API stability has been reached so I don't expect to run into many new breaking changes. The library is also relatively feature complete at this point almost two years after the first commit (February 5, 2015).

This release includes several new minor features and usability improvements in jupyter notebook environments

New Features

Added optional initial value for reduce (https://github.com/EntilZha/PyFunctional/issues/86)
Added table of contents to readme (https://github.com/EntilZha/PyFunctional/issues/88)
Added data interchange tutorial with pandas (https://github.com/EntilZha/PyFunctional/blob/master/examples/PyFunctional-pandas-tutorial.ipynb)
Implemented itertools.starmap as Sequence.starmap and Sequence.smap (https://github.com/EntilZha/PyFunctional/issues/90)
Added interface to csv.DictReader via seq.csv_dict_reader (https://github.com/EntilZha/PyFunctional/issues/92)
Improved _html_repr_, show and tabulate by auto detecting named tuples as column names (https://github.com/EntilZha/PyFunctional/issues/91)
Improved _html_repr_ and show to tell the user 10 of N rows are being shown if there are more than 10 rows (https://github.com/EntilZha/PyFunctional/issues/94)

Dependencies and Supported Python Versions

Bumped version dependencies (https://github.com/EntilZha/PyFunctional/issues/89)
Added Python 3.6 via Travis CI testing

v0.7.1

7 years ago

This is a hotfix release which separates Python 2 and 3 wheels on PyPI. This is primarily motivated by the different installation requirements for each. The python 2 version has dependencies on several libraries that are backports of python 3 libraries.

v0.7.0

7 years ago

New Features

Auto parallelization by using pseq instead of seq. Details at https://github.com/EntilZha/PyFunctional/issues/47
Parallel functions: map, select, filter, filter_not, where, flatten, and flat_map
Compressed file IO support for gzip/lzma/bz2 as detailed at https://github.com/EntilZha/PyFunctional/issues/54
Cartesian product from itertools.product implemented as Pipeline.cartesian
Website at pyfunctional.org and docs at docs.pyfunctional.org

Bug Fixes

No option for encoding in to_json https://github.com/EntilZha/PyFunctional/issues/70

Internal Changes

Pinned versions of all dependencies

Contributors

Thanks to versae for implementing most of the pseq feature!
Thanks to ChuyuHsu for implemented large parts of the compression feature!

v0.6.0

8 years ago

Largest changes in this release are adding SQLite support and changing the project name to PyFunctional.

Name Change

Details can be found in the RFC issue. On PyPI, 0.6.0 was published as PyFunctional and ScalaFunctional to support transition to new name. Overall, name change better suits the package as it is about functional programming with python, even if it is inspired by Scala/Spark.

New Features

Added support for reading to and from SQLite databases with seq.sqlite3
Added to_pandas call integration

Internal Changes

Changed code quality check service

v0.5.0

8 years ago

Release 0.5.0 is a few new features and bug fixes grouped into a release.

Breaking Changes

Sequence.zip_with_index has modified behavior to extend usability and conform to scala/spark APIs which breaks prior compatibility. The drop in replacement to fix this issue in code bases upgrading to 0.5.0 is changing zip_with_index to enumerate.

New Features

Delimiter option on to_file
Sequence.sliding for sliding windows over sequence of elements

Internal Changes

Changed relative imports to absolute imports

Bug Fixes

_wrap incorrectly converted tuples to arrays
to_file documentation fixed
Prior mentioned zip_with_index in breaking changes

Changelog: https://github.com/EntilZha/ScalaFunctional/blob/master/CHANGELOG.md Milestone: https://github.com/EntilZha/ScalaFunctional/milestones/0.5.0

v0.4.1

8 years ago

The primary goals of this release were to:

Support reading and writing data from files in common formats
Improve LINQ support

Reading and Writing text, json, jsonl, and csv

The large feature additions of this release include functions to natively read and write from text, json, jsonl, and csv files. Details on the issue can be found at #19. The examples on the README.md page illustrate how these can be used and their usefulness. A full list of changes can be found in CHANGELOG.md or the copy of it at the bottom of the release notes.

LINQ

In doing research I found that a common use case where ScalaFunctional could be helpful is in doing LINQ-like data manipulation. To better serve this group of users functions like select and where were added, and documentation was improved to cover this use case.

Breaking Changes

The bug detailed at #44 exposed that fold_left and fold_right was using the passed function incorrectly. This was corrected, but is a breaking change to all prior versions.

0.4.1 enum34 Removed

In the release of 0.4.0 a issue was found where the wheel built with python2 contained enum34 which broke the python3 installation. If it were built with python3, then it would not include enum34 causing problems with python2. The solution was to remove enum34 and use vanilla python instead.

Changelog

Release 0.4.0

New Features

Official and tested support for python 3.5. Thus ScalaFunctional is tested on Python 2.7, 3.3, 3.4, 3.5, pypy, and pypy3
aggregate from LINQ
order_by from LINQ
where from LINQ
select from LINQ
average from LINQ
sum modified to allow LINQ projected sum
product modified to allow LINQ projected product
seq.jsonl to read jsonl files
seq.json to read json files
seq.open to read files
seq.csv to read csv files
seq.range to create range sequences
Sequence.to_jsonl to save jsonl files
Sequence.to_json to save json files
Sequence.to_file to save files
Sequence.to_csv to save csv files
Improved documentation with more examples and mention LINQ explicitly
Change PyPi keywords to improve discoverability
Created Google groups mailing list

Bug Fixes

fold_left and fold_right had incorrect order of arguments for passed function

Release 0.4.1

Fix python 3 build error due to wheel installation of enum34. Package no longer depends on enum34

Contributors

Thank you to adrian17 for contributing seq.range to the release.

v0.4.0

8 years ago

Refer to the release notes for 0.4.1 for summary of changes in 0.4.0. Both versions are nearly identical with 0.4.1 being a hotfix to a pip install issue on python 3

v0.3.1

8 years ago

This is a very minor release which adds distinct_by to the API. distinct_by takes a single identity function as argument. The returned sequence is unique by the identity function and consists of the first element found for each identity key. Code example below:

from functional import seq

seq([(1, 2), (1, 3), (2, 3), (4, 5), (0, 1), (0, 0)]).distinct_by(lambda x: x[0])
# [(0, 1), (1, 2), (2, 3), (4, 5)]

v0.3.0

8 years ago

The primary goal of this release was to improve performance of longer data pipelines. Additionally, there were additional API additions and several minor breaking changes.

Performance Improvements

The largest under the hood change is changing all operations to be lazy by default. 0.2.0 calculates a new list at every transformation. This was initially implemented using generators, but this could lead to unexpected behavior. The problem with this approach is highlighted in #20. Code sample below:

from functional import seq
def gen():
    for e in range(5):
    yield e

nums = gen()
s = seq(nums)
s.map(lambda x: x * 2).sum()
# prints 20
s.map(lambda x: x * 2).sum()
# prints 0
s = seq([1, 2, 3, 4])
a = s.map(lambda x: x * 2)
a.sum()
# prints 20
a.sum()
# prints 0

Either, ScalaFunctional would need to aggressively cache results or a new approach was needed. That approach is called lineage. The basic concept is that ScalaFunctional:

Tracks the most recent concrete data (eg list of objects)
Tracks the list of transformations that need to be applied to the list to find the answer
Whenever an expression is evaluated, the result is cached for (1) and returned

The result is the problems above are fixed, below is an example showing how the backend calculates results:

from functional import seq

In [8]: s = seq(1, 2, 3, 4)

In [9]: s._lineage
Out[9]: Lineage: sequence

In [10]: s0 = s.map(lambda x: x * 2)

In [11]: s0._lineage
Out[11]: Lineage: sequence -> map(<lambda>)

In [12]: s0
Out[12]: [2, 4, 6, 8]

In [13]: s0._lineage
Out[13]: Lineage: sequence -> map(<lambda>) -> cache

Note how initially, since the expression is not evaluated, it is not cached. Since printing s0 in the repl calls __repr__, it is evaluated and cached so it is not recomputed if s0 is used again. You can also call cache() directly if desired. You may also notice that seq can now take a list of arguments like list (added in #27).

Next up

Improvements in documentation and redo of README.md. Next release will be focused on extending ScalaFunctional further to work with other data input/output and more usability improvements. This release also marks relative stability in the collections API. Everything that seemed worth porting from Scala/Spark has been completed with a few additions (predominantly left, right, inner, and outer joins). There aren't currently any foreseeable breaking changes.