PyFunctional Versions Save

Python library for creating data pipelines with chain functional programming

v1.4.3

3 years ago

v1.0.0

7 years ago

Reaching 1.0 primarily means that API stability has been reached so I don't expect to run into many new breaking changes. The library is also relatively feature complete at this point almost two years after the first commit (February 5, 2015).

This release includes several new minor features and usability improvements in jupyter notebook environments

New Features

Dependencies and Supported Python Versions

v0.7.1

7 years ago

This is a hotfix release which separates Python 2 and 3 wheels on PyPI. This is primarily motivated by the different installation requirements for each. The python 2 version has dependencies on several libraries that are backports of python 3 libraries.

v0.7.0

7 years ago

New Features

Bug Fixes

Internal Changes

  • Pinned versions of all dependencies

Contributors

  • Thanks to versae for implementing most of the pseq feature!
  • Thanks to ChuyuHsu for implemented large parts of the compression feature!

v0.6.0

8 years ago

Largest changes in this release are adding SQLite support and changing the project name to PyFunctional.

Name Change

Details can be found in the RFC issue. On PyPI, 0.6.0 was published as PyFunctional and ScalaFunctional to support transition to new name. Overall, name change better suits the package as it is about functional programming with python, even if it is inspired by Scala/Spark.

New Features

  • Added support for reading to and from SQLite databases with seq.sqlite3
  • Added to_pandas call integration

Internal Changes

  • Changed code quality check service

v0.5.0

8 years ago

Release 0.5.0 is a few new features and bug fixes grouped into a release.

Breaking Changes

  • Sequence.zip_with_index has modified behavior to extend usability and conform to scala/spark APIs which breaks prior compatibility. The drop in replacement to fix this issue in code bases upgrading to 0.5.0 is changing zip_with_index to enumerate.

New Features

  • Delimiter option on to_file
  • Sequence.sliding for sliding windows over sequence of elements

Internal Changes

  • Changed relative imports to absolute imports

Bug Fixes

  • _wrap incorrectly converted tuples to arrays
  • to_file documentation fixed
  • Prior mentioned zip_with_index in breaking changes

Changelog: https://github.com/EntilZha/ScalaFunctional/blob/master/CHANGELOG.md Milestone: https://github.com/EntilZha/ScalaFunctional/milestones/0.5.0

v0.4.1

8 years ago

The primary goals of this release were to:

  1. Support reading and writing data from files in common formats
  2. Improve LINQ support

Reading and Writing text, json, jsonl, and csv

The large feature additions of this release include functions to natively read and write from text, json, jsonl, and csv files. Details on the issue can be found at #19. The examples on the README.md page illustrate how these can be used and their usefulness. A full list of changes can be found in CHANGELOG.md or the copy of it at the bottom of the release notes.

LINQ

In doing research I found that a common use case where ScalaFunctional could be helpful is in doing LINQ-like data manipulation. To better serve this group of users functions like select and where were added, and documentation was improved to cover this use case.

Breaking Changes

The bug detailed at #44 exposed that fold_left and fold_right was using the passed function incorrectly. This was corrected, but is a breaking change to all prior versions.

0.4.1 enum34 Removed

In the release of 0.4.0 a issue was found where the wheel built with python2 contained enum34 which broke the python3 installation. If it were built with python3, then it would not include enum34 causing problems with python2. The solution was to remove enum34 and use vanilla python instead.

Changelog

Release 0.4.0

New Features

  • Official and tested support for python 3.5. Thus ScalaFunctional is tested on Python 2.7, 3.3, 3.4, 3.5, pypy, and pypy3
  • aggregate from LINQ
  • order_by from LINQ
  • where from LINQ
  • select from LINQ
  • average from LINQ
  • sum modified to allow LINQ projected sum
  • product modified to allow LINQ projected product
  • seq.jsonl to read jsonl files
  • seq.json to read json files
  • seq.open to read files
  • seq.csv to read csv files
  • seq.range to create range sequences
  • Sequence.to_jsonl to save jsonl files
  • Sequence.to_json to save json files
  • Sequence.to_file to save files
  • Sequence.to_csv to save csv files
  • Improved documentation with more examples and mention LINQ explicitly
  • Change PyPi keywords to improve discoverability
  • Created Google groups mailing list

Bug Fixes

  • fold_left and fold_right had incorrect order of arguments for passed function

Release 0.4.1

Fix python 3 build error due to wheel installation of enum34. Package no longer depends on enum34

Contributors

Thank you to adrian17 for contributing seq.range to the release.

v0.4.0

8 years ago

Refer to the release notes for 0.4.1 for summary of changes in 0.4.0. Both versions are nearly identical with 0.4.1 being a hotfix to a pip install issue on python 3

v0.3.1

8 years ago

This is a very minor release which adds distinct_by to the API. distinct_by takes a single identity function as argument. The returned sequence is unique by the identity function and consists of the first element found for each identity key. Code example below:

from functional import seq

seq([(1, 2), (1, 3), (2, 3), (4, 5), (0, 1), (0, 0)]).distinct_by(lambda x: x[0])
# [(0, 1), (1, 2), (2, 3), (4, 5)]

v0.3.0

8 years ago

The primary goal of this release was to improve performance of longer data pipelines. Additionally, there were additional API additions and several minor breaking changes.

Performance Improvements

The largest under the hood change is changing all operations to be lazy by default. 0.2.0 calculates a new list at every transformation. This was initially implemented using generators, but this could lead to unexpected behavior. The problem with this approach is highlighted in #20. Code sample below:

from functional import seq
def gen():
    for e in range(5):
    yield e

nums = gen()
s = seq(nums)
s.map(lambda x: x * 2).sum()
# prints 20
s.map(lambda x: x * 2).sum()
# prints 0
s = seq([1, 2, 3, 4])
a = s.map(lambda x: x * 2)
a.sum()
# prints 20
a.sum()
# prints 0

Either, ScalaFunctional would need to aggressively cache results or a new approach was needed. That approach is called lineage. The basic concept is that ScalaFunctional:

  1. Tracks the most recent concrete data (eg list of objects)
  2. Tracks the list of transformations that need to be applied to the list to find the answer
  3. Whenever an expression is evaluated, the result is cached for (1) and returned

The result is the problems above are fixed, below is an example showing how the backend calculates results:

from functional import seq

In [8]: s = seq(1, 2, 3, 4)

In [9]: s._lineage
Out[9]: Lineage: sequence

In [10]: s0 = s.map(lambda x: x * 2)

In [11]: s0._lineage
Out[11]: Lineage: sequence -> map(<lambda>)

In [12]: s0
Out[12]: [2, 4, 6, 8]

In [13]: s0._lineage
Out[13]: Lineage: sequence -> map(<lambda>) -> cache

Note how initially, since the expression is not evaluated, it is not cached. Since printing s0 in the repl calls __repr__, it is evaluated and cached so it is not recomputed if s0 is used again. You can also call cache() directly if desired. You may also notice that seq can now take a list of arguments like list (added in #27).

Next up

Improvements in documentation and redo of README.md. Next release will be focused on extending ScalaFunctional further to work with other data input/output and more usability improvements. This release also marks relative stability in the collections API. Everything that seemed worth porting from Scala/Spark has been completed with a few additions (predominantly left, right, inner, and outer joins). There aren't currently any foreseeable breaking changes.