Tuplex Versions Save

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.

v0.3.6

4 months ago

[1] Features

  • Update to LLVM16.
  • Support building Tuplex for newer LLVM versions, up to LLVM v17.
  • Remove typed pointers in LLVM IR code generation, use explicit LLVM types instead.
  • Enable Apple Silicon support for Python3.9+.
  • Add script support for building arm64 wheels for Apple Silicon (scripts/build_macos_wheels.sh)
  • Add Github actions to build and test wheels for Intel Mac, Ubuntu 22.04.
  • Add fallback for platforms for generated CSV parser without SSE4.2 instructions.
  • Add support for Python 3.11.
  • Use Github actions to build wheels for Python 3.8, 3.9, 3.10, 3.11 under both manylinux and macos.

[2] Bug Fixes

  • Explicitly shut down AWS SDK (when built with) to avoid race condition of AWS threads in destructors.
  • Fix ResolveTask issue when parsing single column CSV file with heterogeneous data.
  • Refactor decoding of exceptions in ResolveTask into function.
  • Fix Protobuf issue when using -DBUILD_WITH_ORC where ORC libraries may have been build with different Protobuf version than Tuplex causing issues when delocating the wheel under macOS (Intel/ARM).
  • Fix exception handling in individual tasks, which did not unlock partitions in case of error leading to crash.
  • Fix toPythonString() conversion in Row.
  • Fix serialization bug for lists, tuples together with options.

[3] Improvements

  • Explicitly mark export symbols in runtime shared object.
  • Change test script for exceptions to use pytest.parametrize.
  • Add dwarf/elf lib for backward under macOS to print stacktrace.
  • Change to batch processing for result-set conversion for any strategy and explicitly check signals to avoid fine grained locking of GIL.
  • Lower parameters/test data size for exception (merge) testing for faster CI.
  • Change to pick up compatible zstd version in cmake to avoid conflicts with pre-installed, older zstd library.
  • Add cache step for macOS in Github action to speed up brew installs.
  • Fix issue in setup.py when Python executable is not in /usr/local.
  • Remove Ubuntu 18.04 support, update Azure CI runner to use Ubuntu 22.04.
  • Deprecate Python 3.6. and 3.7 support.
  • Use newer ANTLR4 version 13.1 and require Java 11 as dependency.
  • Support setting Python version when creating Lambda runner in scripts/create_lambda_zip.sh.
  • Update dependencies in tuplex/ci containers, build Python version specific containers.
  • Refactor code generation logic for iterators.
  • Update to recent pybind11 version 2.11.1 from 2.10.4.