Parquet.jl Versions Save

Julia implementation of Parquet columnar file format reader

v0.8.4

1 year ago

Parquet v0.8.4

Diff since v0.8.3

Closed issues:

  • Reading Int64 columns throws Inexact Error (#149)
  • Failing to precompile Snappy.jl..... (#153)
  • KeyError when writing DataFrame to Parquet (#157)
  • Not thread safe? (#160)
  • Support [email protected] (#164)

Merged pull requests:

  • Update writer.jl usage of Tables.Columns (#158) (@quinnj)
  • Add compat for Snappy v0.4 (#166) (@giordano)
  • Move CI to GitHub Actions (#167) (@giordano)

v0.8.3

2 years ago

Parquet v0.8.3

Diff since v0.8.2

Closed issues:

  • update to support Missings.jl version 1.0 (#147)

Merged pull requests:

  • updated Missings.jl to v1 (#148) (@xiaodaigh)
  • Update to 0.8.3 release to support latest categorical arrays. (#151) (@xiaodaigh)

v0.8.2

3 years ago

Parquet v0.8.2

Diff since v0.8.1

Merged pull requests:

  • Support CategoricalArrays 0.10 and bump version to 0.8.2 (#146) (@nalimilan)

v0.8.1

3 years ago

Parquet v0.8.1

Diff since v0.8.0

Closed issues:

  • Can't read a specific file (#52)
  • Integration with Tables.jl (#70)
  • Mention Tables.jl and possibly DataFrames in the docs (#98)
  • add a high-level, easy to use read_parquet (#100)
  • Add Finaliser to Parquet.File (#119)
  • Write is very slow due to call to Base.summarysize (#126)
  • How to read Parquet files from a structured folder? (#139)

Merged pull requests:

  • ENH: allow writer to write to arbitrary IO (#127) (@sglyon)
  • Attempt to improve efficiency of read_parquet by using ChainedVector … (#128) (@quinnj)
  • use Table methods on read_parquet result (#130) (@tanmaykm)
  • read_parquet improvements (#134) (@tanmaykm)
  • avoid warning when mmap mode is set (#135) (@tanmaykm)
  • finalizer for Parquet.File to close file handle (#136) (@tanmaykm)
  • support reading metadata only files (#137) (@tanmaykm)
  • support for reading partitioned datasets (#138) (@tanmaykm)
  • Extrapolate size of sample for writing. (#140) (@altre)
  • fix reading dataset without metadata file (#141) (@tanmaykm)
  • fill missing partitioned col from partition path (#142) (@tanmaykm)
  • bump version (#143) (@tanmaykm)

v0.8.0

3 years ago

Parquet v0.8.0

Diff since v0.7.1

Closed issues:

  • Out of range error (#120)

Merged pull requests:

  • Simple reader (#116) (@xiaodaigh)
  • handle trailing bytes in bitpacked run (#123) (@tanmaykm)
  • bump version for tagging (#124) (@tanmaykm)

v0.7.1

3 years ago

Parquet v0.7.1

Diff since v0.7.0

Closed issues:

  • overloading of Base.eltype (#107)
  • "UndefVarError: File not defined" when trying to use Parquet.File (#109)
  • Inconsistent Results when Querying Rows (#111)
  • update to Thrift 0.8 (#113)
  • Update to MemPool v0.3 (#114)

Merged pull requests:

  • [Tiny] Fix ColCursor length bug (#110) (@sa-)
  • updated Categoricalarrays to 0.9 (#115) (@xiaodaigh)
  • correct Base.eltype overloading (#117) (@tanmaykm)
  • fix loading parquet file with a start offset (#118) (@tanmaykm)

v0.7.0

3 years ago

Parquet v0.7.0

Diff since v0.6.1

Closed issues:

  • Write parquet file (#1)
  • String columns read back as Vector{UInt8} (#94)
  • Why ParFile instead of ParquetFile? (#99)
  • Corrupt reads with large data files (#105)

Merged pull requests:

  • reader improvements (#97) (@tanmaykm)
  • read decimals encoded as fixed width byte arrays (#101) (@tanmaykm)
  • update to use Thrift v0.8 (#102) (@tanmaykm)
  • improve tests and coverage (#103) (@tanmaykm)
  • support some more logical type mappings (#104) (@tanmaykm)
  • rename ParFile to Parquet.File (#106) (@tanmaykm)

v0.6.1

3 years ago

Parquet v0.6.1

Diff since v0.6.0

Merged pull requests:

  • fix setrow throw BoundsError (#95) (@ssikdar1)
  • Accept any Tables.jl-compatible source to write_parquet (#96) (@quinnj)

v0.6.0

3 years ago

Parquet v0.6.0

Diff since v0.5.2

Closed issues:

  • Bug with BatchedColumnCursor reader (#89)

Merged pull requests:

  • Parquet Writer (#66) (@xiaodaigh)
  • Added a batched columns iterator (#83) (@tanmaykm)
  • fix order of definition and repetition levels (#86) (@tanmaykm)
  • read files with nested arrays (#87) (@tanmaykm)
  • fix BatchedColumnsIterator missing value handling (#91) (@tanmaykm)
  • bump version to v0.6 (#92) (@tanmaykm)
  • Update Parquet.jl (#93) (@xiaodaigh)

v0.5.2

3 years ago

Parquet v0.5.2

Diff since v0.5.1

Closed issues:

  • values(....)[1] is outputting too many values for column with missing? (#81)
  • string columns with missing are read back at Int32 (#82)

Merged pull requests:

  • fix dict page mapping for missing values (#84) (@tanmaykm)