Ibis Versions Save

the portable Python dataframe library

4.1.0

1 year ago

4.1.0 (2023-01-25)

Features

  • add ibis.get_backend function (2d27df8)
  • add py.typed to allow mypy to type check packages that use ibis (765d42e)
  • api: add ibis.set_backend function (e7fabaf)
  • api: add selectors for easier selection of columns (306bc88)
  • bigquery: add JS UDF support (e74328b)
  • bigquery: add SQL UDF support (db24173)
  • bigquery: add to_pyarrow method (30157c5)
  • bigquery: implement bitwise operations (55b69b1)
  • bigquery: implement ops.Typeof (b219919)
  • bigquery: implement ops.ZeroIfNull (f4c5607)
  • bigquery: implement struct literal (c5f2a1d)
  • clickhouse: properly support native boolean types (31cc7ba)
  • common: add support for annotating with coercible types (ae4a415)
  • common: make frozendict truly immutable (1c25213)
  • common: support annotations with typing.Literal (6f89f0b)
  • common: support generic mapping and sequence type annotations (ddc6603)
  • dask: support connect() with no arguments (67eed42)
  • datatype: add optional timestamp scale parameter (a38115a)
  • datatypes: add as_struct method to convert schemas to structs (64be7b1)
  • duckdb: add read_json function for consuming newline-delimited JSON files (65e65c1)
  • mssql: add a bunch of missing types (c698d35)
  • mssql: implement inference for DATETIME2 and DATETIMEOFFSET (aa9f151)
  • nicer repr for Backend.tables (0d319ca)
  • pandas: support connect() with no arguments (78cbbdd)
  • polars: allow ibis.polars.connect() to function without any arguments (d653a07)
  • polars: handle casting to scaled timestamps (099d1ec)
  • postgres: add Map(string, string) support via the built-in HSTORE extension (f968f8f)
  • pyarrow: support conversion to pyarrow map and struct types (54a4557)
  • snowflake: add more array operations (8d8bb70)
  • snowflake: add more map operations (7ae6e25)
  • snowflake: any/all/notany/notall reductions (ba1af5e)
  • snowflake: bitwise reductions (5aba997)
  • snowflake: date from ymd (035f856)
  • snowflake: fix array slicing (bd7af2a)
  • snowflake: implement ArrayCollect (c425f68)
  • snowflake: implement NthValue (0dca57c)
  • snowflake: implement ops.Arbitrary (45f4f05)
  • snowflake: implement ops.StructColumn (41698ed)
  • snowflake: implement StringSplit (e6acc09)
  • snowflake: implement StructField and struct literals (286a5c3)
  • snowflake: implement TimestampFromUNIX (314637d)
  • snowflake: implement TimestampFromYMDHMS (1eba8be)
  • snowflake: implement typeof operation (029499c)
  • snowflake: implement exists/not exists (7c8363b)
  • snowflake: implement extract millisecond (3292e91)
  • snowflake: make literal maps and params work (dd759d3)
  • snowflake: regex extract, search and replace (9c82179)
  • snowflake: string to timestamp (095ded6)
  • sqlite: implement _get_schema_using_query in SQLite backend (7ff84c8)
  • trino: compile timestamp types with scale (67683d3)
  • trino: enable ops.ExistsSubquery and ops.NotExistsSubquery (9b9b315)
  • trino: map parameters (53bd910)
  • ux: improve error message when column is not found (b527506)

Bug Fixes

  • backend: read the default backend setting in _default_backend (11252af)
  • bigquery: move connection logic to do_connect (42f2106)
  • bigquery: remove invalid operations from registry (911a080)
  • bigquery: resolve deprecation warnings for StructType and Schema (c9e7078)
  • clickhouse: fix position call (702de5d)
  • correctly visualize array type (26b0b3f)
  • deps: make sure pyarrow is not an implicit dependency (10373f4)
  • duckdb: make read_csv on URLs work (9e61816)
  • duckdb: only try to load extensions when necessary for csv (c77bde7)
  • duckdb: remove invalid operations from registry (ba2ec59)
  • fallback to default backend with to_pyarrow/to_pyarrow_batches (a1a6902)
  • impala: remove broken alias elision (32b120f)
  • ir: error for order_by on nonexistent column (57b1dd8)
  • ir: ops.Where output shape should consider all arguments (6f87064)
  • mssql: infer bit as boolean everywhere (24f9d7c)
  • mssql: pull nullability from column information (490f8b4)
  • mysql: fix mysql query schema inference (12f6438)
  • polars: remove non-working Binary and Decimal literal inference (0482d15)
  • postgres: use permanent views to avoid connection pool defeat (49a4991)
  • pyspark: fix substring constant translation (40d2072)
  • set ops: raise if no tables passed to set operations (bf4bdde)
  • snowflake: bring back bitwise operations (260facd)
  • snowflake: don't always insert a cast (ee8817b)
  • snowflake: implement working TimestampNow (42d95b0)
  • snowflake: make sqlalchemy 2.0 compatible (8071255)
  • snowflake: re-enable ops.TableArrayView (a1ad2b7)
  • snowflake: remove invalid operations from registry (2831559)
  • sql: add typeof test and bring back implementations (7dc5356)
  • sqlalchemy: 2.0 compatibility (837a736)
  • sqlalchemy: fix view creation with select stmts that have bind parameters (d760e69)
  • sqlalchemy: handle correlated exists sanely (efa42bd)
  • sqlalchemy: handle generic geography/geometry by name instead of geotype (23c35e1)
  • sqlalchemy: use exec_driver_sql in view teardown (2599c9b)
  • sqlalchemy: use the backend's compiler instead of AlchemyCompiler (9f4ff54)
  • sql: fix broken call to ibis.map (045edc7)
  • sqlite: interpolate pathlib.Path correctly in attach (0415bd3)
  • trino: ensure connecting works with trino 0.321 (07cee38)
  • trino: remove invalid operations from registry (665265c)
  • ux: remove extra trailing newline in expression repr (ee6d58a)

Documentation

  • add BigQuery backend docs (09d8995)
  • add streamlit app for showing the backend operation matrix (3228f64)
  • allow deselecting geospatial ops in backend support matrix (012da8c)
  • api: document more public expression APIs (337018f)
  • backend-info: prevent app from trying install duckdb extensions (3d94082)
  • clean up gen_matrix.py after adding streamlit app (deb80f2)
  • duckdb: add to_pyarrow_batches documentation (ec1ffce)
  • embed streamlit operation matrix app to docs (469a50d)
  • make firefox render the proper iframe height (ff1d4dc)
  • publish raw data for operation matrix (62e68da)
  • re-order when to download test data (8ce8c16)
  • release: update breaking changes in the release notes for 4.0.0 (4e91401)
  • remove trailing parenthesis (4294397)
  • update ibis-version-4.0.0-release.md (f6701df)
  • update links to contributing guides (da615e4)

Refactors

  • bigquery: explicite disallow INT64 in JS UDF (fb33bf9)
  • datatype: add custom sqlalchemy nested types for backend differentiation (dec70f5)
  • datatype: introduce to_sqla_type dispatching on dialect (a8bbc00)
  • datatypes: remove Geography and Geometry types in favor of GeoSpatial (d44978c)
  • datatype: use a mapping to store StructType fields rather than names and types tuples (ff34c7b)
  • dtypes: expose nbytes property for integer and floating point datatypes (ccf80fd)
  • duckdb: remove .raw_sql call (abc939e)
  • duckdb: use sqlalchemy-views to reduce string hacking (c162750)
  • ir: remove UnnamedMarker (dd352b1)
  • postgres: use a bindparam for metadata queries (b6b4669)
  • remove empty unused file (9d63fd6)
  • schema: use a mapping to store Schema fields rather than names and types tuples (318179a)
  • simplify _find_backend implementation (60f1a1b)
  • snowflake: remove unnecessary parse_json call in ops.StructField impl (9e80231)
  • snowflake: remove unnecessary casting (271554c)
  • snowflake: use unary instead of fixed_arity(..., 1) (4a1c7c9)
  • sqlalchemy: clean up quoting implementation (506ce01)
  • sqlalchemy: generalize handling of failed type inference (b0f4e4c)
  • sqlalchemy: move _get_schema_using_query to base class (296cd7d)
  • sqlalchemy: remove the need for deferred columns (e4011aa)
  • sqlalchemy: remove use of deprecated isnot (4ec53a4)
  • sqlalchemy: use exec_driver_sql everywhere (e8f96b6)
  • sql: finally remove _CorrelatedRefCheck (f49e429)

Deprecations

  • api: deprecate .to_projection in favor of .as_table (7706a86)
  • api: deprecate get_column/s in favor of __getitem__/__getattr__ syntax (e6372e2)
  • ir: schedule DatabaseTable.change_name for removal (e4bae26)
  • schema: schedule Schema.delete() and Schema.append() for removal (45ac9a9)

4.0.0

1 year ago

3.2.0

1 year ago

3.2.0 (2022-09-15)

Features

  • add api to get backend entry points (0152f5e)
  • api: add and_ and or_ helpers (94bd4df)
  • api: add argmax and argmin column methods (b52216a)
  • api: add distinct to Intersection and Difference operations (cd9a34c)
  • api: add ibis.memtable API for constructing in-memory table expressions (0cc6948)
  • api: add ibis.sql to easily get a formatted SQL string (d971cc3)
  • api: add Table.unpack() and StructValue.lift() APIs for projecting struct fields (ced5f53)
  • api: allow transmute-style select method (d5fc364)
  • api: implement all bitwise operators (7fc5073)
  • api: promote psql to a show_sql public API (877a05d)
  • clickhouse: add dataframe external table support for memtables (bc86aa7)
  • clickhouse: add enum, ipaddr, json, lowcardinality to type parser (8f0287f)
  • clickhouse: enable support for working window functions (310a5a8)
  • clickhouse: implement argmin and argmax (ee7c878)
  • clickhouse: implement bitwise operations (348cd08)
  • clickhouse: implement struct scalars (1f3efe9)
  • dask: implement StringReplace execution (1389f4b)
  • dask: implement ungrouped argmin and argmax (854aea7)
  • deps: support duckdb 0.5.0 (47165b2)
  • duckdb: handle query parameters in ibis.connect (fbde95d)
  • duckdb: implement argmin and argmax (abf03f1)
  • duckdb: implement bitwise xor (ca3abed)
  • duckdb: register tables from pandas/pyarrow objects (36e48cc)
  • duckdb: support unsigned integer types (2e67918)
  • impala: implement bitwise operations (c5302ab)
  • implement dropna for SQL backends (8a747fb)
  • log: make BaseSQLBackend._log print by default (12de5bb)
  • mysql: register BLOB types (1e4fb92)
  • pandas: implement argmin and argmax (bf9b948)
  • pandas: implement NotContains on grouped data (976dce7)
  • pandas: implement StringReplace execution (578795f)
  • pandas: implement Contains with a group by (c534848)
  • postgres: implement bitwise xor (9b1ebf5)
  • pyspark: add option to treat nan as null in aggregations (bf47250)
  • pyspark: implement ibis.connect for pyspark (a191744)
  • pyspark: implement Intersection and Difference (9845a3c)
  • pyspark: implement bitwise operators (33cadb1)
  • sqlalchemy: implement bitwise operator translation (bd9f64c)
  • sqlalchemy: make ibis.connect with sqlalchemy backends (b6cefb9)
  • sqlalchemy: properly implement Intersection and Difference (2bc0b69)
  • sql: implement StringReplace translation (29daa32)
  • sqlite: implement bitwise xor and bitwise not (58c42f9)
  • support table.sort_by(ibis.random()) (693005d)
  • type-system: infer pandas' string dtype (5f0eb5d)
  • ux: add duckdb as the default backend (8ccb81d)
  • ux: use rich to format Table.info() output (67234c3)
  • ux: use sqlglot for pretty printing SQL (a3c81c5)
  • variadic union, intersect, & difference functions (05aca5a)

Bug Fixes

  • api: make sure column names that are already inferred are not overwritten (6f1cb16)
  • api: support deferred objects in existing API functions (241ce6a)
  • backend: ensure that chained limits respect prior limits (02a04f5)
  • backends: ensure select after filter works (e58ca73)
  • backends: only recommend installing ibis-foo when foo is a known backend (ac6974a)
  • base-sql: fix String-generating backend string concat implementation (3cf78c1)
  • clickhouse: add IPv4/IPv6 literal inference (0a2f315)
  • clickhouse: cast repeat times argument to UInt64 (b643544)
  • clickhouse: fix listing tables from databases with no tables (08900c3)
  • compilers: make sure memtable rows have names in the SQL string compilers (18e7f95)
  • compiler: use repr for SQL string VALUES data (75af658)
  • dask: ensure predicates are computed before projections (5cd70e1)
  • dask: implement timestamp-date binary comparisons (48d5058)
  • dask: set dask upper bound due to large scale test breakage (796c645), closes #9221
  • decimal: add decimal type inference (3fe3fd8)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.4.0 (113dc8f)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.5.0 (ef97c9d)
  • deps: update dependency parsy to v2 (9a06131)
  • deps: update dependency shapely to >=1.6,<1.8.4 (0c787d2)
  • deps: update dependency shapely to >=1.6,<1.8.5 (d08c737)
  • deps: update dependency sqlglot to v5 (f210bb8)
  • deps: update dependency sqlglot to v6 (5ca4533)
  • duckdb: add missing types (59bad07)
  • duckdb: ensure that in-memory connections remain in their creating thread (39bc537)
  • duckdb: use fetch_arrow_table() to be able to handle big timestamps (85a76eb)
  • fix bug in pandas & dask difference implementation (88a78fa)
  • fix dask where implementation (49f8845)
  • impala: add date column dtype to impala to ibis type dict (c59e94e), closes #4449
  • pandas where supports scalar for left (48f6c1e)
  • pandas: fix anti-joins (10a659d)
  • pandas: implement timestamp-date binary comparisons (4fc666d)
  • pandas: properly handle empty groups when aggregating with GroupConcat (6545f4d)
  • pyspark: fix broken StringReplace implementation (22cb297)
  • pyspark: make sure ibis.connect works with pyspark (a7ab107)
  • pyspark: translate predicates before projections (b3d1c80)
  • sqlalchemy: fix float64 type mapping (8782773)
  • sqlalchemy: handle reductions with multiple arguments (5b2039b)
  • sqlalchemy: implement SQLQueryResult translation (786a50f)
  • sql: fix sql compilation after making InMemoryTable a subclass of PhysicalTable (aac9524)
  • squash several bugs in sort_by asc/desc handling (222b2ba)
  • support chained set operations in SQL backends (227aed3)
  • support filters on InMemoryTable exprs (abfaf1f)
  • typo: in BaseSQLBackend.compile docstring (0561b13)

Deprecations

  • right kwarg in union/intersect/difference (719a5a1)
  • duckdb: deprecate path argument in favor of database (fcacc20)
  • sqlite: deprecate path argument in favor of database (0f85919)

Performance

  • pandas: remove reexecution of alias children (64efa53)
  • pyspark: ensure that pyspark DDL doesn't use VALUES (422c98d)
  • sqlalchemy: register DataFrames cheaply where possible (ee9f1be)

Documentation

  • add to_sql (e2821a5)
  • add back constraints for transitive doc dependencies and fix docs (350fd43)
  • add coc reporting information (c2355ba)
  • add community guidelines documentation (fd0893f)
  • add HeavyAI to the readme (4c5ca80)
  • add how-to bfill and ffill (ff84027)
  • add how-to for ibis+duckdb register (73a726e)
  • add how-to section to docs (33c4b93)
  • duckdb: add installation note for duckdb >= 0.5.0 (608b1fb)
  • fix memtable docstrings (72bc0f5)
  • fix flake8 line length issues (fb7af75)
  • fix markdown (4ab6b95)
  • fix relative links in tutorial (2bd075f), closes #4064 #4201
  • make attribution style uniform across the blog (05561e0)
  • move the blog out to the top level sidebar for visibility (417ba64)
  • remove underspecified UDF doc page (0eb0ac0)

3.1.0

1 year ago

3.1.0 (2022-07-26)

Features

  • add __getattr__ support to StructValue (75bded1)
  • allow selection subclasses to define new node args (2a7dc41)
  • api: accept Schema objects in public ibis.schema (0daac6c)
  • api: add .tables accessor to BaseBackend (7ad27f0)
  • api: add e function to public API (3a07e70)
  • api: add ops.StructColumn operation (020bfdb)
  • api: add cume_dist operation (6b6b185)
  • api: add toplevel ibis.connect() (e13946b)
  • api: handle literal timestamps with timezone embedded in string (1ae976b)
  • api: ibis.connect() default to duckdb for parquet/csv extensions (ff2f088)
  • api: make struct metadata more convenient to access (3fd9bd8)
  • api: support tab completion for backends (eb75fc5)
  • api: underscore convenience api (81716da)
  • api: unnest (98ecb09)
  • backends: allow column expressions from non-foreign tables on the right side of isin/notin (e1374a4)
  • base-sql: implement trig and math functions (addb2c1)
  • clickhouse: add ability to pass arbitrary kwargs to Clickhouse do_connect (583f599)
  • clickhouse: implement ops.StructColumn operation (0063007)
  • clickhouse: implement array collect (8b2577d)
  • clickhouse: implement ArrayColumn (1301f18)
  • clickhouse: implement bit aggs (f94a5d2)
  • clickhouse: implement clip (12dfe50)
  • clickhouse: implement covariance and correlation (a37c155)
  • clickhouse: implement degrees (7946c0f)
  • clickhouse: implement proper type serialization (80f4ab9)
  • clickhouse: implement radians (c7b7f08)
  • clickhouse: implement strftime (222f2b5)
  • clickhouse: implement struct field access (fff69f3)
  • clickhouse: implement trig and math functions (c56440a)
  • clickhouse: support subsecond timestamp literals (e8698a6)
  • compiler: restore intersect_class and difference_class overrides in base SQL backend (2c46a15)
  • dask: implement trig functions (e4086bb)
  • dask: implement zeroifnull (38487db)
  • datafusion: implement negate (69dd64d)
  • datafusion: implement trig functions (16803e1)
  • duckdb: add register method to duckdb backend to load parquet and csv files (4ccc6fc)
  • duckdb: enable find_in_set test (377023d)
  • duckdb: enable group_concat test (4b9ad6c)
  • duckdb: implement ops.StructColumn operation (211bfab)
  • duckdb: implement approx_count_distinct (03c89ad)
  • duckdb: implement approx_median (894ce90)
  • duckdb: implement arbitrary first and last aggregation (8a500bc)
  • duckdb: implement NthValue (1bf2842)
  • duckdb: implement strftime (aebc252)
  • duckdb: return the ir.Table instance from DuckDB's register API (0d05d41)
  • mysql: implement FindInSet (e55bbbf)
  • mysql: implement StringToTimestamp (169250f)
  • pandas: implement bitwise aggregations (37ff328)
  • pandas: implement degrees (25b4f69)
  • pandas: implement radians (6816b75)
  • pandas: implement trig functions (1fd52d2)
  • pandas: implement zeroifnull (48e8ed1)
  • postgres/duckdb: implement covariance and correlation (464d3ef)
  • postgres: implement ArrayColumn (7b0a506)
  • pyspark: implement approx_count_distinct (1fe1d75)
  • pyspark: implement approx_median (07571a9)
  • pyspark: implement covariance and correlation (ae818fb)
  • pyspark: implement degrees (f478c7c)
  • pyspark: implement nth_value (abb559d)
  • pyspark: implement nullifzero (640234b)
  • pyspark: implement radians (18843c0)
  • pyspark: implement trig functions (fd7621a)
  • pyspark: implement Where (32b9abb)
  • pyspark: implement xor (550b35b)
  • pyspark: implement zeroifnull (db13241)
  • pyspark: topk support (9344591)
  • sqlalchemy: add degrees and radians (8b7415f)
  • sqlalchemy: add xor translation rule (2921664)
  • sqlalchemy: allow non-primitive arrays (4e02918)
  • sqlalchemy: implement approx_count_distinct as count distinct (4e8bcab)
  • sqlalchemy: implement clip (8c02639)
  • sqlalchemy: implement trig functions (34c1514)
  • sqlalchemy: implement Where (7424704)
  • sqlalchemy: implement zeroifnull (4735e9a)
  • sqlite: implement BitAnd, BitOr and BitXor (e478479)
  • sqlite: implement cotangent (01e7ce7)
  • sqlite: implement degrees and radians (2cf9c5e)

Bug Fixes

  • api: bring back null datatype parsing (fc131a1)
  • api: compute the type from both branches of Where expressions (b8f4120)
  • api: ensure that Deferred objects work in aggregations (bbb376c)
  • api: ensure that nulls can be cast to any type to allow caller promotion (fab4393)
  • api: make ExistSubquery and NotExistsSubquery pure boolean operations (dd70024)
  • backends: make execution transactional where possible (d1ea269)
  • clickhouse: cast empty result dataframe (27ae68a)
  • clickhouse: handle empty IN and NOT IN expressions (2c892eb)
  • clickhouse: return null instead of empty string for group_concat when values are filtered out (b826b40)
  • compiler: fix bool bool comparisons (1ac9a9e)
  • dask/pandas: allow limit to be None (9f91d6b)
  • dask: aggregation with multi-key groupby fails on dask backend (4f8bc70)
  • datafusion: handle predicates in aggregates (4725571)
  • deps: update dependency datafusion to >=0.4,<0.7 (f5b244e)
  • deps: update dependency duckdb to >=0.3.2,<0.5.0 (57ee818)
  • deps: update dependency duckdb-engine to >=0.1.8,<0.3.0 (3e379a0)
  • deps: update dependency geoalchemy2 to >=0.6.3,<0.13 (c04a533)
  • deps: update dependency geopandas to >=0.6,<0.12 (b899c37)
  • deps: update dependency Shapely to >=1.6,<1.8.3 (87a49ad)
  • deps: update dependency toolz to >=0.11,<0.13 (258a641)
  • don't mask udf module in init.py (3e567ba)
  • duckdb: ensure that paths with non-extension . chars are parsed correctly (9448fd3)
  • duckdb: fix struct datatype parsing (5124763)
  • duckdb: force string_agg separator to be a constant (21cdf2f)
  • duckdb: handle multiple dotted extensions; quote names; consolidate implementations (1494246)
  • duckdb: remove timezone function invocation (33d38fc)
  • geospatial: ensure that later versions of numpy are compatible with geospatial code (33f0afb)
  • impala: a delimited table explicitly declare stored as textfile (04086a4), closes #4260
  • impala: remove broken nth_value implementation (dbc9cc2)
  • ir: don't attempt fusion when projections aren't exactly equivalent (3482ba2)
  • mysql: cast mysql timestamp literals to ensure correct return type (8116e04)
  • mysql: implement integer to timestamp using from_unixtime (1b43004)
  • pandas/dask: look at pre_execute for has_operation reporting (cb44efc)
  • pandas: execute negate on bool as not (330ab4f)
  • pandas: fix struct inference from dict in the pandas backend (5886a9a)
  • pandas: force backend options registration on trace.enable() calls (8818fe6)
  • pandas: handle empty boolean column casting in Series conversion (f697e3e)
  • pandas: handle struct columns with NA elements (9a7c510)
  • pandas: handle the case of selection from a join when remapping overlapping column names (031c4c6)
  • pandas: perform correct equality comparison (d62e7b9)
  • postgres/duckdb: cast after milliseconds computation instead of after extraction (bdd1d65)
  • pyspark: handle predicates in Aggregation (842c307)
  • pyspark: prevent spark from trying to convert timezone of naive timestamps (dfb4127)
  • pyspark: remove xpassing test for #2453 (c051e28)
  • pyspark: specialize implementation of has_operation (5082346)
  • pyspark: use empty check for collect_list in GroupConcat rule (df66acb)
  • repr: allow DestructValue selections to be formatted by fmt (4b45d87)
  • repr: when formatting DestructValue selections, use struct field names as column names (d01fe42)
  • sqlalchemy: fix parsing and construction of nested array types (e20bcc0)
  • sqlalchemy: remove unused second argument when creating temporary views (8766b40)
  • sqlite: register coversion to isoformat for pandas.Timestamp (fe95dca)
  • sqlite: test case with whitespace at the end of the line (7623ae9)
  • sql: use isoformat for timestamp literals (70d0ba6)
  • type-system: infer null datatype for empty sequence of expressions (f67d5f9)
  • use bounded precision for decimal aggregations (596acfb)

Performance Improvements

  • analysis: add _projection as cached_property to avoid reconstruction of projections (98510c8)
  • lineage: ensure that expressions are not traversed multiple times in most cases (ff9708c)

Reverts

  • ci: install sqlite3 on ubuntu (1f2705f)

3.0.2

2 years ago

3.0.2 (2022-04-28)

Bug Fixes

  • docs: fix tempdir location for docs build (dcd1b22)

3.0.1

2 years ago

3.0.1 (2022-04-28)

Bug Fixes

  • build: replace version before exec plugin runs (573139c)

3.0.0

2 years ago

3.0.0 (2022-04-25)

⚠ BREAKING CHANGES

  • ir: The following are breaking changes due to simplifying expression internals
    • ibis.expr.datatypes.DataType.scalar_type and DataType.column_type factory methods have been removed, DataType.scalar and DataType.column class fields can be used to directly construct a corresponding expression instance (though prefer to use operation.to_expr())
    • ibis.expr.types.ValueExpr._name and ValueExpr._dtype`` fields are not accassible anymore. While these were not supposed to used directly now ValueExpr.has_name(), ValueExpr.get_name()andValueExpr.type()` methods are the only way to retrieve the expression's name and datatype.
    • ibis.expr.operations.Node.output_type is a property now not a method, decorate those methods with @property
    • ibis.expr.operations.ValueOp subclasses must define output_shape and output_dtype properties from now on (note the datatype abbreviation dtype in the property name)
    • ibis.expr.rules.cast(), scalar_like() and array_like() rules have been removed
  • api: Replace t["a"].distinct() with t[["a"]].distinct().
  • deps: The sqlalchemy lower bound is now 1.4
  • ir: Schema.names and Schema.types attributes now have tuple type rather than list
  • expr: Columns that were added or used in an aggregation or mutation would be alphabetically sorted in compiled SQL outputs. This was a vestige from when Python dicts didn't preserve insertion order. Now columns will appear in the order in which they were passed to aggregate or mutate
  • api: dt.float is now dt.float64; use dt.float32 for the previous behavior.
  • ir: Relation-based execute_node dispatch rules must now accept tuples of expressions.
  • ir: removed ibis.expr.lineage.{roots,find_nodes} functions
  • config: Use ibis.options.graphviz_repr = True to enable
  • hdfs: Use fsspec instead of HDFS from ibis
  • udf: Vectorized UDF coercion functions are no longer a public API.
  • The minimum supported Python version is now Python 3.8
  • config: register_option is no longer supported, please submit option requests upstream
  • backends: Read tables with pandas.read_hdf and use the pandas backend
  • The CSV backend is removed. Use Datafusion for CSV execution.
  • backends: Use the datafusion backend to read parquet files
  • Expr() -> Expr.pipe()
  • coercion functions previously in expr/schema.py are now in udf/vectorized.py
  • api: materialize is removed. Joins with overlapping columns now have suffixes.
  • kudu: use impala instead: https://kudu.apache.org/docs/kudu_impala_integration.html
  • Any code that was relying implicitly on string-y behavior from UUID datatypes will need to add an explicit cast first.

Features

  • add repr_html for expressions to print as tables in ipython (cd6fa4e)
  • add duckdb backend (667f2d5)
  • allow construction of decimal literals (3d9e865)
  • api: add ibis.asc expression (efe177e), closes #1454
  • api: add has_operation API to the backend (4fab014)
  • api: implement type for SortExpr (ab19bd6)
  • clickhouse: implement string concat for clickhouse (1767205)
  • clickhouse: implement StrRight operation (67749a0)
  • clickhouse: implement table union (e0008d7)
  • clickhouse: implement trim, pad and string predicates (a5b7293)
  • datafusion: implement Count operation (4797a86)
  • datatypes: unbounded decimal type (f7e6f65)
  • date: add ibis.date(y,m,d) functionality (26892b6), closes #386
  • duckdb/postgres/mysql/pyspark: implement .sql on tables for mixing sql and expressions (00e8087)
  • duckdb: add functionality needed to pass integer to interval test (e2119e8)
  • duckdb: implement _get_schema_using_query (93cd730)
  • duckdb: implement now() function (6924f50)
  • duckdb: implement regexp replace and extract (18d16a7)
  • implement force argument in sqlalchemy backend base class (9df7f1b)
  • implement coalesce for the pyspark backend (8183efe)
  • implement semi/anti join for the pandas backend (cb36fc5)
  • implement semi/anti join for the pyspark backend (3e1ba9c)
  • implement the remaining clickhouse joins (b3aa1f0)
  • ir: rewrite and speed up expression repr (45ce9b2)
  • mysql: implement _get_schema_from_query (456cd44)
  • mysql: move string join impl up to alchemy for mysql (77a8eb9)
  • postgres: implement _get_schema_using_query (f2459eb)
  • pyspark: implement Distinct for pyspark (4306ad9)
  • pyspark: implement log base b for pyspark (527af3c)
  • pyspark: implement percent_rank and enable testing (c051617)
  • repr: add interval info to interval repr (df26231)
  • sqlalchemy: implement ilike (43996c0)
  • sqlite: implement date_truncate (3ce4f2a)
  • sqlite: implement ISO week of year (714ff7b)
  • sqlite: implement string join and concat (6f5f353)
  • support of arrays and tuples for clickhouse (db512a8)
  • ver: dynamic version identifiers (408f862)

Bug Fixes

  • added wheel to pyproject toml for venv users (b0b8e5c)
  • allow major version changes in CalVer dependencies (9c3fbe5)
  • annotable: allow optional arguments at any position (778995f), closes #3730
  • api: add ibis.map and .struct (327b342), closes #3118
  • api: map string multiplication with integer to repeat method (b205922)
  • api: thread suffixes parameter to individual join methods (31a9aff)
  • change TimestampType to Timestamp (e0750be)
  • clickhouse: disconnect from clickhouse when computing version (11cbf08)
  • clickhouse: use a context manager for execution (a471225)
  • combine windows during windowization (7fdd851)
  • conform epoch_seconds impls to expression return type (18a70f1)
  • context-adjustment: pass scope when calling adjust_context in pyspark backend (33aad7b), closes #3108
  • dask: fix asof joins for newer version of dask (50711cc)
  • dask: workaround dask bug (a0f3bd9)
  • deps: update dependency atpublic to v3 (3fe8f0d)
  • deps: update dependency datafusion to >=0.4,<0.6 (3fb2194)
  • deps: update dependency geoalchemy2 to >=0.6.3,<0.12 (dc3c361)
  • deps: update dependency graphviz to >=0.16,<0.21 (3014445)
  • duckdb: add casts to literals to fix binding errors (1977a55), closes #3629
  • duckdb: fix array column type discovery on leaf tables and add tests (15e5412)
  • duckdb: fix log with base b impl (4920097)
  • duckdb: support both 0.3.2 and 0.3.3 (a73ccce)
  • enforce the schema's column names in apply_to (b0f334d)
  • expose ops.IfNull for mysql backend (156c2bd)
  • expr: add more binary operators to char list and implement fallback (b88184c)
  • expr: fix formatting of table info using tabulate (b110636)
  • fix float vs real data type detection in sqlalchemy (24e6774)
  • fix list_schemas argument (69c1abf)
  • fix postgres udfs and reenable ci tests (7d480d2)
  • fix tablecolumn execution for filter following join (064595b)
  • format: remove some newlines from formatted expr repr (ed4fa78)
  • histogram: cross_join needs onclause=True (5d36a58), closes #622
  • ibis.expr.signature.Parameter is not pickleable (828fd54)
  • implement coalesce properly in the pandas backend (aca5312)
  • implement count on tables for pyspark (7fe5573), closes #2879
  • infer coalesce types when a non-null expression occurs after the first argument (c5f2906)
  • mutate: do not lift table column that results from mutate (ba4e5e5)
  • pandas: disable range windows with order by (e016664)
  • pandas: don't reassign the same column to silence SettingWithCopyWarning warning (75dc616)
  • pandas: implement percent_rank correctly (d8b83e7)
  • prevent unintentional cross joins in mutate + filter (83eef99)
  • pyspark: fix range windows (a6f2aa8)
  • regression in Selection.sort_by with resolved_keys (c7a69cd)
  • regression in sort_by with resolved_keys (63f1382), closes #3619
  • remove broken csv pre_execute (93b662a)
  • remove importorskip call for backend tests (2f0bcd8)
  • remove incorrect fix for pandas regression (339f544)
  • remove passing schema into register_parquet (bdcbb08)
  • repr: add ops.TimeAdd to repr binop lookup table (fd94275)
  • repr: allow ops.TableNode in fmt_value (6f57003)
  • reverse the predicate pushdown subsitution (f3cd358)
  • sort_index to satisfy pandas 1.4.x (6bac0fc)
  • sqlalchemy: ensure correlated subqueries FROM clauses are rendered (3175321)
  • sqlalchemy: use corresponding_column to prevent spurious cross joins (fdada21)
  • sqlalchemy: use replace selectables to prevent semi/anti join cross join (e8a1a71)
  • sql: retain column names for named ColumnExprs (f1b4b6e), closes #3754
  • sql: walk right join trees and substitute joins with right-side joins with views (0231592)
  • store schema on the pandas backend to allow correct inference (35070be)

Performance Improvements

  • datatypes: speed up str and hash (262d3d7)
  • fast path for simple column selection (d178498)
  • ir: global equality cache (13c2bb2)
  • ir: introduce CachedEqMixin to speed up equality checks (b633925)
  • repr: remove full tree repr from rule validator error message (65885ab)
  • speed up attribute access (89d1c05)
  • use assign instead of concat in projections when possible (985c242)

Miscellaneous Chores

  • deps: increase sqlalchemy lower bound to 1.4 (560854a)
  • drop support for Python 3.7 (0afd138)

Code Refactoring

  • api: make primitive types more cohesive (71da8f7)
  • api: remove distinct ColumnExpr API (3f48cb8)
  • api: remove materialize (24285c1)
  • backends: remove the hdf5 backend (ff34f3e)
  • backends: remove the parquet backend (b510473)
  • config: disable graphviz-repr-in-notebook by default (214ad4e)
  • config: remove old config code and port to pydantic (4bb96d1)
  • dt.UUID inherits from DataType, not String (2ba540d)
  • expr: preserve column ordering in aggregations/mutations (668be0f)
  • hdfs: replace HDFS with fsspec (cc6eddb)
  • ir: make Annotable immutable (1f2b3fa)
  • ir: make schema annotable (b980903)
  • ir: remove unused lineage roots and find_nodes functions (d630a77)
  • ir: simplify expressions by not storing dtype and name (e929f85)
  • kudu: remove support for use of kudu through kudu-python (36bd97f)
  • move coercion functions from schema.py to udf (58eea56), closes #3033
  • remove blanket call for Expr (3a71116), closes #2258
  • remove the csv backend (0e3e02e)
  • udf: make coerce functions in ibis.udf.vectorized private (9ba4392)

2.1.1

2 years ago

2.1.1 (2022-01-12)

Bug Fixes

  • setup.py: set the correct version number for 2.1.0 (f3d267b)

2.1.0

2 years ago

2.1.0 (2022-01-12)

Bug Fixes

  • consider all packages' entry points (b495cf6)
  • datatypes: infer bytes literal as binary #2915 (#3124) (887efbd)
  • deps: bump minimum dask version to 2021.10.0 (e6b5c09)
  • deps: constrain numpy to ensure wheels are used on windows (70c308b)
  • deps: update dependency clickhouse-driver to ^0.1 || ^0.2.0 (#3061) (a839d54)
  • deps: update dependency geoalchemy2 to >=0.6,<0.11 (4cede9d)
  • deps: update dependency pyarrow to v6 (#3092) (61e52b5)
  • don't force backends to override do_connect until 3.0.0 (4b46973)
  • execute materialized joins in the pandas and dask backends (#3086) (9ed937a)
  • literal: allow creating ibis literal with uuid (#3131) (b0f4f44)
  • restore the ability to have more than two option levels (#3151) (fb4a944)
  • sqlalchemy: fix correlated subquery compilation (43b9010)
  • sqlite: defer db connection until needed (#3127) (5467afa), closes #64

Features

  • allow column_of to take a column expression (dbc34bb)
  • ci: More readable workflow job titles (#3111) (d8fd7d9)
  • datafusion: initial implementation for Arrow Datafusion backend (3a67840), closes #2627
  • datafusion: initial implementation for Arrow Datafusion backend (75876d9), closes #2627
  • make dayofweek impls conform to pandas semantics (#3161) (9297828)

Reverts

  • "ci: install gdal for fiona" (8503361)