Ibis Versions Save

the portable Python dataframe library

8.0.0

2 months ago

8.0.0 (2024-02-05)

⚠ BREAKING CHANGES

  • backends: Columns with Ibis date types are now returned as object dtype containing datetime.date objects when executing with the pandas backend.
  • impala: Direct HDFS integration is removed and support for ingesting pandas DataFrames directly is as well. The Impala backend still works with HDFS, but data in HDFS must be managed outside of ibis.
  • api: replace ibis.show_sql(expr) calls with print(ibis.to_sql(expr)) or if using Jupyter or IPython ibis.to_sql(expr)
  • bigquery: nullifzero is removed; use nullif(0) instead
  • bigquery: zeroifnull is removed; use fillna(0) instead
  • bigquery: list_databases is removed; use list_schemas instead
  • bigquery: the bigquery current_database method returns the data_project instead of the dataset_id. Use current_schema to retrieve dataset_id. To explicitly list tables in a given project and dataset, you can use f"{con.current_database}.{con.current_schema}"

Features

  • api: define RegexSplit operation and re_split API (07beaed)
  • api: support median and quantile on more types (#7810) (49c75a8)
  • clickhouse: implement RegexSplit (e3c507e)
  • datafusion: implement ops.RegexSplit using pyarrow UDF (37b6b7f)
  • datafusion: set ops (37abea9)
  • datatypes: add decimal and basic geospatial support to the sqlglot type parser/generator (59783b9)
  • datatypes: make intervals round trip through sqlglot type mapper (d22f97a)
  • duckdb-geospatial: add support for flipping coordinates (d47088b)
  • duckdb-geospatial: enable use of literals (23ad256)
  • duckdb: implement RegexSplit (229a1f4)
  • examples: add zones geojson example (#8040) (2d562b7), closes #7958
  • flink: add new temporal operators (dfef418)
  • flink: add primary key support (da04679)
  • flink: export result to pyarrow (9566263)
  • flink: implement array operators (#7951) (80e13b4)
  • flink: implement struct field, clean up literal, and adjust timecontext test markers (#7997) (2d5e108)
  • impala: rudimentary date support (d4bcf7b)
  • mssql: add hashbytes and test for binary output hash fns (#8107) (91f60cd), closes #8082 #8082
  • mssql: use odbc (f03ad0c)
  • polars: implement ops.RegexSplit using pyarrow UDF (a3bed10)
  • postgres: implement RegexSplit (c955b6a)
  • pyspark: implement RegexSplit (cfe0329)
  • risingwave: init impl for Risingwave (#7954) (351747a), closes #8038
  • snowflake: implement RegexSplit (2c1a726)
  • snowflake: implement insert method (2162e3f)
  • trino: implement RegexSplit (9d1295f)

Bug Fixes

  • api: deferred values are not truthy (00b3ece)
  • backends: ensure that returned date results are actually proper date values (0626fb2)
  • backends: preserve order_by position in window function when subsequent expressions are duplicated (#7943) (89056b9), closes #7940
  • common: do not convert callables to resolveable objects (9963705)
  • datafusion: work around lack of support for uppercase units in intervals (ebb6cde)
  • datatypes: ensure that array construction supports literals and infers their shape from its inputs (#8049) (899dce1), closes #8022
  • datatypes: fix bad references in to_numpy() (6fd4550)
  • deps: remove filelock from required dependencies (76dded5)
  • deps: update dependency black to v24 (425f7b1)
  • deps: update dependency datafusion to v34 (601f889)
  • deps: update dependency datafusion to v35 (#8224) (a34af25)
  • deps: update dependency oracledb to v2 (e7419ca)
  • deps: update dependency pyarrow to v15 (ef6a9bd)
  • deps: update dependency pyodbc to v5 (32044ea)
  • docs: surround executable code blocks with interactive mode on/off (4c660e0)
  • duckdb: allow table creation from expr with geospatial datatypes (#7818) (ecac322)
  • duckdb: ensure that casting to floating point values produces valid types in generated sql (424b206)
  • examples: use anonymous access when reading example data from GCS (8e5c0af)
  • impala: generate memtables using UNION ALL to work around sqlglot bug (399a5ef)
  • mutate/select: ensure that unsplatted dictionaries work in mutateandselect APIs (#8014) (8ed19ea), closes #8013
  • mysql: catch PyMySQL OperationalError exception (#7919) (f2c2664), closes #6010 #7918
  • pandas: support non-string categorical columns (5de08c7)
  • polars: avoid using unnecessary subquery for schema inference (0f43667)
  • polars: handle integers coming out of high precision numpy datetime64 values (bcf36cb)
  • postgres: ensure that no timezone conversion takes place on timestamptz columns when selecting them out (7b79ec8)
  • repr: default to pa.binary for all geospatial dtypes (#7817) (066d3fc)
  • repr: force exception message to console in IPython in interactive mode (414c49a)
  • snowflake: insert into the correct object (5e1efe3)
  • sqlalchemy: properly handle aliases of extracted subqueries (38aaf8f)
  • sqlglot: stop using removed singletons for true, false, null (4fb0aad)

Documentation

  • add composable data ecosystem concept (#7898) (d78a887), closes #6618
  • add exasol to list of supported backends (4fae620)
  • add ibis.join() to docs (#7913) (de2e282), closes #7895
  • add image preview for index page (#7920) (ac2375a)
  • add post about move to Zulip chat (#7889) (88f1ee8), closes #7888
  • add quotes around install in 1brc post (#8065) (5998143)
  • add user testimonials page (#7897) (c0714f8), closes #7341
  • blog for the 1 billion row challenge (#8004) (141edea)
  • blog-post: replicate spatial dev guru blog (4b73c3b)
  • blog: redux array blog with equivalent duckdb and bq expressions (5bde8da)
  • blog: show how to install geospatial dependencies (951a169)
  • blog: update geospatial - no need to_array() (78434a0)
  • contrib: add pull request template (effd461)
  • deps: bump quarto version to pick up dashboard feature (79657db)
  • dev: update maintainers guide (d67409c)
  • document possible range of seed values to Table.sample (6a652ec)
  • duckdb: correct wording for empty path logic (72b2cde)
  • fix formatting for note on _name, _dtype (#7911) (e58be2e)
  • fix rolling date on bigquery/duckdb array blog (#8059) (fb09b78)
  • flink: add to the set of documented backends (83eab61)
  • flink: override default install instructions (4fc8e75)
  • geospatial: add examples for duckdb supported methods (#8128) (2a92306), closes #7959
  • geospatial: fix flaky ci geo-literals doctests (417e81d)
  • hyphenate "properly formatted" and add colon (5ab1c27)
  • ibis-analytics blog post (#7990) (17a1ef2)
  • improve UDF signature docs (#8194) (3cdc6ce)
  • include American spelling usage in style guide (#8163) (ac72157), closes #8162
  • kedro blog post link (#8150) (1ffe435)
  • meta: add goatcounter to header of all quarto pages (fd2e6c9)
  • minor edit to the who supports ibis doc (#7896) (d5a0779)
  • minor update to composable data ecosystem concept (a46bd4a)
  • pandas: fix format for kwarg warning callout (0f6d45d)
  • pyspark: document ibis.connect using a URL (d6049f8)
  • pyspark: mention using ibis.connect (33c855a)
  • random: document behavior of repeated use of ibis.random() instance (f4b67e5)
  • row_number always starts at 0 (#8209) (5a26c05)
  • security: add a security policy (33e9f26)
  • sql-tutorial: fix minor typo in union section of SQL user tutorial (ca6c2a5)
  • style: add style guide to contributing (#8092) (b807555), closes #7094
  • support-matrix: replace the backend info streamlit app with a static quarto dashboard (f9da637)
  • update quickstart to use rename (#8196) (9ed4e92)
  • update release date on Ibis geospatial dev guru post (175f141)
  • who supports Ibis (#7892) (1a5a420), closes #7743

Refactors

  • api: remove show_sql in favor of print(to_sql) (36da8c1)
  • bigquery: remove list_databases (22e5ada)
  • bigquery: remove nullifzero (8447b9a)
  • bigquery: remove zeroifnull (8be3c25)
  • bigquery: return data_project as database, not dataset_id (05608eb)
  • deps: make pins an optional dependency through an examples extra (#7878) (3d6c3f1), closes #7844
  • flink: expose raw_sql over _exec_sql (0b66b94)
  • impala: modernize the impala backend (252833d)

Deprecations

  • deprecate Value.least() and Value.greatest() (f711337)

7.2.0

4 months ago

7.2.0 (2023-12-18)

Features

  • api: add ArrayValue.flatten method and operation (e6e995c)
  • api: add ibis.range function for generating sequences (f5a0a5a)
  • api: add timestamp range (c567fe0)
  • base: add to_pandas method to BaseBackend (3d1cf66)
  • clickhouse: implement array flatten support (d15c6e6)
  • common: node.replace() now supports mappings for quick lookup-like substitutions (bbc93c7)
  • common: add node.find_topmost() method to locate matching nodes without descending further to their children (15acf7d)
  • common: allow matching on dictionaries in possibly nested patterns (1d314f7)
  • common: expose node.__children__ property to access the flattened list of children of a node (2e91476)
  • duckdb: add initial support for geospatial functions (65f496c)
  • duckdb: add read_geo function (b19a8ce)
  • duckdb: enforce aswkb for projections, coerce to geopandas (33327dc)
  • duckdb: implement array flatten support (0a0eecc)
  • exasol: add exasol backend (295903d)
  • export: allow passing keyword arguments to PyArrow ParquetWriter and CSVWriter (40558fd)
  • flink: implement nested schema support (057fabc)
  • flink: implement windowed computations (256767f)
  • geospatial: add support for GeoTransform on duckdb (ec533c1)
  • geospatial: update read_geo to support url (3baf509)
  • pandas/dask: implement flatten (c2e8d9d)
  • polars: add streaming kwarg to to_pandas (703507f)
  • polars: implement array flatten support (19b2aa0)
  • pyspark: enable multiple values in .substitute (291a290)
  • pyspark: implement array flatten support (5d1fadf)
  • snowflake: implement array flatten support (d3c754f)
  • snowflake: read_csv with https (72752eb)
  • snowflake: support udf arguments for reading from staged files (529a3a2)
  • snowflake: use upstream array_sort (9624341)
  • sqlalchemy: support expressions in window bounds (5dbb3b1)
  • trino: implement array flatten support (0d1faaa)

Bug Fixes

  • api: avoid casting to bool for table.info() nullable column (3b3bd7b)
  • bigquery: escape the schema (project ID) for BQ builtin UDFs (8096552)
  • bigquery: fully qualified memtable names in compile (a81e432)
  • clickhouse: use backwards compatible methods of getting query metadata (975556f)
  • datafusion: bring back UDF registration (43084fa)
  • datafusion: ensure that non-matching re_search calls return bool values when patterns do not match (088b027)
  • datafusion: support computed group by when the aggregation is count distinct (18bdb7e)
  • decompile: handle isin (6857751)
  • deferred: don't pass expression in fstringified error message (724859d)
  • deps: update dependency datafusion to v33 (57047a2)
  • deps: update dependency sqlglot to v20 (13bc6e2)
  • duckdb: ensure that already quoted identifiers are not erased (45ee391)
  • duckdb: ensure that parameter names are unlikely to overlap with column names (d93dbe2)
  • duckdb: gate geoalchemy import in duckdb geospatial (8f012c4)
  • duckdb: render dates, times, timestamps and none literals correctly (5d8866a)
  • duckdb: use functions for temporal literals (b1407f8)
  • duckdb: use the UDF's signature instead of arguments' output type for generating a duckdb signature (233dce1)
  • flink: add more test (33e1a31)
  • flink: add os to the cache key (1b92b33)
  • flink: add test cases for recreate table (1413de9)
  • flink: customize the list of base idenitifers (0b5d343)
  • flink: fix recreating table/view issue on flink backend (0c9791f)
  • flink: implement TypeMapper and SchemaMapper for Flink backend (f983bfa)
  • flink: use lazy import to prevent premature loading of pyflink during gen_matrix (d042402)
  • geospatial: pretty print data in interactive mode (afb04ed)
  • ir: ensure that join projection columns are all always nullable (f5f35c6)
  • ir: handle renaming for scalar operations (6f77f17)
  • ir: handle the case of non-overlapping data and add a test (1c9ae1b)
  • ir: implicitly convert None literals with dt.Null type to the requested type during value coercion (d51ec4e)
  • ir: merge window frames for bound analytic window functions with a subsequent over call (e12ce8d)
  • ir: raise if Concrete.copy() receives unexpected arguments (442199a)
  • memtable: ensure column names match provided data (faf99df)
  • memtables: disallow duplicate column names when constructing memtables (4937b48)
  • mssql: compute the length of strings correctly (64d2957)
  • mssql: render dates, times and timestamps correctly (aca30e1)
  • mysql: render dates and timestamps correctly (19e878c)
  • oracle: ensure that .sql metadata results are in column-definition order (26a3c1f)
  • oracle: render dates and timestamps correctly (66fbad6)
  • pandas-format: convert map keys (bb92e9f)
  • pandas: ensure that empty arrays unnest to nothing (fa9831f)
  • pandas: fix integer wraparound when extracting epoch seconds (e98fa3c)
  • pandas: handle non-nullable type mapping (c6a6c56)
  • parse_sql: parse IN clauses (8b1f7b5)
  • polars: handle new categorical types (5d6d6ae)
  • polars: handle the case of an empty InValues list (b26aa55)
  • polars: project first when creating computed grouping keys (7f9fdd4)
  • postgres: render dates, times, timestamps and none literals correctly (a3c1c07)
  • pyarrow: avoid catching ValueError and hiding legitimate failures (b7f650c)
  • pyspark,polars: add packaging extra (bdde3a4)
  • pyspark: custom format converter to handle pyspark timestamps (758ec25)
  • snowflake: convert arrays, maps and structs using the base class implementation (f361891)
  • snowflake: convert path to str when checking for a prefix (c5f884c)
  • snowflake: ensure that empty arrays unnest to nothing (28c2498)
  • snowflake: fix array printing by using a pyarrow extension type (7d8fe5a)
  • snowflake: fix creating table in a different database (9b65b48)
  • snowflake: fix quoting across all apis (7bf8e84)
  • substitute: allow mappings with None keys (4b28ff1)

Documentation

  • add exasol to the backend coverage app (3575858)
  • arrays: document behavior of unnest in the presence of empty array rows (5526c40)
  • backends: include docs for inherited members (c04bf67)
  • blog-post: add blog post comparing ibis to pandas and dask (a7fd32b)
  • blog-post: add blogpost ibis duckdb geospatial (def8031)
  • blog-post: pydata performance part 2; polars and datafusion (36e1db5)
  • blog: add dbt-ibis post (d73c156)
  • blog: add pypi compiled file extension blog (751cfcf)
  • build: allow building individual docs without rendering api docs first (529ee6c)
  • build: turn off interactive mode before every example (502b88c)
  • fix minor typo in sql.qmd (17aa929)
  • fix typo in ir.Table docstring (e3b9611)
  • fix typos (9a4d1f8)
  • make minor edits to duckdb-geospatial post (2365e10)
  • name: improve docstring of ibis.param API (2f9ec90)
  • name: improve docstring of Value.name API (dd66af2)
  • perf: use an unordered list instead of an ordered one (297be44)
  • pypi-metadata-post: add Fortran pattern and fix regex (12058f2)
  • remove confusing backend page (c1d19c7)
  • replace deprecated relabels with renames (6bc9e15)
  • sql: emphasize the need to close a raw_sql cursor only when using SELECT statements (74379a8)
  • tests: add API docs for the testing base classes (173e9a9)
  • tests: document class variables in BackendTest (e814c6b)

Refactors

  • analysis: always merge frames during windowization (66fd69c)
  • bigquery: move BigQueryType to use sqlglot for type parsing and generation (6e3219f)
  • clickhouse: clean up session timezone handling (66220c7)
  • clickhouse: use isoformat instead of manual specification (a3fac3e)
  • common: consolidate the finder and replacer inputs for the various graph methods (a1881eb)
  • common: remove traverse() function's filter argument since it can be expressed using the visitor (e4e2993)
  • common: unify the node.find() and node.match() methods to transparently support types and patterns (3c14091)
  • datafusion: simplify execute and to_pyarrow implementations (c572eab)
  • duckdb: use pyarrow for all memtable registration (d6a2f09)
  • formats: move the TableProxy object to formats from the operations (05964b1)
  • pandas-format: move to classmethods to pickup super class behavior where possible (7bb0470)
  • snowflake: use upstream map-from-arrays function instead of a custom UDF (318459c)
  • tests: remove test rounding mixins (3b730d9)
  • tests: remove UnorderedComparator class (ab0a8f6)

Performance

  • common: improve the performance of replacing nodes by using a specialized node.__recreate__() method (f3da926)

7.1.0

5 months ago

7.1.0 (2023-11-16)

Features

  • api: add bucket method for timestamps (ca0f7bc)
  • api: add Table.sample method for sampling rows from a table (3ce2617)
  • api: allow selectors in order_by (359fd5e)
  • api: move analytic window functions to top-level (8f2ced1)
  • api: support deferred in reduction filters (349f475)
  • api: support specifying signature in udf definitions (764977e)
  • bigquery: add location parameter (d652dbb)
  • bigquery: add read_csv, read_json, read_parquet support (ff83110)
  • bigquery: support temporary tables using sessions (eab48a9)
  • clickhouse: add support for timestamp bucket (10a5916)
  • clickhouse: support Table.fillna (5633660)
  • common: better inheritance support for Slotted and FrozenSlotted (9165d41)
  • common: make Slotted and FrozenSlotted pickleable (13cbce0)
  • common: support Self annotations for Annotable (0c60146)
  • common: use patterns to filter out nodes during graph traversal (3edd8f7)
  • dask: add read_csv and read_parquet (e9260af)
  • dask: enable pyarrow conversion (2d36722)
  • dask: support Table.sample (09a7626)
  • datafusion: add case and if-else statements (851d560)
  • datafusion: add corr and covar (edc42be)
  • datafusion: add isnull and isnan operations (0076c25)
  • datafusion: add some array functions (0b96b68)
  • datafusion: add StringLength, FindInSet, ArrayStringJoin (fd03831)
  • datafusion: add TimestampFromUNIX and subtract/add operations (2bffa5a)
  • datafusion: add TimestampTruncate / fix broken extract time part functions (940ed21)
  • datafusion: support dropping schemas (cc6870c)
  • duckdb: add attach and detach methods for adding and removing databases to the current duckdb session (162b058)
  • duckdb: add ntile support (bf08a2a)
  • duckdb: add dict-like for DuckDB settings (ea2d317)
  • duckdb: add support for specific timestamp scales (3518b78)
  • duckdb: allow users to register fsspec filesystem with DuckDB (6172f07)
  • duckdb: expose option to force reinstall extension (98080d0)
  • duckdb: implement Table.sample as a TABLESAMPLE query (3a80f3a)
  • duckdb: implement partial json collection casting (aae28e9)
  • flink: add remaining operators for Flink to pass/skip the common tests (b27adc6)
  • flink: add several temporal operators (f758228)
  • flink: implement the ops.TryCast operation (752e587)
  • formats: map ibis JSON type to pyarrow strings (79b6eac)
  • impala/pyspark: implement to_pyarrow (6b33454)
  • impala: implement Table.sample (8e78dfc)
  • implement window table valued functions (a35a756)
  • improve generated column names for methods receiving intervals (c319ed3)
  • mssql: add support for timestamp bucket (1ffac11)
  • mssql: support cross-db/cross-schema table list (3e0f0fa)
  • mysql: support ntile (9a14ba3)
  • oracle: add fixes after running pre-commit (6538b70)
  • oracle: add fixes after running pre-commit (e3d14b3)
  • oracle: add support for loading Oracle RAW and BLOB types (c77eeb2)
  • oracle: change parsing of Oracle NUMBER data type (649ab86)
  • oracle: remove redundant brackets (2905484)
  • pandas: add read_csv and read_parquet (34eeca6)
  • pandas: support Table.sample (77215be)
  • polars: add support for timestamp bucket (c59518c)
  • postgres: add support for timestamp bucket (4d34afc)
  • pyspark: support Table.sample (6aa897e)
  • snowflake: support ntile (39eed1a)
  • snowflake: support cross-db/cross-schema table list (2071897)
  • snowflake: support timestamp bucketing (a95ffa9)
  • sql: implement Table.sample as a random() filter across several SQL backends (e1870ea)
  • trino: implement Table.sample as a TABLESAMPLE query (f3d044c)
  • trino: support ntile (2978d1a)
  • trino: support temporal operations (8b8e885)
  • udf: improve mypy compatibility for udf functions (65b5bb7)
  • use to_pyarrow instead of to_pandas in the interactive repr (72aa573)
  • ux: fix long links, add repr links in vscode (734bd91)
  • ux: implement recursive element conversion for nested types and json (8ddfa94)
  • ux: render url strings as links in rich table output (1c7a9b6)
  • ux: show syntax-highlighted SQL if pygments is installed (09881b0)

Bug Fixes

  • bigquery: apply unnest transformation in other methods that execute SQL (2cc9d0e)
  • bigquery: avoid trying to filter separator argument to GroupConcat operation (ed3b017)
  • bigquery: ensure that the identifier is parsed according to the dialect (f5bb555)
  • bigquery: move sql code to proper argument (abb0bdd)
  • datafusion: do_connect: properly deal with config-is-actually-context (649480c)
  • datafusion: fix some temporal operations (3206dbc)
  • datatypes: correct uint upper bounds (5ca56d5)
  • datatypes: correct unsigned integer bounds (1e40d4e)
  • deps: bump pins lower bound to pickup transitive fsspec upper bound (983e23e)
  • deps: bump sqlglot lower bound (a47be79)
  • deps: pin pyspark to a working version (7eb8a19)
  • deps: update dependency datafusion to v32 (1afbe9c)
  • deps: update dependency pyarrow to v14 (bce86c4)
  • deps: update dependency sqlglot to v19 (1f3ae07)
  • duckdb: ensure proper quoting when compiling cross database/schema tables (8d7b5fa)
  • duckdb: query table list directly instead of relying on sqlalchemy (5d7822c)
  • duckdb: use connect instead of begin to avoid nesting transactions (6889543)
  • flink: cast argument to integer for reduction (5059eed)
  • flink: correct the filtered count translation (2cbca74)
  • flink: re-implement ops.ApproxCountDistinct (2e3a5a0)
  • ir: ibis.parse_sql() removes where clause (522f3a4)
  • ir: coerce integers passed to Value[dt.Floating] annotated values as dt.float64 (b8a924a)
  • ir: ensure that windowization directly wraps the reduction/analytic function (772df36)
  • mssql: support translation of ops.Neg() when projecting a field (ca49d2a)
  • oracle: change filter inside select into case when (c743fa2)
  • oracle: disable if_exists for Oracle drop view command (973133b)
  • oracle: fix fallback column type inference (fb5d56d)
  • pandas: drop __index_level_N__ cols before applying schema (b53feac)
  • patterns: Object pattern should match on positional arguments first (96c796f)
  • patterns: PatternList should keep the original pattern's type (6552639)
  • polars: bump lower bound to 0.19.8 and clean up a bunch of backcompat code (462bd17)
  • polars: various polars enhancements (5948dd6)
  • repr: add dispatch for repr of GeoSpatialBinOps (843d086)
  • snowflake: include views when listing tables for backwards compatibility (094881b)
  • snowflake: support snowflake 3.3.0 (nanoarrow) (a0f24e8)
  • sqlalchemy: ensure that limit on .sql calls works (a5e3062)
  • sqlite: handle BLOB datatype (d36ed1c)
  • sqlite: truncate week to previous week not following (6239794)
  • sql: subtract one from ntile output in string-generating backends (1d264dc)
  • support self joins on memtables (f24e355)
  • trino: enable passing the database argument when accessing tables (e7ce43e)
  • trino: ensure that a schema is not required upon connection when accessing tables with explicit schema (8bde3e0)
  • use pyarrow_hotfix where necessary (0fa1e5d)

Documentation

  • add .nullif() example (6d405df)
  • add "similar to pandas ..." to docstrings (cd7be29)
  • add basic intro docstring to Table class (1a68f31)
  • add callout note for Table.sample (51027d9)
  • add copyright holders to license (ca97dfb)
  • add deprecation to .nullifzero docstring (8502e81)
  • add example to Value.hash() (501ae92)
  • add examples to Value.typeof() (c146381)
  • add more examples to Table.select() (735bbd0)
  • add See Also sections to some APIs (be8938f)
  • clickhouse: freeze clickhouse backend docs to avoid rate limit from upstream playground (e3a7eac)
  • contribute: fix instructions for nix environment setup (013cedd)
  • contribute: fix path to conda-lock files for contributors (ef5bdf9)
  • dedupe 6.2.0 and 7.0.0 release notes (7ce4b1a)
  • fix and improve .isin() docstring (063cfba)
  • fix dask compile docstring typo (d38d2c4)
  • fix link in Value.type() docstring (43b798c)
  • fixup link (d4c97b0)
  • flink: add backend back to support matrix df (e846e80)
  • improve .between() docstring (a086134)
  • improve .case() and .cases() docstrings (7fc89e8)
  • improve cast() and try_cast() docstrings (0b686e8)
  • improve cross-linking within reference (9e45194)
  • improve examples for Table.order_by() (9465b2a)
  • improve join() docstring (84c08c6), closes #7424
  • improve re_replace docstring (f55d0db)
  • improve Table.columns docstring (d50558b)
  • mysql: render_do_connect mssql to mysql (3c2da6c)
  • pandas: show methods from BasePandasBackend (20fd120)
  • ranking: add ranking function docstrings (750bfeb)
  • setup codespace configuration [skip ci] (5363b94)
  • style: replace Black with Ruff in guidelines (1db3047)
  • temporal: add Literal annotation to display possible units for delta method (ee94cb5)
  • trino: add details for connecting to starburst (ca9873a)
  • trino: add note about SSO configuration (457534b)
  • udfs: fix udf interlink locations (c26e48b)

Refactors

  • analysis: remove _rewrite_filter() in favor of using replacement patterns (4c0ac2e)
  • analysis: remove is_reduction() (2acc31f)
  • analysis: remove pushdown_aggregation_filters() (cf95ff7)
  • analysis: remove sub_for(), substitute(), find_toplevel_aggs() (492b296)
  • analysis: remove substitute_parents() (cd91a7e)
  • analysis: remove substitute_unbound() since it is used at a single place (6a6ad19)
  • analysis: simplify and improve pushdown_selection_filters() (2e47738)
  • analysis: vastly simplify windowize_function (998bbaa)
  • backends: move read_delta to base io handler (3d5a684)
  • bigquery: add schema kwarg to list_tables (95be62f)
  • bigquery: remove session use (60e7900)
  • bigquery: remove unused BigQueryTable object (b83e60e)
  • clean up lit usage (1bc6cee)
  • clickhouse: apply repetitive transformations as pattern replacements (e966af8)
  • clickhouse: replace lit with builtin sqlglot functions (221b630)
  • clickhouse: use a pattern for one-to-zero index conversion of ranking window functions (732c031)
  • clickhouse: use sqlglot for create_table implementation (ea0826d)
  • common: remove ibis.common.bases.Base in favor of Abstract (8ed313c)
  • datafusion: create registry of time udfs to create them only once (9ed0a89)
  • docker-compose: clean up unused exposed ports and make envar spec uniform (7ee518d)
  • duckdb: remove lit (6f77df9)
  • flink: use FILTER syntax when counting (815c12f)
  • imports: move pandas-importing object to method (103a524)
  • ir: remove ibis.expr.streaming (70df318)
  • ir: remove ops.Negatable, ops.NotAny, ops.NotAll, ops.UnresolvedNotExistsSubquery (e31e8fd)
  • ir: unify ibis.common.pattern builders and ibis.expr.deferred (652ceab)
  • make _WellKnownText not a NamedTuple (9a9e733)
  • oracle: deprecate database for schema in list_tables (c8ea79f)
  • patterns: support more flexible sequence matching (b8e463d)
  • postgres: deprecate database for schema in list_tables (d622730)
  • remove unused *args in udf functions (e22236c)
  • sql: align logic for filtered reductions (0347036)
  • temporal: remove unnecessary Temporal* classes (d3bcf73)
  • trino: support better cross-db/cross-schema table list (d2cf1c9)
  • use rewrite rules to handle fillna/dropna in sql backends (f5e06a6)

Performance

  • bigquery: use more efficient representation for memtables (697d325)

7.0.0

6 months ago

6.2.0

7 months ago

6.2.0 (2023-08-31)

Features

  • trino: add source application to trino backend (cf5fdb9)

Bug Fixes

  • bigquery,impala: escape all ASCII escape sequences in string literals (402f5ca)
  • bigquery: correctly escape ASCII escape sequences in regex patterns (a455203)
  • release: pin conventional-changelog-conventionalcommits to 6.1.0 (d6526b8)
  • trino: ensure that list_databases look at all catalogs not just the current one (cfbdbf1)
  • trino: override incorrect base sqlalchemy list_schemas implementation (84d38a1)

Documentation

  • trino: add connection docstring (507a00e)

6.1.0

8 months ago

6.1.0 (2023-08-03)

Features

  • api: add ibis.dtype top-level API (867e5f1)
  • api: add table.nunique() for counting unique table rows (adcd762)
  • api: allow mixing literals and columns in ibis.array (3355dd8)
  • api: improve efficiency of __dataframe__ protocol (15e27da)
  • api: support boolean literals in join API (c56376f)
  • arrays: add concat method equivalent to __add__/__radd__ (0ed0ab1)
  • arrays: add repeat method equivalent to __mul__/__rmul__ (b457c7b)
  • backends: add current_schema API (955a9d0)
  • bigquery: fill out CREATE TABLE DDL options including support for overwrite (5dac7ec)
  • datafusion: add count_distinct, median, approx_median, stddev and var aggregations (45089c4)
  • datafusion: add extract url fields functions (4f5ea98)
  • datafusion: add functions sign, power, nullifzero, log (ef72e40)
  • datafusion: add RegexSearch, StringContains and StringJoin (4edaab5)
  • datafusion: implement in-memory table (d4ec5c2)
  • flink: add tests and translation rules for additional operators (fc2aa5d)
  • flink: implement translation rules and tests for over aggregation in Flink backend (e173cd7)
  • flink: implement translation rules for literal expressions in flink compiler (a8f4880)
  • improved error messages when missing backend dependencies (2fe851b)
  • make output of to_sql a proper str subclass (084bdb9)
  • pandas: add ExtractURLField functions (e369333)
  • polars: implement ops.SelfReference (983e393)
  • pyspark: read/write delta tables (d403187)
  • refactor ddl for create_database and add create_schema where relevant (d7a857c)
  • sqlite: add scalar python udf support to sqlite (92f29e6)
  • sqlite: implement extract url field functions (cb1956f)
  • trino: implement support for .sql table expression method (479bc60)
  • trino: support table properties when creating a table (b9d65ef)

Bug Fixes

  • api: allow scalar window order keys (3d3f4f3)
  • backends: make current_database implementation and API consistent across all backends (eeeeee0)
  • bigquery: respect the fully qualified table name at the init (a25f460)
  • clickhouse: check dispatching instead of membership in the registry for has_operation (acb7f3f)
  • datafusion: always quote column names to prevent datafusion from normalizing case (310db2b)
  • deps: update dependency datafusion to v27 (3a311cd)
  • druid: handle conversion issues from string, binary, and timestamp (b632063)
  • duckdb: avoid double escaping backslashes for bind parameters (8436f57)
  • duckdb: cast read_only to string for connection (27e17d6)
  • duckdb: deduplicate results from list_schemas() (172520e)
  • duckdb: ensure that current_database returns the correct value (2039b1e)
  • duckdb: handle conversion from duckdb_engine unsigned int aliases (e6fd0cc)
  • duckdb: map hugeint to decimal to avoid information loss (4fe91d4)
  • duckdb: run pre-execute-hooks in duckdb before file export (5bdaa1d)
  • duckdb: use regexp_matches to ensure that matching checks containment instead of a full match (0a0cda6)
  • examples: remove example datasets that are incompatible with case-insensitive file systems (4048826)
  • exprs: ensure that left_semi and semi are equivalent (bbc1eb7)
  • forward arguments through __dataframe__ protocol (50f3be9)
  • ir: change "it not a" to "is not a" in errors (d0d463f)
  • memtable: implement support for translation of empty memtable (05b02da)
  • mysql: fix UUID type reflection for sqlalchemy 2.0.18 (12d4039)
  • mysql: pass-through kwargs to connect_args (e3f3e2d)
  • ops: ensure that name attribute is always valid for ops.SelfReference (9068aca)
  • polars: ensure that pivot_longer works with more than one column (822c912)
  • polars: fix collect implementation (c1182be)
  • postgres: by default use domain socket (e44fdfb)
  • pyspark: make has_operation method a [@classmethod](https://github.com/classmethod) (c1b7dbc)
  • release: use @google/[email protected] to avoid module loading bug (673aab3)
  • snowflake: fix broken unnest functionality (207587c)
  • snowflake: reset the schema and database to the original schema after creating them (54ce26a)
  • snowflake: reset to original schema when resetting the database (32ff832)
  • snowflake: use regexp_instr != 0 instead of REGEXP keyword (06e2be4)
  • sqlalchemy: add support for sqlalchemy string subclassed types (8b33b35)
  • sql: handle parsing aliases (3645cf4)
  • trino: handle all remaining common datatype parsing (b3778c7)
  • trino: remove filter index warning in Trino dialect (a2ae7ae)

Documentation

  • add conda/mamba install instructions for specific backends (c643fca)
  • add docstrings to DataType.is_* methods (ed40fdb)
  • backend-matrix: add ability to select a specific subset of backends (f663066)
  • backends: document memtable support and performance for each backend (b321733)
  • blog: v6.0.0 release blog (21fc5da)
  • document versioning policy (242ea15)
  • dot-sql: add examples of mixing ibis expressions and SQL strings (5abd30e)
  • dplyr: small fixes to the dplyr getting started guide (4b57f7f)
  • expand docstring for dtype function (39b7a24)
  • fix functions names in examples of extract url fields (872445e)
  • fix heading in 6.0.0 blog (0ad3ce2)
  • oracle: add note about old password checks in oracle (470b90b)
  • postgres: fix postgres memtable docs (7423eb9)
  • release-notes: fix typo (a319e3a)
  • social: add social media preview cards (e98a0a6)
  • update imports/exports for pyspark backend (16d73c4)

Refactors

  • pyarrow: remove unnecessary calls to combine_chunks (c026d2d)
  • pyarrow: use schema.empty_table() instead of manually constructing empty tables (c099302)
  • result-handling: remove result_handler in favor of expression specific methods (3dc7143)
  • snowflake: enable multiple statements and clean up duplicated parameter setting code (75824a6)
  • tests: clean up backend test setup to make non-data-loading steps atomic (16b4632)

6.0.0

9 months ago

5.1.0

1 year ago

5.1.0 (2023-04-11)

Features

  • api: expand distinct API for dropping duplicates based on column subsets (3720ea5)
  • api: implement pyarrow memtables (9d4fbbd)
  • api: support passing a format string to Table.relabel (0583959)
  • api: thread kwargs around properly to support more complex connection arguments (7e0e15b)
  • backends: add more array functions (5208801)
  • bigquery: make to_pyarrow_batches() smarter (42f5987)
  • bigquery: support bignumeric type (d7c0f49)
  • default repr to showing all columns in Jupyter notebooks (91a0811)
  • druid: add re_search support (946202b)
  • duckdb: add map operations (a4c4e77)
  • duckdb: support sqlalchemy 2 (679bb52)
  • mssql: implement ops.StandardDev, ops.Variance (e322f1d)
  • pandas: support memtable in pandas backend (6e4d621), closes #5467
  • polars: implement count distinct (aea4ccd)
  • postgres: implement ops.Arbitrary (ee8dbab)
  • pyspark: pivot_longer (f600c90)
  • pyspark: add ArrayFilter operation (2b1301e)
  • pyspark: add ArrayMap operation (e2c159c)
  • pyspark: add DateDiff operation (bfd6109)
  • pyspark: add partial support for interval types (067120d)
  • pyspark: add read_csv, read_parquet, and register (7bd22af)
  • pyspark: implement count distinct (db29e10)
  • pyspark: support basic caching (ab0df7a)
  • snowflake: add optional 'connect_args' param (8bf2043)
  • snowflake: native pyarrow support (ce3d6a4)
  • sqlalchemy: support unknown types (fde79fa)
  • sqlite: implement ops.Arbitrary (9bcdf77)
  • sql: use temp views where possible (5b9d8c0)
  • table: implement pivot_wider API (60e7731)
  • ux: move ibis.expr.selectors to ibis.selectors and deprecate for removal in 6.0 (0ae639d)

Bug Fixes

  • api: disambiguate attribute errors from a missing resolve method (e12c4df)
  • api: support filter on literal followed by aggregate (68d65c8)
  • clickhouse: do not render aliases when compiling aggregate expression components (46caf3b)
  • clickhouse: ensure that clickhouse depends on sqlalchemy for make_url usage (ea10a27)
  • clickhouse: ensure that truncate works (1639914)
  • clickhouse: fix create_table implementation (5a54489)
  • clickhouse: workaround sqlglot issue with calling match (762f4d6)
  • deps: support pandas 2.0 (4f1d9fe)
  • duckdb: branch to avoid unnecessary dataframe construction (9d5d943)
  • duckdb: disable the progress bar by default (1a1892c)
  • duckdb: drop use of experimental parallel csv reader (47d8b92)
  • duckdb: generate SIMILAR TO instead of tilde to workaround sqlglot issue (434da27)
  • improve typing signature of .dropna() (e11de3f)
  • mssql: improve aggregation on expressions (58aa78d)
  • mssql: remove invalid aggregations (1ce3ef9)
  • polars: backwards compatibility for the time_zone and time_unit properties (3a2c4df)
  • postgres: allow inference of unknown types (343fb37)
  • pyspark: fail when aggregation contains a having filter (bd81a9f)
  • pyspark: raise proper error when trying to generate sql (51afc13)
  • snowflake: fix new array operations; remove ArrayRemove operation (772668b)
  • snowflake: make sure ephemeral tables following backend quoting rules (9a845df)
  • snowflake: make sure pyarrow is used when possible (01f5154)
  • sql: ensure that set operations resolve to a single relation (3a02965)
  • sql: generate consistent pivot_longer semantics in the presence of multiple unnests (6bc301a)
  • sqlglot: work with newer versions (6f7302d)
  • trino,duckdb,postgres: make cumulative notany/notall aggregations work (c2e985f)
  • trino: only support how='first' with arbitrary reduction (315b5e7)
  • ux: use guaranteed length-1 characters for NULL values (8618789)

Refactors

  • api: remove explicit use of .projection in favor of the shorter .select (73df8df)
  • cache: factor out ref counted cache (c816f00)
  • duckdb: simplify to_pyarrow_batches implementation (d6235ee)
  • duckdb: source loaded and installed extensions from duckdb (fb06262)
  • duckdb: use native duckdb parquet reader unless auth required (e9f57eb)
  • generate uuid-based names for temp tables (a1164df)
  • memtable: clean up dispatch code (9a19302)
  • memtable: dedup table proxy code (3bccec0)
  • sqlalchemy: remove unused _meta instance attributes (523e198)

Deprecations

  • api: deprecate Table.set_column in favor of Table.mutate (954a6b7)

Documentation

  • add a getting started guide (8fd03ce)
  • add warning about comparisons to None (5cf186a)
  • blog: add campaign finance blog post (383c708)
  • blog: add campaign finance to SUMMARY.md (0bdd093)
  • clean up agg argument descriptions and add join examples (93d3059)
  • comparison: add a "why ibis" page (011cc19)
  • move conda before nix in dev setup instructions (6b2cbaa)
  • nth: improve docstring for nth() (fb7b34b)
  • patch docs build to fix anchor links (51be459)
  • penguins: add citation for palmer penguins data (679848d)
  • penguins: change to flipper (eec3706)
  • refresh environment setup pages (b609571)
  • selectors: make doctests more complete and actually run them (c8f2964)
  • style and review fixes in getting started guide (3b0f8db)

5.0.0

1 year ago

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

  • api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
  • backend: Backends now raise ibis.common.exceptions.UnsupportedOperationError in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends.
  • ux: Table.info now returns an expression
  • ux: Passing a sequence of column names to Table.drop is removed. Replace drop(cols) with drop(*cols).
  • The spark plugin alias is removed. Use pyspark instead
  • ir: removed ibis.expr.scope and ibis.expr.timecontext modules, access them under ibis.backends.base.df.<module>
  • some methods have been removed from the top-level ibis.<backend> namespaces, access them on a connected backend instance instead.
  • common: removed ibis.common.geospatial, import the functions from ibis.backends.base.sql.registry.geospatial
  • datatypes: JSON is no longer a subtype of String
  • datatype: Category, CategoryValue/Column/Scalar are removed. Use string types instead.
  • ux: The metric_name argument to value_counts is removed. Use Table.relabel to change the metric column's name.
  • deps: the minimum version of parsy is now 2.0
  • ir/backends: removed the following symbols:
  • ibis.backends.duckdb.parse_type() function
  • ibis.backends.impala.Backend.set_database() method
  • ibis.backends.pyspark.Backend.set_database() method
  • ibis.backends.impala.ImpalaConnection.ping() method
  • ibis.expr.operations.DatabaseTable.change_name() method
  • ibis.expr.operations.ParseURL class
  • ibis.expr.operations.Value.to_projection() method
  • ibis.expr.types.Table.get_column() method
  • ibis.expr.types.Table.get_columns() method
  • ibis.expr.types.StringValue.parse_url() method
  • schema: Schema.from_dict(), .delete() and .append() methods are removed
  • datatype: struct_type.pairs is removed, use struct_type.fields instead
  • datatype: Struct(names, types) is not supported anymore, pass a dictionary to Struct constructor instead

Features

  • add max_columns option for table repr (a3aa236)
  • add examples API (b62356e)
  • api: add map/array accessors for easy conversion of JSON to stronger-typed values (d1e9d11)
  • api: add array to string join operation (74de349)
  • api: add builtin support for relabeling columns to snake case (1157273)
  • api: add support for passing a mapping to ibis.map (d365fd4)
  • api: allow single argument set operations (bb0a6f0)
  • api: implement to_pandas() API for ecosystem compatibility (cad316c)
  • api: implement isin (ac31db2)
  • api: make cache evaluate only once per session per expression (5a8ffe9)
  • api: make create_table uniform (833c698)
  • api: more selectors (5844304)
  • api: upcast pandas DataFrames to memtables in rlz.table rule (8dcfb8d)
  • backends: implement ops.Time for sqlalchemy backends (713cd33)
  • bigquery: add BIGNUMERIC type support (5c98ea4)
  • bigquery: add UUID literal support (ac47c62)
  • bigquery: enable subqueries in select statements (ef4dc86)
  • bigquery: implement create and drop table method (5f3c22c)
  • bigquery: implement create_view and drop_view method (a586473)
  • bigquery: support creating tables from in-memory tables (c3a25f1)
  • bigquery: support in-memory tables (37e3279)
  • change Rich repr of dtypes from blue to dim (008311f)
  • clickhouse: implement ArrayFilter translation (f2144b6)
  • clickhouse: implement ops.ArrayMap (45000e7)
  • clickhouse: implement ops.MapLength (fc82eaa)
  • clickhouse: implement ops.Capitalize (914c64c)
  • clickhouse: implement ops.ExtractMillisecond (ee74e3a)
  • clickhouse: implement ops.RandomScalar (104aeed)
  • clickhouse: implement ops.StringAscii (a507d17)
  • clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
  • clickhouse: improve error message for invalid types in literal (e4d7799)
  • clickhouse: support asof_join (7ed5143)
  • common: add abstract mapping collection with support for set operations (7d4aa0f)
  • common: add support for variadic positional and variadic keyword annotations (baea1fa)
  • common: hold typehint in the annotation objects (b3601c6)
  • common: support Callable arguments and return types in Validator.from_annotable() (ae57c36)
  • common: support positional only and keyword only arguments in annotations (340dca1)
  • dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
  • datafusion: implement ops.Degress, ops.Radians (7e61391)
  • datafusion: implement ops.Exp (7cb3ade)
  • datafusion: implement ops.Pi, ops.E (5a74cb4)
  • datafusion: implement ops.RandomScalar (5d1cd0f)
  • datafusion: implement ops.StartsWith (8099014)
  • datafusion: implement ops.StringAscii (b1d7672)
  • datafusion: implement ops.StrRight (016a082)
  • datafusion: implement ops.Translate (2fe3fc4)
  • datafusion: support substr without end (a19fd87)
  • datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
  • datatype: enable inference of Decimal type (8761732)
  • datatype: implement Mapping abstract base class for StructType (5df2022)
  • deps: add Python 3.11 support and tests (6f3f759)
  • druid: add Apache Druid backend (c4cc2a6)
  • druid: implement bitwise operations (3ac7447)
  • druid: implement ops.Pi, ops.Modulus, ops.Power, ops.Log10 (090ff03)
  • druid: implement ops.Sign (35f52cc)
  • druid: implement ops.StringJoin (42cd9a3)
  • duckdb: add support for reading tables from sqlite databases (9ba2211)
  • duckdb: add UUID type support (5cd6d76)
  • duckdb: implement ArrayFilter translation (5f35d5c)
  • duckdb: implement ops.ArrayMap (063602d)
  • duckdb: implement create_view and drop_view method (4f73953)
  • duckdb: implement ops.Capitalize (b17116e)
  • duckdb: implement ops.TimestampDiff, ops.IntervalAdd, ops.IntervalSubtract (a7fd8fb)
  • duckdb: implement uuid result type (3150333)
  • duckdb: support dt.MACADDR, dt.INET as string (c4739c7)
  • duckdb: use read_json_auto when reading json (4193867)
  • examples: add imdb dataset examples (3d63203)
  • examples: add movielens small dataset (5f7c15c)
  • examples: add wowah_data data to examples (bf9a7cc)
  • examples: enable progressbar and faster hashing (4adfe29)
  • impala: implement ops.Clip (279fd78)
  • impala: implement ops.Radians, ops.Degress (a794ace)
  • impala: implement ops.RandomScalar (874f2ff)
  • io: add to_parquet, to_csv to backends (fecca42)
  • ir: add ArrayFilter operation (e719d60)
  • ir: add ArrayMap operation (49e5f7a)
  • mysql: support in-memory tables (4dfabbd)
  • pandas/dask: implement bitwise operations (4994add)
  • pandas/dask: implement ops.Pi, ops.E (091be3c)
  • pandas: add basic unnest support (dd36b9d)
  • pandas: implement ops.StartsWith, ops.EndsWith (2725423)
  • pandas: support more pandas extension dtypes (54818ef)
  • polars: implement ops.Union (17c6011)
  • polars: implement ops.Pi, ops.E (6d8fc4a)
  • postgres: allow connecting with an explicit schema (39c9ea8)
  • postgres: fix interval literal (c0fa933)
  • postgres: implement argmin/argmax (82668ec)
  • postgres: parse tsvector columns as strings (fac8c47), closes #5402
  • pyspark: add support for ops.ArgMin and ops.ArgMax (a3fa57c)
  • pyspark: implement ops.Between (ed83465)
  • return Table from create_table(), create_view() (e4ea597)
  • schema: implement Mapping abstract base class for Schema (167d85a)
  • selectors: support ranges (e10caf4)
  • snowflake: add support for alias in snowflake (b1b947a)
  • snowflake: add support for bulk upload for temp tables in snowflake (6cc174f)
  • snowflake: add UUID literal support (436c781)
  • snowflake: implement argmin/argmax (8b998a5)
  • snowflake: implement ops.BitwiseAnd, ops.BitwiseNot, ops.BitwiseOr, ops.BitwiseXor (1acd4b7)
  • snowflake: implement ops.GroupConcat (2219866)
  • snowflake: implement remaining map functions (c48c9a6)
  • snowflake: support binary variance reduction with filters (eeabdee)
  • snowflake: support cross-database table access (79cb445)
  • sqlalchemy: generalize unnest to work on backends that don't support it (5943ce7)
  • sqlite: add sqlite type support (addd6a9)
  • sqlite: support in-memory tables (1b24848)
  • sql: support for creating temporary tables in sql based backends (466cf35)
  • tables: cast table using schema (96ce109)
  • tables: implement pivot_longer API (11c5736)
  • trino: enable MapLength operation (a7ad1db)
  • trino: implement ArrayFilter translation (50f6fcc)
  • trino: implement ops.ArrayMap (657bf61)
  • trino: implement ops.Between (d70b9c0)
  • trino: support sqlalchemy 2 (0d078c1)
  • ux: accept selectors in Table.drop (325140f)
  • ux: allow creating unbound tables using annotated class definitions (d7bf6a2)
  • ux: easy interactive setup (6850146)
  • ux: expose between, rows and range keyword arguments in value.over() (5763063)

Bug Fixes

  • analysis: extract Limit subqueries (62f6e14)
  • api: add a name attribute to backend proxy modules (d6d8e7e)
  • api: fix broken __radd__ array concat operation (121d9a0)
  • api: only include valid python identifiers in struct tab completion (8f33775)
  • api: only include valid python identifiers in table tab completion (031a48c)
  • backend: provide useful error if default backend is unavailable (1dbc682)
  • backends: fix capitalize implementations across all backends (d4f0275)
  • backends: fix null literal handling (7f46342)
  • bigquery: ensure that memtables are translated correctly (d6e56c5)
  • bigquery: fix decimal literals (4a04c9b)
  • bigquery: regenerate negative string index sql snapshots (3f02c73)
  • bigquery: regenerate sql for predicate pushdown fix (509806f)
  • cache: remove bogus schema argument and validate database argument type (c4254f6)
  • ci: fix invalid test id (f70de1d)
  • clickhouse: fix decimal literal (4dcd2cb)
  • clickhouse: fix set ops with table operands (86bcf32)
  • clickhouse: raise OperationNotDefinedError if operation is not supported (71e2570)
  • clickhouse: register in-memory tables in pyarrow-related calls (09a045c)
  • clickhouse: use a bool type supported by clickhouse_driver (ab8f064)
  • clickhouse: workaround sqlglot's insistence on uppercasing (6151f37)
  • compiler: generate aliases in a less clever way (04a4aa5)
  • datafusion: support sum aggregation on bool column (9421400)
  • deps: bump duckdb to 0.7.0 (38d2276)
  • deps: bump snowflake-connector-python upper bound (b368b04)
  • deps: ensure that pyspark depends on sqlalchemy (60c7382)
  • deps: update dependency pyarrow to v11 (2af5d8d)
  • deps: update dependency sqlglot to v11 (e581e2f)
  • don't expose backend methods on ibis.<backend> directly (5a16431)
  • druid: remove invalid operations (19f214c)
  • duckdb: add null to duckdb datatype parser (07d2a86)
  • duckdb: ensure that temp_directory exists (00ba6cb)
  • duckdb: explicitly set timezone to UTC on connection (6ae4a06)
  • duckdb: fix blob type in literal (f66e8a1)
  • duckdb: fix memtable to_pyarrow/to_pyarrow_batches (0e8b066)
  • duckdb: in-memory objects registered with duckdb show up in list_tables (7772f79)
  • duckdb: quote identifiers if necessary in struct_pack (6e598cc)
  • duckdb: support casting to unsigned integer types (066c158)
  • duckdb: treat g re_replace flag as literal text (aa3c31c)
  • duckdb: workaround an ownership bug at the interaction of duckdb, pandas and pyarrow (2819cff)
  • duckdb: workaround duckdb bug that prevents multiple substitutions (0e09220)
  • imports: remove top-level import of sqlalchemy from base backend (b13cf25)
  • io: add read_parquet and read_csv to base backend mixin (ce80d36), closes #5420
  • ir: incorrect predicate pushdown (9a9204f)
  • ir: make find_subqueries return in topological order (3587910)
  • ir: properly raise error if literal cannot be coerced to a datatype (e16b91f)
  • ir: reorder the right schema of set operations to align with the left schema (58e60ae)
  • ir: use rlz.map_to() rule instead of isin to normalize temporal units (a1c46a2)
  • ir: use static connection pooling to prevent dropping temporary state (6d2ae26)
  • mssql: set sqlglot to tsql (1044573)
  • mysql: remove invalid operations (8f34a2b)
  • pandas/dask: handle non numpy scalar results in wrap_case_result (a3b82f7)
  • pandas: don't try to dispatch on arrow dtype if not available (d22ae7b)
  • pandas: handle casting to arrays with None elements (382b90f)
  • pandas: handle NAs in array conversion (06bd15d)
  • polars: back compat for concat_str separator argument (ced5a61)
  • polars: back compat for the reverse/descending argument (f067d81)
  • polars: polars execute respect limit kwargs (d962faf)
  • polars: properly infer polars categorical dtype (5a4707a)
  • polars: use metric name in aggregate output to dedupe columns (234d8c1)
  • pyspark: fix incorrect ops.EndsWith translation rule (4c0a5a2)
  • pyspark: fix isnan and isinf to work on bool (8dc623a)
  • snowflake: allow loose casting of objects and arrays (1cf8df0)
  • snowflake: ensure that memtables are translated correctly (b361e07)
  • snowflake: ensure that null comparisons are correct (9b83699)
  • snowflake: ensure that quoting matches snowflake behavior, not sqlalchemy (b6b67f9)
  • snowflake: ensure that we do not try to use a None schema or database (03e0265)
  • snowflake: handle the case where pyarrow isn't installed (b624fa3)
  • snowflake: make array_agg preserve nulls (24b95bf)
  • snowflake: quote column names on construction of sa.Column (af4db5c)
  • snowflake: remove broken pyarrow fetch support (c440adb)
  • snowflake: return NULL when trying to call map functions on non-object JSON (d85fb28)
  • snowflake: use _flatten to avoid overriding unrelated function in other backends (8c31594)
  • sqlalchemy: ensure that isin contains full column expression (9018eb6)
  • sqlalchemy: get builtin dialects working; mysql/mssql/postgres/sqlite (d2356bc)
  • sqlalchemy: make strip family of functions behave like Python (dd0a04c)
  • sqlalchemy: reflect most recent schema when view is replaced (62c8dea)
  • sqlalchemy: use sa.true instead of Python literal (8423eba)
  • sqlalchemy: use indexed group by key references everywhere possible (9f1ddd8)
  • sql: ensure that set operations generate valid sql in the presence of additional constructs such as sort keys (3e2c364)
  • sqlite: explicite disallow array in literal (de73b37)
  • sqlite: fix random scalar range (26d0dde)
  • support negative string indices (f84a54d)
  • trino: workaround broken dialect (b502faf)
  • types: fix argument types of Table.order_by() (6ed3a97)
  • util: make convert_unit work with python types (cb3a90c)
  • ux: give the value_counts aggregate column a better name (abab1d7)
  • ux: make string range selectors inclusive (7071669)
  • ux: make top level set operations work (f5976b2)

Performance

  • duckdb: faster to_parquet/to_csv implementations (6071bb5)

  • fix duckdb insert-from-dataframe performance (cd27b99)

  • deps: bump minimum required version of parsy (22020cb)

  • remove spark alias to pyspark and associated cruft (4b286bd)

Refactors

  • analysis: slightly simplify find_subqueries() (ab3712f)
  • backend: normalize exceptions (065b66d)
  • clickhouse: clean up parsing rules (6731772)
  • common: move frozendict and DotDict to ibis.common.collections (4451375)
  • common: move the geospatial module to the base SQL backend (3e7bfa3)
  • dask: remove unneeded create_table() (86885a6)
  • datatype: clean up parsing rules (c15fb5f)
  • datatype: remove Category type and related APIs (bb0ee78)
  • datatype: remove StructType.pairs property in favor of identical fields attribute (6668122)
  • datatypes: move sqlalchemy datatypes to specfic backend (d7b49eb)
  • datatypes: remove String parent type from JSON type (34f3898)
  • datatype: use a dictionary to store StructType fields rather than names and types tuples (84455ac)
  • datatype: use lazy dispatch when inferring pandas Timedelta objects (e5280ea)
  • drop limit kwarg from to_parquet/to_csv (a54460c)
  • duckdb: clean up parsing rules (30da8f9)
  • duckdb: handle parsing timestamp scale (16c1443)
  • duckdb: remove unused list<...> parsing rule (f040b86)
  • duckdb: use a proper sqlalchemy construct for structs and reduce casting (8daa4a1)
  • ir/api: introduce window frame operation and revamp the window API (2bc5e5e)
  • ir/backends: remove various deprecated functions and methods (a8d3007)
  • ir: reorganize the scope and timecontext utilities (80bd494)
  • ir: update ArrayMap to use the new callable_with validation rule (560474e)
  • move pretty repr tests back to their own file (4a75988)
  • nix: clean up marker argument construction (12eb916)
  • postgres: clean up datatype parsing (1f61661)
  • postgres: clean up literal arrays (21b122d)
  • pyspark: remove another private function (c5081cf)
  • remove unnecessary top-level rich console (8083a6b)
  • rules: remove unused non_negative_integer and pair rules (e00920a)
  • schema: remove deprecated Schema.from_dict(), .delete() and .append() methods (8912b24)
  • snowflake: remove the need for parsy (c53403a)
  • sqlalchemy: set session parameters once per connection (ed4b476)
  • sqlalchemy: use backend-specific startswith/endswith implementations (6101de2)
  • test_sqlalchemy.py: move to snapshot testing (96998f0)
  • tests: reorganize rules test file to the ibis.expr subpackage (47f0909)
  • tests: reorganize schema test file to the ibis.expr subpackage (40033e1)
  • tests: reorganize datatype test files to the datatypes subpackage (16199c6)
  • trino: clean up datatype parsing (84c0e35)
  • ux: return expression from Table.info (71cc0e0)

Deprecations

  • api: deprecate summary API (e449c07)
  • api: mark ibis.sequence() for removal (3589f80)

Documentation

  • add a bunch of string expression examples (18d3112)
  • add Apache Druid to backend matrix (764d9c3)
  • add CNAME file to mkdocs source (6d19111)
  • add druid to the backends index docs page (ad0b6a3)
  • add missing DataFusion entry to the backends in the README (8ce025a)
  • add redirects for common old pages (c9087f2)
  • api: document deferred API and its pitfalls (8493604)
  • api: improve collect method API documentation (b4fcef1)
  • array expression examples (6812c17)
  • backends: document default backend configuration (6d917d3)
  • backends: link to configuration from the backends list (144044d)
  • blob: blog on ibis + substrait + duckdb (5dc7a0a)
  • blog: adds examples sneak peek blog + assets folder (fcbb3d5)
  • blog: adds to file sneak peek blog (128194f)
  • blog: specify parsy 2.0 in substrait blog article (c264477)
  • bump query engine count in README and use project-preferred names (11169f7)
  • don't sort backends by coverage percentage by default (68f73b1)
  • drop docs versioning (d7140e7)
  • duckdb: fix broken docstring examples (51084ad)
  • enable light/dark mode toggle in docs (b9e812a)
  • fill out table API with working examples (16fc8be)
  • fix notebook logging example (04b75ef)
  • how-to: fix sessionize.md to use ibis.read_parquet (ff9cbf7)
  • improve Expr.substitute() docstring (b954edd)
  • improve/update pandas walkthrough (80b05d8)
  • io: doc/ux improvements for read_parquet and friends (2541556), closes #5420
  • io: update README.md to recommend installing duckdb as default backend (0a72ec0), closes #5423 #5420
  • move tutorial from docs to external ibis-examples repo (11b0237)
  • parquet: add docstring examples for to_parquet incl. partitioning (8040164)
  • point to ibis-examples repo in the README (1205636)
  • README.md: clean up readme, fix typos, alter the example (383a3d3)
  • remove duplicate "or" (b6ef3cc)
  • remove duplicate spark backend in install docs (5954618)
  • render __dunder__ method API documentation (b532c63)
  • rerender ci-analysis notebook with new table header colors (50507b6)
  • streamlit: fix url for support matrix (594199b)
  • tutorial: remove impala from sql tutorial (7627c13)
  • use teal for primary & accent colors (24be961)

4.1.0

1 year ago

4.1.0 (2023-01-25)

Features

  • add ibis.get_backend function (2d27df8)
  • add py.typed to allow mypy to type check packages that use ibis (765d42e)
  • api: add ibis.set_backend function (e7fabaf)
  • api: add selectors for easier selection of columns (306bc88)
  • bigquery: add JS UDF support (e74328b)
  • bigquery: add SQL UDF support (db24173)
  • bigquery: add to_pyarrow method (30157c5)
  • bigquery: implement bitwise operations (55b69b1)
  • bigquery: implement ops.Typeof (b219919)
  • bigquery: implement ops.ZeroIfNull (f4c5607)
  • bigquery: implement struct literal (c5f2a1d)
  • clickhouse: properly support native boolean types (31cc7ba)
  • common: add support for annotating with coercible types (ae4a415)
  • common: make frozendict truly immutable (1c25213)
  • common: support annotations with typing.Literal (6f89f0b)
  • common: support generic mapping and sequence type annotations (ddc6603)
  • dask: support connect() with no arguments (67eed42)
  • datatype: add optional timestamp scale parameter (a38115a)
  • datatypes: add as_struct method to convert schemas to structs (64be7b1)
  • duckdb: add read_json function for consuming newline-delimited JSON files (65e65c1)
  • mssql: add a bunch of missing types (c698d35)
  • mssql: implement inference for DATETIME2 and DATETIMEOFFSET (aa9f151)
  • nicer repr for Backend.tables (0d319ca)
  • pandas: support connect() with no arguments (78cbbdd)
  • polars: allow ibis.polars.connect() to function without any arguments (d653a07)
  • polars: handle casting to scaled timestamps (099d1ec)
  • postgres: add Map(string, string) support via the built-in HSTORE extension (f968f8f)
  • pyarrow: support conversion to pyarrow map and struct types (54a4557)
  • snowflake: add more array operations (8d8bb70)
  • snowflake: add more map operations (7ae6e25)
  • snowflake: any/all/notany/notall reductions (ba1af5e)
  • snowflake: bitwise reductions (5aba997)
  • snowflake: date from ymd (035f856)
  • snowflake: fix array slicing (bd7af2a)
  • snowflake: implement ArrayCollect (c425f68)
  • snowflake: implement NthValue (0dca57c)
  • snowflake: implement ops.Arbitrary (45f4f05)
  • snowflake: implement ops.StructColumn (41698ed)
  • snowflake: implement StringSplit (e6acc09)
  • snowflake: implement StructField and struct literals (286a5c3)
  • snowflake: implement TimestampFromUNIX (314637d)
  • snowflake: implement TimestampFromYMDHMS (1eba8be)
  • snowflake: implement typeof operation (029499c)
  • snowflake: implement exists/not exists (7c8363b)
  • snowflake: implement extract millisecond (3292e91)
  • snowflake: make literal maps and params work (dd759d3)
  • snowflake: regex extract, search and replace (9c82179)
  • snowflake: string to timestamp (095ded6)
  • sqlite: implement _get_schema_using_query in SQLite backend (7ff84c8)
  • trino: compile timestamp types with scale (67683d3)
  • trino: enable ops.ExistsSubquery and ops.NotExistsSubquery (9b9b315)
  • trino: map parameters (53bd910)
  • ux: improve error message when column is not found (b527506)

Bug Fixes

  • backend: read the default backend setting in _default_backend (11252af)
  • bigquery: move connection logic to do_connect (42f2106)
  • bigquery: remove invalid operations from registry (911a080)
  • bigquery: resolve deprecation warnings for StructType and Schema (c9e7078)
  • clickhouse: fix position call (702de5d)
  • correctly visualize array type (26b0b3f)
  • deps: make sure pyarrow is not an implicit dependency (10373f4)
  • duckdb: make read_csv on URLs work (9e61816)
  • duckdb: only try to load extensions when necessary for csv (c77bde7)
  • duckdb: remove invalid operations from registry (ba2ec59)
  • fallback to default backend with to_pyarrow/to_pyarrow_batches (a1a6902)
  • impala: remove broken alias elision (32b120f)
  • ir: error for order_by on nonexistent column (57b1dd8)
  • ir: ops.Where output shape should consider all arguments (6f87064)
  • mssql: infer bit as boolean everywhere (24f9d7c)
  • mssql: pull nullability from column information (490f8b4)
  • mysql: fix mysql query schema inference (12f6438)
  • polars: remove non-working Binary and Decimal literal inference (0482d15)
  • postgres: use permanent views to avoid connection pool defeat (49a4991)
  • pyspark: fix substring constant translation (40d2072)
  • set ops: raise if no tables passed to set operations (bf4bdde)
  • snowflake: bring back bitwise operations (260facd)
  • snowflake: don't always insert a cast (ee8817b)
  • snowflake: implement working TimestampNow (42d95b0)
  • snowflake: make sqlalchemy 2.0 compatible (8071255)
  • snowflake: re-enable ops.TableArrayView (a1ad2b7)
  • snowflake: remove invalid operations from registry (2831559)
  • sql: add typeof test and bring back implementations (7dc5356)
  • sqlalchemy: 2.0 compatibility (837a736)
  • sqlalchemy: fix view creation with select stmts that have bind parameters (d760e69)
  • sqlalchemy: handle correlated exists sanely (efa42bd)
  • sqlalchemy: handle generic geography/geometry by name instead of geotype (23c35e1)
  • sqlalchemy: use exec_driver_sql in view teardown (2599c9b)
  • sqlalchemy: use the backend's compiler instead of AlchemyCompiler (9f4ff54)
  • sql: fix broken call to ibis.map (045edc7)
  • sqlite: interpolate pathlib.Path correctly in attach (0415bd3)
  • trino: ensure connecting works with trino 0.321 (07cee38)
  • trino: remove invalid operations from registry (665265c)
  • ux: remove extra trailing newline in expression repr (ee6d58a)

Documentation

  • add BigQuery backend docs (09d8995)
  • add streamlit app for showing the backend operation matrix (3228f64)
  • allow deselecting geospatial ops in backend support matrix (012da8c)
  • api: document more public expression APIs (337018f)
  • backend-info: prevent app from trying install duckdb extensions (3d94082)
  • clean up gen_matrix.py after adding streamlit app (deb80f2)
  • duckdb: add to_pyarrow_batches documentation (ec1ffce)
  • embed streamlit operation matrix app to docs (469a50d)
  • make firefox render the proper iframe height (ff1d4dc)
  • publish raw data for operation matrix (62e68da)
  • re-order when to download test data (8ce8c16)
  • release: update breaking changes in the release notes for 4.0.0 (4e91401)
  • remove trailing parenthesis (4294397)
  • update ibis-version-4.0.0-release.md (f6701df)
  • update links to contributing guides (da615e4)

Refactors

  • bigquery: explicite disallow INT64 in JS UDF (fb33bf9)
  • datatype: add custom sqlalchemy nested types for backend differentiation (dec70f5)
  • datatype: introduce to_sqla_type dispatching on dialect (a8bbc00)
  • datatypes: remove Geography and Geometry types in favor of GeoSpatial (d44978c)
  • datatype: use a mapping to store StructType fields rather than names and types tuples (ff34c7b)
  • dtypes: expose nbytes property for integer and floating point datatypes (ccf80fd)
  • duckdb: remove .raw_sql call (abc939e)
  • duckdb: use sqlalchemy-views to reduce string hacking (c162750)
  • ir: remove UnnamedMarker (dd352b1)
  • postgres: use a bindparam for metadata queries (b6b4669)
  • remove empty unused file (9d63fd6)
  • schema: use a mapping to store Schema fields rather than names and types tuples (318179a)
  • simplify _find_backend implementation (60f1a1b)
  • snowflake: remove unnecessary parse_json call in ops.StructField impl (9e80231)
  • snowflake: remove unnecessary casting (271554c)
  • snowflake: use unary instead of fixed_arity(..., 1) (4a1c7c9)
  • sqlalchemy: clean up quoting implementation (506ce01)
  • sqlalchemy: generalize handling of failed type inference (b0f4e4c)
  • sqlalchemy: move _get_schema_using_query to base class (296cd7d)
  • sqlalchemy: remove the need for deferred columns (e4011aa)
  • sqlalchemy: remove use of deprecated isnot (4ec53a4)
  • sqlalchemy: use exec_driver_sql everywhere (e8f96b6)
  • sql: finally remove _CorrelatedRefCheck (f49e429)

Deprecations

  • api: deprecate .to_projection in favor of .as_table (7706a86)
  • api: deprecate get_column/s in favor of __getitem__/__getattr__ syntax (e6372e2)
  • ir: schedule DatabaseTable.change_name for removal (e4bae26)
  • schema: schedule Schema.delete() and Schema.append() for removal (45ac9a9)