pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
wr.cloudwatch
queries by @LeonLuttenberger in #2430athena.to_iceberg
wait_query by @jaidisido in #2428athena.to_iceberg
by @jaidisido in #2446wr.s3.to_parquet
by @kukushking in #2455Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.3.0...3.4.0
cleanrooms.wait_query
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2381
Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.2.1...3.3.0
No module named 'pyarrow._orc'
by @LeonLuttenberger in #2341 #2337packaging
version requirement by @LeonLuttenberger in #2340Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.2.0...3.2.1
s3.read_orc
and s3.to_orc
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2312 ๐ฅwr.athena.create_spark_session
& wr.athena.run_spark_calculation
by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2314 ๐to_sql
for RDS Data API by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2287
UNLOAD
by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2284
allowed_to_use
and allowed_to_manage
when creating QuickSight resources by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2278
PARTITIONED BY
and additional table properties support by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2322
s3.read_parquet
by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2328
test_spectrum_decimal_cast
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2283
dtype_backend
use in read_parquet_table
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2307
register_func
to handle type checking by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2309
Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.1.1...3.2.0
packaging
dependency by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2281
Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.1.0...3.1.1
neptune.bulk_load
for bulk loading data into Neptune by @LeonLuttenberger in #2238 #2267s3.to_deltalake
function by @LeonLuttenberger in #2228chunked
parameter to DynamoDB read functions by @LeonLuttenberger in #2227ignore_metadata
to False
by default by @jaidisido in #2206path_ignore_suffix
by @LeonLuttenberger in #2240test_spectrum_decimal_cast
test by @LeonLuttenberger in #2244emr.create_cluster
was not passing security configuration to internal method by @malachi-constant in #2246timestream.list_tables
by @SukruHan #2275layers.rst
with Python 3.10 layers by @LeonLuttenberger in #2219pyi
files by @LeonLuttenberger in #2229 #2255 #2256Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.0.0...3.1.0
pip install awswrangler[<MODULE_NAME>]
, for example pip install awswrangler[redshift]
dt.datetime
is parsed into DATETIME xxxx-xx-xx xx:xx:xx
, while a parameter of type str
is formatted into "x"
TypeDict
by @LeonLuttenberger and @kukushking in #1855 #1996 #2016 #2055 #2081 ๐ผ
to_parquet
, to_csv
and to_json
wr.s3.merge_upsert_table
by @kukushking in #2076 โ ๏ธupdated_name
parameter in update_ruleset
by @jaidisido in #2122 โ ๏ธAWS SDK for pandas can now run at scale ๐๐ป๐
use_theads
parameter to dynamodb.read_items
by @LeonLuttenberger in #2113 ๐wr.dynamodb.put_df
with executor task by @LeonLuttenberger in #2118 ๐DatabaseInput
by @malachi-constant in #2067 ๐งtimestream.create_table
by @cnfait in #1819_read_parquet_metadata_file
function based on the PyArrow file system by @LeonLuttenberger in #2050@Experimental
and @Deprecated
annotations by @kukushking in #2062describe_objects
by @jaidisido in #2069bulk_read
option for reading large amounts of Parquet files quickly by @LeonLuttenberger in #2033s3.to_json
and s3.to_csv
by @LeonLuttenberger in #1631s3.read_csv
, s3.read_json
and s3.read_fwf
by @LeonLuttenberger in #1567 #1607s3.wait_objects
by @LeonLuttenberger in #1539s3.to_parquet
by @kukushking in #1526s3.delete objects
by @malachi-constant in #1474s3.read_parquet
by @jaidisido in #1513s3.select_query
by @kukushking in #1446Literal
typing for mode
and projection_types
by @LeonLuttenberger in #2191read_parquet_metadata_distributed
by @jaidisido in #2196utcnow
argument in start_query
by @LeonLuttenberger in #2193awswrangler.distributed
from coverage report by @LeonLuttenberger in #1884Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/2.20.1...3.0.0
chunksize=True
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2087
to_csv
and to_json
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2104
Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/2.20.0...2.20.1
names
parameter support to PyArrow reading by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2008
_read_parquet_metadata_file
function based on the PyArrow file system by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2050
bulk_read
option for reading large amounts of Parquet files quickly by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2033
parallelism
and bulk_read
into ray_modin_args
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2081
awswrangler.distributed
from coverage report by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1884
test_modin_s3_read_parquet_many_files
by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2096
Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/3.0.0rc2...3.0.0rc3
dynamodb.read_partiql
no longer performs a Scan operation under the hood. Instead the ExecuteStatement
API is used. It means that the PartiQL*
IAM permission is required instead of Scan
ExecuteStatement
instead of Scan for DynamoDB read_partiql
by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1964
xfail
's in tests by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1930
Full Changelog: https://github.com/aws/aws-sdk-pandas/compare/2.19.0...2.20