Releases · aws/aws-sdk-pandas

20 Sep 23:11

LeonLuttenberger

2.17.0

3bcd8d3

AWS SDK for pandas 2.17.0

New Functionalities

RedshiftDataAPI serverless support 🔥 #1530
- Check out the tutorial
Add get_query_results to the Athena module #1496
- Check out the function documentation
Add generate_create_query to the Athena module #1514
- Check out the function documentation

Enhancements

Returning empty DataFrame for empty TimeStream query #1430
Added support for INSERT IGNORE for mysql.to_sql #1429
Added use_column_names to redshift.copy akin to redshift.to_sql #1437
Enable passing kwargs to redshift.connect #1467
Add timestream_endpoint_url property to the config #1483
Add support for upserting to an empty Glue table #1579

Documentation

Fix typos in documentation #1434

Bug Fix

validate_schema=True for wr.s3.read_parquet breaks with partition columns and dataset=True #1426
wr.neptune.to_property_graph failing for Neptune version 1.1.1.0 #1407
ValueError when using opensearch.index_df with documents with an array field #1444
Missing catalog_id in wr.catalog.create_database #1480
Check for pair of brackets in query preparation for Athena cache #1529
Fix wrong type hint for TagColumnOperation in quicksight.create_athena_dataset #1570
s3.to_json compression parameters is passed twice when dataset=True #1585
Cast Athena array, map & struct types to pandas object #1581
In the OpenSearch module, use SSL only for HTTPS (port 443) #1603

Noteworthy

AWS Lambda Managed Layers

Since the last release, the library has been accepted as an official SDK for AWS, and rebranded as AWS SDK for pandas 🚀. The module names in Python will remain the same. One noteworthy change, however, is that the AWS Lambda Manager layer name has been renamed from AWSDataWrangler to AWSSDKPandas.

You can view the ARN value for the layers here.

PyArrow 7 Support

⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):

pip install pyarrow==2 awswrangler

Thanks

We thank the following contributors/users for their work on this release:

@bechbd, @maxispeicher, @timgates42, @aeeladawy, @KhueNgocDang, @szemek, @malachi-constant, @cnfait, @jaidisido, @LeonLuttenberger, @kukushking

Contributors

szemek, kukushking, and 9 other contributors

Assets 8

17 Aug 10:35

jaidisido

3.0.0a2

b471c5c

3.0.0a2 Pre-release

Pre-release

This is a pre-release for the Wrangler@Scale project

What's Changed

(feat): Add directory for Distributed Wrangler Load Tests by @malachi-constant in #1464
(CI): Distribute tests in tox config by @malachi-constant in #1469
(feat): Distribute s3 delete objects by @malachi-constant in #1474
(CI): Enable new CI pipeline for standard & distributed tests by @malachi-constant in #1481
(feat): Refactor to distribute s3.read_parquet by @jaidisido in #1513
(bug): s3 delete tests failing in distributed codebase by @malachi-constant in #1517

Full Changelog: 3.0.0a1...3.0.0a2

Contributors

malachi-constant and jaidisido

Assets 2

17 Aug 10:06

jaidisido

3.0.0a1

b4d13bf

3.0.0a1 Pre-release

Pre-release

This is a pre-release for the Wrangler@Scale project

What's Changed

(feat): Add distributed config flag and initialise method by @jaidisido in #1389
(feat): Add distributed Lake Formation read by @jaidisido in #1397
(feat): Distribute S3 select over multiple paths and scan ranges by @jaidisido in #1445
(refactor): Refactor threading/ray; add single-path distributed s3 select impl by @kukushking in #1446

Full Changelog: 2.16.1...3.0.0a1

Contributors

kukushking and jaidisido

Assets 2

28 Jun 16:39

malachi-constant

2.16.1

b97e086

2.16.1

Noteworthy

🐛 Fixed issue introduced by 2.16.0 to method s3.read_parquet()

Patch

Fix bug: pq_file.schema.names(): TypeError: 'list' object is not callable s3.read_parquet() #1412

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Full Changelog: 2.16.0...2.16.1

Assets 8

22 Jun 18:21

cnfait

2.16.0

dc555a9

AWS Data Wrangler 2.16.0

Noteworthy

⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Add support for Oracle Database 🔥 #1259 Check out the tutorial.

Enhancements

add test infrastructure for oracle database #1274
revisiting S3 Select performance #1287
migrate test infra from cdk v1 to cdk v2 #1288
to_sql() make column names quoted identifiers to allow sql keywords #1392
throw NoFilesFound exception on 404 #1290
fast executemany #1299
add precombine key to upsert method for Redshift #1304
pass precombine to redshift.copy() #1319
use DataFrame column names in INSERT statement for UPSERT operation #1317
add data_source param to athena.repair_table #1324
modify athena2quicksight datatypes to allow startswith for varchar #1332
add TagColumnOperation to quicksight.create_athena_dataset #1342
enable list timestream databases and tables #1345
enable s3.to_parquet to receive "zstd" compression type #1369
create a way to perform PartiQL queries to a Dynamo DB table #1390
s3 proxy support with data wrangler #1361

Documentation

be more explicit about awswrangler.s3.to_parquet overwrite behavior #1300
fix Python Version in Readme #1302

Bug Fix

set encoding to utf-8 when no encoding is specified when reading/writing to s3 #1257
fix Redshift Locking Behavior #1305
specify cfn deletion policy for sqlserver and oracle instances #1378
to_sql() make column names quoted identifiers to allow sql keywords #1392
fix extension dtype index handling #1333
fix issue with redshift.to_sql() method when mode set to "upsert" and schema contains a hyphen #1360
timestream - array cols to str #1368
read_parquet Does Not Throw Error for Missing Column #1370

Thanks

We thank the following contributors/users for their work on this release:

@bnimam, @IldarAlmakaev, @syokoysn, @IldarAlmakaev, @thomasniebler, @maxdavidson91, @takeknock, @Sleekbobby1011, @snikolakis, @willsmith28, @malachi-constant, @cnfait, @jaidisido, @kukushking

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

takeknock, kukushking, and 11 other contributors

Assets 8

11 Apr 15:35

kukushking

2.15.1

7708c80

AWS Data Wrangler 2.15.1

Noteworthy

⚠️ Dropped Python 3.6 support

⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Patch

Add sparql extra & make SPARQLWrapper dependency optional #1252

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 8

28 Mar 14:36

kukushking

2.15.0

4f46c4c

AWS Data Wrangler 2.15.0

Noteworthy

⚠️ Dropped Python 3.6 support

⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Amazon Neptune module 🚀 #1084 Check out the tutorial. Thanks to @bechbd & @sakti-mishra !
ARM64 Support for Python 3.8 and 3.9 layers 🔥 #1129 Many thanks @cnfait !

Enhancements

Timestream module - support multi-measure records #1214
Warnings for implicit float conversion of nulls in to_parquet #1221
Support additional sql params in Redshift COPY operation #1210
Add create_ctas_table to Athena module #1207
S3 Proxy support #1206
Add Athena get_named_query_statement #1183
Add manifest parameter to 'redshift.copy_from_files' method #1164

Documentation

Update install section #1242
Update lambda layers section #1236

Bug Fix

Give precedence to user path for Athena UNLOAD S3 Output Location #1216
Honor User specified workgroup in athena.read_sql_query with unload_approach=True #1178
Support map type in Redshift copy #1185
data_api.rds.read_sql_query() does not preserve data type when column is all NULLS - switches to Boolean #1158
Allow decimal values within struct when writing to parquet #1179

Thanks

We thank the following contributors/users for their work on this release:

@bechbd, @sakti-mishra, @mateogianolio, @jasadams, @malachi-constant, @cnfait, @jaidisido, @kukushking

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

mateogianolio, jasadams, and 6 other contributors

Assets 8

28 Jan 14:24

jaidisido

2.14.0

7604507

AWS Data Wrangler 2.14.0

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Support Athena Unload 🚀 #1038

Enhancements

Add the ExcludeColumnSchema=True argument to the glue.get_partitions call to reduce response size #1094
Add PyArrow flavor argument to write_parquet via pyarrow_additional_kwargs #1057
Add rename_duplicate_columns and handle_duplicate_columns flag to sanitize_dataframe_columns_names method #1124
Add timestamp_as_object argument to all database read_sql_table methods #1130
Add ignore_null to read_parquet_metadata method #1125

Documentation

Improve documentation on installing SAR Lambda layers with the CDK #1097
Fix broken link to tutorial in to_parquet method #1058

Bug Fix

Ensure that partition locations retrieved from AWS Glue always end in a "/" #1094
Fix bucketing overflow issue in Athena #1086

Thanks

We thank the following contributors/users for their work on this release:

@dennyau, @kailukowiak, @lucasmo, @moykeen, @RigoIce, @vlieven, @kepler, @mdavis-xyz, @ConstantinoSchillebeeckx, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

kepler, kukushking, and 8 other contributors

Assets 7

03 Dec 20:09

kukushking

2.13.0

0821d3d

AWS Data Wrangler 2.13.0

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Breaking changes

Fix sanitize methods to align with Glue/Hive naming conventions #579

New Functionalities

AWS Lake Formation Governed Tables 🚀 #570
Support for Python 3.10 🔥 #973
Add partitioning to JSON datasets #962
Add ability to use unbuffered cursor for large MySQL datasets #928

Enhancements

Add awswrangler.s3.list_buckets #997
Add partitions_parameters to catalog partitions methods #1035
Refactor pagination config in list objects #955
Add error message to EmptyDataframe exception #991

Documentation

Clarify docs & add tutorial on schema evolution for CSV datasets #964

Bug Fix

catalog.add_column() without column_comment triggers exception #1017
catalog.create_parquet_table Key in dictionary does not always exist #998
Fix Catalog StorageDescriptor get #969

Thanks

We thank the following contributors/users for their work on this release:

@csabz09, @Falydoor, @moritzkoerber, @maxispeicher, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

Falydoor, kukushking, and 4 other contributors

Assets 7

18 Oct 12:02

jaidisido

2.12.1

829c306

AWS Data Wrangler 2.12.1

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Patch

Removing unnecessary dev dependencies from main #961

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

Releases: aws/aws-sdk-pandas

AWS SDK for pandas 2.17.0

New Functionalities

Enhancements

Documentation

Bug Fix

Noteworthy

AWS Lambda Managed Layers

PyArrow 7 Support

Thanks

Contributors

3.0.0a2

What's Changed

Contributors

3.0.0a1

What's Changed

Contributors

2.16.1

Noteworthy

Patch

AWS Data Wrangler 2.16.0

Noteworthy

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

AWS Data Wrangler 2.15.1

Noteworthy

Patch

AWS Data Wrangler 2.15.0

Noteworthy

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

AWS Data Wrangler 2.14.0

Caveats

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

AWS Data Wrangler 2.13.0

Caveats

Breaking changes

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

AWS Data Wrangler 2.12.1

Caveats

Patch