Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to read delta table created using Uniform #2578

Closed
jeppe742 opened this issue Jun 7, 2024 · 4 comments · Fixed by #2685
Closed

Unable to read delta table created using Uniform #2578

jeppe742 opened this issue Jun 7, 2024 · 4 comments · Fixed by #2685
Labels
bug Something isn't working

Comments

@jeppe742
Copy link
Contributor

jeppe742 commented Jun 7, 2024

Environment

Delta-rs version:
0.17.4

Binding:
Python

Environment:

  • Cloud provider: N/A
  • OS: Ubuntu
  • Other:

Bug

What happened:
We are investigating using Delta Uniform to have our Spark jobs also write Iceberg metadata.
In order to enable the generation of Iceberg metadata you have to set the delta.enableIcebergCompatV2 property on the table.
When you set this, the Delta transaction log will include some more information.

E.g if you run the Example from the Uniform documentation

CREATE TABLE uniform_table(c1 INT) USING DELTA TBLPROPERTIES(
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');

You will get a delta transaction that looks something like this

{"commitInfo":{"timestamp":1717753754287,"operation":"CREATE TABLE","operationParameters":{"isManaged":"true","description":null,"partitionBy":"[]","properties":"{\"delta.enableIcebergCompatV2\":\"true\",\"delta.universalFormat.enabledFormats\":\"iceberg\",\"delta.columnMapping.mode\":\"name\",\"delta.columnMapping.maxColumnId\":\"1\"}"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{},"engineInfo":"Apache-Spark/3.5.1 Delta-Lake/3.1.0","txnId":"a4d4593f-835c-4d00-81d8-27c1103343d2"}}
{"metaData":{"id":"a8477f73-f004-4a08-8397-3420d4df98a2","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"c1\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"delta.columnMapping.id\":1,\"delta.columnMapping.nested.ids\":{},\"delta.columnMapping.physicalName\":\"col-fdc375c2-e5f2-44c5-a5e9-2cdafca1ddfd\"}}]}","partitionColumns":[],"configuration":{"delta.enableIcebergCompatV2":"true","delta.universalFormat.enabledFormats":"iceberg","delta.columnMapping.mode":"name","delta.columnMapping.maxColumnId":"1"},"createdTime":1717753754108}}
{"protocol":{"minReaderVersion":2,"minWriterVersion":7,"writerFeatures":["columnMapping","icebergCompatV2"]}}

If you try to read this table you get the following error

_internal.DeltaProtocolError: Invalid JSON in file stats: data did not match any variant of untagged enum MetadataValue at line 1 column 147

Seems like what is causing this is that Delta adds "delta.columnMapping.nested.ids":{} to the metaData config, but the delta kernel doesn't support nested structs in meta data

What you expected to happen:
I should be able to read a delta table written with uniform enabled

How to reproduce it:

  1. create table with Uniform (and Iceberg) enabled
CREATE TABLE uniform_table(c1 INT) USING DELTA TBLPROPERTIES(
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');
  1. Try to read table
from deltalake import DeltaTable
DeltaTable("uniform_table")

More details:

@jeppe742 jeppe742 added the bug Something isn't working label Jun 7, 2024
@ion-elgreco
Copy link
Collaborator

Can you try it against at 0.18.0, if it still persist, then it deserves an upstream issue at delta-kernel-rs repo

@jeppe742
Copy link
Contributor Author

jeppe742 commented Jun 7, 2024

@ion-elgreco It's also an issue with 0.18.0.
Will try to create an issue in the delta-kernel-rs repo 😃

@jeppe742
Copy link
Contributor Author

Hey @ion-elgreco
Just fyi, the bug in delta-kernel-rs has finally been fixed and released in 0.2.0.
Would it be possible to bump the dependency to get the fix?

@ion-elgreco
Copy link
Collaborator

Hey @ion-elgreco
Just fyi, the bug in delta-kernel-rs has finally been fixed and released in 0.2.0.
Would it be possible to bump the dependency to get the fix?

Feel free to open a PR to bump it, then I'll approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants