Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge predicate fails with a field with a space #2167

Closed
echai58 opened this issue Feb 5, 2024 · 6 comments · Fixed by #2271
Closed

Merge predicate fails with a field with a space #2167

echai58 opened this issue Feb 5, 2024 · 6 comments · Fixed by #2271
Labels
binding/python Issues for the Python package bug Something isn't working

Comments

@echai58
Copy link

echai58 commented Feb 5, 2024

Environment

Delta-rs version: 0.15.1

Binding: python


Bug

What happened:
If there is a field with a space in table schema, when you try to merge on that field as part of the merge predicate, it will raise an error, seems like the space does not get parsed correctly in the merge predicate.

What you expected to happen:
Should be able to write a predicate on this column.

How to reproduce it:

from deltalake import DeltaTable, write_deltalake
import pyarrow as pa
import pandas as pd

data = pd.DataFrame.from_dict(
    {
        "a a": [],
        "b": [],
        "c": [],
    }
)
schema = pa.schema(
    [
        ("a a", pa.string()),
        ("b", pa.int32()),
        ("c", pa.int32()),
    ]
)

table = pa.Table.from_pandas(data, schema=schema)


write_deltalake(
    "test",
    table,
    mode="overwrite",
)

dt = DeltaTable("test")

new_data = pd.DataFrame.from_dict(
    {
        "a a": ["abc", "def"],
        "b": [2, 3],
        "c": [4, 5],
    }
)
new_table = pa.Table.from_pandas(new_data, schema=schema)

dt = DeltaTable("test")
dt.merge(
    source=new_table,
    predicate='s."a a" = t."a a"',
    source_alias="s",
    target_alias="t",

).when_matched_update_all().when_not_matched_insert_all().execute()
DeltaError: Generic DeltaTable error: Schema error: No field named s.a. Valid fields are s."a a", s.b, s.c, __delta_rs_source, t."a a", t.b, t.c, t.__delta_rs_path, __delta_rs_target.

It looks like it's trying to parse s."a a" as s.a instead of s."a a".

@echai58 echai58 added the bug Something isn't working label Feb 5, 2024
@Blajda
Copy link
Collaborator

Blajda commented Feb 6, 2024

@echai58 Please try using the back tick in this case. EG

target.`a a` = source.`a a`

@echai58
Copy link
Author

echai58 commented Feb 6, 2024

@echai58 Please try using the back tick in this case. EG

target.`a a` = source.`a a`

I am changing just the predicate line in dt.merge to predicate="s.`a a` = t.`a a`" and I am getting the same error.

@rtyler rtyler added the binding/python Issues for the Python package label Feb 6, 2024
@Blajda
Copy link
Collaborator

Blajda commented Feb 6, 2024

Oh got. It's an issue with the *_all implementation. As a work around you can explicitly define the columns to be inserted / updated.

@echai58
Copy link
Author

echai58 commented Feb 6, 2024

Oh got. It's an issue with the *_all implementation. As a work around you can explicitly define the columns to be inserted / updated.

Not sure I understand, how do I implement this workaround in my example?

@Blajda
Copy link
Collaborator

Blajda commented Feb 6, 2024

Something like this. The documentation goes into more details

dt.merge(
    source=new_table,
    predicate='s."a a" = t."a a"',
    source_alias="s",
    target_alias="t",

).when_matched_update(updates={"`a a`": "s.`a a`", "b": "s.b", "c": "s.c"})
.when_not_matched_insert(
        updates={"`a a`": "s.`a a`", "b": "s.b", "c": "s.c"}
).execute()

@ion-elgreco
Copy link
Collaborator

@Blajda I'll make a small update to show your example in the docs : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants