Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobEvent with outputs populated fails to write with nullPointerException #2925

Open
seanmullane opened this issue Oct 15, 2024 · 1 comment

Comments

@seanmullane
Copy link

Emitting a JobEvent with input and/or output datasets causes a HTTP500 error in the API, which results from a nullPointerException in Marquez.

Fixing this is important to allow static lineage graphs to be able to be generated without being associated with active runs. This is useful in cases where an integration is not yet available to consume pipeline runs for a given system or where a pipeline is not yet fleshed out but we want to enter the job in Marquez to see how it would relate to other jobs.

The attached code includes a purely json version generated the OpenLineage client which can prompt the bug in Marquez. I also included the python code the json derives from and the Marquez error log.

Environment:

Marquez 0.49.0 running via docker-compose per the Marquez example with --seed
openlineage-python 1.22.0
python 3.11.9

nullPointerException.txt
reproduce_bug.zip

More detail on this from phix on Slack:

It looks like we’re not processing the “outputFacets” on the IO fields without a runId provided. The event should save if you drop that field that’s the empty object for now… We should take a look at the OL spec for this

Copy link

boring-cyborg bot commented Oct 15, 2024

Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant