Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add Batch Ingestion Endpoint for OpenLineage Events #2918

Open
algorithmy1 opened this issue Oct 9, 2024 · 0 comments
Open

Comments

@algorithmy1
Copy link
Contributor

algorithmy1 commented Oct 9, 2024

Currently, the Marquez API for OpenLineage events (/api/v1/lineage) accepts one event per request, as seen in OpenLineageResource.java#L67. While this is suitable for real-time ingestion, it becomes inefficient when we need to ingest multiple events simultaneously.

Use Case:

  • Database Migration or Restoration: When changing the database or restoring from backups, we may need to re-ingest a large number of events to rebuild the lineage graph.
  • Bulk Event Replay: In scenarios like system recovery or batch processing, ingesting events one by one is not practical.
  • Performance Optimization: Reducing the number of HTTP requests can significantly improve ingestion performance.

Proposal:

  • New Endpoint: Introduce a batch ingestion endpoint (e.g., /api/v1/lineage/batch) that accepts an array of OpenLineage events.
  • Batch Processing: Update the OpenLineageResource class to handle a list of events in a single request.
  • Response Format: Provide a response that indicates the success or failure of each event within the batch.

(Or even update the current one /api/v1/lineage to accept both options)

Benefits:

  • Efficiency: Streamlines the ingestion process for multiple events.
  • Scalability: Enhances Marquez's ability to handle large-scale data operations.
  • User Convenience: Simplifies workflows that require bulk event ingestion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant