Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: cannot query lineage if job namespace contains colon character #2806

Open
mgorsk1 opened this issue Apr 30, 2024 · 2 comments
Open

bug: cannot query lineage if job namespace contains colon character #2806

mgorsk1 opened this issue Apr 30, 2024 · 2 comments
Assignees
Labels
bug Something isn't working web
Milestone

Comments

@mgorsk1
Copy link

mgorsk1 commented Apr 30, 2024

if job namespace is:

{"job": {"namespace": "trino://trino-integration-test:1337" }}

then querying for lineage registered under this namespace results in error:

192.168.64.4 - - [30/Apr/2024:09:31:21 +0000] "GET /api/v1/namespaces/trino%3A%2F%2Ftrino-integration-test%3A1337/jobs?limit=20&offset=0 HTTP/1.1" 200 21280 "http://localhost:32914/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36" 472
ERROR [2024-04-30 09:31:23,466] io.dropwizard.jersey.errors.IllegalStateExceptionMapper: Error handling a request: acc3ebe74db78369
! java.lang.IllegalStateException: No match available
! at java.base/java.util.regex.Matcher.start(Matcher.java:450)
! at marquez.service.models.NodeId.parts(NodeId.java:233)
! at marquez.service.models.NodeId.asJobId(NodeId.java:251)
! at marquez.api.BaseResource.throwIfNotExists(BaseResource.java:165)
! at marquez.api.OpenLineageResource.getLineage(OpenLineageResource.java:118)
! at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
! at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
! at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at java.base/java.lang.reflect.Method.invoke(Method.java:568)
! at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
! at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
! at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
! at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
! at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)
! at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
! at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
! at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
! at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
! at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
! at io.dropwizard.servlets.ThreadNameFilter.doFilter(ThreadNameFilter.java:35)
! at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
! at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.handle(AllowedMethodsFilter.java:47)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.doFilter(AllowedMethodsFilter.java:41)
! at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
! at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
! at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
! at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
! at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
! at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
! at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
! at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
! at com.codahale.metrics.jetty9.InstrumentedHandler.handle(InstrumentedHandler.java:322)
! at io.dropwizard.jetty.RoutingHandler.handle(RoutingHandler.java:52)
! at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
! at io.dropwizard.jetty.ZipExceptionHandlingGzipHandler.handle(ZipExceptionHandlingGzipHandler.java:26)
! at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:54)
! at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
! at org.eclipse.jetty.server.Server.handle(Server.java:516)
! at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
! at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
! at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
! at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
! at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
! at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
! at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
! at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
! at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
! at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
! at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
! at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
! at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
! at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
! at java.base/java.lang.Thread.run(Thread.java:840)
ERROR [2024-04-30 09:31:23,480] marquez.logging.LoggingMdcFilter: status: 500
192.168.64.4 - - [30/Apr/2024:09:31:23 +0000] "GET /api/v1/lineage?nodeId=job:trino://trino-integration-test:1337:20240430_093112_00001_hhtt7&depth=2 HTTP/1.1" 500 110 "http://localhost:32914/lineage/job/trino%3A%2F%2Ftrino-integration-test%3A1337/20240430_093112_00001_hhtt7" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36" 149
image

renaming job namespace to string not containing colon fixes the issue. This should not be the case and is not an issue if dataset name contains such characters.

Suggested fix woule be to base64 encode NodeId parts (and delimit using colon) before sending to api - and decode api-side after splitting by colon.

Copy link

boring-cyborg bot commented Apr 30, 2024

Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!

@mgorsk1 mgorsk1 closed this as completed Apr 30, 2024
@mgorsk1 mgorsk1 reopened this Apr 30, 2024
@mgorsk1 mgorsk1 changed the title bug: cannot query lineage if job namespace contains special characters bug: cannot query lineage if job namespace contains colon character Apr 30, 2024
@wslulciuc wslulciuc added the bug Something isn't working label May 1, 2024
@wslulciuc wslulciuc added this to the 0.47.0 milestone May 1, 2024
@wslulciuc wslulciuc modified the milestones: 0.47.0, 0.48.0 May 18, 2024
@wslulciuc wslulciuc added the web label May 18, 2024
@phixMe
Copy link
Member

phixMe commented Jun 4, 2024

@wslulciuc and I spent a bit of time looking into this one today.

The issue is actually related to the job_name that you have since it only has numbers it gets erroneously excluded from our regex processed in our NodeId class.

Pattern p = Pattern.compile("(?:" + ID_DELIM + "(?!//|\\d+))");

job:trino://trino-integration-test:1337:asdf // works
job:trino://trino-integration-test:1337:1234 // does not work

This is because our regex is trying to exclude a colon following by one or more digits which works to skip over the : delimiter for the port, but not for the actual job name which in your case is a colon followed by some digits.

We're going to change some processing around to handle this case. Thanks for reporting the bug with an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working web
Projects
Status: In Progress
Development

No branches or pull requests

3 participants