Support upstream timeout #53

stefanprodan · 2019-04-02T13:59:45Z

Allow the upstream timeout to be specified.
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-routeaction-timeout

TrueBurn · 2019-08-08T17:01:00Z

Even when adding the "x-envoy-upstream-rq-timeout-ms" header to the request to override the default 15s timeout requests still error out with upstream request timeout.

https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/router_filter#x-envoy-upstream-rq-timeout-ms

stphivos · 2019-09-11T15:00:30Z

Attempting to override using header x-envoy-upstream-rq-timeout-ms is ignored for me also. As a result when under heavy load all our slow responses get terminated at the gateway level. Would really appreciate an update on this :)

SleeperSmith · 2019-09-18T02:41:10Z

Any update on this? We are blocked on a full roll out because of this.

bigdefect · 2019-09-25T01:11:09Z

We're researching this right now to see why the header isn't being honored, and to importantly clarify when it can be honored.

In the meantime, @stphivos @SleeperSmith would you be able to tell us the following information

Is the downstream requester behind an Envoy Proxy?
Is the upstream responder behind Envoy?
Is the communication between the downstream and the upstream modeled in App Mesh?

SleeperSmith · 2019-10-01T04:21:37Z

Yes
Yes
Yes.

The downstream requester: Nginx as ingress controller -> envoy proxy as sidecar.
The upstream responder: Custom app service registered as a virtual service + virtual router + virtual node.
I am injecting the header "x-envoy-upstream-rq-timeout-ms" in the nginx.

dastbe · 2019-10-03T19:36:43Z

A quick update on why this isn't working.

Broadly, the issue is an interaction between how envoy determines if the request is internal (which is what allows the use of envoy control headers) and the setting of use_remote_address.

Currently, App Mesh configuration turns use_remote_address off. The result is that the downstream Envoy doesn't consider the request internal (as no x-forwarded-for is set) and x-forwarded-for is not appended to, and the upstream will not respect the control header for the same reason.

Looking forward, we'd like to support upstream timeouts on both the route (respected at the downstream) and on a VNode listener (respected at the upstream).

stphivos · 2019-10-10T11:10:16Z

@efe-selcuk, @dastbe thanks. In our case a standard request is as follows:

[client] => [aws alb] => [nginx] => [flask-http-app] => [grpc-tcp-app] => [aws rds]

where nginx (vn), flask (vs+vn) and grpc (vs+vn) are all modeled in App Mesh and have the Envoy Proxy sidecar. All were updated to support higher timeouts (3m) but when some responses are slow nginx replies with 504. As a temp solution we removed the services from the mesh.

SleeperSmith · 2019-10-14T22:47:35Z

@stphivos that sounds like the timeout is working? What settings / host headers did you set? I have a similar set up and that should work well enough for me.

Can't you just set nginx timeout to something higher?

stphivos · 2019-10-28T12:34:27Z

@SleeperSmith both aws alb and nginx gateway have timeouts set to 3m:

http {
    server {
        listen 80;
        server_name _;
        proxy_http_version 1.1;

        client_header_timeout  3m;
        client_body_timeout    3m;
        send_timeout           3m;
        keepalive_timeout      3m;

        gzip on;
        gzip_min_length  1100;
        gzip_buffers     4 8k;
        gzip_types       text/plain;

        location / {
            try_files $uri @app;
        }

        location @app {
            proxy_redirect  off;
            proxy_pass      http://backend;

            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;

            proxy_connect_timeout  70;
            proxy_send_timeout     180;
            proxy_read_timeout     180;            
        }
    }
}

The request times out at nginx while waiting for the flask app to respond. I can confirm that from X-Ray:

mymesh/service-gateway-vn AWS::AppMesh::Proxy
  => mymesh/service-gateway-vn  504 15.0 sec

mymesh/flask-app-vn AWS::ECS::Container
  => mymesh/flask-app-vn        200 16.4 sec

Also when logging the http headers at the flask app i get:

  "X-Real-Ip": "127.0.0.1",
  "X-Forwarded-For": "<public-ip-addr>, 127.0.0.1",
  "X-Forwarded-Host": "_",
  "X-Envoy-Expected-Rq-Timeout-Ms": "15000"

I tried setting different headers from the http client but still no luck:

x-envoy-upstream-rq-timeout-ms          180000
x-envoy-upstream-rq-per-try-timeout-ms  180000
X-Envoy-Expected-Rq-Timeout-Ms          180000

X-Envoy-Expected-Rq-Timeout-Ms in the request is always set to 15000 and the client receives a 504: upstream request timeout.

I'm using the latest release of appmesh-inject (v0.2.0) which uses aws-appmesh-envoy:v1.11.2.0-prod. Thanks.

gdowmont · 2019-11-20T15:25:42Z

I have got an application that streams the response back to the client. Due to the size it can take more than a minute to finish. However envoy always terminates the connection after 15 seconds.

I have tried sending keep-alives, chunking, streaming, it is always 15 seconds and connection gets closed.

Are there any workarounds I could try applying? As other mentioned before, headers are being ignored.
I have tried with the latest aws-appmesh-envoy:v1.12.1.0-prod

craiggoddenpayne · 2019-11-20T16:31:53Z

I have the same issue also, I came across this after searching for some time, and x-envoy-expected-rq-timeout-ms does not seem to work for me also.

I also cannot find any mention of this in any documentation on the AWS site, only on envoy documentation, is AppMesh now support changing the timeout based on request headers like in the envoy documentation? https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter

SleeperSmith · 2019-12-04T01:18:16Z

@bcelenza so this has been accepted as being worked on, but I'm just wondering why I don't see it on the project road map. Just want to see if this is being worked on or this is under a slightly different status?

bcelenza · 2019-12-04T21:36:16Z

@SleeperSmith sorry about that, added it to the roadmap project. We are actively working on it now. 😄

gary-cowell · 2019-12-13T07:36:41Z

Imagine my disappointment upon getting 29 meshed services deployed with CDK only to find a slew of hard limit 15s timeouts. This is not usable in this state. Awaiting a resolution.

hanxu2050 · 2020-03-11T00:23:30Z

any update on this? we also just migrated 2 of our applications to App Mesh and now facing time out issues

SleeperSmith · 2020-03-11T03:08:00Z

Just to chime in here. We've withheld further roll out of AWS AppMesh specifically because of this too. Some 15 something services.

LancerRainier · 2020-04-06T18:59:16Z

Hey just to give a quick update, this is still work in progress but we are getting more traction here. We had some delays in early 2020, and is now back on track. We are looking at launching this in Q2 2020 (most likely in May 2020). Will keep this thread updated if any more changes. Apologies about the delay here, we are aware that this is a very high priority item and are actively working on it. Thanks!

shsahu · 2020-04-08T21:19:53Z

As part of providing support for timeouts, we intend to add configurable timeout fields(request timeout and idle timeout) at the Route, Virtual Node listener and probably at the Virtual Gateway too (Virtual Gateway is available in preview)

Current Implementation:
We currently support defining timeout at Route in the preview channel. There is a walkthrough available for this feature. Below is a sample spec of a Http route with timeouts:

{
  "virtualRouterName": "color-router",
  "routeName": "color-route",
  "spec": {
    "priority": 1,
    "httpRoute": {
      "match": {
        "prefix": "/"
      },
      "action": {
        "weightedTargets": [
          {
            "virtualNode": "color-node",
            "weight": 1
          }
        ]
      },
      "timeout" : {
        "perRequest": {
          "value" : 5,
          "unit" : "s"
        },
        "idle": {
          "value" : 10,
          "unit" : "s"
        }
      }
    }
  }
}

Limitation:
The current feature implementation lets you only decrease the current default request timeout(15 secs). You can set the timeout to higher values but the request will still timeout at 15 secs because the default value of 15 secs will be applicable at the upstream.

Future Enhancements (WIP) :
To set the timeout to higher values(>15 secs) , the value needs to be set at both the Route and the Virtual Node listener that the Route points to. We are working to add timeouts at the listener of the Virtual Node. Below is the sample config we will support for the Virtual Nodes with timeout defined at the listener.

    "virtualNodeName": "color-node"
    "spec": {
        "listeners": [
            {
                "healthCheck": {
                    "healthyThreshold": 2,
                    "intervalMillis": 5000,
                    "port": 8080,
                    "protocol": "http",
                    "path": "/ping",
                    "timeoutMillis": 2000,
                    "unhealthyThreshold": 3
                },
                "portMapping": {
                    "port": 8080,
                    "protocol": "http"
                },
                "timeout": {
                    "http": {
                        "timeout" : {
                            "perRequest": {
                                "value" : 20,
                                "unit" : "s"
                            },
                            "idle": {
                                "value" : 30,
                                "unit" : "s"
                            }
                        }
                    }
                }
            }
        ],
        "serviceDiscovery": {
            "awsCloudMap": {
                "namespaceName": "http.local",
                "serviceName": "color"
            }
        }
    }
}

We are also working on to finalise the design for adding timeout at the Virtual gateway route.

Thanks.

shsahu · 2020-05-30T00:06:33Z

Hi All, Timeouts for Routes and Virtual Node Listener are available for testing in our preview channel. Check out the walkthrough to get started. Refer to the docs for route and Virtual Node for more info.

Looking forward for your feedback.

shsahu · 2020-06-19T00:31:42Z

Timeouts is now available in the App Mesh APIs, SDKs and Console for all regions. The Kubernetes controller supports timeout in the latest version v1.0.0

Timeout configuration is supported at route and at Virtual Node listeners. Refer API docs and the userguide for more information.

Please note that the Cloudformation support is not released yet, It will be available soon.

herrhound · 2020-06-19T22:25:20Z

Closing the issue, as the feature is now released.

RafalMaleska · 2021-04-16T14:23:58Z

hi, the issue was solved for VirtualNodes but how about VirtualGateway?

it there a way to set the timeout on them?

RafalMaleska · 2021-04-23T14:19:12Z

ok, solved this by using a router and setting a timeout for one of the routes.

visit1985 · 2021-11-05T10:43:37Z

Even when setting timeouts on VirtualRouter and VirtualNode level, the is still no timeout defined on VirtualGateways internal self-redirect to 15001. This should be set to the max of all timeouts across all possible routes, but is currently defaulting to 15s.

rajal-amzn · 2021-11-05T15:40:54Z

@visit1985 Thank you for reporting this. We are aware of this regression where the timeouts are not getting configured at Virtual Gateway when the hostname rewrite is disabled. We are currently working on the fix and is being tracked here

mhausenblas added the Roadmap: Proposed We are considering this for inclusion in the roadmap. label Apr 3, 2019

mhausenblas changed the title ~~[request]: Support upstream timeout~~ Support upstream timeout Apr 3, 2019

shubharao assigned bigdefect Sep 27, 2019

shubharao added the Roadmap: Awaiting Customer Feedback We need to get more information in order understand how we will implement this feature. label Sep 27, 2019

bigdefect assigned dastbe and unassigned bigdefect Oct 2, 2019

shubharao added Roadmap: Accepted We are planning on doing this work. and removed Roadmap: Awaiting Customer Feedback We need to get more information in order understand how we will implement this feature. Roadmap: Proposed We are considering this for inclusion in the roadmap. labels Oct 3, 2019

bigdefect mentioned this issue Oct 8, 2019

update retry policy walkthrough aws/aws-app-mesh-examples#217

Merged

bcelenza added the Phase: Working on it label Nov 21, 2019

bcelenza mentioned this issue Jan 15, 2020

Add timeouts and new TLS fields to preview model #150

Merged

ewbankkit mentioned this issue Feb 2, 2020

[WIP] App Mesh preview 01/2020: Cross-account support, route timeouts, enhanced support for TLS hashicorp/terraform-provider-aws#11850

Closed

LancerRainier assigned LancerRainier and unassigned dastbe Mar 11, 2020

LancerRainier added the Priority: High label Mar 11, 2020

LancerRainier assigned shsahu Mar 11, 2020

ewbankkit mentioned this issue Apr 9, 2020

[WIP] App Mesh preview 04/2020: Route timeouts, virtual gateways hashicorp/terraform-provider-aws#12750

Closed

jtoberon mentioned this issue May 28, 2020

App Mesh feature: Support upstream timeout aws/aws-app-mesh-controller-for-k8s#242

Closed

shsahu mentioned this issue May 29, 2020

Updating timeout walkthrough to include listener timeouts aws/aws-app-mesh-examples#298

Merged

shsahu added In AppMesh Preview and removed Phase: Working on it labels May 30, 2020

ewbankkit mentioned this issue Jun 19, 2020

AWS App Mesh introduces timeout configuration support hashicorp/terraform-provider-aws#13839

Closed

shsahu removed the In AppMesh Preview label Jun 19, 2020

herrhound closed this as completed Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support upstream timeout #53

Support upstream timeout #53

stefanprodan commented Apr 2, 2019

TrueBurn commented Aug 8, 2019

stphivos commented Sep 11, 2019

SleeperSmith commented Sep 18, 2019

bigdefect commented Sep 25, 2019 •

edited by dastbe

Loading

SleeperSmith commented Oct 1, 2019 •

edited

Loading

dastbe commented Oct 3, 2019 •

edited

Loading

stphivos commented Oct 10, 2019

SleeperSmith commented Oct 14, 2019

stphivos commented Oct 28, 2019

gdowmont commented Nov 20, 2019 •

edited

Loading

craiggoddenpayne commented Nov 20, 2019

SleeperSmith commented Dec 4, 2019

bcelenza commented Dec 4, 2019

gary-cowell commented Dec 13, 2019

hanxu2050 commented Mar 11, 2020

SleeperSmith commented Mar 11, 2020

LancerRainier commented Apr 6, 2020

shsahu commented Apr 8, 2020 •

edited

Loading

shsahu commented May 30, 2020

shsahu commented Jun 19, 2020

herrhound commented Jun 19, 2020

RafalMaleska commented Apr 16, 2021

RafalMaleska commented Apr 23, 2021

visit1985 commented Nov 5, 2021

rajal-amzn commented Nov 5, 2021

Support upstream timeout #53

Support upstream timeout #53

Comments

stefanprodan commented Apr 2, 2019

TrueBurn commented Aug 8, 2019

stphivos commented Sep 11, 2019

SleeperSmith commented Sep 18, 2019

bigdefect commented Sep 25, 2019 • edited by dastbe Loading

SleeperSmith commented Oct 1, 2019 • edited Loading

dastbe commented Oct 3, 2019 • edited Loading

stphivos commented Oct 10, 2019

SleeperSmith commented Oct 14, 2019

stphivos commented Oct 28, 2019

gdowmont commented Nov 20, 2019 • edited Loading

craiggoddenpayne commented Nov 20, 2019

SleeperSmith commented Dec 4, 2019

bcelenza commented Dec 4, 2019

gary-cowell commented Dec 13, 2019

hanxu2050 commented Mar 11, 2020

SleeperSmith commented Mar 11, 2020

LancerRainier commented Apr 6, 2020

shsahu commented Apr 8, 2020 • edited Loading

shsahu commented May 30, 2020

shsahu commented Jun 19, 2020

herrhound commented Jun 19, 2020

RafalMaleska commented Apr 16, 2021

RafalMaleska commented Apr 23, 2021

visit1985 commented Nov 5, 2021

rajal-amzn commented Nov 5, 2021

bigdefect commented Sep 25, 2019 •

edited by dastbe

Loading

SleeperSmith commented Oct 1, 2019 •

edited

Loading

dastbe commented Oct 3, 2019 •

edited

Loading

gdowmont commented Nov 20, 2019 •

edited

Loading

shsahu commented Apr 8, 2020 •

edited

Loading