-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway 2.X performance issues #1861
Comments
Howdy @v3i1r4in! Thanks for upgrading and filing this issue! Really sorry there are some performance problems. While we look into this, do you mind giving a few extra details?
|
Here are the answers:
|
I believe this is happening because fetching from subgraphs now has import { Fetcher } from "@apollo/utils.fetcher";
import * as makeFetchHappen from "make-fetch-happen";
class CustomDataSource extends RemoteGraphQLDataSource {
fetcher: Fetcher = makeFetchHappen.defaults({ maxSockets: 100 });
} InvestigationDuring my testing, I found there were similar performance characteristics between Stats for
|
Hey, thanks for the detailed investigation! That makes a lot of sense. We discovered yesterday too it might be something related to the fetcher. We were using our own customized HTTP agent for connections and keep-alive behaviors and was passing it in through We are currently perf testing on one pod. so no scale up yet. |
The change to not allow extra stuffs passed into the http agent (such as |
@v3i1r4in -- Please let us know what maxSockets you change to, and also approximately how many unique host/port combos do you have across all of your subgraphs that your gateway connects to? That has bearing over your performance due to how many total sockets can actually be created. |
I am linking this here for reference. Still more investigation to be done but there has been some noticeable perf differences between graphql-js v15 and v16 which also came in gateway 2 One thing may be due to the TS compilation target: graphql/graphql-js#3648 |
Howdy all, just wanted to provide an update of what's been going on with performance investigation in v2. It's worth noting that there will be different solutions depending on your version, subgraph topology, and schema & operation size. There are a few threads we're pulling on:
We hope to get more clarity on the For now, my best advice for anyone investigating performance issues is:
|
Any chance this "FIXME: heavy-handed" mechanism of re-executing the query altogether here in the POST-PROCESSING could be exacerbating this issue as well? |
Yes, I believe it's related to my point #1 above. Though I don't believe calling Of course, removing that |
Apollo has published this tech note to help provide some guidance on how to investigate performance issues https://www.apollographql.com/docs/technotes/TN0009-gateway-performance/. We're still investigating some issues from above, but that tech note is a good way to ensure everything is setup correctly. |
We are using apollo server
3.6.7
with apollo gateway2.0.2
. We are seeing longer event loop delays and general fewer event loop iterations after the upgrade.This is a graph of our runtime metrics before and after we shipped the upgrade.
The performance of the graph took a hit after the upgrade, which I suspect this being the cause. Our P99 doubled and P95 is 1.3 times longer.
Another observation (that I couldn't quite understand) from this graph is that before the upgrades, peaks in the event loop delay correspond to dips in event loop iterations which makes sense, but after the upgrade, the dips in the event loop delays correspond to even fewer event loop iterations.
The text was updated successfully, but these errors were encountered: