-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate request to got (part 1 of many) #6160
Conversation
|
I have a frustratingly difficult proxy environment where I currently run a self-hosted server, so happy to take a look/test/etc. anything on that front whenever we get there if that'd be helpful |
bb1d747
to
ca33c99
Compare
const got = require('got') | ||
const { Inaccessible, InvalidResponse } = require('./errors') | ||
|
||
function requestOptions2GotOptions(options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to pass a request-like options object to got without modifying service code (mostly).
I reckon once we've completely removed all our usage of request, we remove this translation layer, update the service code and update https:/badges/shields/blob/master/doc/TUTORIAL.md to point to got's docs.
} | ||
|
||
if (Object.keys(requestOptions).length > 0) { | ||
throw new Error(`Found unrecognised options ${Object.keys(requestOptions)}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to fail loudly if we don't know what to do with it. Obviously this massively constrains what selection of options we can use without modifying this, but I'm pretty certain I've covered the full subset of options we are actively using. If we try to use another one, we need to work out the translation not make an assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I started with this file from sendRequest
below, and then as I was reading through requestOptions2GotOptions
found myself wondering why the keys were being deleted on requestOptions
since it seemed like that was tossed/unused for the actual request but now see why as we're using it for validation.
I don't expect any issues with this, but since it will add processing steps on every request do we want to do some kind of perf run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually have thinking on this some more, such a hypothetical test would really need to be executed under load over an extended duration which doesn't really seem feasible/viable. i think this will be a case better handled by simply keeping an eye on prod for a while after deployed to be prepared to scale out or even roll back should something really go wildly unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think performance should be a worry here. The options object is always going to be a small object with only a few keys.
|
||
async function sendRequest(gotWrapper, url, options) { | ||
const gotOptions = requestOptions2GotOptions(options) | ||
gotOptions.throwHttpErrors = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider here:
By default got auto-retries failed requests. See
https:/sindresorhus/got#api
https:/nock/nock#common-issues
This behaviour is somewhat idiosyncratic I'm not 100% certain if this is desirable behaviour for us. I haven't done it, but we could turn this off by setting
gotOptions.retry = 0
here. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having played around with it a bit more locally, I think this behaviour is going to be undesirable for us so I've turned it off in ece2f57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are cases with a flaky upstream service where auto-retry could potentially be helpful, though I think the constraints of our max response time are a bigger factor (i.e. better to return a failed/error badge that renders than take 5-6 seconds and have the badge fail to render).
There are some service tests that I'd love to have some auto-backoff and retry logic for, but that's entirely separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the magic 4 second limit is important here, at least in production
One of the nice things about got is as well as passing a number we can also pass retry an object including a calculateDelay
function (which allows you to implement backoff) see https:/sindresorhus/got#retry so we could possibly find a way to make this a setting. We set retry to 0 in production/dev, but under test we set retry to an object which allows N retries with a backoff. I think we should leave this for now and come back to it once we've migrated everything to use got and dropped request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the nice things about got is as well as passing a number we can also pass retry an object including a calculateDelay function (which allows you to implement backoff) see https:/sindresorhus/got#retry so we could possibly find a way to make this a setting. We set retry to 0 in production/dev, but under test we set retry to an object which allows N retries with a backoff. I think we should leave this for now and come back to it once we've migrated everything to use got and dropped request.
that sounds potentially glorious
Proxying
Yes this is a useful offer. I was going to ask you about this anyway as I don't really have a good handle on this and I am not behind a proxy so it is really difficult for me to meaningfully test this. I think proxying is now the last outstanding big issue (hopefully). I haven't implemented anything yet, but I have done more reading. You've suggested global-agent as a solution. Looking over the docs for global-agent, it says it works with both got and request The thing I'm not sure about is https:/gajus/global-agent#what-is-the-reason-global-agentbootstrap-does-not-use-http_proxy I guess the situation is:
I guess the options are:
|
My understanding is that the only risk would be the potential side effects on any other apps running in the same environment as the Shields server, with those apps themselves explicitly setting certain properties (either directly or via something in their own dep tree). I think the potential of that is relatively low, given how infrequently you encounter multiple apps running on the same host machine with no isolation in today's world. However, the impact could be subtle/difficult to detect, and tricky to track down too, especially if you have App Foo running directly on the same host as your Shields server, and App Foo needs some finally tuned socket configurations. That seems like a rather unlikely scenario to me though, as I wouldn't expect someone to haphazardly drop a Shield server in that same environment. I can't envision any risk on the Shields server itself, provided we aren't relying on any lib/SDK anywhere in our own codebase that's handling the http interaction with some upstream service. I believe folks running in a Docker container, or in a dedicated/isolated hosting env would be fine.
Yup, think that sums them up well. I'm open to any option at this point, but I will definitely lean towards any option that's less invasive/less-risky for the Shields.io case even if that adds a step or two, or potential breaking change, for self-hosting. |
I see. My reading of
is different to that. My understanding was that global-agent implements proxying lower down in the stack than request, so because request implements its own proxy agent, if we use I don't really have a meaningful way to test this, but I've pushed 302a2a5. This bootstraps global-agent when we initialise the server (doing it in One further option I've thought of since yesterday is we could try and do the full request --> got migration all at once and only merge this once we've completely removed request. Then any incompatibility between global-agent and request becomes irrelevant. That said I have got a strong preference for doing the rest of the work in 8 smaller PRs rather than having to do the whole thing in one go if at all possible. |
I think the global-agent readme could use an update/some clarity
Suppose we could poke around the
This could certainly be a case of me being overly optimistic though!
Will do, should have some bandwidth later this week |
Excitingly, turning off retry has given us whole load of tests failing with "Mocks not yet satisfied" so that's something for me to dig into 🤔 |
Turns out it was actually global agent rather than retry that was causing all the nock tests to fail. I just didn't check the build after pushing 302a2a5 😄 I haven't really dug too deep into this but I guess both nock and global-agent are patching the node HTTP stack and terrible things are happening. Given the live tests work and the mocked tests all fail, I think rather than get lost in those weeds the solution is probably not to bootstrap global-agent under test, but happy to review it if anyone else has ideas.. |
I'd be inclined to see if our resident nock expert has any insights or suggestions @paulmelnikow 😄 It would give me some minor angst if we end up having to running our service tests in a different configuration than we run the server, but that wouldn't be a blocker for me if we end up needing to go that route |
We talked about this issue in the ops meeting. Tl;dr - lets not bootstrap global-agent under test or in production.
I'll do some reading + follow up on that. |
Things seem to be working great behind a proxy, no env/config changes needed 👍 I did notice an |
This is a behavioural difference between request and got request will send the request, then we'll get a `400 Bad Request` back and re-throw at as invalid got will pick up that the URL is invalid and throw `RequestError: URI malformed` before attempting to send it which we'll re-throw as inaccessible
Got doesn't natively support assmebling a querystring from nested objects because it uses node's URLSearchParams internally. Use qs and pass qs a string. Wordpress is the only service that needs this, so we could build the string manually in this case if we don't want to take qs as a prod dependency. It is mostly hard-coded values anyway.
got overwrites any ?foo=bar in the URL string if searchParams is also passed whereas request appends see https:/sindresorhus/got#url
b7dbcf3
to
d1f8907
Compare
d1f8907
to
94f8c43
Compare
OK, so the resolution I arrived at for global-agent is only bootstrapping it if proxy vars are configured. That way we know global agent definitely isn't touching the HTTP stack unless we explicitly want it to and it doesn't require us or self-hosting users to change anything. Additionally I've rebased onto master to resolve merge conflicts and squashed the commit history into something easier to review (as things meandered a bit while this was WIP). I'm going to mark this ready for review now as it is.. ready for review 🎉 There are a few failing service tests but I think all they are all transient or due to other known issues like |
@@ -60,7 +60,7 @@ module.exports = class KeybaseBTC extends KeybaseProfile { | |||
|
|||
async handle({ username }) { | |||
const options = { | |||
form: { | |||
qs: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're changing these, could we go ahead and convert to got's searchParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it would throw Found unrecognised options searchParams
:)
I think the reasons for that are explained in the top post and #6160 (comment) but just to run over it again: My proposal for the migration plan is we keep everything communicating using request's options
format regardless of whether we are actually using request or got (at least in the service code. Maybe the more 'standalone' stuff makes sense to bypass that). Then once we've removed request completely, the last stage of the migration will be:
- remove
requestOptions2GotOptions()
- update everything to use got's syntax
- update https:/badges/shields/blob/master/doc/TUTORIAL.md to point to got's docs instead of request's
I think allowing both a got-like options object or a request-like options object during the transition period probably introduces unnecessary complexity, but maybe it becomes useful to be able to break that down into smaller chunks nearer the time. I think the way I'd probably do that would be to do one parameter at a time over all the services rather than have some services using got's format and some using request's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it would throw
Found unrecognised options searchParams
:)
Ah so it would. Apologies, have been reviewing this in small chunks over a long period of time and clearly forgetting things along the way.
I think allowing both a got-like options object or a request-like options object during the transition period probably introduces unnecessary complexity, but maybe it becomes useful to be able to break that down into smaller chunks nearer the time
As I'm thinking about this (again 😄) I find myself wondering if I understand all the scenarios we're trying to catch/protect against with that check (https:/badges/shields/pull/6160/files#r575695319). I gather these are the three(ish) :
- flag any usage of
request
-specific options that we haven't yet properly mapped togot
's equivalent - prevent the introduction of any direct
got
option usage/mixed usage of request and got options - structure the conversion to
got
in batches, specifically here updating the service class fetch params for options
If so and I'm following correctly then I think I'm on board with that. I suppose one thing that's at least worth brief consideration is whether we want to abstract away the http client lib request options (http request, not request
lib) from the service classes as a long term approach (not just for conversions).
I'm not proposing that we do, and think I would probably vote against, but tossing it out there anyway. I think the frequency (or lack thereof really) at which the underlying lib is changed, and the benefits of using a popular/well-known lib which many folks will know/can easily lookup outweigh trying to have our own Shields-specific set of options and translation layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that summary 👍
I think unless we've been very unlucky in backing the wrong horse here, changing what HTTP lib we are using should be something that we will not do any time soon and there isn't any benefit in building a permanent abstraction over got. This should just be here temporarily for transitional purposes.
Think I may want to do one more pass through this, largely to be sure I've got my head wrapped around the changes and plan, but have finished going through once and things LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're good to go on this first (big) step 🚀 The failing service tests look to be the usual, unrelated, suspects, but will differ to you to adjust/eliminate the target tests to whatever you think is worth running at this point.
Don't think there's any additional pending changes nor conflicts that would cause a dismissed review, but happy to re-approve if need be
Thanks for your time on reviewing this. I'm going to deploy it and keep an eye on sentry for any issues we haven't thought of that crop up. |
This reverts commit 2359eb2.
Its a dirty job, but someone's gotta do it. Having started this, I now slightly regret it 😬
Basically, what I've done here is picked one place we call request:
shields/core/base-service/base.js
Line 443 in 7f7b101
This is also the most frequent call we make to request. Most of our upstream API requests go via here. The idea is if we can build up the structure to allow
request
andgot
to live side by side and replace one occurrence of it, then we can gradually move the rest of the calls to request over to use got one at a time (in nice small easy to review PRs, hopefully with more than one person being able to contribute) until there is nothing left calling request and we can completely remove it and tidy up the mess.The other place we use request are:
shields/core/base-service/base.js
Line 444 in 7f7b101
all of which are their own special unique snowflakes ❄️ Tbh, some of those are probably going to be much easier than this. Maybe I should have started with the low hanging fruit.
A design decision: At the moment I'm trying to absolutely minimise the changes to the structure of
BaseService
. I think in future we can probably remove some (mostly pointless?) layers of indirection here. For example, I think one thing we could do is instead of injecting a fetch function into the constructor ofBaseService
viaregister()
andinvoke()
(which we callsendAndCacheRequest
.. because it doesn't cache anything 😁) we can probably move to just declaringdirectly on
BaseService
and that might clean things up. That refactor will require a lot of changes to the test code to accommodate though, so what I'm focussing on for now is just migrating from one HTTP lib to another and keeping the interface as similar as possible (even if that interface is a bit janky). This means we can really rely on our existing test suite to verify the new functionality. Then refactoring improvements can come later. That should keep the changeset as small as possible.TODO:
Deal with proxyingWrite tests, esp for:requestOptions2GotOptions()
transformMax response size exceeded error, Inaccessible errorDeal with specific services that have known problems to fix:dynamickeybaseopmwerckerwordpressyoutube(this was a transient failure)Rebase and resolve merge conflictsRefs #4655