Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate request to got (part 1 of many) #6160

Merged
merged 10 commits into from
Mar 9, 2021
Merged

Conversation

chris48s
Copy link
Member

@chris48s chris48s commented Feb 7, 2021

Its a dirty job, but someone's gotta do it. Having started this, I now slightly regret it 😬

Basically, what I've done here is picked one place we call request:

sendAndCacheRequest: request.asPromise,

This is also the most frequent call we make to request. Most of our upstream API requests go via here. The idea is if we can build up the structure to allow request and got to live side by side and replace one occurrence of it, then we can gradually move the rest of the calls to request over to use got one at a time (in nice small easy to review PRs, hopefully with more than one person being able to contribute) until there is nothing left calling request and we can completely remove it and tidy up the mess.
The other place we use request are:

A design decision: At the moment I'm trying to absolutely minimise the changes to the structure of BaseService. I think in future we can probably remove some (mostly pointless?) layers of indirection here. For example, I think one thing we could do is instead of injecting a fetch function into the constructor of BaseService via register() and invoke() (which we call sendAndCacheRequest.. because it doesn't cache anything 😁) we can probably move to just declaring

async _requestFetcher(url, options) {
  // go get the things
  return data
}

directly on BaseService and that might clean things up. That refactor will require a lot of changes to the test code to accommodate though, so what I'm focussing on for now is just migrating from one HTTP lib to another and keeping the interface as similar as possible (even if that interface is a bit janky). This means we can really rely on our existing test suite to verify the new functionality. Then refactoring improvements can come later. That should keep the changeset as small as possible.

TODO:

  • Deal with proxying
  • Write tests, esp for:
    • requestOptions2GotOptions() transform
    • Max response size exceeded error, Inaccessible error
  • Deal with specific services that have known problems to fix:
    • dynamic
    • keybase
    • opm
    • wercker
    • wordpress
    • youtube (this was a transient failure)
    • Other service test failures are transient/unrelated to this PR
  • Rebase and resolve merge conflicts

Refs #4655

@shields-ci
Copy link

shields-ci commented Feb 7, 2021

Warnings
⚠️ This PR modified service code for wercker but not its test code.
That's okay so long as it's refactoring existing code.
⚠️ This PR modified service code for opm but not its test code.
That's okay so long as it's refactoring existing code.
⚠️ This PR modified service code for keybase but not its test code.
That's okay so long as it's refactoring existing code.
⚠️ This PR modified the server but none of its tests.
That's okay so long as it's refactoring existing code.
Messages
📖 ✨ Thanks for your contribution to Shields, @chris48s!

Generated by 🚫 dangerJS against f19a894

@calebcartwright
Copy link
Member

I have a frustratingly difficult proxy environment where I currently run a self-hosted server, so happy to take a look/test/etc. anything on that front whenever we get there if that'd be helpful

@chris48s chris48s changed the title Very WIP: Migrate request to got (part 1 of many); run [*****] Very WIP: Migrate request to got (part 1 of many); run [youtube] Feb 13, 2021
@chris48s chris48s changed the title Very WIP: Migrate request to got (part 1 of many); run [youtube] Very WIP: Migrate request to got (part 1 of many); run [*****] Feb 13, 2021
@chris48s chris48s changed the title Very WIP: Migrate request to got (part 1 of many); run [*****] Still quite WIP: Migrate request to got (part 1 of many); run [*****] Feb 13, 2021
const got = require('got')
const { Inaccessible, InvalidResponse } = require('./errors')

function requestOptions2GotOptions(options) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to pass a request-like options object to got without modifying service code (mostly).
I reckon once we've completely removed all our usage of request, we remove this translation layer, update the service code and update https:/badges/shields/blob/master/doc/TUTORIAL.md to point to got's docs.

}

if (Object.keys(requestOptions).length > 0) {
throw new Error(`Found unrecognised options ${Object.keys(requestOptions)}`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to fail loudly if we don't know what to do with it. Obviously this massively constrains what selection of options we can use without modifying this, but I'm pretty certain I've covered the full subset of options we are actively using. If we try to use another one, we need to work out the translation not make an assumption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I started with this file from sendRequest below, and then as I was reading through requestOptions2GotOptions found myself wondering why the keys were being deleted on requestOptions since it seemed like that was tossed/unused for the actual request but now see why as we're using it for validation.

I don't expect any issues with this, but since it will add processing steps on every request do we want to do some kind of perf run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually have thinking on this some more, such a hypothetical test would really need to be executed under load over an extended duration which doesn't really seem feasible/viable. i think this will be a case better handled by simply keeping an eye on prod for a while after deployed to be prepared to scale out or even roll back should something really go wildly unexpected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think performance should be a worry here. The options object is always going to be a small object with only a few keys.


async function sendRequest(gotWrapper, url, options) {
const gotOptions = requestOptions2GotOptions(options)
gotOptions.throwHttpErrors = false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider here:

By default got auto-retries failed requests. See
https:/sindresorhus/got#api
https:/nock/nock#common-issues
This behaviour is somewhat idiosyncratic I'm not 100% certain if this is desirable behaviour for us. I haven't done it, but we could turn this off by setting

gotOptions.retry = 0

here. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having played around with it a bit more locally, I think this behaviour is going to be undesirable for us so I've turned it off in ece2f57

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are cases with a flaky upstream service where auto-retry could potentially be helpful, though I think the constraints of our max response time are a bigger factor (i.e. better to return a failed/error badge that renders than take 5-6 seconds and have the badge fail to render).

There are some service tests that I'd love to have some auto-backoff and retry logic for, but that's entirely separate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the magic 4 second limit is important here, at least in production

One of the nice things about got is as well as passing a number we can also pass retry an object including a calculateDelay function (which allows you to implement backoff) see https:/sindresorhus/got#retry so we could possibly find a way to make this a setting. We set retry to 0 in production/dev, but under test we set retry to an object which allows N retries with a backoff. I think we should leave this for now and come back to it once we've migrated everything to use got and dropped request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the nice things about got is as well as passing a number we can also pass retry an object including a calculateDelay function (which allows you to implement backoff) see https:/sindresorhus/got#retry so we could possibly find a way to make this a setting. We set retry to 0 in production/dev, but under test we set retry to an object which allows N retries with a backoff. I think we should leave this for now and come back to it once we've migrated everything to use got and dropped request.

that sounds potentially glorious

@chris48s
Copy link
Member Author

chris48s commented Feb 13, 2021

Proxying

I have a frustratingly difficult proxy environment where I currently run a self-hosted server, so happy to take a look/test/etc. anything on that front whenever we get there if that'd be helpful

Yes this is a useful offer. I was going to ask you about this anyway as I don't really have a good handle on this and I am not behind a proxy so it is really difficult for me to meaningfully test this. I think proxying is now the last outstanding big issue (hopefully). I haven't implemented anything yet, but I have done more reading.

You've suggested global-agent as a solution. Looking over the docs for global-agent, it says it works with both got and request
https:/gajus/global-agent#supported-libraries
which sounds good, given we want to use both got and request side-by-side for some amount of time. I guess the way we would implement this is we just bootstrap global-agent somewhere pretty early on in our own bootstrapping process (probably server.js and it works its magic on both request and got.. hopefully).

The thing I'm not sure about is https:/gajus/global-agent#what-is-the-reason-global-agentbootstrap-does-not-use-http_proxy I guess the situation is:

  • If we just use global-agent then out-of-the-box, that doesn't give us backwards-compatibility: self-hosting users who are using HTTP_PROXY, HTTPS_PROXY and NO_PROXY will not get the same behaviour after they upgrade.
  • If we set GLOBAL_AGENT_ENVIRONMENT_VARIABLE_NAMESPACE= (lets sidestep the issue that this is hard if the user just starts up using node server instead of via a npm script or Dockerfile for the moment), global-agent respects the HTTP_PROXY, HTTPS_PROXY and NO_PROXY variables, but may result in some kind of vague/unspecified "conflicting library behaviour", presumably because request implements its own proxying behaviour at a higher level. I don't really know what that means though. I don't know whether that means "definitely don't do this if you're using request" or "it'll probably be fine, but we don't guarantee it. As someone who actually hosts your own instance behind a proxy, do you have any clearer idea what the risk is here?

I guess the options are:

  • We use global-agent and make a breaking change: If you host your own instance behind a proxy you need to change HTTP_PROXY, HTTPS_PROXY and NO_PROXY to GLOBAL_AGENT_HTTP_PROXY, GLOBAL_AGENT_HTTPS_PROXY and GLOBAL_AGENT_NO_PROXY or set GLOBAL_AGENT_ENVIRONMENT_VARIABLE_NAMESPACE= (which may result in some kind of "conflicting library behaviour")
  • We use global-agent and try and find some way of setting GLOBAL_AGENT_ENVIRONMENT_VARIABLE_NAMESPACE= for all users (easy for dockerfile/NPM script. More annoying if you just run node server). No breaking change but again, maybe there is some kind of "conflicting library behaviour"?
  • We avoid global-agent and use a non-global proxy agent (e.g: tunnel) explicitly with got but configure it ourselves based on HTTP_PROXY, HTTPS_PROXY and NO_PROXY. Then request does its own thing internally with those vars and got uses its own different proxy agent but we remove this risk of "conflicting library behaviour" or a breaking change. Tbh, maybe this one is actually the way forward?

@calebcartwright
Copy link
Member

As someone who actually hosts your own instance behind a proxy, do you have any clearer idea what the risk is here

My understanding is that the only risk would be the potential side effects on any other apps running in the same environment as the Shields server, with those apps themselves explicitly setting certain properties (either directly or via something in their own dep tree).

I think the potential of that is relatively low, given how infrequently you encounter multiple apps running on the same host machine with no isolation in today's world. However, the impact could be subtle/difficult to detect, and tricky to track down too, especially if you have App Foo running directly on the same host as your Shields server, and App Foo needs some finally tuned socket configurations. That seems like a rather unlikely scenario to me though, as I wouldn't expect someone to haphazardly drop a Shield server in that same environment.

I can't envision any risk on the Shields server itself, provided we aren't relying on any lib/SDK anywhere in our own codebase that's handling the http interaction with some upstream service. I believe folks running in a Docker container, or in a dedicated/isolated hosting env would be fine.

I guess the options are:

Yup, think that sums them up well. I'm open to any option at this point, but I will definitely lean towards any option that's less invasive/less-risky for the Shields.io case even if that adds a step or two, or potential breaking change, for self-hosting.

@chris48s
Copy link
Member Author

My understanding is that the only risk would be the potential side effects on any other apps running in the same environment

I see. My reading of

Some libraries (e.g. request) change their behaviour when HTTP_PROXY environment variable is present. Using a namespaced environment variable prevents conflicting library behaviour.

is different to that. My understanding was that global-agent implements proxying lower down in the stack than request, so because request implements its own proxy agent, if we use HTTP_PROXY, HTTPS_PROXY and NO_PROXY then we're essentially double-implementing the proxy logic: request is implementing proxying at one level but then global-agent is implementing it again at a lower level, which could lead to some odd or conflicting results (within the shields app).

I don't really have a meaningful way to test this, but I've pushed 302a2a5. This bootstraps global-agent when we initialise the server (doing it in start() ensures we use the proxy even when we are running tests or running npm run badge etc). Could you give this a try behind a proxy and see if it seems to behave sensibly? I think the code that still uses request (e.g: github, php-version, suggest etc) is probably the stuff I'm more worried about testing.


One further option I've thought of since yesterday is we could try and do the full request --> got migration all at once and only merge this once we've completely removed request. Then any incompatibility between global-agent and request becomes irrelevant. That said I have got a strong preference for doing the rest of the work in 8 smaller PRs rather than having to do the whole thing in one go if at all possible.

@calebcartwright
Copy link
Member

I think the global-agent readme could use an update/some clarity

then we're essentially double-implementing the proxy logic: request is implementing proxying at one level but then global-agent is implementing it again at a lower level, which could lead to some odd or conflicting results (within the shields app)

Suppose we could poke around the request codebase, but I'm fairly confident we wouldn't run into any intra-Shields because:

  • global agent appears to act as a steam roller (always overriding agents unless explicitly told not to) that will always use its agents over any specified by libs (as request does). If it didn't/if that behavior gets disabled then it's a bit easier to picture how there could be some strange behavior between libs that did conditionally change the agent, like request, and others that didn't

By default, global-agent overrides agent property of any HTTP request, even if agent property was explicitly set when constructing a HTTP request. This behaviour allows to intercept requests of libraries that use a custom instance of an agent per default

  • The readme says they've tested against request so I'm assuming that means there wouldn't be any butchering of other options (url, headers, w/e) when using the two in conjunction

This could certainly be a case of me being overly optimistic though!

Could you give this a try behind a proxy and see if it seems to behave sensibly?

Will do, should have some bandwidth later this week

@chris48s
Copy link
Member Author

Excitingly, turning off retry has given us whole load of tests failing with "Mocks not yet satisfied" so that's something for me to dig into 🤔

@chris48s
Copy link
Member Author

Turns out it was actually global agent rather than retry that was causing all the nock tests to fail. I just didn't check the build after pushing 302a2a5 😄 I haven't really dug too deep into this but I guess both nock and global-agent are patching the node HTTP stack and terrible things are happening. Given the live tests work and the mocked tests all fail, I think rather than get lost in those weeds the solution is probably not to bootstrap global-agent under test, but happy to review it if anyone else has ideas..

@calebcartwright
Copy link
Member

Given the live tests work and the mocked tests all fail, I think rather than get lost in those weeds the solution is probably not to bootstrap global-agent under test, but happy to review it if anyone else has ideas..

I'd be inclined to see if our resident nock expert has any insights or suggestions @paulmelnikow 😄 It would give me some minor angst if we end up having to running our service tests in a different configuration than we run the server, but that wouldn't be a blocker for me if we end up needing to go that route

@chris48s
Copy link
Member Author

We talked about this issue in the ops meeting. Tl;dr - lets not bootstrap global-agent under test or in production.
Possible approaches:

  • Set an env var in production and use that as a condition
  • Check if proxy vars are set, only bootstrap if they are
  • Does global-agent have a feature/flag that tells it not to touch the HTTP stack if no proxy configured?

I'll do some reading + follow up on that.
Still with @calebcartwright to test the case where we do bootstrap global-agent behind a real-life proxy.

@calebcartwright
Copy link
Member

Things seem to be working great behind a proxy, no env/config changes needed 👍

I did notice an internal server error for GitHub badges, but I've not tried that before with my self-hosted instance so that may not be new/could just be the absence of GH auth

This is a behavioural difference between request and got

request will send the request, then we'll get a
`400 Bad Request` back and re-throw at as invalid

got will pick up that the URL is invalid and throw
`RequestError: URI malformed` before attempting to send it
which we'll re-throw as inaccessible
Got doesn't natively support assmebling a querystring
from nested objects because it uses node's URLSearchParams
internally. Use qs and pass qs a string.

Wordpress is the only service that needs this,
so we could build the string manually in this case
if we don't want to take qs as a prod dependency.
It is mostly hard-coded values anyway.
got overwrites any ?foo=bar in the URL string if
searchParams is also passed whereas request appends
see https:/sindresorhus/got#url
@chris48s chris48s changed the title Still quite WIP: Migrate request to got (part 1 of many); run [*****] Migrate request to got (part 1 of many); run [*****] Feb 27, 2021
@chris48s
Copy link
Member Author

OK, so the resolution I arrived at for global-agent is only bootstrapping it if proxy vars are configured. That way we know global agent definitely isn't touching the HTTP stack unless we explicitly want it to and it doesn't require us or self-hosting users to change anything.

Additionally I've rebased onto master to resolve merge conflicts and squashed the commit history into something easier to review (as things meandered a bit while this was WIP). I'm going to mark this ready for review now as it is.. ready for review 🎉

There are a few failing service tests but I think all they are all transient or due to other known issues like

@chris48s chris48s marked this pull request as ready for review February 27, 2021 17:31
@@ -60,7 +60,7 @@ module.exports = class KeybaseBTC extends KeybaseProfile {

async handle({ username }) {
const options = {
form: {
qs: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're changing these, could we go ahead and convert to got's searchParams

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it would throw Found unrecognised options searchParams :)

I think the reasons for that are explained in the top post and #6160 (comment) but just to run over it again: My proposal for the migration plan is we keep everything communicating using request's options format regardless of whether we are actually using request or got (at least in the service code. Maybe the more 'standalone' stuff makes sense to bypass that). Then once we've removed request completely, the last stage of the migration will be:

I think allowing both a got-like options object or a request-like options object during the transition period probably introduces unnecessary complexity, but maybe it becomes useful to be able to break that down into smaller chunks nearer the time. I think the way I'd probably do that would be to do one parameter at a time over all the services rather than have some services using got's format and some using request's.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it would throw Found unrecognised options searchParams :)

Ah so it would. Apologies, have been reviewing this in small chunks over a long period of time and clearly forgetting things along the way.

I think allowing both a got-like options object or a request-like options object during the transition period probably introduces unnecessary complexity, but maybe it becomes useful to be able to break that down into smaller chunks nearer the time

As I'm thinking about this (again 😄) I find myself wondering if I understand all the scenarios we're trying to catch/protect against with that check (https:/badges/shields/pull/6160/files#r575695319). I gather these are the three(ish) :

  • flag any usage of request-specific options that we haven't yet properly mapped to got's equivalent
  • prevent the introduction of any direct got option usage/mixed usage of request and got options
  • structure the conversion to got in batches, specifically here updating the service class fetch params for options

If so and I'm following correctly then I think I'm on board with that. I suppose one thing that's at least worth brief consideration is whether we want to abstract away the http client lib request options (http request, not request lib) from the service classes as a long term approach (not just for conversions).

I'm not proposing that we do, and think I would probably vote against, but tossing it out there anyway. I think the frequency (or lack thereof really) at which the underlying lib is changed, and the benefits of using a popular/well-known lib which many folks will know/can easily lookup outweigh trying to have our own Shields-specific set of options and translation layer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that summary 👍

I think unless we've been very unlucky in backing the wrong horse here, changing what HTTP lib we are using should be something that we will not do any time soon and there isn't any benefit in building a permanent abstraction over got. This should just be here temporarily for transitional purposes.

@calebcartwright
Copy link
Member

Think I may want to do one more pass through this, largely to be sure I've got my head wrapped around the changes and plan, but have finished going through once and things LGTM

Copy link
Member

@calebcartwright calebcartwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're good to go on this first (big) step 🚀 The failing service tests look to be the usual, unrelated, suspects, but will differ to you to adjust/eliminate the target tests to whatever you think is worth running at this point.

Don't think there's any additional pending changes nor conflicts that would cause a dismissed review, but happy to re-approve if need be

@chris48s chris48s changed the title Migrate request to got (part 1 of many); run [*****] Migrate request to got (part 1 of many) Mar 9, 2021
@chris48s chris48s requested a deployment to shields-staging-pr-6160 March 9, 2021 19:59 Abandoned
@chris48s chris48s merged commit 2359eb2 into badges:master Mar 9, 2021
@chris48s
Copy link
Member Author

chris48s commented Mar 9, 2021

Thanks for your time on reviewing this. I'm going to deploy it and keep an eye on sentry for any issues we haven't thought of that crop up.
I'll also update #4655 with the next steps so we can keep this moving

@muety muety mentioned this pull request Mar 9, 2021
chris48s added a commit to chris48s/shields that referenced this pull request Mar 16, 2021
chris48s added a commit that referenced this pull request Mar 17, 2021
* Revert "send custom user agent when using got (#6256)"
This reverts commit fd7eddc.

* Revert "Migrate request to got (part 1 of many) (#6160)"
This reverts commit 2359eb2.

* install got as a prod dependency, allow npm 7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants