Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TransfomHeadersAgent #26

Merged
merged 12 commits into from
Aug 4, 2021
Merged

Add TransfomHeadersAgent #26

merged 12 commits into from
Aug 4, 2021

Conversation

szmarczak
Copy link
Contributor

@szmarczak szmarczak commented Aug 3, 2021

Moved from #25

The should add custom headers test is flaky, as the headers are fixed in the agent. Will fix this asap. Fixed.

@szmarczak szmarczak requested a review from mnmkng August 3, 2021 13:15
Copy link
Member

@mnmkng mnmkng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, the code looks pretty good and I think it's a nice improvement. There are only two things we should do before merging:

  1. Please add some unit tests for the TransformHeadersAgent class. Especially some sanity checks for the private APIs it uses. I noticed that you made some comments in the previous PR, but in the PR comments, not in code.
  2. I made a lot of comments about comments 😄 We try to explain everything that's not "common knowledge", or at least point to the relevant resource via link. Not all devs have deep knowledge of Node.js core HTTP (or any other complex system) so it's always nice to at least point them in the right direction. We also try to follow the KISS principle, because as a company, we never know who might be the next guy who will need to read this code and make changes to it.

src/agent/transform-headers-agent.js Show resolved Hide resolved
src/agent/transform-headers-agent.js Outdated Show resolved Hide resolved
src/agent/transform-headers-agent.js Outdated Show resolved Hide resolved
src/agent/transform-headers-agent.js Outdated Show resolved Hide resolved
src/agent/transform-headers-agent.js Outdated Show resolved Hide resolved
src/agent/transform-headers-agent.js Outdated Show resolved Hide resolved
src/agent/transform-headers-agent.js Outdated Show resolved Hide resolved
@@ -25,27 +26,25 @@ exports.proxyHook = async function (options) {
const parsedProxy = new URL(proxyUrl);

validateProxyProtocol(parsedProxy.protocol);
const agents = await getAgents(parsedProxy, options.https.rejectUnauthorized);
options.agent = await getAgents(parsedProxy, options.https.rejectUnauthorized);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit torn whether this improves readability. I know it's shorter this way, but it made more sense to me when both the options were next to each other and not separated by the big comment. I don't have a strong opinion about this, but would like to learn why you think it's better this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required. Otherwise it would fail if the user provided their own agents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because http2.request is used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm blind, but I don't see any difference in behavior between:

const agents = await getAgents(parsedProxy, options.https.rejectUnauthorized);
if (resolvedRequestProtocol === 'http2') {
            options.agent = agents[resolvedRequestProtocol];
} else {
            options.agent = agents;
}

and

options.agent = await getAgents(parsedProxy, options.https.rejectUnauthorized);
if (resolvedRequestProtocol === 'http2') {
        options.agent = options.agent[resolvedRequestProtocol];
}

🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if (resolvedRequestProtocol === 'http2') { is outside if (proxyUrl) {, previously it was inside. You're not blind, it just may not be obvious at first sight. Let me add appropriate comments in this regards :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* The `if` below cannot be placed inside the `if` above.
* Otherwise `http2.request` would receive the entire `agent` object
* __when not using proxy__.
* ---
* `http2.request`, in contrary to `http2.auto`, expects an instance of `http2.Agent`.
* `http2.auto` expects an object with `http`, `https` and `http2` properties.

Unless you mean something else 🤔

test/helpers/dummy-server.js Outdated Show resolved Hide resolved
@szmarczak
Copy link
Contributor Author

Some other tests are still flaky. We would need a server that would return rawHeaders. browser-info API would suit this.

@szmarczak szmarczak marked this pull request as draft August 3, 2021 17:42
@szmarczak szmarczak requested a review from mnmkng August 4, 2021 07:46
@szmarczak szmarczak marked this pull request as ready for review August 4, 2021 07:47
@szmarczak
Copy link
Contributor Author

szmarczak commented Aug 4, 2021

Should I update the readme as well? People may get confused when they're expecting lower-case headers to be sent and got Pascal-Case instead.

Copy link
Member

@mnmkng mnmkng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small nitpick in tests. I think you've done a great job 👏

test/agent.test.js Outdated Show resolved Hide resolved
@mnmkng
Copy link
Member

mnmkng commented Aug 4, 2021

Should I update the readme as well? People may get confused when they're expecting lower-case headers to be sent and got Pascal-Case instead.

Yeah, that's a good idea. There's just one thing that did not occur to me before and I probably have not mentioned it. We want to pascal case the headers sent by the browser itself for HTTP/1.1, that's for sure. But custom headers like x-something-stuff should probably stay with the user provided case. What do you think?

@szmarczak
Copy link
Contributor Author

What do you think?

Good idea as well.

Copy link
Member

@mnmkng mnmkng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Is it mergeable or do we need to wait for header-generator or something else?

Comment on lines +27 to +31
if (key.toLowerCase().startsWith('x-')) {
headers[key] = request.getHeader(key);
} else {
headers[this.toPascalCase(key)] = request.getHeader(key);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry for the confusion. The x-something was just an example. The header could be my-header as well or some other randomness. We've seen quite a few.

Also I think there are some X- headers which are actually sent by the browsers like X-Requested-With.

So the determination of the "custom headers" is a bit more complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme patch this real quick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the generator returns in the correct casing then it should be no problem I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the generator should always return correct casing. Unless we encounter some crap UA like the applebot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, I might be wrong about the X-Requested-With. I've seen it multiple times so I thought it's sent by the browser but maybe it's just common among developers to include it.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this PR is good to go I think. I'm planning to optimize HTTP/2 related stuff next such as the ALPN negotiation. I think there's no need to manually store the cache anymore.

@szmarczak
Copy link
Contributor Author

It's mergeable. I think it'd be better to add header-generator order integration in a different PR.

https:/apify/header-generator/blob/2ebee495b88efb144fc75e374ca3acb59e4d2832/src/header-generator.js#L288-L294

header-generator returns already sorted headers, but it's sorted at the wrong point. This way we don't know where the Connection header could go. I believe it would be best to return the headers as they are and additionally return an array of sorted headers.

README.md Outdated Show resolved Hide resolved
@mnmkng mnmkng merged commit fe2bcc8 into master Aug 4, 2021
@szmarczak szmarczak deleted the transform-headers branch August 4, 2021 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants