-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TransfomHeadersAgent #26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, the code looks pretty good and I think it's a nice improvement. There are only two things we should do before merging:
- Please add some unit tests for the
TransformHeadersAgent
class. Especially some sanity checks for the private APIs it uses. I noticed that you made some comments in the previous PR, but in the PR comments, not in code. - I made a lot of comments about comments 😄 We try to explain everything that's not "common knowledge", or at least point to the relevant resource via link. Not all devs have deep knowledge of Node.js core HTTP (or any other complex system) so it's always nice to at least point them in the right direction. We also try to follow the KISS principle, because as a company, we never know who might be the next guy who will need to read this code and make changes to it.
@@ -25,27 +26,25 @@ exports.proxyHook = async function (options) { | |||
const parsedProxy = new URL(proxyUrl); | |||
|
|||
validateProxyProtocol(parsedProxy.protocol); | |||
const agents = await getAgents(parsedProxy, options.https.rejectUnauthorized); | |||
options.agent = await getAgents(parsedProxy, options.https.rejectUnauthorized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit torn whether this improves readability. I know it's shorter this way, but it made more sense to me when both the options were next to each other and not separated by the big comment. I don't have a strong opinion about this, but would like to learn why you think it's better this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's required. Otherwise it would fail if the user provided their own agents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because http2.request
is used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm blind, but I don't see any difference in behavior between:
const agents = await getAgents(parsedProxy, options.https.rejectUnauthorized);
if (resolvedRequestProtocol === 'http2') {
options.agent = agents[resolvedRequestProtocol];
} else {
options.agent = agents;
}
and
options.agent = await getAgents(parsedProxy, options.https.rejectUnauthorized);
if (resolvedRequestProtocol === 'http2') {
options.agent = options.agent[resolvedRequestProtocol];
}
🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if (resolvedRequestProtocol === 'http2') {
is outside if (proxyUrl) {
, previously it was inside. You're not blind, it just may not be obvious at first sight. Let me add appropriate comments in this regards :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got-scraping/src/hooks/proxy.js
Lines 46 to 51 in 8ead6d3
* The `if` below cannot be placed inside the `if` above. | |
* Otherwise `http2.request` would receive the entire `agent` object | |
* __when not using proxy__. | |
* --- | |
* `http2.request`, in contrary to `http2.auto`, expects an instance of `http2.Agent`. | |
* `http2.auto` expects an object with `http`, `https` and `http2` properties. |
Unless you mean something else 🤔
Some other tests are still flaky. We would need a server that would return |
Should I update the readme as well? People may get confused when they're expecting lower-case headers to be sent and got |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small nitpick in tests. I think you've done a great job 👏
Yeah, that's a good idea. There's just one thing that did not occur to me before and I probably have not mentioned it. We want to pascal case the headers sent by the browser itself for HTTP/1.1, that's for sure. But custom headers like |
Good idea as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Is it mergeable or do we need to wait for header-generator
or something else?
if (key.toLowerCase().startsWith('x-')) { | ||
headers[key] = request.getHeader(key); | ||
} else { | ||
headers[this.toPascalCase(key)] = request.getHeader(key); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry for the confusion. The x-something
was just an example. The header could be my-header
as well or some other randomness. We've seen quite a few.
Also I think there are some X-
headers which are actually sent by the browsers like X-Requested-With
.
So the determination of the "custom headers" is a bit more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lemme patch this real quick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X-Requested-With
doesn't exist in the spec: https://www.iana.org/assignments/message-headers/message-headers.xhtml nor in the MDN docs: https://developer.mozilla.org/pl/docs/Web/HTTP/Headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the generator returns in the correct casing then it should be no problem I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the generator should always return correct casing. Unless we encounter some crap UA like the applebot
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this PR is good to go I think. I'm planning to optimize HTTP/2 related stuff next such as the ALPN negotiation. I think there's no need to manually store the cache anymore.
It's mergeable. I think it'd be better to add
|
Moved from #25
The
should add custom headers
test is flaky, as the headers are fixed in the agent.Will fix this asap.Fixed.