-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy changes for same session #2503
Comments
Thank you @harm-matthias-harms for bringing this up. Indeed, there is an issue with the way we're handling the sessions in the browser crawlers. This is because a running browser instance can be reused for multiple requests, but will always have only one proxy URL / session tied to it (because of technical reasons). We'll try to straighten this up in upcoming patches - in the meantime, you can get the expected behavior by switching the const crawler = new PlaywrightCrawler({
launchContext: {
useIncognitoPages: true, // Use one browser per request, fixes the session pairing issues
},
requestHandler: async ({ enqueueLinks, session, proxyInfo }) => {
...
}
}); |
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/browser (BrowserCrawler)
Issue description
According to the documentation the proxies and sessions are bound together to avoid blocking if the same sessions run with another IP address. The documentation gives a similar example:
But if I check the proxy and session in the router, the session ID does not match the proxies' session ID:
This outputs something like:
The problem seems to be that the proxy is loaded before the page context is enhanced, which can change the session..
A local working solution is to load the proxy after the session is again loaded. This can be done by moving the code block below the last mentioned line.
After the change the output looks like this:
I'm sorry for not providing a PR for this because I don't know if this has other implications and it's not easy for me to add an adequate test fast.
Related to https://discord.com/channels/801163717915574323/1243449005820874763
Code sample
No response
Package version
latest
Node.js version
20
Operating system
macOs
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
The text was updated successfully, but these errors were encountered: