-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithread & gevent framework built into newspaper #4
Comments
Okay, I added a public API for multithreading article downloads (while also respecting news source domains). Instead of going news source by source and spamming each source with X threads. We spread out 1-2 threads to each desired news source and download all of their articles concurrently so it's a WIN-WIN. Check it out in the updated readme!
This is still a very rough implementation, i'm going to need a few more commits to clean this up fully. Ideally, users should be able to customize how many threads they want to allocate per news source. I'm also aware that you can use "privoxy" to avoid rate limiting. Not sure if we need to build that in. |
Very cool. |
Thanks man! Feel free to open any issue or send whatever pull request :D Hopefully this project remains active. (P.S. you are from Spokane? Good to see another Washingtonian here lol i'm from Issaquah) |
Not only do I live in Washington, I work at the Spokesman-Review newspaper in Spokane :) We're a Django/Python shop, and I'm always looking for cool new toys. |
Thats pretty cool man, i'm a huge Django fanatic also! Go Seahawks :D |
updating with better extractors for mismatched languages (i.e. french…
I will add this feature tonight or tomorrow. Opening an issue for it because it is so important. Multithreading has always existed in newspaper but there hasn't been a public API for it.
Downloading multiple articles concurrently is super useful and newspaper has an effective setup to do so.
The text was updated successfully, but these errors were encountered: