Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklist requests that are duplicates of existing resources or bound to fail #28

Open
Popolechien opened this issue Mar 2, 2022 · 6 comments
Labels
enhancement New feature or request prio1

Comments

@Popolechien
Copy link
Contributor

Following openzim/zimit#113, we should think about implementing a fairly easily editable list (hosted on drive.kiwix.org?) of blacklisted sites that can not be requested on zimit, e.g.

  • kiwix.org subdomains (download and library);
  • very large corporate websites (e.g. Facebook, Twitter, Reddit, Youtube, etc.)
  • websites that have been scraped in the past and failed.

It's probably the matter of a separate ticket, but requests for websites we already have a scraper for (wikipedia, stackoverflow, etc.) should also be soft blocked and the user offered a direct link to the zim file.

@Popolechien Popolechien added the enhancement New feature or request label Mar 2, 2022
@rgaudin
Copy link
Member

rgaudin commented Mar 2, 2022

Can you move your comment to #25 and close this? This is the scraper's repo.

@Popolechien Popolechien transferred this issue from openzim/zimit Mar 2, 2022
@Popolechien
Copy link
Contributor Author

@rgaudin Moved it but I'd keep it open as this ticket is a little bit different.

@rgaudin
Copy link
Member

rgaudin commented Mar 2, 2022

This one's better ; closing the other one but the problem raised there remains: where do we point to for stuff that we know exists?

@Popolechien
Copy link
Contributor Author

Is your question "in case there are several versions of the same zim" (e.g., Wikipedia mini/nopic/maxi)?

The basic assumption here is that zimit provides a copy of the real thing, so we should send them the maxi zim file.

@stale
Copy link

stale bot commented May 3, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@kelson42
Copy link
Contributor

kelson42 commented Nov 4, 2023

See also #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request prio1
Projects
None yet
Development

No branches or pull requests

3 participants