Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattypan does no longer check for duplicate files using hashes before an upload #151

Open
Abbe98 opened this issue Feb 7, 2022 · 2 comments

Comments

@Abbe98
Copy link
Collaborator

Abbe98 commented Feb 7, 2022

Following migration to a newer version of Wiki.java we lost the feature which checked for duplicate files using hashes, we should bring this back either by adding support for such a feature to Wiki.java or by implementing it on our end.

Keep in mind that we need to support the upload by URL feature.

@don-vip
Copy link
Contributor

don-vip commented Feb 7, 2022

Hi! If it can help you this is a feature I implemented in my tool with good results:

https:/toolforge/tool-spacemedia/blob/master/sm-apps/sm-cronjobs/sm-downloader/src/main/java/org/wikimedia/commons/donvip/spacemedia/downloader/HashHelper.java

https:/toolforge/tool-spacemedia/blob/master/sm-legacyapp/src/main/java/org/wikimedia/commons/donvip/spacemedia/service/MediaService.java#L116

It's based on https:/KilianB/JImageHash to not only search by exact SHA-1 match but also by "perceptive" hash so that other duplicates are also detected.

@Abbe98
Copy link
Collaborator Author

Abbe98 commented Feb 10, 2022

I started a discussion upstream around more generic support for warnings: MER-C/wiki-java#154 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants