Option to preserve filenames in download_images
#2983
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I apologize in advance if I missed a requirement for PR creation. I did my best to make sure I did not miss anything.
The motivation for this change is that I use this utility and the
ImageClassifierCleaner
widget as well and the following situation happened to me.I have a list of URLs for class A and another list of URLs for class B. So I first call
download_images
for class A url list and then for class B url list to download the images.download_images
set numbers as names for the files it downloads. So we end up a 0001.jpg image labeled as A and a 0001.jpg image labeled as B (and so on). The problem that occurred to me is that if when inspecting the images with the cleaner widget, I ended up confirming that 0001.jpg on class A was really of class B, then moving the file to the directory B as it is done in chapter 2 of the book (https:/fastai/fastbook/blob/master/02_production.ipynb) would throw an error because 0001.jpg already exists on directory B. The first solution that I thought was to preserve the filenames when downloading the images, which is what this PR enables. In summary, this PR adds apreserve_filename
option todownload_images
, which is disabled by default and preserves the image file names.I later realized that it would have been muuuuuuch easier to just rename the file when moving it from directory A to directory B... but I thought of that after I made this change locally.
As I described above, this change was not really necessary to solve my original problem, but since I already have done it I went ahead and created this PR. I also found someone that needed this feature in the past, so that also motivated me to go ahead https://forums.fast.ai/t/using-download-images-retaining-source-file-name/39463.
This is a change that I had locally and converting it into a PR did not required much effort. Still, if this does not sound like a good idea, do not hesitate to reject my PR. The learning experience was worth it :)