Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering by subfolder option in parse_folder script #215

Closed
wants to merge 10 commits into from

Conversation

crohkohl
Copy link

This pull request is the result of the discussion in #212.

It is aimed at handling the following import case when parsing data folders.

Simple example: there are images from two categories and for every image we create augmented versions by mirroring resulting in a folder structure comparable to:

├── images/
│   ├── cat/
│   │   ├── img1/
│   │   |     └── 1_original.jpg
│   │   |     └── 1_mirrorX.jpg
│   │   ├── img2/
│   │   |     └── 2_original.jpg
│   │   |     └── 2_mirrorX.jpg

Currently, the structure is just flattened and then splitted into train and validation datasets. So it could happen that this split is chosen:

Training:

│   ├── cat/
│   │  └── img1/1_original.jpg
│   │  └── img2/2_mirrorX.jpg

Validation:

│   ├── cat/
│   │  └── img1/1_mirrorX.jpg
│   │  └── img2/2_original.jpg

That is not the desired result because it mixes data that originated from the same source image in training and validation. What you would want is:

Training:

│   ├── cat/
│   │  └── img1/1_original.jpg
│   │  └── img1/1_mirrorX.jpg

Validation:

│   ├── cat/
│   │  └── img2/2_original.jpg
│   │  └── img2/2_mirrorX.jpg

A new argument called --split_by_subfolder has been added to

https:/crohkohl/DIGITS/blob/split_by_subfolder/tools/parse_folder.py#L524

which leads to that behaviour. The data is first grouped by the deepest sub-folder name in a dictionary - then divided into train / val / test and finally all group items are added to the image lists.

crohkohl and others added 10 commits August 7, 2015 15:56
None of these are functional changes. Just cleaning up the code.
This is just a quick fix until we get a proper job management page.
Add widget to the homepage showing available gpus
Travis started failing with this error:

      File "/home/travis/miniconda/lib/python2.7/os.py", line 157, in makedirs
        mkdir(name, mode)
    OSError: [Errno 13] Permission denied: '/home/travis/.cache/pip/wheels/b7'

I fixed this by chowning all of ~/.cache to travis:travis
Also changed the verbosity of the install_caffe script
Double LMDB map_size on MapFullError
@lukeyeager
Copy link
Member

Neat! I tested it and it seems to work.

  1. This should be exposed as an option through the UI so people can actually use it.
    • Add a field to digits/dataset/images/classification/forms.py
    • Display it in the template at digits/templates/datasets/images/classification/new.html
  2. Add some tests, please!
    • At least in tools/test_parse_folder.py
    • For bonus points, add a test to TestCreation in digits/dataset/images/classification/test_views.py. That may require modifying digits/dataset/images/classification/test_imageset_creator.py to create classes with subfolders.

@crohkohl
Copy link
Author

Okay, I will look into that.

@lukeyeager
Copy link
Member

If it's helpful, you can look at what I've done in #226 as reference for how to do (1) and (2).

@lukeyeager
Copy link
Member

@crohkohl, you still there? This is great stuff and I would like to see it merged. Do you want for me to take over?

@lukeyeager lukeyeager closed this Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants