Filtering by subfolder option in parse_folder script #215

crohkohl · 2015-08-13T11:12:58Z

This pull request is the result of the discussion in #212.

It is aimed at handling the following import case when parsing data folders.

Simple example: there are images from two categories and for every image we create augmented versions by mirroring resulting in a folder structure comparable to:

├── images/
│   ├── cat/
│   │   ├── img1/
│   │   |     └── 1_original.jpg
│   │   |     └── 1_mirrorX.jpg
│   │   ├── img2/
│   │   |     └── 2_original.jpg
│   │   |     └── 2_mirrorX.jpg

Currently, the structure is just flattened and then splitted into train and validation datasets. So it could happen that this split is chosen:

Training:

│   ├── cat/
│   │  └── img1/1_original.jpg
│   │  └── img2/2_mirrorX.jpg

Validation:

│   ├── cat/
│   │  └── img1/1_mirrorX.jpg
│   │  └── img2/2_original.jpg

That is not the desired result because it mixes data that originated from the same source image in training and validation. What you would want is:

Training:

│   ├── cat/
│   │  └── img1/1_original.jpg
│   │  └── img1/1_mirrorX.jpg

Validation:

│   ├── cat/
│   │  └── img2/2_original.jpg
│   │  └── img2/2_mirrorX.jpg

A new argument called --split_by_subfolder has been added to

https:/crohkohl/DIGITS/blob/split_by_subfolder/tools/parse_folder.py#L524

which leads to that behaviour. The data is first grouped by the deepest sub-folder name in a dictionary - then divided into train / val / test and finally all group items are added to the image lists.

None of these are functional changes. Just cleaning up the code.

Remove all references to LevelDB

This is just a quick fix until we get a proper job management page.

Add widget to the homepage showing available gpus

Travis started failing with this error: File "/home/travis/miniconda/lib/python2.7/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 13] Permission denied: '/home/travis/.cache/pip/wheels/b7' I fixed this by chowning all of ~/.cache to travis:travis Also changed the verbosity of the install_caffe script

TravisCI fixes

Now requires py-lmdb >= 0.87

Double LMDB map_size on MapFullError

lukeyeager · 2015-08-13T18:20:04Z

Neat! I tested it and it seems to work.

This should be exposed as an option through the UI so people can actually use it.
- Add a field to digits/dataset/images/classification/forms.py
- Display it in the template at digits/templates/datasets/images/classification/new.html
Add some tests, please!
- At least in tools/test_parse_folder.py
- For bonus points, add a test to TestCreation in digits/dataset/images/classification/test_views.py. That may require modifying digits/dataset/images/classification/test_imageset_creator.py to create classes with subfolders.

crohkohl · 2015-08-13T19:45:37Z

Okay, I will look into that.

lukeyeager · 2015-08-20T23:16:51Z

If it's helpful, you can look at what I've done in #226 as reference for how to do (1) and (2).

lukeyeager · 2015-08-28T00:51:52Z

@crohkohl, you still there? This is great stuff and I would like to see it merged. Do you want for me to take over?

crohkohl and others added 10 commits August 7, 2015 15:56

use send_from_directory for file serving in development server

3c260bd

Fix errors and smells as reported by Landscape

4723004

None of these are functional changes. Just cleaning up the code.

Merge pull request NVIDIA#203 from lukeyeager/remove-leveldb

1bf8be3

Remove all references to LevelDB

Add widget to the homepage showing available gpus

08699a0

This is just a quick fix until we get a proper job management page.

Merge pull request NVIDIA#207 from lukeyeager/display-gpus-available

1094932

Add widget to the homepage showing available gpus

Merge pull request NVIDIA#214 from lukeyeager/travis-fixes

a84260a

TravisCI fixes

Double LMDB map_size on MapFullError - close NVIDIA#206

af5b875

Now requires py-lmdb >= 0.87

Merge pull request NVIDIA#209 from lukeyeager/lmdb-map-size

7f9d50a

Double LMDB map_size on MapFullError

Filtering by subfolder option in parse_folder script

6deb1d5

crohkohl mentioned this pull request Aug 13, 2015

Grouping during image folder parsing for db creation #212

Closed

crohkohl force-pushed the split_by_subfolder branch from 3733c19 to 6deb1d5 Compare August 16, 2015 10:36

lukeyeager closed this Sep 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering by subfolder option in parse_folder script #215

Filtering by subfolder option in parse_folder script #215

crohkohl commented Aug 13, 2015

lukeyeager commented Aug 13, 2015

crohkohl commented Aug 13, 2015

lukeyeager commented Aug 20, 2015

lukeyeager commented Aug 28, 2015

Filtering by subfolder option in parse_folder script #215

Filtering by subfolder option in parse_folder script #215

Conversation

crohkohl commented Aug 13, 2015

lukeyeager commented Aug 13, 2015

crohkohl commented Aug 13, 2015

lukeyeager commented Aug 20, 2015

lukeyeager commented Aug 28, 2015