Dataset train/eva/test partitions #2

michaeltrs · 2020-03-02T16:19:44Z

Hi,

For the provided dataset, I noticed there are more data saved in disk than what the total of the partitions found in tileids folder. For example for the 48x48 pixel data there are a total of 28515 .tfrecord.gz files while eval.tileids, train_fold*.tileids, test_fold*.tileids collectively contain 10494 samples per year. That leaves 28515 - 2*10494 = 7527 samples which are not split into train/eval/test for 2016 and 2017.
Is there something wrong in the above description? if not, then how should we treat the unassigned data?

Many thanks,
Michael

MarcCoru · 2020-03-23T21:35:23Z

Hi Michael,

Thanks for your issue and your patience.

The tileids files are used for the results in the paper. All results are obtained from the tiles of tileids/eval.tileids.

The number of tfrecord files can vary from the tileids in the data splits due to two effects: 1) data preprocessing failed (tileid listed in failedtiles201*.txt) and 2) the tileids are on the margin region between blocks of train/valid/eval as shown in Figure 4 in the paper.

Overall the preprocessing chain looked like this:

a) for each tile within AOI: crop images and store to tfrecord, if error: add id to failedtiles201*.txt

b) separate area of interest into blocks for train/valid/eval with margin. Store the ids of tiles that lie within the respective blocks into tileids folder.

Since b) defines the split, not all tiles that have been processed in a) will be used by the training script.
So, the number of tfrecord files and tileids can vary.

We decided to separate the tileids from the actual data samples to allow for different folds and some experiments with different data split similar to what we did in the CVPR paper (east vs west, size of blocks). At the end, we did not include these experiments in the work of the IJGI paper.

I hope this clarifies things.
We quantitatively evaluated the models on the eval.tileids of the 24px by 24px tiles. These would be the data tiles that you could compare your method with the results of the paper directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset train/eva/test partitions #2

Dataset train/eva/test partitions #2

michaeltrs commented Mar 2, 2020 •

edited

Loading

MarcCoru commented Mar 23, 2020 •

edited

Loading

Dataset train/eva/test partitions #2

Dataset train/eva/test partitions #2

Comments

michaeltrs commented Mar 2, 2020 • edited Loading

MarcCoru commented Mar 23, 2020 • edited Loading

michaeltrs commented Mar 2, 2020 •

edited

Loading

MarcCoru commented Mar 23, 2020 •

edited

Loading