-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset train/eva/test partitions #2
Comments
Hi Michael, Thanks for your issue and your patience. The tileids files are used for the results in the paper. All results are obtained from the tiles of tileids/eval.tileids. The number of tfrecord files can vary from the tileids in the data splits due to two effects: 1) data preprocessing failed (tileid listed in failedtiles201*.txt) and 2) the tileids are on the margin region between blocks of train/valid/eval as shown in Figure 4 in the paper. Overall the preprocessing chain looked like this: a) for each tile within AOI: crop images and store to tfrecord, if error: add id to failedtiles201*.txt b) separate area of interest into blocks for train/valid/eval with margin. Store the ids of tiles that lie within the respective blocks into tileids folder. Since b) defines the split, not all tiles that have been processed in a) will be used by the training script. We decided to separate the tileids from the actual data samples to allow for different folds and some experiments with different data split similar to what we did in the CVPR paper (east vs west, size of blocks). At the end, we did not include these experiments in the work of the IJGI paper. I hope this clarifies things. |
Hi,
For the provided dataset, I noticed there are more data saved in disk than what the total of the partitions found in tileids folder. For example for the 48x48 pixel data there are a total of 28515 .tfrecord.gz files while eval.tileids, train_fold*.tileids, test_fold*.tileids collectively contain 10494 samples per year. That leaves 28515 - 2*10494 = 7527 samples which are not split into train/eval/test for 2016 and 2017.
Is there something wrong in the above description? if not, then how should we treat the unassigned data?
Many thanks,
Michael
The text was updated successfully, but these errors were encountered: