Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileStitcher: do not use detected patterns that include missing files #3129

Merged
merged 1 commit into from
Jul 2, 2018

Conversation

melissalinkert
Copy link
Member

See https://trello.com/c/hTxcpCtl/37-full-repository-skipped-filesets

When detecting a pattern based upon a list of file names, ignore any patterns that include file names not in the list. To test, use the TIFF files in data_repo/curated/cellr/kees/7Z5GR81B_DocumentFiles. Without this change, showinf -nopix -stitch on any of them should throw a FileNotFoundException. With this change, there should be no exception and a single file should be on the used files list.

Unit tests should continue to pass with this change, but there is a chance that this will impact https://web-proxy.openmicroscopy.org/west-ci/job/BIOFORMATS-test-repo/ and require additional fixes.

@sbesson
Copy link
Member

sbesson commented Apr 25, 2018

With this change included, the repository tests are still green so no obvious regression. While more robustness in the detection of the file stitching is certainly useful, I still struggle to understand what causes a wrong pattern to be detected and issue a FileNotFoundException in the first place.

With

[sbesson@ome-c6100-1 ~]$ ls test1
test_T02.tif  test_T03.tif  test_T04.tif  test_T06.tif  test_T07.tif

executing showinf -nopix -stitch on any input file with Bio-Formats 5.8.2 returns a single file pattern as expected.

@melissalinkert
Copy link
Member Author

The files need to have variation in more than one digit, as this causes multiple numerical blocks to be detected. The file names shouldn't have consistent values in the second digit for all values of the first digit.

As a minimal example, with these names:

$ ls
t03.tif  t04.tif  t12.tif  t13.tif

the patterns previously detected are t<0-1><3-4>.tif and t<0-1><2-3>.tif, which should result in a FileNotFoundException with 5.8.2. Renaming t12.tif to t14.tif would result in t<0-1><3-4>.tif, which should pick up all 4 files without throwing an exception (with or without this PR).

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. In that case, I would definitely consider the previous FileNotFoundException behavior as a bug. If no multi-file pattern can be found, falling back the behavior to a single-file pattern sounds like a reasonable behavior and matches existing cases as in as in #3129 (comment)

Given the complexity of the file stitcher, I think a second opinion might be necessary here though.

@dgault dgault added this to the 5.9.0 milestone Apr 30, 2018
@dgault
Copy link
Member

dgault commented Jun 18, 2018

A message on the users list today (http://lists.openmicroscopy.org.uk/pipermail/ome-users/2018-June/007075.html) was resulting in the FileNotFoundException when using patterns. I checked to see if the FileStitcher changes here would have any impact in this particular scenario but it still produced the same issue. In this particular case though that is probably the expected result.

@dgault
Copy link
Member

dgault commented Jul 2, 2018

i do not see this having an negative consequences and fixes the FileNotFoundException with a sensible fallback behaviour. Merging for 5.9.0

@dgault dgault merged commit 859e908 into ome:develop Jul 2, 2018
@melissalinkert melissalinkert deleted the pattern-missing-files branch July 11, 2018 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants