Skip to content

Commit

Permalink
Filter files via regular expression (#177)
Browse files Browse the repository at this point in the history
* add regex filter

* add complete command line call to readme

* handle pattern exception
  • Loading branch information
KochTobi authored Jul 11, 2024
1 parent 3557137 commit 4963628
Show file tree
Hide file tree
Showing 6 changed files with 102 additions and 17 deletions.
48 changes: 36 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,11 @@ When using the application, you can either:
1. enter your password interactively `--password`
2. enter the name of a system property containing your password `--password:prop my.awesome.property`
```bash
java -jar -Dmy.awesome.property=ABCDEFG postman.jar -u qbc001a --password:prop my.awesome.property
java -jar -Dmy.awesome.property=ABCDEFG postman.jar -u qbc001a --password:prop my.awesome.property @path/to/config.txt
```
3. enter the name of an environment variable containing your password `--password:env MY_PASSWORD`. Make sure to *not* use the `$` sign before the environment variable (as in bash variables) otherwise the password is not recognized (`--password:env $MY_PASSWORD` will fail)
```bash
MY_PASSWORD=ABCDEFG java -jar postman.jar -u qbc001a --password:env MY_PASSWORD
MY_PASSWORD=ABCDEFG java -jar postman.jar -u qbc001a --password:env MY_PASSWORD @path/to/config.txt
```
### How to provide QBiC identifiers
To specify which data you want to list or download, you need to provide us with QBiC identifiers.
Expand All @@ -128,7 +128,7 @@ QSTTS001AB
QSTTS002BC
```
```bash
java -jar postman.jar -f myids.txt
java -jar postman.jar -f myids.txt @path/to/config.txt
```

### How to filter files by suffix
Expand All @@ -138,17 +138,36 @@ Multiple suffixes can be provided separated by a comma. A suffix does not have t

If you only want to download `fastq` and `fastq.gz` files you can run postman with
```bash
java -jar postman.jar -s .fastq,.fastq.gz
java -jar postman.jar -s .fastq,.fastq.gz @path/to/config.txt
```

### How to filter by regular expression
Both the `download` and the `list` command allow you to filter files using [a regular expression](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html).
When filtering with a regular expression pattern, the path of the file as well as the filename are matched against the pattern. Only matching files are included in the output.

If you only want to download files containing `test` in their path or in their name, you can run postman with
```bash
java -jar postman.jar --pattern ".*test.*" @path/to/config.txt
```
Although this can be used to filter for suffixes as well, please use the `--suffix` option.

Instead of
```bash
java -jar postman.jar --pattern".*\.fastq\.gz" @path/to/config.txt
```
do
```bash
java -jar postman.jar -s .fastq.gz @path/to/config.txt
```

## `list`
```txt
Usage: postman-cli list [-hV] [--exact-filesize] [--with-checksum]
[--without-header] [--format=<outputFormat>] -u=<user>
[-s=<suffix>[,<suffix>...]]... (--password:
env=<environment-variable> | --password:
prop=<system-property> | --password) (-f=<filePath> |
SAMPLE_IDENTIFIER...)
[--without-header] [--format=<format>]
[--pattern=<regex_pattern>] -u=<user> [-s=<suffix>[,
<suffix>...]]... (--password:env=<environment-variable>
| --password:prop=<system-property> | --password)
(-f=<filePath> | SAMPLE_IDENTIFIER...)
Description:
lists all the datasets found for the given identifiers
Expand All @@ -169,6 +188,8 @@ Options:
-s, --suffix=<suffix>[,<suffix>...]
only include files ending with one of these
(case-insensitive) suffixes
--pattern=<regex_pattern>
only include files with paths matching this pattern
--with-checksum list the crc32 checksum for each file
--exact-filesize use exact byte count instead of unit suffixes:
Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and
Expand Down Expand Up @@ -220,8 +241,9 @@ NGSQSTTS015A0 (20211026111452695-847006) NGSQSTTS015A0 2021-10-26T09:14:53.14381

```txt
Usage: postman-cli download [-hV] [--ignore-subdirectories] [-o=<outputPath>]
-u=<user> [-s=<suffix>[,<suffix>...]]...
(--password:env=<environment-variable> | --password:
[--pattern=<regex_pattern>] -u=<user> [-s=<suffix>[,
<suffix>...]]... (--password:
env=<environment-variable> | --password:
prop=<system-property> | --password) (-f=<filePath>
| SAMPLE_IDENTIFIER...)
Expand All @@ -244,6 +266,8 @@ Options:
-s, --suffix=<suffix>[,<suffix>...]
only include files ending with one of these
(case-insensitive) suffixes
--pattern=<regex_pattern>
only include files with paths matching this pattern
-o, --output-dir=<outputPath>
specify where to write the downloaded data
--ignore-subdirectories
Expand All @@ -252,7 +276,7 @@ Options:
with files with equal names are not addressed
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
Optional: specify a config file by running postman with '@/path/to/config.txt'.
A detailed documentation can be found at https:/qbicsoftware/postman-cli#readme.
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import java.nio.file.Path;
import java.util.Optional;
import life.qbic.qpostman.common.functions.FileFilter.MalformedPatternException;
import life.qbic.qpostman.common.options.AuthenticationOptions.NoPasswordException;
import life.qbic.qpostman.common.options.SampleIdentifierOptions.IdentityFileEmptyException;
import life.qbic.qpostman.common.options.SampleIdentifierOptions.IdentityFileNotFoundException;
Expand Down Expand Up @@ -49,6 +50,13 @@ private void logError(RuntimeException e) {
log.error(
"Please provide at least 5 letters for your sample identifiers. The following sample identifiers are to short: "
+ toShortSampleIdsException.getIdentifiers());
} else if (e instanceof MalformedPatternException malformedPatternException) {
log.error(
"The pattern %s is malformed:%s".formatted(malformedPatternException.getPatternString(),
malformedPatternException.getErrorDescription()));
log.debug(
"The pattern %s is malformed:%s".formatted(malformedPatternException.getPatternString(),
malformedPatternException.getErrorDescription()), malformedPatternException);
} else {
log.error("Something went wrong. For more detailed output see " + Path.of(LOG_PATH, "postman.log").toAbsolutePath());
log.debug(e.getMessage(), e);
Expand Down
52 changes: 49 additions & 3 deletions src/main/java/life/qbic/qpostman/common/functions/FileFilter.java
Original file line number Diff line number Diff line change
@@ -1,31 +1,74 @@
package life.qbic.qpostman.common.functions;

import static java.util.Objects.isNull;
import static java.util.Objects.nonNull;

import java.util.ArrayList;
import java.util.List;
import java.util.function.Predicate;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
import life.qbic.qpostman.common.structures.DataFile;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

/**
* A class implementing the {@link Predicate} interface and providing file filtering functionality based on suffixes.
*/
public class FileFilter implements Predicate<DataFile> {
private static final Logger log = LogManager.getLogger(FileFilter.class);

private final List<String> suffixes;
private final boolean caseSensitive;
private final Pattern pattern;

public static FileFilter create() {
return new FileFilter(new ArrayList<>(), false);
return new FileFilter(new ArrayList<>(), false, null);
}

public FileFilter withSuffixes(List<String> suffixes) {
var temp = new ArrayList<>(this.suffixes);
temp.addAll(suffixes);
return new FileFilter(temp, caseSensitive);
return new FileFilter(temp, caseSensitive, pattern);
}

public FileFilter withPattern(String pattern) {
if (isNull(pattern)) {
return new FileFilter(suffixes, caseSensitive, null);
}
Pattern compiledPattern = null;
try {
compiledPattern = Pattern.compile(pattern);
} catch (PatternSyntaxException e) {
throw new MalformedPatternException(e, pattern, e.getMessage());
}
return new FileFilter(suffixes, caseSensitive, compiledPattern);
}

public static class MalformedPatternException extends RuntimeException {

private final String patternString;
private final String errorDescription;
public MalformedPatternException(Throwable cause, String patternString,
String errorDescription) {
super(cause);
this.patternString = patternString;
this.errorDescription = errorDescription;
}

public String getPatternString() {
return patternString;
}

public String getErrorDescription() {
return errorDescription;
}
}

private FileFilter(List<String> suffixes, boolean caseSensitive) {
private FileFilter(List<String> suffixes, boolean caseSensitive, Pattern pattern) {
this.suffixes = suffixes;
this.caseSensitive = caseSensitive;
this.pattern = pattern;
}

@Override
Expand All @@ -35,6 +78,9 @@ public boolean test(DataFile dataFile) {
result &= suffixes.stream()
.anyMatch(suffix -> hasSuffix(dataFile.fileName(), suffix));
}
if (nonNull(pattern)) {
result &= pattern.matcher(dataFile.filePath()).matches();
}
return result;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ public class FilterOptions {
paramLabel = "<suffix>")
public List<String> suffixes = new ArrayList<>(0);

@CommandLine.Option(names = {"--pattern"},
description= "only include files with paths matching this pattern",
paramLabel = "<regex_pattern>")
public String pattern = null;

@Override
public String toString() {
return new StringJoiner(", ", FilterOptions.class.getSimpleName() + "[", "]")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,8 @@ private Functions functions() {
IApplicationServerApi applicationServerApi = ServerFactory.applicationServer(serverOptions.as_url, serverOptions.timeoutInMillis);
OpenBisSessionProvider.init(applicationServerApi, authenticationOptions.user, new String(authenticationOptions.getPassword()));
SearchDataSets searchDataSets = new SearchDataSets(applicationServerApi);
FileFilter myAwesomeFileFilter = FileFilter.create().withSuffixes(filterOptions.suffixes);
FileFilter myAwesomeFileFilter = FileFilter.create().withSuffixes(filterOptions.suffixes)
.withPattern(filterOptions.pattern);
WriteFileToDisk writeFileToDisk = new WriteFileToDisk(dataStoreServerApis().toArray(IDataStoreServerApi[]::new)[0],
downloadOptions.bufferSize, Path.of(downloadOptions.outputPath), downloadOptions.successiveDownloadAttempts,
downloadOptions.ignoreSubDirectories);
Expand Down
3 changes: 2 additions & 1 deletion src/main/java/life/qbic/qpostman/list/ListCommand.java
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ private Functions setupFunctions() {
serverOptions.dss_urls, serverOptions.timeoutInMillis);
SearchDataSets searchDataSets = new SearchDataSets(applicationServerApi);
FileFilter myAwesomeFileFilter = FileFilter.create()
.withSuffixes(filterOptions.suffixes);
.withSuffixes(filterOptions.suffixes)
.withPattern(filterOptions.pattern);
SearchFiles searchFiles = new SearchFiles(dataStoreServerApis, number -> {});
FindSourceSample findSourceSample = new FindSourceSample(serverOptions.sourceSampleType);

Expand Down

0 comments on commit 4963628

Please sign in to comment.