-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to Write
or Display
interval types
#9
Comments
now, we get a Instead of handling all of the branching there (in the example code), we probably want to have something like: writer = Writer::init(&args)
...
for intersection ... {
let report = intersection.report(...)
writer.write(report)
} so what goes to use sniff::FileFormat;
impl Writer {
pub fn init(in:FileFormat, out:FileFormat) -> Result<Writer, FormatCompatibilityError> { ... }
} this can be a good start, but will need a way to pass all of the options, for example the option to count overlaps should be known by the writer. But this can be in an additional struct. |
The above seems good enough to proceed to specify input format X to output format Y. For BAM, it can go to tags and for bed, it is appended as a column. Who is responsible for keeping that information? Is it: let wtr = Writer::init(in_fmt, out_fmt, compression, args.count, args.count_bases, ...) or wtr.write(report, count, count_bases) but then any possible additional count or column must be an argument to these. We can likely special-case those and also accept a function: impl Writer {
pub fn init(
in_fmt: FileFormat,
out_fmt: Option<FileFormat>,
compression: Compression,
column_fn: fn(Option<&report>) -> String
) with the convention that when the argument is pub trait ColumnReporter {
/// report the name, e.g. `count` for the INFO field of the VCF
fn name(&self) -> String
/// report the type, for the INFO field of the VCF
fn type(&self) -> Type // Type is some enum from noodles or here that limits to relevant types
fn description(&self) -> String
fn number(&self) -> Number // Number is some enum ...
fn value(&report) -> Value // Value probably something from noodles that encapsulates Float/Int/Vec<Float>/String/...
} This makes it more work to implement, but I think it's worth it and actually not too onerous. impl Writer {
...
pub fn write(r: &Report, Vec<dyn ColumnReporter>) {...}
} |
This is started in 9e6f440 It's necessary to have a way to link each pub struct ReportFragment {
pub a: Option<Position>,
pub b: Vec<Position>,
pub id: usize,
} so we might want
where the outer vec is has length of the number of struct BedReporter {
name_: String,
idx: usize,
type: Type,
desc: String,
}
impl ColumnReporter for BedReporter {
fn name(&self) { self.name }
fn ftype(&self) { self.type }
...
fn value(&self, r: &ReportFragment) -> Result<Value, ColumnError> {
r.b.map(|p| {
match p {
Position::Bed(br) => match self.idx {
0 => p.chrom(),
1 => p.start(),
2 => p.stop()
},
_ => Error
}
})
}
} This is very flexible, but would get quite complicated just to extract a single column |
I am exploring using lua snippets again as this seems flexible and user-friendly enough. The cost is performance, but I think that's not the main concern. |
We could use
std::fmt::Display
but what if we have VCF input and want to write, e.g. BCF output.Currently, we're using standard outputting, but should likely switch to writers. This will require tracking that, for example, we can have a BCF writer from a VCF query, or a compressed (BGZF) writer from a VCF, but not a BAM writer from a VCF query.
The user should be able to specify, e.g.
--output-format BAM | CRAM | SAM
or--output-format BCF | VCF
along with a--compression-level = 0..9
For BED, perhaps it's
--output-format BED12 | BED3 | ???
where perhaps ??? is a template that can be filled.I need to think about how how to implement this and to look at changes to noodles in 0.61.0 that unify BAM/CRAM/SAM reading.
The text was updated successfully, but these errors were encountered: