Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix documentation for AT_DROPOUT and GC_DROPOUT in PanelMetricsBase.java #1977

Merged
merged 2 commits into from
Aug 22, 2024

Conversation

kockan
Copy link
Contributor

@kockan kockan commented Aug 19, 2024

The metrics AT_DROPOUT and GC_DROPOUT are calculated in a pretty much identical manner in two different places in Picard. One is TargetMetricsCollector.java which will perform this functionality for CollectHsMetrics.java if a reference sequence is provided and the other is GcBiasMetricsCollector.java. A user has highlighted an inconsistency with the documentation in #1973 . If we look at GcBiasSummaryMetrics.java, we see:

    /**
     * Illumina-style AT dropout metric.  Calculated by taking each GC bin independently and calculating
     * (%ref_at_gc - %reads_at_gc) and summing all positive values for GC=[0..50].
     */
    public double AT_DROPOUT;

    /**
     * Illumina-style GC dropout metric.  Calculated by taking each GC bin independently and calculating
     * (%ref_at_gc - %reads_at_gc) and summing all positive values for GC=[50..100].
     */
    public double GC_DROPOUT;

However, in PanelMetricsBase.java, we have:

    /**
     * A measure of how undercovered <= 50% GC regions are relative to the mean. For each GC bin [0..50]
     * we calculate a = % of target territory, and b = % of aligned reads aligned to these targets.
     * AT DROPOUT is then abs(sum(a-b when a-b < 0)). E.g. if the value is 5% this implies that 5% of total
     * reads that should have mapped to GC<=50% regions mapped elsewhere.
     */
    public double AT_DROPOUT;

    /**
     * A measure of how undercovered >= 50% GC regions are relative to the mean. For each GC bin [50..100]
     * we calculate a = % of target territory, and b = % of aligned reads aligned to these targets.
     * GC DROPOUT is then abs(sum(a-b when a-b < 0)). E.g. if the value is 5% this implies that 5% of total
     * reads that should have mapped to GC>=50% regions mapped elsewhere.
     */
    public double GC_DROPOUT;

which is inconsistent with the previous definition and contains confusing notation.

The implementation inside GcBiasMetricsCollector.java although is consistent with its definition above:

    /////////////////////////////////////////////////////////////////////////////
    // Calculates the Illumina style AT and GC dropout numbers
    /////////////////////////////////////////////////////////////////////////////
    private void calculateDropoutMetrics(final Collection<GcBiasDetailMetrics> details,
                                         final GcBiasSummaryMetrics summary) {
        // First calculate the totals
        double totalReads = 0;
        double totalWindows = 0;

        for (final GcBiasDetailMetrics detail : details) {
            totalReads += detail.READ_STARTS;
            totalWindows += detail.WINDOWS;
        }

        double atDropout = 0;
        double gcDropout = 0;

        for (final GcBiasDetailMetrics detail : details) {
            final double relativeReads = detail.READ_STARTS / totalReads;
            final double relativeWindows = detail.WINDOWS / totalWindows;
            final double dropout = (relativeWindows - relativeReads) * 100;

            if (dropout > 0) {
                if (detail.GC <= 50) atDropout += dropout;
                else{ gcDropout += dropout; }
            }
        }

        summary.AT_DROPOUT = atDropout;
        summary.GC_DROPOUT = gcDropout;
    }

TargetMetricsCollector.java doesn't have a separate function for this, but the implementation is as follows:

                // Total things up
                long totalTarget = 0;
                long totalBases  = 0;
                for (int i=0; i<targetBasesByGc.length; ++i) {
                    totalTarget += targetBasesByGc[i];
                    totalBases  += alignedBasesByGc[i];
                }

                // Re-express things as % of the totals and calculate dropout metrics
                for (int i=0; i<targetBasesByGc.length; ++i) {
                    final double targetPct  = targetBasesByGc[i]  / (double) totalTarget;
                    final double alignedPct = alignedBasesByGc[i] / (double) totalBases;

                    double dropout = (alignedPct - targetPct) * 100d;
                    if (dropout < 0) {
                        dropout = Math.abs(dropout);

                        if (i <=50) this.metrics.AT_DROPOUT += dropout;
                        if (i >=50) this.metrics.GC_DROPOUT += dropout;
                    }
                }

The proposed change here tries to make the documentation for the functionality above more consistent across Picard and easier to read.

@kockan kockan requested a review from kachulis August 20, 2024 13:19
@kockan kockan requested a review from kachulis August 22, 2024 14:58
Copy link
Contributor

@kachulis kachulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @kockan !

@kockan kockan merged commit e047518 into master Aug 22, 2024
9 checks passed
@kockan kockan deleted the can_panel_metrics_docs_fix branch August 22, 2024 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants