Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Primer -> Oligo, PrimerLike -> OligoLike #51

Merged
merged 2 commits into from
Oct 7, 2024

Conversation

emmcauley
Copy link
Collaborator

@emmcauley emmcauley commented Sep 26, 2024

The PR refactors the Primer object and planned Probe object into the same dataclass, now called Oligo. An Oligo object can represent single primer and probe designs. The corresponding base class, previously called PrimerLike, has been refactored to OligoLike.

The new Oligo object has the attributes that a Primer object had (tm, penalty, span and optional attributes bases, tail) plus three new optional attributes: self_any_th, self_end_th, and hairpin_th. These three attributes are emitted from Primer3 for probe design and, after an update to PrimerAndAmpliconParams, will also be emitted for primer designs.

For backwards compatibility, the Primer object now wraps an Oligo object and also emits a DeprecationWarning.

I left PrimerPair largely untouched -- a PrimerPair is now composed of two Oligo objects but the rest of the functionality of PrimerPair I left intact.

Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 92.67241% with 17 lines in your changes missing coverage. Please review.

Project coverage is 96.76%. Comparing base (f71dc65) to head (c0e869e).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
prymer/primer3/primer3.py 77.19% 8 Missing and 5 partials ⚠️
prymer/api/oligo.py 96.36% 1 Missing and 1 partial ⚠️
prymer/primer3/primer3_parameters.py 92.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #51      +/-   ##
==========================================
- Coverage   97.25%   96.76%   -0.49%     
==========================================
  Files          25       26       +1     
  Lines        1603     1701      +98     
  Branches      303      328      +25     
==========================================
+ Hits         1559     1646      +87     
- Misses         23       29       +6     
- Partials       21       26       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Base automatically changed from em_add_probe_params to em_feat_pick_hyb_probe September 26, 2024 18:08
@emmcauley emmcauley changed the base branch from em_feat_pick_hyb_probe to main September 26, 2024 18:25
@emmcauley emmcauley marked this pull request as ready for review September 26, 2024 18:25
@emmcauley emmcauley force-pushed the em_refactor_primer_to_oligo branch 5 times, most recently from 6b57128 to e7bf242 Compare September 26, 2024 22:50
Copy link
Member

@tfenne tfenne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're doing this we should also think about the future where we want to enable running primer3 in the mode where it will produce left and right primers and a probe all in one go.

In the examples, can you please use more realistic values. My 2c is that documentation that uses unrealistic values reduces my confidence in the documentation. Examples are to use more realistic Tms, penalty values that are >=0, sequences that are not G*20 etc.

I don't love that this PR also introduces a ton of new parameters that we weren't previously exposing for the homodimer, end stability and hairpin thresholds ... I would ideally prefer to see those in a separate PR where we can discuss them a bit more.

docs/overview.md Outdated
@@ -3,8 +3,8 @@
The `prymer` Python library is intended to be used for three main purposes:

1. [Clustering targets](#clustering-targets) into larger amplicons prior to designing primers.
2. [Designing primers](#designing-primers) (left or right) or primer pairs using Primer3 for each target from (1).
3. [Build and Picking a set of primer pairs](#build-and-picking-primer-pairs) from the designed primer pairs produced in (2).
2. [Designing oligos](#designing-primers) using Primer3 for each target from (1).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's helpful to have the documentation still refer to primers, primer pairs and probes rather than "oligos".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmcauley Let's revert this line and address more substantial documentation updates in subsequent PRs.

docs/overview.md Outdated Show resolved Hide resolved
docs/overview.md Outdated Show resolved Hide resolved
"""
# Oligo Class and Methods

This module contains a class and class methods to represent an oligo designed by Primer3.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @msto's point elsewhere, I don't think they always have to be designed by primer3. It would be helpful to think about whether we might make more values optional (or have a default value) and also to document that they don't have to come from primer3.

I could see making penalty optional, or defaulting it to 1.0 for example.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional penalty would be nice for me, but we could do that in a subsequent PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a noble but perhaps lofty goal to expand the scope of this package beyond primer3, certainly beyond the scope of this PR.

I agree we should be mindful of making decisions that are overly specific to primer3. However I'd like to be cognizant that the content of this file is largely the same as the original primer.py, and would like to limit this PR's scope to the planned refactoring (Primer -> Oligo)

prymer/api/oligo.py Outdated Show resolved Hide resolved
primer_designs: Primer3Result = Primer3Result(
filtered_designs=valid_primer_designs, failures=failures
design_candidates: Primer3Result = Primer3Result(
filtered_designs=valid_oligo_designs, failures=failures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came up elsewhere in conversation that perhaps filtered_designs should just be designs? If primer3 implemented a dinuc check, would we feel the need to call this filtered_designs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the docstring and changed the variable from filtered_designs to designs. These are filtered for out-of-spec characteristics and the corresponding failures list contains the reasons why (so I have been thinking of these designs as "filtered" because they have undergone one round of filtering for primer-specific properties like GC content, TM, etc.).

Comment on lines +122 to +131
primer_weights: Optional[PrimerAndAmpliconWeights] = None
probe_weights: Optional[ProbeWeights] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to allow default weights? I.e. does primer3 blow up if you set probe weights and then design left primers?

Copy link
Collaborator Author

@emmcauley emmcauley Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- if you allow default weights for both tasks, but do not supply both sets of params, Primer3 complains, because those weights are part of the objective scoring function, and it doesn't know how far off optimal it is when optimal is left undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find your answer confusing. I'm asking "if you provide too many weights, isn't that ok?", but the last line of your answer - "it doesn't know how far off optimal it is when optimal is left undefined." - makes it sound like you're answering "what if we didn't provide enough weights"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use your example: Primer3 blows up if you set probe weights and then design left primers. By giving ProbeWeights(), you are telling Primer3 how you want the resultant designs to be scored. But Primer3 complains because you didn't give it ProbeParameters() to score against -- that is what I meant by "optimal is undefined".

Primer3 does not know what to do with the weights you provide it in the absence of corresponding parameters. Because each set of parameters is now defaulted to None, that is the context we are working within.

Specifically, it is the PRIMER_INTERNAL_WT_GC_PERCENT_* weights that cause the problem, here is the error when you specify PRIMER_INTERNAL_WT_GC_PERCENT_LT and PRIMER_INTERNAL_WT_GC_PERCENT_GT but ask for left primers only:

PRIMER_ERROR=Hyb probe GC content is part of objective function while optimum gc_content is not defined
=
primer3_core: Hyb probe GC content is part of objective function while optimum gc_content is not defined

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this with one more thought. Would it be hard to still have these defaulted on the input, but then depending on the task, only translate the appropriate weights to tags and send them to primer3?

Comment on lines 157 to 161
# Set melting temperature thresholds to be 10 degrees less than the minimum primer tm
default_thermo_tm: float = self.primer_tms.min - 10.0
object.__setattr__(self, "primer_max_self_any_thermo", default_thermo_tm)
object.__setattr__(self, "primer_max_self_end_thermo", default_thermo_tm)
object.__setattr__(self, "primer_max_hairpin_thermo", default_thermo_tm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do this above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point in the code, you have validated other user inputs and you know that you have valid PrimerAndAmpliconParameters. It made more sense to me to set these attributes after you know that you have valid input and can continue.

prymer/primer3/primer3_parameters.py Outdated Show resolved Hide resolved
prymer/primer3/primer3_parameters.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@msto msto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tfenne @emmcauley

I'm optimistic we can get this across the line in the next day or so.

@emmcauley I agree with Tim that this should be split up into a few PRs - one to execute the grand Primer -> Oligo renaming, and additional PR(s) to introduce the functional changes (e.g. ProbeParameters, the various new parameters and attributes)

@tfenne If this becomes a purely refactoring PR, I think we can merge in without much additional review. I agree the points you've highlighted are worth addressing, but I think we'll be able to progress more quickly if we keep the scope of this PR focused on the renaming.

docs/overview.md Outdated Show resolved Hide resolved
docs/overview.md Outdated
@@ -3,8 +3,8 @@
The `prymer` Python library is intended to be used for three main purposes:

1. [Clustering targets](#clustering-targets) into larger amplicons prior to designing primers.
2. [Designing primers](#designing-primers) (left or right) or primer pairs using Primer3 for each target from (1).
3. [Build and Picking a set of primer pairs](#build-and-picking-primer-pairs) from the designed primer pairs produced in (2).
2. [Designing oligos](#designing-primers) using Primer3 for each target from (1).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmcauley Let's revert this line and address more substantial documentation updates in subsequent PRs.

"""
# Oligo Class and Methods

This module contains a class and class methods to represent an oligo designed by Primer3.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a noble but perhaps lofty goal to expand the scope of this package beyond primer3, certainly beyond the scope of this PR.

I agree we should be mindful of making decisions that are overly specific to primer3. However I'd like to be cognizant that the content of this file is largely the same as the original primer.py, and would like to limit this PR's scope to the planned refactoring (Primer -> Oligo)

prymer/api/oligo.py Show resolved Hide resolved
prymer/api/oligo.py Show resolved Hide resolved
Comment on lines 129 to 130
OligoLikeType = TypeVar("OligoLikeType", bound=OligoLike)
"""Type variable for classes generic over `OligoLike` types."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - @emmcauley could you please remove these two lines while you're in here?

@@ -14,14 +14,14 @@
```python
>>> from prymer.api.span import Span, Strand
>>> primer_span = Span(refname="chr1", start=1, end=20)
>>> primer = Primer(tm=70.0, penalty=-123.0, span=primer_span)
>>> primer = Oligo(tm=70.0, penalty=-123.0, span=primer_span)
Copy link
Collaborator

@msto msto Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - @emmcauley could you please remove the duplicate docs?

I think the complete contents of the file should look something like this:

"""This module is deprecated - see prymer/api/oligo.py"""

import warnings
from dataclasses import dataclass

from prymer.api.oligo import Oligo


@dataclass(frozen=True, init=True, slots=True)
class Primer(Oligo):
    """
    A deprecated alias for `Oligo`.
    
    This class exists to maintain backwards compatibility with earlier releases of `prymer`, and may be removed in a future version.
    """

    warnings.warn(
        "The Primer class was deprecated, use Oligo instead",
        DeprecationWarning,
        stacklevel=2,
    )

prymer/api/primer_pair.py Show resolved Hide resolved
f"target length: {design_input.target.length}, "
f"minimal probe size: {design_input.probe_params.probe_sizes.min}"
)
design_region = design_input.target
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmcauley This line caught my eye due to the lack of test coverage.

This looks like a functional change - I'd suggest backing this out to be introduced by another PR (e.g. #50 where PickHybProbeOnly is introduced).

prymer/primer3/primer3.py Show resolved Hide resolved
Comment on lines +6 to +7
2. [Designing](#designing-primers) primers (single and paired) and internal hybridization probes using Primer3 for each target from (1).
3. [Build and Picking a set of primer pairs](#build-and-picking-primer-pairs) from the design candidates produced in (2).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we converged on design() instead of design_oligos() or design_primers()

bases: Optional[str] = None
tail: Optional[str] = None

def __post_init__(self) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these methods derive from the original Primer class and are unchanged, just ported from primer.py to oligo.py

@@ -124,7 +123,3 @@ def _strand_to_location_string(self) -> str:
case _: # pragma: no cover
# Not calculating coverage on this line as it should be impossible to reach
assert_never(f"Encountered unhandled Strand value: {self.span.strand}")


PrimerLikeType = TypeVar("PrimerLikeType", bound=PrimerLike)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed per PR comment

@@ -279,17 +287,17 @@ def is_acceptable_primer_pair(primer_pair: PrimerPair, params: FilteringParams)


def build_primer_pairs(
lefts: Iterable[Primer],
rights: Iterable[Primer],
left_primers: Iterable[Oligo],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per Tim's request on OG PR, updating "lefts" and "rights" to "left_primers" and "right_primers"


```
"""
"""This module is deprecated - see prymer/api/oligo.py"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating deprecation of Primer via Matt/Tim

@@ -37,7 +37,7 @@ class methods to represent a primer pair. The primer pair is comprised of a lef
Span(refname='chr1', start=21, end=100, strand=<Strand.POSITIVE: '+'>)

>>> list(primer_pair)
[Primer(name=None, tm=70.0, penalty=-123.0, span=Span(refname='chr1', start=1, end=20, strand=<Strand.POSITIVE: '+'>), bases='GGGGGGGGGGGGGGGGGGGG', tail=None), Primer(name=None, tm=70.0, penalty=-123.0, span=Span(refname='chr1', start=101, end=120, strand=<Strand.NEGATIVE: '-'>), bases='TTTTTTTTTTTTTTTTTTTT', tail=None)]
[Oligo(name=None, tm=70.0, penalty=-123.0, span=Span(refname='chr1', start=1, end=20, strand=<Strand.POSITIVE: '+'>), bases='GGGGGGGGGGGGGGGGGGGG', tail=None), Oligo(name=None, tm=70.0, penalty=-123.0, span=Span(refname='chr1', start=101, end=120, strand=<Strand.NEGATIVE: '-'>), bases='TTTTTTTTTTTTTTTTTTTT', tail=None)]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we stripped out the thermo-related attributes, I reverted the doc-test examples

failures: ordered list of Primer3Failures detailing design failure reasons and corresponding
count
"""

filtered_designs: list[PrimerLikeType]
designs: list[OligoLikeType]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per Tim's request, updating filtered_designs to designs

@@ -308,17 +309,10 @@ def get_design_sequences(self, region: Span) -> tuple[str, str]:
hard_masked = "".join(soft_masked_list)
return soft_masked, hard_masked

@staticmethod
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method was renamed to a function has_acceptable_dinuc_run and made into a module-level function

max_amplicon_length=design_input.params.max_amplicon_length,
min_primer_length=design_input.params.min_primer_length,
)
design_region: Span
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this in, @tfenne flagging this for you because I wasn't sure if we decided to take it out.

@@ -760,3 +744,44 @@ def _create_design_region(
)

return design_region


def _check_design_results(design_input: Primer3Input, design_results: dict[str, str]) -> int:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was existing code that I re-organized into a private function because I was trying to limit the sprawl of _build_oligos

return count


def _has_acceptable_dinuc_run(design_input: Primer3Input, oligo_design: Oligo) -> bool:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same -- existing code that I re-orged into a module-level function and renamed to be more descriptive

@@ -0,0 +1,181 @@
import pytest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are new tests and the intention is to make sure that when one or both sets of Params are provided, that Primer3Input is instantiated as expected.

@emmcauley emmcauley requested a review from msto October 2, 2024 21:47
Copy link
Member

@tfenne tfenne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, but please squash and merge once you've addressed or told me why you won't address!

prymer/primer3/primer3.py Outdated Show resolved Hide resolved
Comment on lines +122 to +131
primer_weights: Optional[PrimerAndAmpliconWeights] = None
probe_weights: Optional[ProbeWeights] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this with one more thought. Would it be hard to still have these defaulted on the input, but then depending on the task, only translate the appropriate weights to tags and send them to primer3?

@emmcauley emmcauley merged commit 93ea6e7 into main Oct 7, 2024
4 of 6 checks passed
@emmcauley emmcauley deleted the em_refactor_primer_to_oligo branch October 7, 2024 21:04
msto added a commit that referenced this pull request Oct 16, 2024
So the task can be imported from `prymer.primer3` like the other tasks.

@tfenne I should have thought of this while we were discussing #51, but
do you have any thoughts on whether this task might be renamed for
consistency with the other tasks and ease of discovery? e.g.
`PickHybProbeOnlyTask`, or `DesignProbeTask`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants