Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add findfirst method for BioSymbol #166

Merged
merged 2 commits into from
Jul 24, 2021
Merged

Conversation

jakobnissen
Copy link
Member

See #141

This PR adds findfirst and findlast methods of the type findfirst(::BioSymbol, ::BioSequence). How it currently works is:

  • findfirst(::BioSymbol, ::BioSequence) allows for compatible but different symbols, i.e. dna"TACNA" has a potential G at position 4. This is the new thing in the PR.
  • findfirst(::BioSequence, ::BioSequence) likewise allows compatible symbols, just like before.
  • To get literal match against a symbol, do findfirst(isequal(DNA_A), dna"GGA")
  • There is currently no way to search for literal matches between two biological sequences. Perhaps there should be, but I'm not sure what the syntax should be.

Issues / things to mull over:

@codecov
Copy link

codecov bot commented Jun 24, 2021

Codecov Report

Merging #166 (e7cec9d) into v3 (87f806c) will increase coverage by 0.03%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##               v3     #166      +/-   ##
==========================================
+ Coverage   82.14%   82.17%   +0.03%     
==========================================
  Files          31       31              
  Lines        2156     2171      +15     
==========================================
+ Hits         1771     1784      +13     
- Misses        385      387       +2     
Flag Coverage Δ
unittests 82.17% <85.71%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/search/exact.jl 91.96% <85.71%> (-0.90%) ⬇️
src/BioSequences.jl 0.00% <0.00%> (ø)
src/longsequences/longsequence.jl 100.00% <0.00%> (ø)
src/biosequence/predicates.jl 98.07% <0.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87f806c...e7cec9d. Read the comment docs.

@TransGirlCodes
Copy link
Member

TransGirlCodes commented Jul 17, 2021

I think this is good.

As for your question as to a non-compatibility version of findfirst(::BioSequence, ::BioSequence), in the future.

Have you considered having the method accept an optional keyword that toggles the behaviour between finding the first compatible subsequence, and the first exact subsequence? Maybe say if strict = true then the symbols subsequence must be exact, not just compatible.

@jakobnissen jakobnissen merged commit f1fdcdc into BioJulia:v3 Jul 24, 2021
@jakobnissen
Copy link
Member Author

Yes, that sounds like a good idea. I'll see if I can make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants