Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on appropriate use of BUSTED-PH #35

Open
Emilyaoc opened this issue Feb 17, 2023 · 1 comment
Open

Question on appropriate use of BUSTED-PH #35

Emilyaoc opened this issue Feb 17, 2023 · 1 comment

Comments

@Emilyaoc
Copy link

Hello,

I'd like to ask for some help with how best to use BUSTED-PH. I am trying to test whether selection is different between genes in species according to a binary trait (CB vs PB). Example tree 1 is pasted below to demonstrate what I mean, where the foreground branches as specified {CB} and the background {PB}. There are many independent pairs of CB / PB across the tree. I understand that one option is to use BUSTED-PH specifying all the CB tips as the test group and all the PB tips as the comparison group. In which case, the line of code would be:
‘hyphy BUSTED-PH.bf --alignment alignment.fas --tree gene_tree.txt --branches CB --comparison PB --srv No’

I was wondering whether an alternative option that could be more sensitive to lineage specific differences could be to run BUSTED-PH separately for pairs of CB vs PB. So, one example would be as in example tree 2 below to get a test between Spp1 (CB) and Spp24 (PB). Then continue to do this for each of the paired comparisons. I don’t mean all possible combinations or CB vs PB, but all the actual pairs across the tree (e.g. Spp1 vs 24, Spp 6 vs 31, Spp 32 vs Spp 7 & 8). I realize this would create a multiple testing problem, but perhaps I could correct for it later? This approach would enable me to look at each set of comparisons separately (which I’d like to do). I wanted to ask your thoughts on this approach? Would it be potentially passable to perform independent BUSTED-PH tests for the different PB and CB pairs? Or is there a reason this approach would be flawed?

Thank you for your help.

Emily

Example tree 1:
((((((((((Spp27{PB}:0.1061445121,(Spp2{CB}:0.0270584277,Spp3{CB}:0.0261794982):0.0344910033):0.0035020037,(((Spp5{CB}:0.0035841027,Spp30{PB}:0.0057611215):0.0219621149,(Spp24{PB}:0.0321787049,Spp1{CB}:0.0178073773):0.0100827858):0.0352484102,((Spp4{CB}:0.0152403061,Spp28{PB}:0.0155103987):0.0038109277,Spp29{PB}:0.0079769122):0.0779856776):0.0087964965):0.001588438,(Spp26{PB}:0.0286713022,Spp25{PB}:0.0527021885):0.1245980631):0.0184857182,(Spp6{CB}:0.028444044,Spp31{PB}:0.0225514116):0.0596485678):0.0296406922,(((Spp40{PB}:0.1201390145,(Spp39{PB}:0.0548268922,(Spp16{CB}:0.0361122117,Spp17{CB}:0.0358397551):0.0246823189):0.0136822679):0.0397902583,((Spp34{PB}:0.0426910453,(Spp12{CB}:0.0185448918,Spp11{CB}:0.0237418132):0.0244439469):0.090134941,((Spp45{PB}:0.0045815149,Spp21{CB}:0.0046461625):0.0337996368,Spp46{PB}:0.0312817224):0.0816084984):0.0090666726):0.0057031342,((Spp19{CB}:0.0468786749,Spp43{PB}:0.0382526367):0.1378100849,(((Spp14{CB}:0.0111939159,Spp36{PB}:0.0137225212):0.0591854198,Spp37{PB}:0.0738589427):0.0666284331,(Spp35{PB}:0.0104790193,Spp13{CB}:0.0095368591):0.0733485724):0.0576597698):0.0074808915):0.0420763111):0.0090785591,Spp33{PB}:0.1539224942):0.0084542564,(Spp9{CB}:0.0176071397,Spp10{CB}:0.0141676561):0.1158838754):0.0150910971,((Spp47{PB}:0.0366681243,Spp22{CB}:0.0386648034):0.063511177,(Spp32{PB}:0.0189333796,(Spp7{CB}:0.0142000867,Spp8{CB}:0.0107546269):0.0084506914):0.0844803839):0.0411672233):0.2360367563,(((Spp20{CB}:0.0025737016,Spp44{PB}:0.0028672275):0.2623635034,(Spp23{CB}:0.0270701033,Spp48{PB}:0.0196880331):0.2388883853):0.0568027823,((Spp41{PB}:0.0124954283,Spp18{CB}:0.0171455802):0.0306188918,Spp42{PB}:0.0605509651):0.1576290146):0.0447594415):0.6098442627,Spp38{PB}:0.1317968626,Spp15{CB}:0.3308590102)

Example tree 2:
((((((((((Spp27:0.1061445121,(Spp2:0.0270584277,Spp3:0.0261794982):0.0344910033):0.0035020037,(((Spp5:0.0035841027,Spp30:0.0057611215):0.0219621149,(Spp24{PB}:0.0321787049,Spp1{CB}::0.0178073773):0.0100827858):0.0352484102,((Spp4:0.0152403061,Spp28:0.0155103987):0.0038109277,Spp29:0.0079769122):0.0779856776):0.0087964965):0.001588438,(Spp26:0.0286713022,Spp25:0.0527021885):0.1245980631):0.0184857182,(Spp6:0.028444044,Spp31:0.0225514116):0.0596485678):0.0296406922,(((Spp40:0.1201390145,(Spp39:0.0548268922,(Spp16:0.0361122117,Spp17:0.0358397551):0.0246823189):0.0136822679):0.0397902583,((Spp34:0.0426910453,(Spp12:0.0185448918,Spp11:0.0237418132):0.0244439469):0.090134941,((Spp45:0.0045815149,Spp21:0.0046461625):0.0337996368,Spp46:0.0312817224):0.0816084984):0.0090666726):0.0057031342,((Spp19:0.0468786749,Spp43:0.0382526367):0.1378100849,(((Spp14:0.0111939159,Spp36:0.0137225212):0.0591854198,Spp37:0.0738589427):0.0666284331,(Spp35:0.0104790193,Spp13:0.0095368591):0.0733485724):0.0576597698):0.0074808915):0.0420763111):0.0090785591,Spp33:0.1539224942):0.0084542564,(Spp9:0.0176071397,Spp10:0.0141676561):0.1158838754):0.0150910971,((Spp47:0.0366681243,Spp22:0.0386648034):0.063511177,(Spp32:0.0189333796,(Spp7:0.0142000867,Spp8:0.0107546269):0.0084506914):0.0844803839):0.0411672233):0.2360367563,(((Spp20:0.0025737016,Spp44:0.0028672275):0.2623635034,(Spp23:0.0270701033,Spp48:0.0196880331):0.2388883853):0.0568027823,((Spp41:0.0124954283,Spp18:0.0171455802):0.0306188918,Spp42:0.0605509651):0.1576290146):0.0447594415):0.6098442627,Spp38:0.1317968626,Spp15:0.3308590102)

@spond
Copy link
Member

spond commented Feb 21, 2023

Dear @Emilyaoc,

Other than multiple comparisons and loss of power due to a reduced sample size (# of branches), there are no fundamental statistical issues that I see. But the double-whammy of multiple testing corrections and few branches per test is likely to result in a big set of null results. An alternative, more positive, possibility is that by looking at smaller branch sets, you will be able to better reflect their specific selective regimes, which could be "smoothed" out to the tree average when you do the complete analysis.

I would say you should run the test on Tree 1 to see if there is anything there when you do a joint analysis and if there is, maybe explore individual comparisons.

Best,
Sergei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants