Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with RAXML v8.2.12 - number of states in partition #51

Open
iimog opened this issue Dec 28, 2023 · 5 comments
Open

Problem with RAXML v8.2.12 - number of states in partition #51

iimog opened this issue Dec 28, 2023 · 5 comments

Comments

@iimog
Copy link
Member

iimog commented Dec 28, 2023

Hello @iimog,

It seems with the newest version of RAXML, v8.2.12, there are alway problems with respect to partition file (no problem with v 8.0 or earlier):

RAxML was called as follows:

raxmlHPC-PTHREADS -f a -m GTRGAMMA -p 740281 -q /Users/jianshuzhao/Github/bcgTree/data/SAR11_SAG_try_bcgtree_full_alignment.concat.partition -s /Users/jianshuzhao/Github/bcgTree/data/SAR11_SAG_try_bcgtree_full_alignment.concat.fa -w /Users/jianshuzhao/Github/bcgTree/data/ -n SAR11_SAG_bcgtree_final -T 4 -x 28840 -N 100

Partition PF00276.16 number 1 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition PF00281.15 number 2 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00297.18 number 3 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition PF00410.15 number 7 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00416.18 number 9 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00466.16 number 10 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF01025.15 number 13 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00001 number 15 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00002 number 16 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00009 number 17 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00012 number 18 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00029 number 20 has a problem, the number of expected states is 20 the number of states that are present is 17.
Please go and fix your data!

Partition TIGR00059 number 22 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00060 number 23 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00061 number 24 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00062 number 25 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00082 number 27 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00115 number 30 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00152 number 32 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00158 number 33 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00165 number 34 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00168 number 36 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00442 number 50 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00663 number 59 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00810 number 61 has a problem, the number of expected states is 20 the number of states that are present is 17.
Please go and fix your data!

Partition TIGR00855 number 62 has a problem, the number of expected states is 20 the number of states that are present is 17.
Please go and fix your data!

Partition TIGR00952 number 64 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00959 number 65 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00964 number 67 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00981 number 69 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01021 number 73 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01024 number 74 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01031 number 76 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR01032 number 77 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01044 number 78 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01049 number 79 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR01050 number 80 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01066 number 82 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01067 number 83 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01079 number 85 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01169 number 87 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01632 number 91 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00276.16 number 1 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition PF00281.15 number 2 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00297.18 number 3 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition PF00410.15 number 7 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00416.18 number 9 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF00466.16 number 10 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition PF01025.15 number 13 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00001 number 15 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00002 number 16 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00009 number 17 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00012 number 18 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00029 number 20 has a problem, the number of expected states is 20 the number of states that are present is 17.
Please go and fix your data!

Partition TIGR00059 number 22 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00060 number 23 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00061 number 24 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00062 number 25 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00082 number 27 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00115 number 30 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00152 number 32 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00158 number 33 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00165 number 34 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00168 number 36 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00442 number 50 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00663 number 59 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00810 number 61 has a problem, the number of expected states is 20 the number of states that are present is 17.
Please go and fix your data!

Partition TIGR00855 number 62 has a problem, the number of expected states is 20 the number of states that are present is 17.
Please go and fix your data!

Partition TIGR00952 number 64 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00959 number 65 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR00964 number 67 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR00981 number 69 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01021 number 73 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01024 number 74 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01031 number 76 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR01032 number 77 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01044 number 78 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01049 number 79 has a problem, the number of expected states is 20 the number of states that are present is 18.
Please go and fix your data!

Partition TIGR01050 number 80 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01066 number 82 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01067 number 83 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01079 number 85 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01169 number 87 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Partition TIGR01632 number 91 has a problem, the number of expected states is 20 the number of states that are present is 19.
Please go and fix your data!

Segmentation fault: 11

I attached the alignment file and partition file from the java GUI:

Archive.zip

Any idea?

Thanks

Originally posted by @jianshu93 in #50 (comment)

@iimog
Copy link
Member Author

iimog commented Dec 28, 2023

Hi @jianshu93, thanks for reporting. I created this separate issue to track this. I need to look into this, once I'm back at work.

@iimog
Copy link
Member Author

iimog commented Jan 3, 2024

This issue seems to be a bit tricky. I found this discussion: https://groups.google.com/g/raxml/c/yHvXAKdk7OA

The problem is, that after alignment and trimming not all the proteins contain each of the 20 amino acids (not sure whether all of them do in the first place). As each protein is uses as a partition, a separate substitution model is estimated for each of them. If not all states are present, it is impossible to estimate a full 20x20 substitution model. The message about this is only a warning, but this can lead to numerical problems which cause the segfault.
The suggested solution is to merge partitions in order to have all 20 states. But I'm not sure which partitions we could merge. The only quick fix I can think of is to not use partitions at all but estimate the substitution model on the full combined alignment. But I'm also not sure, whether that makes sense.

@jianshu93
Copy link

jianshu93 commented Jan 3, 2024

Hello @iimog,

If we rely on iqtree2 (we can use partitioning with iqtree2), then it should be fine after some simple testing. But again iqtree2 is not very fast for just dozens of genomes (seems to be 2-5 times faster than RAxML). fasttree on the other hand, does not support partitioning, we have to concatenate all alignments without using a partitioning file as input. Overall, phylogeny from fasttree is consistent with RAxML but there are small differences especially when we want precise results. Not sure what to do. But I think provide an option for users is a good idea, they can use either method at their own risk. I vote for iqtree2 + fasttree.

Thanks,

Jianshu

@jianshu93
Copy link

Hello @iimog,

Let me know how you create the .jar file if possible, I am new to Java (e.g., not idea how to create a jar file). I will try to replace raxml with iqtree2 in the perl script. By the way, is there a way to also include the binary dependencies into the jar file, seems impossible after a quick search.

Thanks,

Jianshu

@iimog
Copy link
Member Author

iimog commented Jan 4, 2024

Hi @jianshu93,

I'm creating the jar file by opening the bcgTreeGUI folder as project in Eclipse. Running the BcgTree.java file automatically compiles the project. In order to build the jar file, I open the build.xml and click "Run tool".

As jar files are (zip) containers, it should be easy to add files to them. The hard part is probably to make bcgTree use the files from inside the jar.

Sure, go ahead and make changes to the perl script. We are happy to take pull requests for any of the open issues. I'm quite busy, so it would not be able to make any major changes in the next couple of weeks. But I'm happy to spend some time reviewing. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants