-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what did you intend to do with drug_name.split("+")] #132
Comments
I used the re split with and it worked. |
The "+" sign is meant for indicating drug combinations, this can be done with chemCPA as demonstrated here. I will make this more clear in a future PR |
yes, I do understand the intent for the '+' in the code. my point is the code for splitting by '+' will fail for the dataset, because of drug names that already contains '+', as in (+)-3-(1-propyl-piperidin-3-yl)-phenol |
Were you able to run manual_seml_sweep.py? I keep getting random errors regarding the data. I'm trying with sciplex_complete_middle_subset.h5ad and slincs_full_smiles_sciplex_genes.h5ad. |
This is not a repo you can just download and run. There are a hundred and
one little bugs mismatches discrepancies throughout. But I worked through
the errors one by one and got it running at the end.
…On Thu, Jan 11, 2024 at 11:57 AM Sepideh ***@***.***> wrote:
Were you able to run manual_seml_sweep.py? I keep getting random errors
regarding the data. I'm trying with sciplex_complete_middle_subset.h5ad and
slincs_full_smiles_sciplex_genes.h5ad.
—
Reply to this email directly, view it on GitHub
<#132 (comment)>,
or unsubscribe
<https:/notifications/unsubscribe-auth/AAI3PUVWSOR6DRBHSX3KXTDYOA7Y5AVCNFSM6AAAAAA3QIGT3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXHA3TAMBSGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
apparently the drug names with the funny + were eliminated during preprocessing, if you were able to run through the code in the preprocessing folder. For us, that is not possible due to unposted input files. |
Hi @bhomass, why were you not able to remove those '+' signs from the drug names? What do you mean by |
Yes, I did remove the drugs with + in the names once I realized that is how
you handled these.
I posted a few times all the missing input files that show up in the code
but are not in the download links.
Let me walk through all the notebook code in the preprocessing folder one
more time and make a list of all input files that are missing,
…On Mon, Mar 4, 2024 at 4:14 AM Leon Hetzel ***@***.***> wrote:
Hi @bhomass <https:/bhomass>,
why were you not able to remove those '+' signs from the drug names? What
do you mean by unposted input files?
—
Reply to this email directly, view it on GitHub
<#132 (comment)>,
or unsubscribe
<https:/notifications/unsubscribe-auth/AAI3PUQBQJ5SHAOGCCL33KDYWRQUFAVCNFSM6AAAAAA3QIGT3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWGQ2DQOBUGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
in data.py, there is this statement
It led to the code bombing in the following line
Upon examination of the drug names, there are only 4 drug names that contains '+'
(+)-3-(1-propyl-piperidin-3-yl)-phenol
(+|-)-7-hydroxy-2-(N,N-di-n-propylamino)tetralin
flurbiprofen-(+|-)
atenolol-(+|-)
I assume the sensible thing to do would be to eliminate (+) or (+|-) and the trailing or preceding -.
But [drugs_names_unique.add(i) for i in d.split("+")] would not be doing that. It would simply leave fragments like '(' as a possible drug name.
If someone can point out if my interpretation is correct.
The comment for drug_names_to_once_canon_smiles() says
#This function will need to be rewritten to handle datasets with combinations
but I don't get what is meant by "combinations". Are there some drugs that uses '+' to combine multiple formula together, and that is why you are doing split('+'). If so, the (+) cases should be exemplified from the split processing. But I don't see that mechanism in place.
The text was updated successfully, but these errors were encountered: