Skip to content
This repository has been archived by the owner on May 25, 2024. It is now read-only.

install google-nucleus error #43

Open
one-matrix opened this issue May 10, 2023 · 14 comments
Open

install google-nucleus error #43

one-matrix opened this issue May 10, 2023 · 14 comments

Comments

@one-matrix
Copy link

when i pip install the nucleus,it throw the error,i have try version 0.5.6 to latest,the same error....

python 3.9
tensorflow 2.9.1
ubuntu 20

pip install --user google-nucleus

`
import pkg_resources
This package does not support wheel creation.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for google-nucleus
Running setup.py clean for google-nucleus
Failed to build google-nucleus
ERROR: Could not build wheels for google-nucleus, which is required to install pyproject.toml-based projects

`

@danielecook
Copy link
Contributor

@one-matrix which version of pip are you using?

pip --version

@one-matrix
Copy link
Author

@danielecook it is pip 23.1.2. I tried python 3.7, not work,Don't know how to solve it

@pgrosu
Copy link

pgrosu commented May 11, 2023

@one-matrix I think you need Python 3.8 based on the following:

Classifier: Programming Language :: Python :: 3.8

@one-matrix
Copy link
Author

Snipaste_2023-05-11_13-44-39
Snipaste_2023-05-11_13-46-05
I used python 3.8 locally, the same error and you can use colab like that

@pgrosu
Copy link

pgrosu commented May 11, 2023

@one-matrix I see you're trying to install google-nucleus 0.5.6, but now the current version is 0.6.0:

https://pypi.org/project/google-nucleus/

Also the error you're seeing is saying that there is a newer version of tensorflow than the one you've specified that you can pick from: 2.8.0rc0, 2.8.0rc1, 2.8.0, etc.

Just to isolate the errors, I would first start with trying to pip install version 0.6.0 of google-nucleus and then see what is the next error afterwards that you get.

@one-matrix
Copy link
Author

@pgrosu hi,pgrosu .
Maybe you can go to the link below and try it out with "pip install google-nucleus". I want to fix this mistake.
“ERROR: Could not build wheels for google-nucleus, which is required to install pyproject.toml-based projects”
This version 0.6.0, has the same errors as 0.56.This link is also official.
https://colab.research.google.com/github/google/nucleus/blob/master/nucleus/examples/dna_sequencing_error_correction.ipynb

@pgrosu
Copy link

pgrosu commented May 11, 2023

@one-matrix If you replace it with the following it should work:

!pip download google-nucleus 
!tar xzf google_nucleus-0.6.0.tar.gz
%cd google_nucleus-0.6.0/
%rm -rf /usr/local/lib/python3.10/dist-packages/google_nucleus.egg-info
!python setup.py clean
!python setup.py install 
!pip install -q tensorflow==2.9.1

In the Python code you would need to also add the following after the import numpy as np:

import collections.abc
collections.MutableMapping = collections.abc.MutableMapping
collections.MutableSequence = collections.abc.MutableSequence

But there are other issues with the code. A bit swamped now, but will try to look at it when I get a bit of free time -- though happy if others jump in as well.

~p

@pgrosu
Copy link

pgrosu commented May 12, 2023

Hi @one-matrix,

So here's the simplest way I was able to get it to run.

  1. Replace the whole !pip install ... with the following code and execute it:
!wget https://bootstrap.pypa.io/get-pip.py
!python3.8 get-pip.py
!python3.8 -m pip install --upgrade pip
!python3.8 -m pip download google-nucleus 
!tar xzf google_nucleus-0.6.0.tar.gz
%cd google_nucleus-0.6.0/
!python3.8 setup.py clean
!python3.8 setup.py install 
!python3.8 -m pip install -r ./nucleus/pip_package/egg_files/requires.txt
!python3.8 -m pip install -q tensorflow==2.8.0
!python3.8 -m pip install protobuf==3.20.1

You will see as shown below that it installs Google Nucleus successfully:

image

  1. Now to run the code. For that you will just need to wrap it by adding a couple of extra lines at the beginning and end of each section, as I got it to run under Python 3.8. The following is the procedure on how to do this:
import subprocess

nucleus_code = """

  ...YOUR NUCLEUS CODE...

"""

nucleus_result = subprocess.run(['python3.8'], input=nucleus_code, capture_output=True, encoding='UTF-8')
print(nucleus_result.stdout)

Basically leave the code as is and add the top and bottom part. Below is a screenshot showing that it ran successfully:

image

Hope it makes sense, and feel free to let me know if you have any other questions.

Hope it helps,
Paul

@pgrosu
Copy link

pgrosu commented May 12, 2023

@one-matrix One more small thing, you will need to move tensorflow imports above the nucleus imports to not get an error. Below is the code:

import subprocess

nucleus_code = """

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import random

import numpy as np

# Import TensorFlow after Nucleus.
import tensorflow as tf
from tensorflow.keras import layers

from nucleus.io import fasta
from nucleus.io import sam
from nucleus.io import vcf
from nucleus.io.genomics_writer import TFRecordWriter
from nucleus.protos import reads_pb2
from nucleus.util import cigar
from nucleus.util import ranges
from nucleus.util import utils

# Import TensorFlow after Nucleus.
#import tensorflow as tf
#from tensorflow.keras import layers

"""

nucleus_result = subprocess.run(['python3.8'], input=nucleus_code, capture_output=True, encoding='UTF-8')
print(nucleus_result.stderr)

Below is the standard error output showing a clean run:

image

@one-matrix
Copy link
Author

@pgrosu Thank you very much Paul, although it is a little strange to use, but it can work.
Nucleus are important and useful, what is the plan for nucleus in the future , I notice that it has not been updated for a long time. There are many similarities with "pysam:https:/pysam-developers/pysam"

@pgrosu
Copy link

pgrosu commented May 13, 2023

@one-matrix Well, with one variation to pysam. Nucleus can create TensorFlow records of your data opening the world of machine learning to genomics. Basically think of different sets of genomic variations representing collections of varying language dialects. Then you can transform those into "dialect" models, such as "disease" dialects in a clinical setting. You can even ask a larger question, such as given a collection of library of books written in different dialects, what might have been the original book that started it all -- that would be your consensus dialect. All these models can even then help you with filling in missing data as well. But that's only the beginning, as there are so many ways to go from there. Regarding the roadmap for Nucleus that would something Google folks would know more than me, as I'm just dropping by at times helping out here and there.

@one-matrix
Copy link
Author

@pgrosu I agree with you very much, there are too many gene data formats at present, such as SAM/BAM/VCF/BCF, need to spend a lot of time on data preprocessing, tf records as a kind of intermediate data storage format, it is convenient to develop artificial intelligence models later.thank you for your contributions on it

@Tharindu-Nirmal
Copy link

Tharindu-Nirmal commented Apr 30, 2024

run_error
@pgrosu I followed your advise and tried to run the official notebook: from google. However, trying with python 3.8 gave distutils errors. I continued with python 3.10 (default for colab). However, the final run( ) command in the final cell throws an error:
RuntimeError: PythonNext() argument read is not valid: Dynamic cast failed

Hope someone could help me run the notebook.

@pgrosu
Copy link

pgrosu commented May 1, 2024

Hi @Tharindu-Nirmal,

It's possible to do, but really complex to install and properly configure given the new version of the CoLab environment. So the current Google CoLab now runs on Ubuntu 22.04 with Python 3.10.12, which is not what 0.6.0 of Nucleus is built on (Ubuntu 20.04 with Python 3.8). Basically a new version of Nucleus would need to be updated with that Ubuntu/Python version environment in mind. The easier alternative is to install Docker within the CoLab, and then to pull the Nucleus image using the following steps via the CoLab commands shown below:

!apt-get update
!apt-get install ca-certificates curl
!install -m 0755 -d /etc/apt/keyrings
!curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
!chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
!echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
!apt-get update

!apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
!dockerd -b none --iptables=0 -l warn &

!docker pull yijun/nucleus:py3

I have not tried this specific image, and I think this is an older version of the Dockerized version of Nucleus. At least it will provide something to try it out. If you find a newer one, feel free to post it here so others can try it out.

Hope it helps,
Paul

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants