diff --git a/FAQ.md b/FAQ.md index a5dd97b91..8ab8aafc8 100644 --- a/FAQ.md +++ b/FAQ.md @@ -167,8 +167,17 @@ In case you get peer to peer related errors on non-homogeneous GPU systems, set export NCCL_P2P_LEVEL=LOC ``` +### Other models + +One can choose any huggingface model, just pass the name after `--base_model=`, but a `prompt_type` is required if we don't already have support. +E.g. for vicuna models, a typical prompt_type is used and we support that already automatically for specific models, +but if you pass `--prompt_type=instruct_vicuna` with any other Vicuna model, we'll use it assuming that is the correct prompt type. +See models that are currently supported in this automatic way, and the same dictionary shows which prompt types are supported: [prompter](prompter.py). + ### Offline Mode: +Note, when running `generate.py` and asking your first question, it will download the model(s), which for the 6.9B model takes about 15 minutes per 3 pytorch bin files if have 10MB/s download. + 1) Download model and tokenizer of choice ```python @@ -223,92 +232,17 @@ templates/frontend/share.html HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python generate.py --base_model='h2oai/h2ogpt-oasst1-512-12b' ``` -### LangChain Usage: +### Isolated LangChain Usage: See [tests/test_langchain_simple.py](tests/test_langchain_simple.py) +### ValueError: ...offload.... -### MACOS - -* Install [Rust](https://www.geeksforgeeks.org/how-to-install-rust-in-macos/) -```bash -curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh ``` -Enter new shell and test: `rustc --version` - -* Mac Running Intel -When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install. -If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_ - -### C++ Compiler -If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer. - -### For Windows 10/11 -To install a C++ compiler on Windows 10/11, follow these steps: - -1. Install Visual Studio 2022. -2. Make sure the following components are selected: - * Universal Windows Platform development - * C++ CMake tools for Windows -3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/). -4. Run the installer and select the `gcc` component. - -### ENV installation - -* Install, e.g. for MACOS: [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers) - -* Enter new shell and should also see `(base)` in prompt - -* Create new env: -```bash -conda create -n h2ogpt -y -conda activate h2ogpt -conda install -y mamba -c conda-forge # for speed -mamba install python=3.10 -c conda-forge -y -``` -Should see `(h2ogpt)` in shell prompt. - -* Test python: -```bash -python --version -``` -should say 3.10.xx -```bash -python -c 'import os, sys ; print("hello world")' -``` -should print `hello world`. - -* Clone and pip install as usual: -``` -bash -git clone https://github.com/h2oai/h2ogpt.git -cd h2ogpt -pip install -r requirements.txt -``` - -* For non-cuda support, edit requirements_optional_langchain.txt and switch to `faiss_cpu`. - -* Install langchain dependencies if want to use langchain: -```bash -pip install -r requirements_optional_langchain.txt -``` -and fill `user_path` path with documents to be scanned recursively. - -* Run: -```bash -python generate.py --load_8bit=True --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b --langchain_mode=MyData --user_path=user_path --score_model=None -``` -It will download the model, which takes about 15 minutes per 3 pytorch bin files if have 10MB/s download. -One can choose any huggingface model, just pass the name after `--base_model=`, but a prompt_type is required if we don't already have support. -E.g. for vicuna models, a typical prompt_type is used and we support that already automatically for specific models, -but if you pass `--prompt_type=instruct_vicuna` with any other vicuna model, we'll use it assuming that is the correct prompt type. -See models that are currently supported in this automatic way, and the same dictionary shows which prompt types are supported: [prompter](prompter.py). - -* Potential Errors: -``` -ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers +The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format. ``` + If you see this error, then you either have insufficient GPU memory or insufficient CPU memory. E.g. for 6.9B model one needs minimum of 27GB free memory. ### Larger models require more GPU memory diff --git a/INSTALL.md b/INSTALL.md index 2b5c94c40..52326498f 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -1,11 +1,10 @@ -## h2oGPT Installation +## h2oGPT Installation Help Follow these instructions to get a working Python environment on a Linux system. -### Native Installation for Training/Fine-Tuning of h2oGPT on Linux GPU Servers - -#### Install Python environment +### Install Python environment +For Ubuntu use Linux-x86_64 as in below, or for MACOS use [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers). ```bash wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh @@ -17,19 +16,33 @@ conda install mamba -n base -c conda-forge conda install python=3.10 -y conda update -n base -c defaults conda ``` - -#### Install Python packages - +Enter new shell and should also see `(base)` in prompt. Then, create new env: +```bash +conda create -n h2ogpt -y +conda activate h2ogpt +conda install -y mamba -c conda-forge # for speed +mamba install python=3.10 -c conda-forge -y +``` +You should see `(h2ogpt)` in shell prompt. Test your python: +```bash +python --version +``` +should say 3.10.xx and: +```bash +python -c 'import os, sys ; print("hello world")' +``` +should print `hello world`. Then clone: ```bash git clone https://github.com/h2oai/h2ogpt.git cd h2ogpt -pip install -r requirements.txt ``` +Then go back to [README](README.md) for package installation and use of `generate.py`. -#### Install CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local) +### Installing CUDA Toolkit -E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands: +E.g. CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local) +E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands: ```bash wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 @@ -55,11 +68,10 @@ Then reboot the machine, to get everything sync'ed up on restart. sudo reboot ``` -#### Compile bitsandbytes for fast 8-bit training [BitsandBytes Source](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) - -This is only required if have different CUDA than built into bitsandbytes pypi package, -which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1. Here we compile for 12.1. +### Compile bitsandbytes +For fast 4-bit and 8-bit training, one needs bitsandbytes. [Compiling bitsandbytes](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) is only required if you have different CUDA than built into bitsandbytes pypi package, +which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1. Here we compile for 12.1 as example. ```bash git clone http://github.com/TimDettmers/bitsandbytes.git cd bitsandbytes @@ -69,7 +81,7 @@ CUDA_VERSION=121 python setup.py install cd .. ``` -#### Install nvidia GPU manager if have multiple A100/H100s. +### Install nvidia GPU manager if have multiple A100/H100s. ```bash sudo apt-key del 7fa2af80 distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g') @@ -83,7 +95,7 @@ dcgmi discovery -l ``` See [GPU Manager](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html) -#### Install and run Fabric Manager if have multiple A100/100s +### Install and run Fabric Manager if have multiple A100/100s ```bash sudo apt-get install cuda-drivers-fabricmanager @@ -120,7 +132,5 @@ Then No for symlink change, say continue (not abort), accept license, keep only If cuda 11.7 is not your base installation, then when doing pip install -r requirements.txt do instead: ```bash -CUDA_HOME=/usr/local/cuda-11.7 pip install -r requirements_optional.txt +CUDA_HOME=/usr/local/cuda-11.7 pip install -r requirements_optional_flashattention.txt ``` - -Now you're ready to go back to [data prep and fine-tuning](FINETUNE.md)! diff --git a/README.md b/README.md index 4ceb6e249..21ccfea11 100644 --- a/README.md +++ b/README.md @@ -99,6 +99,13 @@ python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b --load_8bit=Tr ``` For more ways to ingest on CLI and control see [LangChain Readme](README_LangChain.md). +For 4-bit support, the latest dev versions of transformers, accelerate, and peft are required, which can be installed by running: +```bash +pip uninstall peft transformers accelerate -y +pip install -r requirements_optional_4bit.txt +``` +where uninstall is required in case, e.g., peft was installed from GitHub previously. Then when running generate pass `--load_4bit=True`. + Any other instruct-tuned base models can be used, including non-h2oGPT ones. [Larger models require more GPU memory](FAQ.md#larger-models-require-more-gpu-memory). #### CPU @@ -136,6 +143,30 @@ For no langchain support (still uses LangChain package as model wrapper), run as python generate.py --base_model=gptj --score_model=None ``` +### MACOS + +All instructions are same as for GPU or CPU installation, except first install [Rust](https://www.geeksforgeeks.org/how-to-install-rust-in-macos/): +```bash +curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh +``` +Enter new shell and test: `rustc --version` + +When running a Mac with Intel hardware (not M1), you may run into `_clang: error: the clang compiler does not support '-march=native'_` during pip install. +If so, set your archflags during pip install. eg: `ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt` + +If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer. + +#### For Windows 10/11 + +All instructions are same as for GPU or CPU installation, except also need C++ compiler by doing: + +1. Install Visual Studio 2022. +2. Make sure the following components are selected: + * Universal Windows Platform development + * C++ CMake tools for Windows +3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/). +4. Run the installer and select the `gcc` component. + ### CLI chat The CLI can be used instead of gradio by running for some base model, e.g.: diff --git a/README_LangChain.md b/README_LangChain.md index 9f0a3dc28..533113211 100644 --- a/README_LangChain.md +++ b/README_LangChain.md @@ -22,6 +22,13 @@ python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b --load_8bit=True --l ``` See below for additional instructions to add support for some file types. +To support GPU FAISS database, run: +```bash +grep -v '#\|peft' requirements.txt > req_constraints.txt +pip install -r requirements_optional_faiss.txt -c req_constraints.txt +``` +or if you have no GPUs, you can still use FAISS but comment-out the faiss-gpu line and uncomment the faiss-cpu line. + ## Supported Datatypes Open-source data types are supported, .msg is not supported due to GPL-3 requirement. Other meta types support other types inside them. Special support for some behaviors is provided by the UI itself. diff --git a/gpt_langchain.py b/gpt_langchain.py index d49b64bc8..272ceee3e 100644 --- a/gpt_langchain.py +++ b/gpt_langchain.py @@ -37,7 +37,6 @@ EverNoteLoader, UnstructuredEmailLoader, UnstructuredODTLoader, UnstructuredPowerPointLoader, \ UnstructuredEPubLoader, UnstructuredImageLoader, UnstructuredRTFLoader, ArxivLoader from langchain.text_splitter import RecursiveCharacterTextSplitter -from langchain.vectorstores import FAISS from langchain.chains.question_answering import load_qa_chain from langchain.docstore.document import Document from langchain import PromptTemplate @@ -53,6 +52,7 @@ def get_db(sources, use_openai_embedding=False, db_type='faiss', persist_directo # Create vector database if db_type == 'faiss': + from langchain.vectorstores import FAISS db = FAISS.from_documents(sources, embedding) elif db_type == 'chroma': collection_name = langchain_mode.replace(' ', '_') diff --git a/requirements_optional_4bit.txt b/requirements_optional_4bit.txt new file mode 100644 index 000000000..240ee3c4a --- /dev/null +++ b/requirements_optional_4bit.txt @@ -0,0 +1,6 @@ +# dev required for now for 4-bit training +git+https://github.com/huggingface/accelerate.git@0226f750257b3bf2cadc4f189f9eef0c764a0467 +git+https://github.com/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2 +git+https://github.com/huggingface/transformers.git@f67dac97bdc63874f2288546b3fa87e69d2ea1c8 +#optional: +#xformers==0.0.20 diff --git a/requirements_optional_faiss.txt b/requirements_optional_faiss.txt new file mode 100644 index 000000000..2d6db8df2 --- /dev/null +++ b/requirements_optional_faiss.txt @@ -0,0 +1,3 @@ +# choose: +#faiss-cpu +faiss-gpu==1.7.2 diff --git a/requirements_optional.txt b/requirements_optional_flashattention.txt similarity index 100% rename from requirements_optional.txt rename to requirements_optional_flashattention.txt diff --git a/requirements_optional_langchain.txt b/requirements_optional_langchain.txt index 9a460828d..3eb75ca6e 100644 --- a/requirements_optional_langchain.txt +++ b/requirements_optional_langchain.txt @@ -7,9 +7,6 @@ pypdf==3.8.1 tiktoken==0.3.3 # avoid textract, requires old six #textract==1.6.5 -# choose: -#faiss-cpu -faiss-gpu==1.7.2 # for HF embeddings sentence_transformers==2.2.2 diff --git a/requirements_optional_training.txt b/requirements_optional_training.txt index eb37c6194..95e303b61 100644 --- a/requirements_optional_training.txt +++ b/requirements_optional_training.txt @@ -1,5 +1 @@ xformers==0.0.20 -# dev required for now for 4-bit training -git+https://github.com/huggingface/accelerate.git@0226f750257b3bf2cadc4f189f9eef0c764a0467 -git+https://github.com/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2 -git+https://github.com/huggingface/transformers.git@f67dac97bdc63874f2288546b3fa87e69d2ea1c8 diff --git a/tests/test_langchain_units.py b/tests/test_langchain_units.py index 131ef406a..dd1cec937 100644 --- a/tests/test_langchain_units.py +++ b/tests/test_langchain_units.py @@ -4,7 +4,7 @@ import pytest from tests.utils import wrap_test_forked -from utils import zip_data, download_simple, get_ngpus_vis, get_mem_gpus +from utils import zip_data, download_simple, get_ngpus_vis, get_mem_gpus, have_faiss have_openai_key = os.environ.get('OPENAI_API_KEY') is not None @@ -136,6 +136,7 @@ def test_qa_daidocs_db_chunk_hf(): check_ret(ret) +@pytest.mark.skipif(not have_faiss, reason="requires FAISS") @wrap_test_forked def test_qa_daidocs_db_chunk_hf_faiss(): from gpt_langchain import _run_qa_db diff --git a/utils.py b/utils.py index e1c7add6c..181d08afe 100644 --- a/utils.py +++ b/utils.py @@ -800,3 +800,23 @@ def get_kwargs(func, exclude_names=None, **kwargs): assert not missing_kwargs, "Missing %s" % missing_kwargs kwargs = {k: v for k, v in kwargs.items() if k in func_names} return kwargs + + +import pkg_resources +have_faiss = False + +try: + assert pkg_resources.get_distribution('faiss') is not None + have_faiss = True +except (pkg_resources.DistributionNotFound, AssertionError): + pass +try: + assert pkg_resources.get_distribution('faiss_gpu') is not None + have_faiss = True +except (pkg_resources.DistributionNotFound, AssertionError): + pass +try: + assert pkg_resources.get_distribution('faiss_cpu') is not None + have_faiss = True +except (pkg_resources.DistributionNotFound, AssertionError): + pass