Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out FAISS from requirements #184

Merged
merged 3 commits into from
May 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 13 additions & 79 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,8 +167,17 @@ In case you get peer to peer related errors on non-homogeneous GPU systems, set
export NCCL_P2P_LEVEL=LOC
```

### Other models

One can choose any huggingface model, just pass the name after `--base_model=`, but a `prompt_type` is required if we don't already have support.
E.g. for vicuna models, a typical prompt_type is used and we support that already automatically for specific models,
but if you pass `--prompt_type=instruct_vicuna` with any other Vicuna model, we'll use it assuming that is the correct prompt type.
See models that are currently supported in this automatic way, and the same dictionary shows which prompt types are supported: [prompter](prompter.py).

### Offline Mode:

Note, when running `generate.py` and asking your first question, it will download the model(s), which for the 6.9B model takes about 15 minutes per 3 pytorch bin files if have 10MB/s download.

1) Download model and tokenizer of choice

```python
Expand Down Expand Up @@ -223,92 +232,17 @@ templates/frontend/share.html
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python generate.py --base_model='h2oai/h2ogpt-oasst1-512-12b'
```

### LangChain Usage:
### Isolated LangChain Usage:

See [tests/test_langchain_simple.py](tests/test_langchain_simple.py)

### ValueError: ...offload....

### MACOS

* Install [Rust](https://www.geeksforgeeks.org/how-to-install-rust-in-macos/)
```bash
curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh
```
Enter new shell and test: `rustc --version`

* Mac Running Intel
When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install.
If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_

### C++ Compiler
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.

### For Windows 10/11
To install a C++ compiler on Windows 10/11, follow these steps:

1. Install Visual Studio 2022.
2. Make sure the following components are selected:
* Universal Windows Platform development
* C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the `gcc` component.

### ENV installation

* Install, e.g. for MACOS: [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers)

* Enter new shell and should also see `(base)` in prompt

* Create new env:
```bash
conda create -n h2ogpt -y
conda activate h2ogpt
conda install -y mamba -c conda-forge # for speed
mamba install python=3.10 -c conda-forge -y
```
Should see `(h2ogpt)` in shell prompt.

* Test python:
```bash
python --version
```
should say 3.10.xx
```bash
python -c 'import os, sys ; print("hello world")'
```
should print `hello world`.

* Clone and pip install as usual:
```
bash
git clone https:/h2oai/h2ogpt.git
cd h2ogpt
pip install -r requirements.txt
```

* For non-cuda support, edit requirements_optional_langchain.txt and switch to `faiss_cpu`.

* Install langchain dependencies if want to use langchain:
```bash
pip install -r requirements_optional_langchain.txt
```
and fill `user_path` path with documents to be scanned recursively.

* Run:
```bash
python generate.py --load_8bit=True --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b --langchain_mode=MyData --user_path=user_path --score_model=None
```
It will download the model, which takes about 15 minutes per 3 pytorch bin files if have 10MB/s download.
One can choose any huggingface model, just pass the name after `--base_model=`, but a prompt_type is required if we don't already have support.
E.g. for vicuna models, a typical prompt_type is used and we support that already automatically for specific models,
but if you pass `--prompt_type=instruct_vicuna` with any other vicuna model, we'll use it assuming that is the correct prompt type.
See models that are currently supported in this automatic way, and the same dictionary shows which prompt types are supported: [prompter](prompter.py).

* Potential Errors:
```
ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers
The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers
the weights in this format.
```

If you see this error, then you either have insufficient GPU memory or insufficient CPU memory. E.g. for 6.9B model one needs minimum of 27GB free memory.

### Larger models require more GPU memory
Expand Down
48 changes: 29 additions & 19 deletions INSTALL.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
## h2oGPT Installation
## h2oGPT Installation Help

Follow these instructions to get a working Python environment on a Linux system.

### Native Installation for Training/Fine-Tuning of h2oGPT on Linux GPU Servers

#### Install Python environment
### Install Python environment

For Ubuntu use Linux-x86_64 as in below, or for MACOS use [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers).
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
Expand All @@ -17,19 +16,33 @@ conda install mamba -n base -c conda-forge
conda install python=3.10 -y
conda update -n base -c defaults conda
```

#### Install Python packages

Enter new shell and should also see `(base)` in prompt. Then, create new env:
```bash
conda create -n h2ogpt -y
conda activate h2ogpt
conda install -y mamba -c conda-forge # for speed
mamba install python=3.10 -c conda-forge -y
```
You should see `(h2ogpt)` in shell prompt. Test your python:
```bash
python --version
```
should say 3.10.xx and:
```bash
python -c 'import os, sys ; print("hello world")'
```
should print `hello world`. Then clone:
```bash
git clone https:/h2oai/h2ogpt.git
cd h2ogpt
pip install -r requirements.txt
```
Then go back to [README](README.md) for package installation and use of `generate.py`.

#### Install CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local)
### Installing CUDA Toolkit

E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands:
E.g. CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local)

E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands:
```bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
Expand All @@ -55,11 +68,10 @@ Then reboot the machine, to get everything sync'ed up on restart.
sudo reboot
```

#### Compile bitsandbytes for fast 8-bit training [BitsandBytes Source](https:/TimDettmers/bitsandbytes/blob/main/compile_from_source.md)

This is only required if have different CUDA than built into bitsandbytes pypi package,
which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1. Here we compile for 12.1.
### Compile bitsandbytes

For fast 4-bit and 8-bit training, one needs bitsandbytes. [Compiling bitsandbytes](https:/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) is only required if you have different CUDA than built into bitsandbytes pypi package,
which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1. Here we compile for 12.1 as example.
```bash
git clone http:/TimDettmers/bitsandbytes.git
cd bitsandbytes
Expand All @@ -69,7 +81,7 @@ CUDA_VERSION=121 python setup.py install
cd ..
```

#### Install nvidia GPU manager if have multiple A100/H100s.
### Install nvidia GPU manager if have multiple A100/H100s.
```bash
sudo apt-key del 7fa2af80
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
Expand All @@ -83,7 +95,7 @@ dcgmi discovery -l
```
See [GPU Manager](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html)

#### Install and run Fabric Manager if have multiple A100/100s
### Install and run Fabric Manager if have multiple A100/100s

```bash
sudo apt-get install cuda-drivers-fabricmanager
Expand Down Expand Up @@ -120,7 +132,5 @@ Then No for symlink change, say continue (not abort), accept license, keep only

If cuda 11.7 is not your base installation, then when doing pip install -r requirements.txt do instead:
```bash
CUDA_HOME=/usr/local/cuda-11.7 pip install -r requirements_optional.txt
CUDA_HOME=/usr/local/cuda-11.7 pip install -r requirements_optional_flashattention.txt
```

Now you're ready to go back to [data prep and fine-tuning](FINETUNE.md)!
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,13 @@ python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b --load_8bit=Tr
```
For more ways to ingest on CLI and control see [LangChain Readme](README_LangChain.md).

For 4-bit support, the latest dev versions of transformers, accelerate, and peft are required, which can be installed by running:
```bash
pip uninstall peft transformers accelerate -y
pip install -r requirements_optional_4bit.txt
```
where uninstall is required in case, e.g., peft was installed from GitHub previously. Then when running generate pass `--load_4bit=True`.

Any other instruct-tuned base models can be used, including non-h2oGPT ones. [Larger models require more GPU memory](FAQ.md#larger-models-require-more-gpu-memory).

#### CPU
Expand Down Expand Up @@ -136,6 +143,30 @@ For no langchain support (still uses LangChain package as model wrapper), run as
python generate.py --base_model=gptj --score_model=None
```

### MACOS

All instructions are same as for GPU or CPU installation, except first install [Rust](https://www.geeksforgeeks.org/how-to-install-rust-in-macos/):
```bash
curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh
```
Enter new shell and test: `rustc --version`

When running a Mac with Intel hardware (not M1), you may run into `_clang: error: the clang compiler does not support '-march=native'_` during pip install.
If so, set your archflags during pip install. eg: `ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt`

If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.

#### For Windows 10/11

All instructions are same as for GPU or CPU installation, except also need C++ compiler by doing:

1. Install Visual Studio 2022.
2. Make sure the following components are selected:
* Universal Windows Platform development
* C++ CMake tools for Windows
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
4. Run the installer and select the `gcc` component.

### CLI chat

The CLI can be used instead of gradio by running for some base model, e.g.:
Expand Down
7 changes: 7 additions & 0 deletions README_LangChain.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b --load_8bit=True --l
```
See below for additional instructions to add support for some file types.

To support GPU FAISS database, run:
```bash
grep -v '#\|peft' requirements.txt > req_constraints.txt
pip install -r requirements_optional_faiss.txt -c req_constraints.txt
```
or if you have no GPUs, you can still use FAISS but comment-out the faiss-gpu line and uncomment the faiss-cpu line.

## Supported Datatypes

Open-source data types are supported, .msg is not supported due to GPL-3 requirement. Other meta types support other types inside them. Special support for some behaviors is provided by the UI itself.
Expand Down
2 changes: 1 addition & 1 deletion gpt_langchain.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@
EverNoteLoader, UnstructuredEmailLoader, UnstructuredODTLoader, UnstructuredPowerPointLoader, \
UnstructuredEPubLoader, UnstructuredImageLoader, UnstructuredRTFLoader, ArxivLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.docstore.document import Document
from langchain import PromptTemplate
Expand All @@ -53,6 +52,7 @@ def get_db(sources, use_openai_embedding=False, db_type='faiss', persist_directo

# Create vector database
if db_type == 'faiss':
from langchain.vectorstores import FAISS
db = FAISS.from_documents(sources, embedding)
elif db_type == 'chroma':
collection_name = langchain_mode.replace(' ', '_')
Expand Down
6 changes: 6 additions & 0 deletions requirements_optional_4bit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# dev required for now for 4-bit training
git+https:/huggingface/accelerate.git@0226f750257b3bf2cadc4f189f9eef0c764a0467
git+https:/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2
git+https:/huggingface/transformers.git@f67dac97bdc63874f2288546b3fa87e69d2ea1c8
#optional:
#xformers==0.0.20
3 changes: 3 additions & 0 deletions requirements_optional_faiss.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# choose:
#faiss-cpu
faiss-gpu==1.7.2
File renamed without changes.
3 changes: 0 additions & 3 deletions requirements_optional_langchain.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ pypdf==3.8.1
tiktoken==0.3.3
# avoid textract, requires old six
#textract==1.6.5
# choose:
#faiss-cpu
faiss-gpu==1.7.2

# for HF embeddings
sentence_transformers==2.2.2
Expand Down
4 changes: 0 additions & 4 deletions requirements_optional_training.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1 @@
xformers==0.0.20
# dev required for now for 4-bit training
git+https:/huggingface/accelerate.git@0226f750257b3bf2cadc4f189f9eef0c764a0467
git+https:/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2
git+https:/huggingface/transformers.git@f67dac97bdc63874f2288546b3fa87e69d2ea1c8
3 changes: 2 additions & 1 deletion tests/test_langchain_units.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import pytest
from tests.utils import wrap_test_forked
from utils import zip_data, download_simple, get_ngpus_vis, get_mem_gpus
from utils import zip_data, download_simple, get_ngpus_vis, get_mem_gpus, have_faiss

have_openai_key = os.environ.get('OPENAI_API_KEY') is not None

Expand Down Expand Up @@ -136,6 +136,7 @@ def test_qa_daidocs_db_chunk_hf():
check_ret(ret)


@pytest.mark.skipif(not have_faiss, reason="requires FAISS")
@wrap_test_forked
def test_qa_daidocs_db_chunk_hf_faiss():
from gpt_langchain import _run_qa_db
Expand Down
20 changes: 20 additions & 0 deletions utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -800,3 +800,23 @@ def get_kwargs(func, exclude_names=None, **kwargs):
assert not missing_kwargs, "Missing %s" % missing_kwargs
kwargs = {k: v for k, v in kwargs.items() if k in func_names}
return kwargs


import pkg_resources
have_faiss = False

try:
assert pkg_resources.get_distribution('faiss') is not None
have_faiss = True
except (pkg_resources.DistributionNotFound, AssertionError):
pass
try:
assert pkg_resources.get_distribution('faiss_gpu') is not None
have_faiss = True
except (pkg_resources.DistributionNotFound, AssertionError):
pass
try:
assert pkg_resources.get_distribution('faiss_cpu') is not None
have_faiss = True
except (pkg_resources.DistributionNotFound, AssertionError):
pass