diff --git a/FAQ.md b/FAQ.md
index a5dd97b91..8ab8aafc8 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -167,8 +167,17 @@ In case you get peer to peer related errors on non-homogeneous GPU systems, set
 export NCCL_P2P_LEVEL=LOC
 ```
 
+### Other models
+
+One can choose any huggingface model, just pass the name after `--base_model=`, but a `prompt_type` is required if we don't already have support.
+E.g. for vicuna models, a typical prompt_type is used and we support that already automatically for specific models,
+but if you pass `--prompt_type=instruct_vicuna` with any other Vicuna model, we'll use it assuming that is the correct prompt type.
+See models that are currently supported in this automatic way, and the same dictionary shows which prompt types are supported: [prompter](prompter.py).
+
 ### Offline Mode:
 
+Note, when running `generate.py` and asking your first question, it will download the model(s), which for the 6.9B model takes about 15 minutes per 3 pytorch bin files if have 10MB/s download.
+
 1) Download model and tokenizer of choice
 
 ```python
@@ -223,92 +232,17 @@ templates/frontend/share.html
 HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python generate.py --base_model='h2oai/h2ogpt-oasst1-512-12b'
 ```
 
-### LangChain Usage:
+### Isolated LangChain Usage:
 
 See [tests/test_langchain_simple.py](tests/test_langchain_simple.py)
 
+### ValueError: ...offload....
 
-### MACOS
-
-* Install [Rust](https://www.geeksforgeeks.org/how-to-install-rust-in-macos/)
-```bash
-curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh
 ```
-Enter new shell and test: `rustc --version`
-
-* Mac Running Intel
-When running a Mac with Intel hardware (not M1), you may run into _clang: error: the clang compiler does not support '-march=native'_ during pip install.
-If so set your archflags during pip install. eg: _ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt_
-
-### C++ Compiler
-If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
-
-### For Windows 10/11
-To install a C++ compiler on Windows 10/11, follow these steps:
-
-1. Install Visual Studio 2022.
-2. Make sure the following components are selected:
-   * Universal Windows Platform development
-   * C++ CMake tools for Windows
-3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
-4. Run the installer and select the `gcc` component.
-
-###  ENV installation
-
-* Install, e.g. for MACOS: [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers)
-
-* Enter new shell and should also see `(base)` in prompt
-
-* Create new env:
-```bash
-conda create -n h2ogpt -y
-conda activate h2ogpt
-conda install -y mamba -c conda-forge  # for speed
-mamba install python=3.10 -c conda-forge -y
-```
-Should see `(h2ogpt)` in shell prompt.
-
-* Test python:
-```bash
-python --version
-```
-should say 3.10.xx
-```bash
-python -c 'import os, sys ; print("hello world")'
-```
-should print `hello world`.
-
-* Clone and pip install as usual:
-```
-bash
-git clone https://github.com/h2oai/h2ogpt.git
-cd h2ogpt
-pip install -r requirements.txt
-```
-
-* For non-cuda support, edit requirements_optional_langchain.txt and switch to `faiss_cpu`.
-
-* Install langchain dependencies if want to use langchain:
-```bash
-pip install -r requirements_optional_langchain.txt
-```
-and fill `user_path` path with documents to be scanned recursively.
-
-* Run:
-```bash
-python generate.py --load_8bit=True --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b --langchain_mode=MyData --user_path=user_path --score_model=None
-```
-It will download the model, which takes about 15 minutes per 3 pytorch bin files if have 10MB/s download.
-One can choose any huggingface model, just pass the name after `--base_model=`, but a prompt_type is required if we don't already have support.
-E.g. for vicuna models, a typical prompt_type is used and we support that already automatically for specific models,
-but if you pass `--prompt_type=instruct_vicuna` with any other vicuna model, we'll use it assuming that is the correct prompt type.
-See models that are currently supported in this automatic way, and the same dictionary shows which prompt types are supported: [prompter](prompter.py).
-
-* Potential Errors:
-```
-ValueError: The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers
+The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers
 the weights in this format.
 ```
+
 If you see this error, then you either have insufficient GPU memory or insufficient CPU memory.  E.g. for 6.9B model one needs minimum of 27GB free memory.
 
 ### Larger models require more GPU memory
diff --git a/INSTALL.md b/INSTALL.md
index 2b5c94c40..52326498f 100644
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -1,11 +1,10 @@
-## h2oGPT Installation
+## h2oGPT Installation Help
 
 Follow these instructions to get a working Python environment on a Linux system.
 
-### Native Installation for Training/Fine-Tuning of h2oGPT on Linux GPU Servers
-
-#### Install Python environment
+### Install Python environment
 
+For Ubuntu use Linux-x86_64 as in below, or for MACOS use [Miniconda](https://docs.conda.io/en/latest/miniconda.html#macos-installers).
 ```bash
 wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
 bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
@@ -17,19 +16,33 @@ conda install mamba -n base -c conda-forge
 conda install python=3.10 -y
 conda update -n base -c defaults conda
 ```
-
-#### Install Python packages
-
+Enter new shell and should also see `(base)` in prompt.  Then, create new env:
+```bash
+conda create -n h2ogpt -y
+conda activate h2ogpt
+conda install -y mamba -c conda-forge  # for speed
+mamba install python=3.10 -c conda-forge -y
+```
+You should see `(h2ogpt)` in shell prompt.  Test your python:
+```bash
+python --version
+```
+should say 3.10.xx and:
+```bash
+python -c 'import os, sys ; print("hello world")'
+```
+should print `hello world`.  Then clone:
 ```bash
 git clone https://github.com/h2oai/h2ogpt.git
 cd h2ogpt
-pip install -r requirements.txt
 ```
+Then go back to [README](README.md) for package installation and use of `generate.py`.
 
-#### Install CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local)
+### Installing CUDA Toolkit
 
-E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands:
+E.g. CUDA 12.1 [install cuda coolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local)
 
+E.g. for Ubuntu 20.04, select Ubuntu, Version 20.04, Installer Type "deb (local)", and you should get the following commands:
 ```bash
 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
 sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
@@ -55,11 +68,10 @@ Then reboot the machine, to get everything sync'ed up on restart.
 sudo reboot
 ```
 
-#### Compile bitsandbytes for fast 8-bit training [BitsandBytes Source](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md)
-
-This is only required if have different CUDA than built into bitsandbytes pypi package,
-which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1.  Here we compile for 12.1.
+### Compile bitsandbytes
 
+For fast 4-bit and 8-bit training, one needs bitsandbytes.  [Compiling bitsandbytes](https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md) is only required if you have different CUDA than built into bitsandbytes pypi package,
+which includes CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 12.0, 12.1.  Here we compile for 12.1 as example.
 ```bash
 git clone http://github.com/TimDettmers/bitsandbytes.git
 cd bitsandbytes
@@ -69,7 +81,7 @@ CUDA_VERSION=121 python setup.py install
 cd ..
 ```
 
-#### Install nvidia GPU manager if have multiple A100/H100s.
+### Install nvidia GPU manager if have multiple A100/H100s.
 ```bash
 sudo apt-key del 7fa2af80
 distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
@@ -83,7 +95,7 @@ dcgmi discovery -l
 ```
 See [GPU Manager](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html)
 
-#### Install and run Fabric Manager if have multiple A100/100s
+### Install and run Fabric Manager if have multiple A100/100s
 
 ```bash
 sudo apt-get install cuda-drivers-fabricmanager
@@ -120,7 +132,5 @@ Then No for symlink change, say continue (not abort), accept license, keep only
 
 If cuda 11.7 is not your base installation, then when doing pip install -r requirements.txt do instead:
 ```bash
-CUDA_HOME=/usr/local/cuda-11.7 pip install -r requirements_optional.txt
+CUDA_HOME=/usr/local/cuda-11.7 pip install -r requirements_optional_flashattention.txt
 ```
-
-Now you're ready to go back to [data prep and fine-tuning](FINETUNE.md)!
diff --git a/README.md b/README.md
index 4ceb6e249..21ccfea11 100644
--- a/README.md
+++ b/README.md
@@ -99,6 +99,13 @@ python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-512-6_9b  --load_8bit=Tr
 ```
 For more ways to ingest on CLI and control see [LangChain Readme](README_LangChain.md).
 
+For 4-bit support, the latest dev versions of transformers, accelerate, and peft are required, which can be installed by running:
+```bash
+pip uninstall peft transformers accelerate -y
+pip install -r requirements_optional_4bit.txt
+```
+where uninstall is required in case, e.g., peft was installed from GitHub previously.  Then when running generate pass `--load_4bit=True`.
+
 Any other instruct-tuned base models can be used, including non-h2oGPT ones.  [Larger models require more GPU memory](FAQ.md#larger-models-require-more-gpu-memory).
 
 #### CPU
@@ -136,6 +143,30 @@ For no langchain support (still uses LangChain package as model wrapper), run as
 python generate.py --base_model=gptj --score_model=None
 ```
 
+### MACOS
+
+All instructions are same as for GPU or CPU installation, except first install [Rust](https://www.geeksforgeeks.org/how-to-install-rust-in-macos/):
+```bash
+curl –proto ‘=https’ –tlsv1.2 -sSf https://sh.rustup.rs | sh
+```
+Enter new shell and test: `rustc --version`
+
+When running a Mac with Intel hardware (not M1), you may run into `_clang: error: the clang compiler does not support '-march=native'_` during pip install.
+If so, set your archflags during pip install. eg: `ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt`
+
+If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
+
+#### For Windows 10/11
+
+All instructions are same as for GPU or CPU installation, except also need C++ compiler by doing:
+
+1. Install Visual Studio 2022.
+2. Make sure the following components are selected:
+   * Universal Windows Platform development
+   * C++ CMake tools for Windows
+3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
+4. Run the installer and select the `gcc` component.
+
 ### CLI chat
 
 The CLI can be used instead of gradio by running for some base model, e.g.:
diff --git a/README_LangChain.md b/README_LangChain.md
index 9f0a3dc28..533113211 100644
--- a/README_LangChain.md
+++ b/README_LangChain.md
@@ -22,6 +22,13 @@ python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b --load_8bit=True --l
 ```
 See below for additional instructions to add support for some file types.
 
+To support GPU FAISS database, run:
+```bash
+grep -v '#\|peft' requirements.txt > req_constraints.txt
+pip install -r requirements_optional_faiss.txt -c req_constraints.txt
+```
+or if you have no GPUs, you can still use FAISS but comment-out the faiss-gpu line and uncomment the faiss-cpu line.
+
 ## Supported Datatypes
 
 Open-source data types are supported, .msg is not supported due to GPL-3 requirement.  Other meta types support other types inside them.  Special support for some behaviors is provided by the UI itself.
diff --git a/gpt_langchain.py b/gpt_langchain.py
index d49b64bc8..272ceee3e 100644
--- a/gpt_langchain.py
+++ b/gpt_langchain.py
@@ -37,7 +37,6 @@
     EverNoteLoader, UnstructuredEmailLoader, UnstructuredODTLoader, UnstructuredPowerPointLoader, \
     UnstructuredEPubLoader, UnstructuredImageLoader, UnstructuredRTFLoader, ArxivLoader
 from langchain.text_splitter import RecursiveCharacterTextSplitter
-from langchain.vectorstores import FAISS
 from langchain.chains.question_answering import load_qa_chain
 from langchain.docstore.document import Document
 from langchain import PromptTemplate
@@ -53,6 +52,7 @@ def get_db(sources, use_openai_embedding=False, db_type='faiss', persist_directo
 
     # Create vector database
     if db_type == 'faiss':
+        from langchain.vectorstores import FAISS
         db = FAISS.from_documents(sources, embedding)
     elif db_type == 'chroma':
         collection_name = langchain_mode.replace(' ', '_')
diff --git a/requirements_optional_4bit.txt b/requirements_optional_4bit.txt
new file mode 100644
index 000000000..240ee3c4a
--- /dev/null
+++ b/requirements_optional_4bit.txt
@@ -0,0 +1,6 @@
+# dev required for now for 4-bit training
+git+https://github.com/huggingface/accelerate.git@0226f750257b3bf2cadc4f189f9eef0c764a0467
+git+https://github.com/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2
+git+https://github.com/huggingface/transformers.git@f67dac97bdc63874f2288546b3fa87e69d2ea1c8
+#optional:
+#xformers==0.0.20
diff --git a/requirements_optional_faiss.txt b/requirements_optional_faiss.txt
new file mode 100644
index 000000000..2d6db8df2
--- /dev/null
+++ b/requirements_optional_faiss.txt
@@ -0,0 +1,3 @@
+# choose:
+#faiss-cpu
+faiss-gpu==1.7.2
diff --git a/requirements_optional.txt b/requirements_optional_flashattention.txt
similarity index 100%
rename from requirements_optional.txt
rename to requirements_optional_flashattention.txt
diff --git a/requirements_optional_langchain.txt b/requirements_optional_langchain.txt
index 9a460828d..3eb75ca6e 100644
--- a/requirements_optional_langchain.txt
+++ b/requirements_optional_langchain.txt
@@ -7,9 +7,6 @@ pypdf==3.8.1
 tiktoken==0.3.3
 # avoid textract, requires old six
 #textract==1.6.5
-# choose:
-#faiss-cpu
-faiss-gpu==1.7.2
 
 # for HF embeddings
 sentence_transformers==2.2.2
diff --git a/requirements_optional_training.txt b/requirements_optional_training.txt
index eb37c6194..95e303b61 100644
--- a/requirements_optional_training.txt
+++ b/requirements_optional_training.txt
@@ -1,5 +1 @@
 xformers==0.0.20
-# dev required for now for 4-bit training
-git+https://github.com/huggingface/accelerate.git@0226f750257b3bf2cadc4f189f9eef0c764a0467
-git+https://github.com/huggingface/peft.git@3714aa2fff158fdfa637b2b65952580801d890b2
-git+https://github.com/huggingface/transformers.git@f67dac97bdc63874f2288546b3fa87e69d2ea1c8
diff --git a/tests/test_langchain_units.py b/tests/test_langchain_units.py
index 131ef406a..dd1cec937 100644
--- a/tests/test_langchain_units.py
+++ b/tests/test_langchain_units.py
@@ -4,7 +4,7 @@
 
 import pytest
 from tests.utils import wrap_test_forked
-from utils import zip_data, download_simple, get_ngpus_vis, get_mem_gpus
+from utils import zip_data, download_simple, get_ngpus_vis, get_mem_gpus, have_faiss
 
 have_openai_key = os.environ.get('OPENAI_API_KEY') is not None
 
@@ -136,6 +136,7 @@ def test_qa_daidocs_db_chunk_hf():
     check_ret(ret)
 
 
+@pytest.mark.skipif(not have_faiss, reason="requires FAISS")
 @wrap_test_forked
 def test_qa_daidocs_db_chunk_hf_faiss():
     from gpt_langchain import _run_qa_db
diff --git a/utils.py b/utils.py
index e1c7add6c..181d08afe 100644
--- a/utils.py
+++ b/utils.py
@@ -800,3 +800,23 @@ def get_kwargs(func, exclude_names=None, **kwargs):
     assert not missing_kwargs, "Missing %s" % missing_kwargs
     kwargs = {k: v for k, v in kwargs.items() if k in func_names}
     return kwargs
+
+
+import pkg_resources
+have_faiss = False
+
+try:
+    assert pkg_resources.get_distribution('faiss') is not None
+    have_faiss = True
+except (pkg_resources.DistributionNotFound, AssertionError):
+    pass
+try:
+    assert pkg_resources.get_distribution('faiss_gpu') is not None
+    have_faiss = True
+except (pkg_resources.DistributionNotFound, AssertionError):
+    pass
+try:
+    assert pkg_resources.get_distribution('faiss_cpu') is not None
+    have_faiss = True
+except (pkg_resources.DistributionNotFound, AssertionError):
+    pass