Offline Mode:

Note, when running generate.py and asking your first question, it will download the model(s), which for the 6.9B model takes about 15 minutes per 3 pytorch bin files if have 10MB/s download.

If all data has been put into ~/.cache by HF transformers, then these following steps (those related to downloading HF models) are not required.

Download model and tokenizer of choice

from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'h2oai/h2ogpt-oasst1-512-12b'
model = AutoModelForCausalLM.from_pretrained(model_name)
model.save_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained(model_name)

Download reward model, unless pass --score_model='None' to generate.py

# and reward model
reward_model = 'OpenAssistant/reward-model-deberta-v3-large-v2'
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained(reward_model)
model.save_pretrained(reward_model)
tokenizer = AutoTokenizer.from_pretrained(reward_model)
tokenizer.save_pretrained(reward_model)

For LangChain support, download embedding model:

hf_embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
model_kwargs = 'cpu'
from langchain.embeddings import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name=hf_embedding_model, model_kwargs=model_kwargs)

For HF inference server and OpenAI, this downloads the tokenizers used for Hugging Face text generation inference server and gpt-3.5-turbo:

import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

Run generate with transformers in Offline Mode
```
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python generate.py --base_model='h2oai/h2ogpt-oasst1-512-12b' --gradio_offline_level=2 --share=False
```
Some code is always disabled that involves uploads out of user control: Huggingface telemetry, gradio telemetry, chromadb posthog.

The additional option --gradio_offline_level=2 changes fonts to avoid download of google fonts. This option disables google fonts for downloading, which is less intrusive than uploading, but still required in air-gapped case. The fonts don't look as nice as google fonts, but ensure full offline behavior.

If the front-end can still access internet, but just backend should not, then one can use --gradio_offline_level=1 for slightly better-looking fonts.

Note that gradio attempts to download iframeResizer.contentWindow.min.js, but nothing prevents gradio from working without this. So a simple firewall block is sufficient. For more details, see: AUTOMATIC1111/stable-diffusion-webui#10324.

Disable access or port

To ensure nobody can access your gradio server, disable the port via firewall. If that is a hassle, then one can enable authentication by adding to CLI when running python generate.py:
```
--auth=[('jon','password')]
```
with no spaces. Run python generate.py --help for more details.
To fully disable Chroma telemetry, which documented options still do not disable, run:
```
sp=`python -c 'import site; print(site.getsitepackages()[0])'`
sed -i 's/posthog\.capture/return\n            posthog.capture/' $sp/chromadb/telemetry/posthog.py
```
or the equivalent for windows/mac using. Or edit the file manually to just return in the capture function.
To avoid h2oGPT monitoring which elements are clicked in UI, set the ENV H2OGPT_ENABLE_HEAP_ANALYTICS=False pass --enable-heap-analytics=False to generate.py. Note that no data or user inputs are included, only raw svelte UI element IDs and nothing from the user inputs or data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_offline.md

README_offline.md

Offline Mode:

Files

README_offline.md

Latest commit

History

README_offline.md

File metadata and controls

Offline Mode: