Skip to content

Commit

Permalink
Merge pull request #34 from ai-nikolai/fix/aarch64_mac
Browse files Browse the repository at this point in the history
Installation Fixes - arm-based Mac (M1, M2, M3); faiss-cpu; Werkzeug (Flask), spacy
  • Loading branch information
ysymyth authored Aug 28, 2024
2 parents a557a20 + d149d0d commit 17dbef5
Show file tree
Hide file tree
Showing 5 changed files with 183 additions and 1 deletion.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,6 @@ search_engine/indexes*
search_engine/resources*
transfer/flagged
user_session_logs/


*_err_*.log
82 changes: 82 additions & 0 deletions README_INSTALL_ARM-MAC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Getting started on arm-based Mac

This README support the installation on the arm based mac.


## 🚀 Setup on arm based mac.
Our code is implemented in Python. To setup, do the following:
1. Install [Python 3.8.13](https://www.python.org/downloads/release/python-3813/)
2. Install [Java](https://www.java.com/en/download/)
3. Download the source code:
```sh
> git clone https:/princeton-nlp/webshop.git webshop
```
4. Create a virtual environment using [Anaconda](https://anaconda.org/anaconda/python) and activate it
```sh
> conda create -n webshop python=3.8.13
> conda activate webshop
```
5. Install requirements into the `webshop` virtual environment via the `setup.sh` script
```sh
> ./setup_arm.sh [-d small|all]
```




## Default Installation (Failures):
1. `pip3 install -r requirements.txt`

Fails at:
- tokenizers
- nmslib
- lightgbm
- transformers==4.19.2
- PyYAML==6.0.0

Fails, because wrong versions installed
- Werkzeug==2.2.2 (needs to be installed for Flask instead of 3.0.0 to work [https://stackoverflow.com/questions/77213053/why-did-flask-start-failing-with-importerror-cannot-import-name-url-quote-fr])
- numpy-1.24.4 (needs to be installed instead of numpy 1.22 [https://stackoverflow.com/questions/33859531/runtimeerror-module-compiled-against-api-version-a-but-this-version-of-numpy-is])

**tokenizers fix**:
`pip3 install tokenizers`

**nmslib fix**:
`pip3 install Cython`
`pip3 install CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib`
[https:/nmslib/nmslib/issues/476]

**lightgbm fix**:
[https:/microsoft/LightGBM/issues/5328]
`brew install libomp`
`pip3 install lightgbm`


**transformers fix**:
`pip3 install transformers-4.23.1` works

**PyYAML fix**:
`pip3 install PyYAML==6.0.1` works


## Running setup.sh
Fails at:
- `python -m spacy download en_core_web_lg
`

**Spacy Fix**:
Interestingly:

- `pip install -U 'spacy[apple]'`

Does **NOT** work.

So, first installing spacy with conda works.

Remove:
`spacy==3.3.0` from `requirements.txt`




<!-- env_name:env_conda_webshop -->
23 changes: 23 additions & 0 deletions requirements_arm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
beautifulsoup4==4.11.1
cleantext==1.1.4
env==0.1.0
Flask==2.1.2
gdown
gradio
gym==0.24.0
numpy==1.24.4
pandas==1.4.2
pyserini==0.17.0
pytest
PyYAML==6.0.1
rank_bm25==0.2.2
requests==2.27.1
requests_mock
rich==12.4.4
scikit_learn==1.1.1
selenium==4.2.0
thefuzz==0.19.0
torch==1.11.0
tqdm==4.64.0
train==0.0.5
transformers==4.23.1
74 changes: 74 additions & 0 deletions setup_arm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash

# Displays information on how to use script
helpFunction()
{
echo "Usage: $0 [-d small|all]"
echo -e "\t-d small|all - Specify whether to download entire dataset (all) or just 1000 (small)"
exit 1 # Exit script after printing help
}

# Get values of command line flags
while getopts d: flag
do
case "${flag}" in
d) data=${OPTARG};;
esac
done

if [ -z "$data" ]; then
echo "[ERROR]: Missing -d flag"
helpFunction
fi

# Install Python Dependencies
conda install spacy
pip install -r requirements_arm.txt;

# Install Environment Dependencies via `conda`
conda install -c pytorch faiss-cpu;
conda install -c conda-forge openjdk=11;

# Download dataset into `data` folder via `gdown` command
mkdir -p data;
cd data;
if [ "$data" == "small" ]; then
gdown https://drive.google.com/uc?id=1EgHdxQ_YxqIQlvvq5iKlCrkEKR6-j0Ib; # items_shuffle_1000 - product scraped info
gdown https://drive.google.com/uc?id=1IduG0xl544V_A_jv3tHXC0kyFi7PnyBu; # items_ins_v2_1000 - product attributes
elif [ "$data" == "all" ]; then
gdown https://drive.google.com/uc?id=1A2whVgOO0euk5O13n2iYDM0bQRkkRduB; # items_shuffle
gdown https://drive.google.com/uc?id=1s2j6NgHljiZzQNL3veZaAiyW_qDEgBNi; # items_ins_v2
else
echo "[ERROR]: argument for `-d` flag not recognized"
helpFunction
fi
gdown https://drive.google.com/uc?id=14Kb5SPBk_jfdLZ_CDBNitW98QLDlKR5O # items_human_ins
cd ..

# Download spaCy large NLP model
python -m spacy download en_core_web_sm

# Build search engine index
cd search_engine
mkdir -p resources resources_100 resources_1k resources_100k
python convert_product_file_format.py # convert items.json => required doc format
mkdir -p indexes
./run_indexing.sh
cd ..

# Create logging folder + samples of log data
get_human_trajs () {
PYCMD=$(cat <<EOF
import gdown
url="https://drive.google.com/drive/u/1/folders/16H7LZe2otq4qGnKw_Ic1dkt-o3U9Zsto"
gdown.download_folder(url, quiet=True, remaining_ok=True)
EOF
)
python -c "$PYCMD"
}
mkdir -p user_session_logs/
cd user_session_logs/
echo "Downloading 50 example human trajectories..."
get_human_trajs
echo "Downloading example trajectories complete"
cd ..
2 changes: 1 addition & 1 deletion web_agent_site/engine/goal.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from thefuzz import fuzz
from web_agent_site.engine.normalize import normalize_color

nlp = spacy.load("en_core_web_lg")
nlp = spacy.load("en_core_web_sm")

PRICE_RANGE = [10.0 * i for i in range(1, 100)]

Expand Down

0 comments on commit 17dbef5

Please sign in to comment.