Skip to content

Commit

Permalink
Merge branch 'main' into DEV-1971-remove-warnings
Browse files Browse the repository at this point in the history
  • Loading branch information
marcobottaro authored Oct 14, 2024
2 parents 286e19e + 23f27b8 commit 2d16a4d
Show file tree
Hide file tree
Showing 55 changed files with 1,325 additions and 431 deletions.
5 changes: 5 additions & 0 deletions .changeset/clever-mirrors-carry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"nextjs-website": patch
---

fix reference in anchorEl for chatbot
5 changes: 5 additions & 0 deletions .changeset/fuzzy-bags-yawn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"nextjs-website": patch
---

Remove subtitle from banner link codec
5 changes: 5 additions & 0 deletions .changeset/fuzzy-buttons-sing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"strapi-cms": minor
---

Remove subtitle attribute from bannerLink component
5 changes: 5 additions & 0 deletions .changeset/little-colts-impress.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"nextjs-website": patch
---

Fix margins and text size of guides and tutorials
5 changes: 5 additions & 0 deletions .changeset/long-camels-sell.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"chatbot": minor
---

"Add Presidio to detect and mask PII entities"
5 changes: 5 additions & 0 deletions .changeset/orange-beans-drop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"nextjs-website": minor
---

Update url mapping for documentation
5 changes: 5 additions & 0 deletions .changeset/rare-chefs-pump.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"nextjs-website": patch
---

Fix tutorial's in page menu
5 changes: 5 additions & 0 deletions .changeset/tasty-balloons-joke.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"nextjs-website": minor
---

Fix the header in the guide page
7 changes: 5 additions & 2 deletions apps/chatbot/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ PYTHONPATH=app-path
LOG_LEVEL=DEBUG
CHB_AWS_ACCESS_KEY_ID=...
CHB_AWS_SECRET_ACCESS_KEY=...
CHB_AWS_DEFAULT_REGION=eu-west-3
CHB_AWS_S3_BUCKET=...
CHB_AWS_DEFAULT_REGION=eu-south-1
CHB_AWS_BEDROCK_REGION=eu-west-3
CHB_AWS_GUARDRAIL_ID=...
CHB_AWS_GUARDRAIL_VERSION=...
CHB_REDIS_URL=...
CHB_WEBSITE_URL=...
CHB_REDIS_INDEX_NAME=...
CHB_LLAMAINDEX_INDEX_ID=...
CHB_DOCUMENTATION_DIR=...
CHB_GOOGLE_API_KEY=...
CHB_PROVIDER=...
CHB_MODEL_ID=...
Expand Down
40 changes: 11 additions & 29 deletions apps/chatbot/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# PagoPA Chatbot

This folder contains all the details to build a RAG using the documentation provided in [`PagoPA Developer Portal`](https://developer.pagopa.it/). The retriver chosen is the `Auto Merging Retriver` one and it was implemented using [`llama-index`](https://docs.llamaindex.ai/en/stable/). Check out `src/modules/retriever.py`.
This folder contains all the details to build a RAG using the documentation provided in [`PagoPA Developer Portal`](https://developer.pagopa.it/).

This chatbot uses [`AWS Bedrock`](https://aws.amazon.com/bedrock/) as provider, so be sure to have installed [`aws-cli`](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and stored your credential in `~/.aws/credentials`.
This chatbot uses [Google](https://ai.google.dev/) or [AWS Bedrock](https://aws.amazon.com/bedrock/) as provider.
Even though the provider is the Google one, we stored its API key in AWS. So, be sure to have installed [aws-cli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) and stored your credential in `~/.aws/credentials`.

All the parameters and prompts used to build the Retrieval-Augmented Generation (RAG) are available in `config`.
The Retrieval-Augmented Generation (RAG) was implemented using [llama-index](https://docs.llamaindex.ai/en/stable/). All the parameters and prompts used are stored in `config`.

## Environment Variables

Create a `.env` file inside this folder and store the environment variables listed in `.env.example`.

## Virtual environment

Expand All @@ -27,40 +32,17 @@ The working directory is `/developer-portal/apps/chatbot`. So, to set the `PYTHO

In this way, `PYTHONPATH` points to where the Python packages and modules are, not where your checkouts are.

## File for Environment Variables

Create a `.env` file inside the folder and write to the file the following environment variables:

CHB_AWS_ACCESS_KEY_ID=...
CHB_AWS_SECRET_ACCESS_KEY=...
CHB_AWS_DEFAULT_REGION=...
CHB_AWS_S3_BUCKET=...
CHB_AWS_GUARDRAIL_ID=...
CHB_AWS_GUARDRAIL_VERSION=...
CHB_REDIS_URL=...
CHB_REDIS_INDEX_NAME=...
CHB_WEBSITE_URL=...
CHB_GOOGLE_API_KEY=...
CHB_PROVIDER=...
CHB_MODEL_ID=...
CHB_MODEL_TEMPERATURE=...
CHB_MODEL_MAXTOKENS=...
CHB_EMBED_MODEL_ID=...
CHB_ENGINE_SIMILARITY_TOPK=...
CHB_ENGINE_SIMILARITY_CUTOFF=...
CHB_ENGINE_USE_ASYNC=...
CHB_ENGINE_USE_STREAMING=...

## Knowledge vector database
## Knowledge index vector database

To reach the remote redis instance, it is necessary to open a tunnel:

```
./scripts/redis-tunnel.sh
```

Verify that the HTML files that compose the Developer Portal documentation exist in a directory. Otherwise create the documentation. Once you have the documentation directory ready, put its path in `params` and, in the end, create the vector index doing:

```
```
python src/modules/create_vector_index.py --params config/params.yaml
```

Expand Down
52 changes: 50 additions & 2 deletions apps/chatbot/config/params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,57 @@ vector_index:
path: index
chunk_sizes: [2816, 704, 176]
chunk_overlap: 20
use_redis: True
use_s3: False

engine:
response_mode: compact
verbose: False

config_presidio:
nlp_engine_name: spacy
models:
-
lang_code: en
model_name: en_core_web_md
-
lang_code: it
model_name: it_core_news_md
# -
# lang_code: de
# model_name: de_core_news_md
# -
# lang_code: es
# model_name: es_core_news_md
# -
# lang_code: fr
# model_name: fr_core_news_md
ner_model_configuration:
labels_to_ignore:
- ORDINAL
- QUANTITY
- ORGANIZATION
- ORG
- LANGUAGE
- PRODUCT
- MONEY
- PERCENT
- O
- CARDINAL
- EVENT
- WORK_OF_ART
- LAW
- MISC
model_to_presidio_entity_mapping:
PER: PERSON
PERSON: PERSON
LOC: LOCATION
LOCATION: LOCATION
GPE: LOCATION
ORG: ORGANIZATION
DATE: DATE_TIME
TIME: DATE_TIME
NORP: NRP
low_confidence_score_multiplier: 0.4
low_score_entity_names:
- ORGANIZATION
- ORG
default_score: 0.8
2 changes: 1 addition & 1 deletion apps/chatbot/config/prompts.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
qa_prompt_str: |
You are an customer services chatbot.
Your name is Discovery and your duty is to assist the user with the PagoPA DevPortal documentation!
Your name is Discovery and your duty is to assist the user with the PagoPA DevPortal documentation, homepage: https://dev.developer.pagopa.it!
--------------------
Context information:
{context_str}
Expand Down
2 changes: 2 additions & 0 deletions apps/chatbot/docker/app.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,6 @@ RUN poetry install

COPY . ${LAMBDA_TASK_ROOT}
RUN python ./scripts/nltk_download.py
RUN python ./scripts/spacy_download.py

CMD ["src.app.main.handler"]
Loading

0 comments on commit 2d16a4d

Please sign in to comment.