Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CAI-118] Presidio #1183

Merged
merged 43 commits into from
Oct 11, 2024
Merged

[CAI-118] Presidio #1183

merged 43 commits into from
Oct 11, 2024

Conversation

mdciri
Copy link
Collaborator

@mdciri mdciri commented Oct 7, 2024

List of Changes

Add presidio to chatbot module to mask the Personally identifiable information (PII) entities

Motivation and Context

In this way, we can store a user's conversation masking all the PII in it for privacy reasons.

How Has This Been Tested?

Jupyter notebook

Screenshots (if appropriate):

Types of changes

  • Chore (nothing changes by a user perspective)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@mdciri mdciri requested a review from a team as a code owner October 7, 2024 08:21
Copy link

changeset-bot bot commented Oct 7, 2024

🦋 Changeset detected

Latest commit: 028b827

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
chatbot Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@mdciri mdciri changed the title Presidio/chatbot/cai 118 [CAI-118] Presidio Oct 7, 2024
@@ -22,7 +22,7 @@
assert PROVIDER in ["aws", "google"]


GOOGLE_PARAM_NAME = os.getenv("CHB_GOOGLE_API_KEY")
GOOGLE_PARAM_NAME = os.getenv("GOOGLE_PARAM_NAME")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable has been previously named CHB_GOOGLE_API_KEY to be compliant with the standard we used in the others

Suggested change
GOOGLE_PARAM_NAME = os.getenv("GOOGLE_PARAM_NAME")
GOOGLE_PARAM_NAME = os.getenv("CHB_GOOGLE_API_KEY")

Copy link
Collaborator Author

@mdciri mdciri Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Anyway, we need a better way to share the env variables

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the .env.example file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean a shared env file with all the filled variables

self.nlp_engine = nlp_engine
self.analyzer = AnalyzerEngine(
nlp_engine = self.nlp_engine,
supported_languages = ["it", "en", "es", "fr", "de"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually necessary to support other languages than italian?

Copy link
Collaborator Author

@mdciri mdciri Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There isn't anymore a block language, so a user can write using any language. Presidio masks PII entites only if the language is one of its inputs (see detect_pii method). So, I assumed that a normal PagoPA user could speak one of the main european languages.

try:
lang_list = detect_langs(text)
for i in range(len(lang_list)-1, -1, -1):
if lang_list[i].lang not in ["it", "en", "es", "fr", "de"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

Copy link
Contributor

Branch is not up to date with base branch

@christian-calabrese it seems this Pull Request is not updated with base branch.
Please proceed with a merge or rebase to solve this.

Copy link
Contributor

@christian-calabrese christian-calabrese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Oct 11, 2024

Jira Pull Request Link

This Pull Request refers to the following Jira issue CAI-118

@christian-calabrese christian-calabrese merged commit cee4135 into main Oct 11, 2024
13 checks passed
@christian-calabrese christian-calabrese deleted the presidio/chatbot/cai-118 branch October 11, 2024 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants