Skip to content

An 💬NLP chatbot answering questions about science

Notifications You must be signed in to change notification settings

IzicTemi/ScienceBot

Repository files navigation

ScienceBot

An 💬NLP chatbot answering questions about science.

Pre-requisites

  • Install poetry
pip install poetry

Approach

When a question is asked, TF-IDF (“Term Frequency — Inverse Document Frequency”) is used to search a database of science facts and score the documents relating to the question. It is a technique to calculate the weight of each word in the question i.e. the importance of the word in the document and corpus. This algorithm is mostly using for the retrieval of information and text mining.

The SciQ Dataset is used as the document store (database). It contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others.

When a question is asked, the bot computes the query vectors and then retrieves relevant context from document store if similarity score > threshold of 0.5. If score not up to threshold i.e., no relevant context found, the summary from the top Wikipedia page relating to the question is used as context.

A pretrained model: DistilBERT finetuned on the SQuAD dataset for question answering is used to extract answers from the context. This answer is then returned to the user.

View approach in modelling.py

Steps

  1. Create virtual environment and install dependencies
poetry install
  1. Activate virtual environment
poetry shell
  1. Download Spacy English language model.
python -m spacy download en_core_web_sm
  1. Optional - Run modelling.py to build document store. I've saved one to the artifacts directory.
python modelling.py
  1. Start ScienceBot
python ignite.py

The bot should be running on port 5000

  1. Test ScienceBot by sending a POST request to http://localhost:5000/app

About

An 💬NLP chatbot answering questions about science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published