ScienceBot

An 💬NLP chatbot answering questions about science.

Pre-requisites

Install poetry

pip install poetry

Approach

When a question is asked, TF-IDF (“Term Frequency — Inverse Document Frequency”) is used to search a database of science facts and score the documents relating to the question. It is a technique to calculate the weight of each word in the question i.e. the importance of the word in the document and corpus. This algorithm is mostly using for the retrieval of information and text mining.

The SciQ Dataset is used as the document store (database). It contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others.

When a question is asked, the bot computes the query vectors and then retrieves relevant context from document store if similarity score > threshold of 0.5. If score not up to threshold i.e., no relevant context found, the summary from the top Wikipedia page relating to the question is used as context.

A pretrained model: DistilBERT finetuned on the SQuAD dataset for question answering is used to extract answers from the context. This answer is then returned to the user.

View approach in modelling.py

Steps

Create virtual environment and install dependencies

poetry install

Activate virtual environment

poetry shell

Download Spacy English language model.

python -m spacy download en_core_web_sm

Optional - Run modelling.py to build document store. I've saved one to the artifacts directory.

python modelling.py

Start ScienceBot

python ignite.py

The bot should be running on port 5000

Test ScienceBot by sending a POST request to http://localhost:5000/app

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
artifacts		artifacts
config		config
ontology_dc8f06af066e4a7880a5938933236037		ontology_dc8f06af066e4a7880a5938933236037
.gitignore		.gitignore
Dockerfile		Dockerfile
ignite.py		ignite.py
main.py		main.py
modelling.py		modelling.py
pyproject.toml		pyproject.toml
readme.md		readme.md
solution_notes.md		solution_notes.md
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScienceBot

Pre-requisites

Approach

Steps

About

Releases

Packages

Languages

IzicTemi/ScienceBot

Folders and files

Latest commit

History

Repository files navigation

ScienceBot

Pre-requisites

Approach

Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages