Sight Sense: Smart DoorBell using ML and Computer Vision

Created by Alois Orozco, Daniel Grimut, Dom Doesburg, Augusto Moto Pinheiro

What is Sight Sense:

For ConuHacks 8, Quebec's largest Hackathon of the past decade, our team created an exceptional tool to increase home security with the help of ML and Computer Vision. SightSence utilizes multiple machine learning models to identify faces and uses mathematical concepts such as the Kalman Filter, IoU (intersect over Union), and the EAR (eye aspect ratio) to perform lighting-fast human authentications without the use of infra-red cameras.

How we did it:

Our team used a pre-trained computer vision model and fed it multiple images of faces. We used Python and OpenCV to work with the model, and enhance it.

Why we did it:

Our team cares passionately about making the world a better place. Through our years working together on academic projects, we have discussed our motives behind pursuing Computer Science and we all strongly believed in the great opportunities that software development has to make people's lives better, and safer.

Features:

Face Detection

The AI is able to detect faces, and associate unique IDs to each - these IDs make it easy for the admin to select a detected individual to authenticate.

To maintain and track IDs was a challenge, as the AI processes and annotates each frame individually. Additionally, we found it inefficient to load a new model to track the faces - we wanted to learn to do this stuff ourselves. So, we used the Kalman Filter to track the faces and IoU to associate the Kalman Predictions to the detected faces at each new frame.

Kalman Filter is a relatively complex mathematical algorithm, which we certainly had trouble understanding. The filter makes these predictions by considering the object's velocity in 3D space, while also accounting for noise—essentially the variations and uncertainties in parameters like velocity, position, and acceleration. We used the Intersection over Union (IoU) to find the closest match of the newly detected faces, against the Kalman predictions, which allowed us to track the same face, and maintain its ID throughout its existence on the screen. This predictive capability enhances the accuracy and reliability of object-tracking systems and allows us to determine if a detected person disappears from the frame - at which point we halt any authentication process started for that target. In the following image, you can see the Kalman prediction of the center of the new bounding box (blue circle) against the actual position of the face's bounding box (red circle):

Face Meshing

We used an open-source Face Mesh model from Google to map the facial landmarks on the detected face. This is done after the target has been identified and selected by ID, allowing us to begin the human authentication process (more on it in the next section). It is worth noting, that for efficiency, we isolated the part of the frame that contained the face and only passed that to the model, adjusting the new coordinate system accordingly after the face landmark detection, to display the landmarks on the screen.

Human Authentication

We annihilated the if the target is a human by generating a sequence of blinks the person has to perform, holding each blink for 'n' number of seconds. Using the facial landmarks obtained from our Facial Detection Module, we detected each blink using the Eye-Aspect Ratio (EAR) we learned about in the following paper https://peerj.com/articles/cs-943/.

Concurency Control

We utilized asyncio with socketio to create a Python server to host the models and app. Using Locks and Semaphores, we made our app thread-safe allowing multiple users to join the broadcast, but only allowing one (for now) to perform an authentication.

Future Plans

We plan the following for the near future:

Deploy Website
Port to mobile app
Re-write our frontend in TS (why? Because we can 😎)
Demo Video - so we can flex all the other cool features that we did not outline here
More cool AI stuff we don't want to spoil (although this repo is public so you probably can figure it out through our code if you want 👀)

Tech Stack

Python, asyncio, socketio
OpenCV, YOLOv8, Keras, Tensorflow
JS, React

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
backend		backend
frontend		frontend
old		old
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sight Sense: Smart DoorBell using ML and Computer Vision

What is Sight Sense:

How we did it:

Why we did it:

Features:

Face Detection

Face Meshing

Human Authentication

Concurency Control

Future Plans

Tech Stack

About

Releases

Packages

Contributors 4

Languages

aloisorozco/Sight-Sense

Folders and files

Latest commit

History

Repository files navigation

Sight Sense: Smart DoorBell using ML and Computer Vision

What is Sight Sense:

How we did it:

Why we did it:

Features:

Face Detection

Face Meshing

Human Authentication

Concurency Control

Future Plans

Tech Stack

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages