Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a privacy preserving version of Shazam using Concrete and/or Concrete ML #79

Closed
zaccherinij opened this issue Sep 28, 2023 · 2 comments
Assignees
Labels
💰 Awarded This project is now completed and awarded 🎯 Bounty This bounty is currently open 📁 Concrete ML library targeted: Concrete ML
Milestone

Comments

@zaccherinij
Copy link
Collaborator

zaccherinij commented Sep 28, 2023

Winners

🥇 1st place: A submission by Iamayushanand
🥈 2nd place A submission by GoktuEk


Overview

Shazam is a popular application that instantly identifies the music being played from a short recording made with a mobile phone. While more than 20 years old it is still one of the most used apps for music recognition to this day.
Description

Shazam’s algorithm is well known, as the company published a paper explaining its inner workings in 2003: An Industrial-Strength Audio Search Algorithm, by Avery Li-Chun Wang. In particular, it requires the user to send some queries to Shazam’s servers in order to search for matches in its central database. That means that Shazam is able to gather some of their users’ private personal preferences. We believe this could be avoided thanks to FHE ! We challenge you to create a music recognition algorithm like Shazam’s with FHE !

How to participate?

1️⃣ Register here.
2️⃣ When ready, submit your code here.
🗓️ Submission deadline: December 17, 2023.

What we expect

We expect you to provide:

  • A model that can identify a song’s artist and title from a sample of raw audio, at most 20 seconds long, as input.
    • To qualify for the maximum prize, the FHE application should work on raw encrypted audio and give a song identification number as output. Furthermore, the model should report when a song is not known.
    • Partial prizes will be awarded if only some parts of the pipeline are in FHE and others are done in the clear (parts can be: feature extraction, matching algorithm, unknown song rejection, song index identification). The same applies if the application does not report that a song is not known
  • An evaluation of the FHE model’s performance using FHE (only partial prizes are awarded if the algorithm can only partially be run with FHE but works with FHE simulation)
  • An evaluation of the floating-point equivalent (non-FHE) model’s performance for comparison
  • A tutorial explaining how you built the project
  • A clean and documented code, as well as a straightforward README.md file showing how to install the project as well as run and evaluate the models

The evaluation of both model’s performance should be done using the top-1 and top-3 Accuracy metric for known songs: for a query song, it is considered to be retrieved successfully if it is the first song returned in the result list, or, for top-3 if it is in the first 3 songs returned These metrics should be measured with 10-fold cross validation over the chosen dataset. The test set should be made of the training set’s audio files. A separate list of songs that are not known to the application should be kept and a report on the accuracy of these songs being reported as unknown should be given.

In order to obtain the full reward prize, we expect both models' accuracy to be high, even though the FHE model’s score might be a bit lower than its floating-point equivalent due to quantization. Additionally, the FHE execution time for a single sample should be as realistic as possible.

Implementation guide

While it is not mandatory to use one of them, the FMA: A Dataset For Music Analysis repository provides several dataset of full MP3-encoded 30s audio data, along with additional files containing each of their metadata. The smallest one has 8000 tracks but using a subset of it might be easier to start with. Alternatively you could propose a similar dataset. The awarded prize will depend on the dataset’s size.

Besides, while Concrete ML should provide the necessary models for achieving this bounty, some modifications in the source code might be required. Additionally, some parts might be done using Concrete Python directly.

Reward

🥇Best submission: up to €10,000.

To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.

🥈Second-best submission: up to €3,500.

For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.

🥉Third-best submission: up to €1,500.

The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.

Reward amounts are decided based on code quality, model accuracy scores and speed performance on a m6i.metal AWS server. When multiple solutions of comparable scope are submitted they are compared based on the accuracy metrics and computation times.

Related links and references

Submission

Apply directly to this bounty by opening an application here.

Questions?

Do you have a specific question about this bounty? Join the live conversation on the FHE.org discord server here. You can also send us an email at: [email protected]

@zaccherinij zaccherinij added 🎯 Bounty This bounty is currently open 📁 TFHE-rs library targeted: TFHE-rs labels Sep 28, 2023
@zaccherinij zaccherinij changed the title Create a privacy preserving Shazam using FHE and Concrete ML Create a Privacy Preserving Version of Shazam Using FHE and Concrete ML Sep 28, 2023
@aquint-zama aquint-zama added this to the Season 4 milestone Sep 28, 2023
@zaccherinij zaccherinij added 📁 Concrete ML library targeted: Concrete ML and removed 📁 TFHE-rs library targeted: TFHE-rs labels Sep 28, 2023
@zaccherinij zaccherinij changed the title Create a Privacy Preserving Version of Shazam Using FHE and Concrete ML Create a privacy preserving version of Shazam using FHE and Concrete ML Oct 2, 2023
@zaccherinij zaccherinij changed the title Create a privacy preserving version of Shazam using FHE and Concrete ML Create a privacy preserving version of Shazam using Concrete ML Oct 3, 2023
@aquint-zama aquint-zama pinned this issue Oct 9, 2023
@aquint-zama aquint-zama changed the title Create a privacy preserving version of Shazam using Concrete ML Create a privacy preserving version of Shazam using Concrete and/or Concrete ML Nov 2, 2023
@iamayushanand
Copy link

These metrics should be measured with 10-fold cross validation over the chosen dataset

If I have 10 songs and I train my model on 9 of them, I would likely get "unrecognised" on the 10th song during cross validation because it isn't in the training set. Can you clarify what is meant by the 10 fold cv?

@zaccherinij zaccherinij added 💰 Awarded This project is now completed and awarded and removed 🎯 Bounty This bounty is currently open labels Feb 9, 2024
@zaccherinij zaccherinij unpinned this issue Feb 9, 2024
@zaccherinij zaccherinij added the 🎯 Bounty This bounty is currently open label Feb 12, 2024
@aquint-zama
Copy link
Collaborator

Winners

🥇 1st place: A submission by Iamayushanand
🥈 2nd place A submission by GoktuEk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💰 Awarded This project is now completed and awarded 🎯 Bounty This bounty is currently open 📁 Concrete ML library targeted: Concrete ML
Projects
Status: Awarded Contributions
Development

No branches or pull requests

3 participants