Semantic Sommeliers: Audio Processing and Story Detection System

Semantic Sommeliers is an advanced audio processing system designed to handle complex tasks such as transcription, story detection, and instruction alignment within audio files. Utilizing models like Whisper and WhisperX, the system can transcribe audio, identify stories, and synchronize instructions with the session data effectively.

Features

Audio Transcription: Leverages OpenAI's Whisper and custom WhisperX models for accurate speech-to-text capabilities.
Story Detection: Identifies and timestamps stories within audio sessions using semantic similarity analysis.
Instruction Synchronization: Aligns instructional audio files with session data, using cross-correlation to find exact timings.
Dynamic Configuration: Allows for varied audio processing settings through external configuration.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.10 or later*
Poetry Python Package Manager
FFmpeg for audio processing
pip
Conda (optional)

Installation

To set up the Semantic Sommeliers system on your local machine:

Using Poetry

Clone the repository:

git clone https:/your-username/semantic-sommeliers.git
cd semantic-sommeliers

Install Poetry:

Follow the instructions on the Poetry website to install Poetry.
Create and activate the virtual environment:
```
poetry env use python3.10
```
Install dependencies:
```
poetry install
```

Activate the virtual environment:

source $(poetry env info --path)/bin/activate

Install torch with GPU support:

pip install torch==2.0.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html

Using Conda

Clone the Repository:

git clone https:/your-repository/semantic_sommeliers.git
cd semantic_sommeliers

Setup Python Envrionment (using Conda):

conda create -n semantic_sommeliers python=3.10
conda activate semantic_sommeliers

Install Dependencires:
```
pip install -r requirements.txt
```

Usage

Running Individual Experiments

To run individual experiments with a specific session:

python main.py --session_name [session_filename.wav] --transcript_tool [whisper|whisperx] [optional parameters]

Optional parameters and their default values from config.py are:

--new_sample_rate: Sample rate for audio processing (default is set in config.py)
--highcut: Highcut frequency for filtering (default is set in config.py)
--lowcut: Lowcut frequency for filtering (default is set in config.py)
--normalization: Enable or disable volume normalization (default is set in config.py)
--filtering: Enable or disable filtering (default is set in config.py)
--seconds_threshold, --story_absolute_peak_height, etc.: Other thresholds and heights as specified in config.py

Running Batch Experiments

To automatically process all session files located in your data/sessions directory, run the run_experiments.py script. This script reads all .wav files in the sessions directory and processes them using the default settings specified in config.py:

python batch_run.py --audio_list path/to/audio_list.txt --error_log path/to/error_log.txt

Configuration

Modify 'config.py' to change default settings used by the scripts. These settings include audio processing parameters like sample rate, filter settings, normalization, and detection thresholds. Changes in 'config.py' will affect both individual and batch processing unless parameters are explicitly overridden in the command line.

Files and Directories

'main.py : Main script for running individual experiments.
'batch_fun.py' : Wrapper script for running experiments in batch mode.
'utility/utility.py' : Contains all utility functions for audio loading, trascription, and other core functionalities
'utils/general_util.py' : Contains utility functions for audio loading, transcription, and other core functionalities.
'utils/audio_util.py' : Contains functions specific to audio processing tasks.
'utils/text_util.py' : Contains functions specific to text processing tasks.
'config.py' : Configuration file for setting default parameters.

Contributing

Contributions to improve Semantic Sommeliers are welcome. Please ensure to follow the existing code style and add unit tests for any new or changed functionality.

License

Distributed under the GNU Lesser General Public License v2.1 (LGPL 2.1). See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
slurm_cmi.sh		slurm_cmi.sh
slurm_mit.sh		slurm_mit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Sommeliers: Audio Processing and Story Detection System

Features

Prerequisites

Installation

Using Poetry

Using Conda

Usage

Running Individual Experiments

Running Batch Experiments

Configuration

Files and Directories

Contributing

License

About

Releases

Packages

Contributors 3

Languages

kimit0310/semantic_sommeliers

Folders and files

Latest commit

History

Repository files navigation

Semantic Sommeliers: Audio Processing and Story Detection System

Features

Prerequisites

Installation

Using Poetry

Using Conda

Usage

Running Individual Experiments

Running Batch Experiments

Configuration

Files and Directories

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages