LaCour! Generation

Companion code to the arXiv preprint presenting the LaCour! corpus. If you are looking for the dataset, please visit LaCour! Corpus.

Note

This repo is still a work in progress and will be updated in the coming days!

Installation

For installation with Miniconda:

conda create -n lacour-generation python=3.9
conda activate lacour-generation
git clone https:/trusthlt/lacour-generation.git
cd lacour-generation
pip install -r requirements.txt

Running the scraper

Producing the hearing transcripts and associated documents is divided into several steps. The code for all scrapers is located in scrape.

Download video files and video information by running scrape_webcast_videos.py, produces all_webcasts_{date}.json
Find associated files in HUDOC and download them by running scrape_case_html_matching_webcast.py
Find related press releases in HUDOC and download them by running scrape_press_releases.py

Downloading the videos

Warning

Due to changes to the webcast website, the scraper for videos no longer works. You can instead skip the first step and download the last scraped file all_webcasts.json

Transcribing the videos

Transcribing a video into a hearing transcript requires several steps, with one manual annotation step. The code for transcription is located in transcribe.

Diarize the video by running diarize.py. This requires a huggingface token to access the models of pyannote/[email protected]
Generate a speaker schedule, clustering the diarization output by running generate_speaker_schedule.py. This will result in one text file with a speaker schedule per hearing webcast
(MANUAL) Annotate the speaker schedule with the correct tags
Generate a transcript by passing the annotated speaker schedule with the video to transcribe_segmented_whisper.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scrape/video_files		scrape/video_files
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LaCour! Generation

Installation

Running the scraper

Downloading the videos

Transcribing the videos

About

Releases

Packages

trusthlt/lacour-generation

Folders and files

Latest commit

History

Repository files navigation

LaCour! Generation

Installation

Running the scraper

Downloading the videos

Transcribing the videos

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages