Skip to content

Releases: danijel3/ASRforNLP

Sample models

23 Sep 15:42
019b086
Compare
Choose a tag to compare

Models for Kaldi. These models were trained on Polish Parliament data.

Included are:

  • Phonetisaurus G2P model
  • online chain TDNN acoustic model

Sample data

23 Sep 11:41
019b086
Compare
Choose a tag to compare

This package contains sample audio and text to test ASR. This data is from old sessions of the Polish Parliament.

Kaldi binaries

23 Sep 08:25
019b086
Compare
Choose a tag to compare

These are binaries for Kaldi that work with Google Colab.

This was generated from kaldi-asr/kaldi repository commit 070437fca7d3e09b405882ae1bfc373ae0f57ad1.

It also includes:

  • openfst binaries
  • phonetisaurus-g2pfst binary from Phonetisaurus G2P
  • ngram and ngram-count binaries from SRILM

The binaries were created using nvidia-docker image nvidia/cuda:11.4.2-cudnn8-devel-ubuntu18.04.

This code to load it in Colab is as follows:

!wget https:/danijel3/ASRforNLP/releases/download/v1.0/kaldi.tar.xz

!tar xvf kaldi.tar.xz -C / > /dev/null
%rm kaldi.tar.xz

!for f in $(find /opt/kaldi -name *.so*) ; do ln -sf $f /usr/local/lib/$(basename $f) ; done
!for f in $(find /opt/kaldi/src -not -name *.so* -type f -executable) ; do ln -s $f /usr/local/bin/$(basename $f) ; done
!for f in $(find /opt/kaldi/tools -not -name *.so* -type f -executable) ; do ln -s $f /usr/local/bin/$(basename $f) ; done

!ldconfig