2

GitHub - IBM/watson-stt-wer-python: Utilities for transcribing a set of audio fi...

 1 year ago
source link: https://github.com/IBM/watson-stt-wer-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

STT-WER-Python

Utilities for

  • Transcribing a set of audio files with Speech to Text (STT)
  • Analyzing the error rate of the STT transcription against a known-good transcription

More documentation

This readme describes the tools in depth. For more information on use cases and methodology, please see the following articles:

You may also find useful:

  • TTS-Python - companion tooling for IBM Text to Speech

Installation

Requires Python 3.x installation.

All of the watson-stt-wer-python dependencies are installed at once with pip:

pip install -r requirements.txt

Note: If receiving an SSL Certificate error (CERTIFICATE_VERIFY_FAILED) when running the python scripts, try the following commands to tell python to use the system certificate store.

Windows

pip install --trusted-host pypi.org --trustedhost files.python.org python-certifi-win32

MacOS

Open a terminal and change to the location of your python installation to execute Install Certificates.command, for example:

cd /Applications/Python 3.6
./Install Certificates.command

Setup

Create a copy of config.ini.sample. You'll modify this file in subsequent steps.

cp config.ini.sample config.ini

Each sub-sections will describe what configuration parameters are needed.

Transcription

Uses IBM Watson Speech to Text service to transcribe a folder full of audio files. Creates a CSV with transcriptions.

Setup

Update the parameters in your config.ini file.

Required configuration parameters:

  • apikey - API key for your Speech to Text instance
  • service_url - Reference URL for your Speech to Text instance
  • base_model_name - Base model for Speech to Text transcription

Optional configuration parameters:

  • language_model_id - Language model customization ID (comment out to use base model)
  • acoustic_model_id - Acoustic model customization ID (comment out to use base model)
  • grammar_name - Grammar name (comment out to use base model)
  • stt_transcriptions_file - Output file for Speech to Text transcriptions
  • audio_file_folder - Input directory containing your audio files
  • reference_transcriptions_file - Reference file for manually transcribed audio files ("labeled data" or "ground truth"). If present, will be merged into stt_transcriptions_file as "Reference" column
  • stemming - If True, pre-processing stems words with Porter stemmer. Stemming will treat singular/plural of a word as equivalent, rather than a word error.

Execution

Assuming your configuration is in config.ini, transcribe all the audio files in audio_file_folder parameter via the following command:

python transcribe.py config.ini

Output

Transcription will be stored in a CSV file based on stt_transcriptions_file parameter with a format like below:

Audio File Transcription
file1.wav The quick brown fox
file2.wav jumped over the lazy dog

A third column, "Reference", will be included with the reference transcription, if a reference_transcriptions_file is found as source.

Analysis

Simple python package to approximate the Word Error Rate (WER), Match Error Rate (MER), Word Information Lost (WIL) and Word Information Preserved (WIP) of one or more transcripts.

Setup

Your config file must have references for the reference_transcriptions_file and stt_transcriptions_file properties.

  • Reference file (reference_transcriptions_file) is a CSV file with at least columns called Audio File Name and Reference. The Reference is the actual transcription of the audio file (also known as the "ground truth" or "labeled data"). NOTE: In your audio file name, make sure you put the full path (eg. ./audio1.wav)
  • Hypothesis file (stt_transcriptions_file) is a CSV file with at least columns called Audio File Name and Hypothesis. The Hypothesis is the transcription of the audio file by the Speech to Text engine. The transcribe.py script can create this file.

Execution

python analyze.py config.ini

Experiment

Use the experiment.py script to execute a series of Transcription/Analyze experiments where configuration settings may change for each experiment. This option will require customization to set up for the specific configuration to be tested. Changes should be made in the run_all_experiments function.

python experiment.py config.ini

Results

The script creates two output files, in the file names specified by the details_file and summary_file properties.

  • Details (details_file) is a CSV file with rows for each audio sample, including reference and hypothesis transcription and specific transcription errors
  • Summary (summary_file) is a JSON file with metrics for total transcriptions and overall word and sentence error rates.

Metrics (Definitions)

  • WER (word error rate), commonly used in ASR assessment, measures the cost of restoring the output word sequence to the original input sequence.
  • MER (match error rate) is the proportion of I/O word matches which are errors.
  • WIL (word information lost) is a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure that was proposed half a century ago.

Background on supporting library

Repo of the Python module JIWER: https://pypi.org/project/jiwer/

It computes the minimum-edit distance between the ground-truth sentence and the hypothesis sentence of a speech-to-text API. The minimum-edit distance is calculated using the python C module python-Levenshtein.

Model training

The models.py script has wrappers for many model-related tasks including creating models, updating training contents, getting model details, and training models.

Setup

Update the parameters in your config.ini file.

Required configuration parameters:

  • apikey - API key for your Speech to Text instance
  • service_url - Reference URL for your Speech to Text instance
  • base_model_name - Base model for Speech to Text transcription

Execution

For general help, execute:

python models.py

The script requires a type (one of base_model,custom_model,corpus,word,grammar) and an operation (one of list,get,create,update,delete) The script optionally takes a config file as an argument with -c config_file_name_goes_here, otherwise using a default file of config.ini which contains the connection details for your speech to text instance. Depending on the specified operation, the script also accepts a name, description, and file for an associated resource. For instance, new custom models should have a name and description, and a corpus should have a name and associated file.

Examples

List all base models:

python models.py -o list -t base_model

List all custom models:

python models.py -o list -t custom_model

Create a custom model:

python models.py -o create -t custom_model -n "model1" -d "my first model"

Add a corpus file for a custom model (the custom model's customization_id is stored in config.ini.model1)(corpus1.txt contains the corpus contents):

python models.py -c config.ini.model1 -o create -n "corpus1" -f "corpus1.txt" -t corpus

List all corpora for a custom model (the custom model's customization_id is stored in config.ini.model1):

python models.py -c config.ini.model1 -o list -t corpus

Train a custom model (the custom model's customization_id is stored in config.ini.model1):

python models.py -c config.ini.model1 -o update -t custom_model

Note some parameter combinations are not possible. The operations supported all wrap the SDK methods documented at https://cloud.ibm.com/apidocs/speech-to-text.

Sample setup for organizing multiple experiments

Instructions for creating a directory structure for organizing input and output files for experiments for multiple models. Creating a new directory structure is recommend for each new model being experimented/tested. A sample MemberID model is shown.

  1. Start from root of WER tool directory, cd WATSON-STT-WER-PYTHON
  2. Create project directory, mkdir -p <project name>
    1. e.g. mkdir -p ClientName-data
  3. Create audio directory, mkdir -p <project name>/audios/<audio type>
    1. e.g. mkdir -p ClientName-data/audios/audio.memberID
    2. copy/upload audio files to directory
      1. e.g. cp /temp/audio/*.wav ClientName-data/audios/audio.memberID
  4. Create referemce transcriptions directory, mkdir -p <project name>/reference_transcriptions
    1. e.g. mkdir -p ClientName-data/reference_transcriptions
    2. copy/upload transcription file to directory
      1. e.g. cp/temp/transcriptions/reference_transcription_memberID.csv ClientName-data/reference_transcriptions
  5. Create experiments directory, mkdir -p <project name>/experiments/<model description base>/<model detail>
    1. e.g. mkdir -p ClientName-data/experiments/telephony_base/MemberID/
  6. Copy sample config file over to directory
    1. e.g. cp config.ini.sample ClientName-data/experiments/telephony_base/MemberID/config.ini
    2. Edit the config file to match your new directory structure
      base_model_name=en-US_Telephony
      .
      .
      .
      [Transcriptions]
      reference_transcriptions_file=./ClientName-data/reference_transcriptions/reference_transcription_memberID.csv
      stt_transcriptions_file=./ClientName-data/experiments/telephony_base/MemberID/stt_transcription.csv
      audio_file_folder=./ClientName-data/audios/audio.memberID
      
      [ErrorRateOutput]
      details_file=./ClientName-data/experiments/telephony_base/MemberID/wer_detailsMemberID.csv
      summary_file=./ClientName-data/experiments/telephony_base/MemberID/wer_summaryMemberID.json
      word_accuracy_file=./ClientName-data/experiments/telephony_base/MemberID/wer_word_accuracyMemberID.csv
      stt_transcriptions_file=./ClientName-data/experiments/telephony_base/MemberID/stt_transcription.csv
      
  7. transcribe using the new config file, python transcribe.py ClientName-data/experiments/telephony_base/MemberID/config.ini
  8. analyze using the new config file, python analyze.py ClientName-data/experiments/telephony_base/MemberID/config.ini
  9. repeat previous steps for each new experiment

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK