Summarize a video or audio file using Watson

In this code pattern, we will [create something] using [technologies] and [components]. [Explain briefly how things work]. [Give acknowledgements to others if necessary]

When you have completed this code pattern, you will understand how to:

[goal 1]
[goal 2]
[goal 3]
[goal 4]

Step 1.
Step 2.
Step 3.
Step 4.
Step 5.

Watch the Video

Steps

1. Clone the repo

Clone the video-summarizer-using-watson repo locally. In a terminal, run:

git clone https://github.com/IBM/video-summarizer-using-watson.git

Application Directory structure

The Application is built on Python Flask Framework.

The directory structure is as follows:

  .
  ├── Dockerfile
  ├── LICENSE
  ├── Notebooks
  │   ├── IBM Watson Speech to Text custom model training.ipynb
  │   └── Summarize.ipynb
  ├── Procfile
  ├── README.md
  ├── apis
  │   ├── __init__.py
  │   ├── summarizer.py
  │   ├── videoUtils.py
  │   └── watsonSpeechToText.py
  ├── app.py
  ├── deploy.yaml
  ├── manifest.yml
  ├── requirements.txt
  ├── static
  │   ├── audios
  │   ├── chunks
  │   ├── credentials
  │   │   └── speechtotext.json
  │   ├── css
  │   │   └── style.css
  │   ├── images
  │   ├── js
  │   │   └── script.js
  │   ├── transcripts
  │   └── videos
  │       └── wc.png
  └── templates
      └── index.html

apis/ contains the API endpoints.
- /api/v1.0/uploadVideo: This API is used to upload the video file, extract audio from the video file, detect long pauses in the audio file and split the audio file into chunks.
- /api/v1.0/transcribe/<string:model>: This API is used to transcribe the audio files using Watson Speech to Text.
- /api/v1.0/summarize: This API is used to summarize the text using GTP-2, Gensim and XLNET summarizers.
static/ contains the following static files.
- credentials/ contains the credentials for Watson Speech to Text.
- videos/ contains the uploaded video files.
- audios/ contains the extracted audio files.
- transcripts/ contains the transcribed text files.
- chunks/ contains the audio chunks.
- css/ contains the CSS files.
- js/ contains the JavaScript files.
templates/ contains the HTML templates.
app.py is the main application file to run the flask server.
Dockerfile is the Dockerfile to build the Docker image.
requirements.txt is the list of requirements for the application.
deploy.yaml is the deployment configuration file.

2. Create Watson Services

2.1. Create Watson Speech to Text service on IBM Cloud

Login to IBM Cloud, create a Watson Speech To Text Service, and click on create as shown.
In Speech To Text Dashboard, Click on Services Credentials.

Click on New credential and add a service credential as shown.
Copy the credentials.

2.2. Add Watson Speech to Text credentials to the application

Add the Watson Speech to Text credentials in the static/credentials/speechtotext.json file.

{
    "apikey": "xxxx",
    "iam_apikey_description": "xxxx",
    "iam_apikey_name": "xxxx",
    "iam_role_crn": "xxxx",
    "iam_serviceid_crn": "xxxx",
    "url": "xxxx"
}

3. Run the Application

You can choose to run the application Locally or deploy on Red Hat OpenShift or deploy on IBM Public Cloud Foundry.

LocallyRed Hat OpenShiftIBM Public Cloud Foundry

4. Analyze the Application

Upload any video/audio file. (.mp4/.mov or .mp3/.wav). You can use the dataset provided in the repo data/earnings-call-2019.mp4 or data/earnings-call-Q-and-A.mp4

About the Dataset

Select the Watson Speech to Text Language and Acoustic Model.

Custom language model is built to recognize the out of vocabulary words from the audio. Learn more

Custom accoustic model is built to recognize the accent of the speaker from the audio. Learn more

NOTE: A Standard account is required to train a custom Speech To Text Model. There are three types of plans, Lite (FREE), Standard and Premium (PAID) for more info visit https://cloud.ibm.com/catalog/services/speech-to-text

You can refer to the IBM Watson Speech to Text custom model training.ipynb notebook to learn in detail how to build and train custom Watson Speech to Text models.
Click on submit.
It will take approximately the same amount of time as the duration of the video to process the Speaker Diarized Output, Summary and Transcript.
You can view the Speaker Diarized Output.

Speaker Diarization is a process of extracting multiple speakers information from an audio. Learn more

You can view the Summary from Gensim, GPT2, XLNet and Key Bert.
You can also view the transcript.

Summary

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

GitHub - IBM/video-summarizer-using-watson: Summarize Video and Audio files usin...

Summarize a video or audio file using Watson

Watch the Video

Steps

1. Clone the repo

Application Directory structure

2. Create Watson Services

2.1. Create Watson Speech to Text service on IBM Cloud

2.2. Add Watson Speech to Text credentials to the application

3. Run the Application

4. Analyze the Application

Summary

License

Recommend

B站付费视频UP主掉粉过万；快狗打车正式挂牌港交所；蔚来回应汽车坠楼

Microsoft Edge Gets Its Game On With Xbox Cloud Gaming Update

企业税务风险管控服务商「查税宝」获数千万战略融资

快狗打车陈小华：坚守“服务普通人”的初心，努力打通新能源生态系统

快狗打车登陆港交所总市值约达132.34亿港元

预测：2022年底Nexo或将面临流动性危机

GitHub - Clivern/Flare: A Simple Expressive Web Framework for Java.

Quick Charge Podcast: June 23, 2022

直播预告｜论道原生：云原生大数据建设与实践

手机市场持续低迷，2022年前五月销量暴跌20%

About Joyk