GitHub - IBM/video-summarizer-using-watson: Summarize Video and Audio files usin...
source link: https://github.com/IBM/video-summarizer-using-watson
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Summarize a video or audio file using Watson
In this code pattern, we will [create something] using [technologies] and [components]. [Explain briefly how things work]. [Give acknowledgements to others if necessary]
When you have completed this code pattern, you will understand how to:
- [goal 1]
- [goal 2]
- [goal 3]
- [goal 4]
- Step 1.
- Step 2.
- Step 3.
- Step 4.
- Step 5.
Watch the Video
Steps
1. Clone the repo
Clone the video-summarizer-using-watson
repo locally. In a terminal, run:
git clone https://github.com/IBM/video-summarizer-using-watson.git
Application Directory structure
The Application is built on Python Flask Framework.
-
The directory structure is as follows:
. ├── Dockerfile ├── LICENSE ├── Notebooks │ ├── IBM Watson Speech to Text custom model training.ipynb │ └── Summarize.ipynb ├── Procfile ├── README.md ├── apis │ ├── __init__.py │ ├── summarizer.py │ ├── videoUtils.py │ └── watsonSpeechToText.py ├── app.py ├── deploy.yaml ├── manifest.yml ├── requirements.txt ├── static │ ├── audios │ ├── chunks │ ├── credentials │ │ └── speechtotext.json │ ├── css │ │ └── style.css │ ├── images │ ├── js │ │ └── script.js │ ├── transcripts │ └── videos │ └── wc.png └── templates └── index.html
-
apis/
contains the API endpoints./api/v1.0/uploadVideo
: This API is used to upload the video file, extract audio from the video file, detect long pauses in the audio file and split the audio file into chunks./api/v1.0/transcribe/<string:model>
: This API is used to transcribe the audio files using Watson Speech to Text./api/v1.0/summarize
: This API is used to summarize the text using GTP-2, Gensim and XLNET summarizers.
-
static/
contains the following static files.credentials/
contains the credentials for Watson Speech to Text.videos/
contains the uploaded video files.audios/
contains the extracted audio files.transcripts/
contains the transcribed text files.chunks/
contains the audio chunks.css/
contains the CSS files.js/
contains the JavaScript files.
-
templates/
contains the HTML templates. -
app.py
is the main application file to run the flask server. -
Dockerfile
is the Dockerfile to build the Docker image. -
requirements.txt
is the list of requirements for the application. -
deploy.yaml
is the deployment configuration file.
2. Create Watson Services
2.1. Create Watson Speech to Text service on IBM Cloud
-
Login to IBM Cloud, create a Watson Speech To Text Service, and click on
create
as shown. -
In Speech To Text Dashboard, Click on
Services Credentials
.
-
Click on
New credential
and add a service credential as shown. -
Copy the credentials.
2.2. Add Watson Speech to Text credentials to the application
-
Add the Watson Speech to Text credentials in the
static/credentials/speechtotext.json
file.{ "apikey": "xxxx", "iam_apikey_description": "xxxx", "iam_apikey_name": "xxxx", "iam_role_crn": "xxxx", "iam_serviceid_crn": "xxxx", "url": "xxxx" }
3. Run the Application
You can choose to run the application Locally or deploy on Red Hat OpenShift or deploy on IBM Public Cloud Foundry.
LocallyRed Hat OpenShiftIBM Public Cloud Foundry
4. Analyze the Application
- Upload any video/audio file. (.mp4/.mov or .mp3/.wav). You can use the dataset provided in the repo data/earnings-call-2019.mp4 or data/earnings-call-Q-and-A.mp4
About the Dataset
-
Select the Watson Speech to Text Language and Acoustic Model.
Custom language model is built to recognize the out of vocabulary words from the audio. Learn more
Custom accoustic model is built to recognize the accent of the speaker from the audio. Learn more
NOTE: A Standard account is required to train a custom Speech To Text Model. There are three types of plans, Lite (FREE), Standard and Premium (PAID) for more info visit https://cloud.ibm.com/catalog/services/speech-to-text
You can refer to the IBM Watson Speech to Text custom model training.ipynb notebook to learn in detail how to build and train custom Watson Speech to Text models.
-
It will take approximately the same amount of time as the duration of the video to process the Speaker Diarized Output, Summary and Transcript.
-
You can view the Speaker Diarized Output.
Speaker Diarization is a process of extracting multiple speakers information from an audio. Learn more
Summary
License
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK