2

GitHub - IBM/video-summarizer-using-watson: Summarize Video and Audio files usin...

 1 year ago
source link: https://github.com/IBM/video-summarizer-using-watson
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Summarize a video or audio file using Watson

In this code pattern, we will [create something] using [technologies] and [components]. [Explain briefly how things work]. [Give acknowledgements to others if necessary]

When you have completed this code pattern, you will understand how to:

  • [goal 1]
  • [goal 2]
  • [goal 3]
  • [goal 4]

architecture
  1. Step 1.
  2. Step 2.
  3. Step 3.
  4. Step 4.
  5. Step 5.

Watch the Video

video

Steps

1. Clone the repo

Clone the video-summarizer-using-watson repo locally. In a terminal, run:

git clone https://github.com/IBM/video-summarizer-using-watson.git

Application Directory structure

The Application is built on Python Flask Framework.

  • The directory structure is as follows:

      .
      ├── Dockerfile
      ├── LICENSE
      ├── Notebooks
      │   ├── IBM Watson Speech to Text custom model training.ipynb
      │   └── Summarize.ipynb
      ├── Procfile
      ├── README.md
      ├── apis
      │   ├── __init__.py
      │   ├── summarizer.py
      │   ├── videoUtils.py
      │   └── watsonSpeechToText.py
      ├── app.py
      ├── deploy.yaml
      ├── manifest.yml
      ├── requirements.txt
      ├── static
      │   ├── audios
      │   ├── chunks
      │   ├── credentials
      │   │   └── speechtotext.json
      │   ├── css
      │   │   └── style.css
      │   ├── images
      │   ├── js
      │   │   └── script.js
      │   ├── transcripts
      │   └── videos
      │       └── wc.png
      └── templates
          └── index.html
      
  • apis/ contains the API endpoints.

    • /api/v1.0/uploadVideo: This API is used to upload the video file, extract audio from the video file, detect long pauses in the audio file and split the audio file into chunks.
    • /api/v1.0/transcribe/<string:model>: This API is used to transcribe the audio files using Watson Speech to Text.
    • /api/v1.0/summarize: This API is used to summarize the text using GTP-2, Gensim and XLNET summarizers.
  • static/ contains the following static files.

    • credentials/ contains the credentials for Watson Speech to Text.
    • videos/ contains the uploaded video files.
    • audios/ contains the extracted audio files.
    • transcripts/ contains the transcribed text files.
    • chunks/ contains the audio chunks.
    • css/ contains the CSS files.
    • js/ contains the JavaScript files.
  • templates/ contains the HTML templates.

  • app.py is the main application file to run the flask server.

  • Dockerfile is the Dockerfile to build the Docker image.

  • requirements.txt is the list of requirements for the application.

  • deploy.yaml is the deployment configuration file.

2. Create Watson Services

2.1. Create Watson Speech to Text service on IBM Cloud

  • Login to IBM Cloud, create a Watson Speech To Text Service, and click on create as shown.

    Speech-to-text-service
  • In Speech To Text Dashboard, Click on Services Credentials.

    service-credentials.png
  • Click on New credential and add a service credential as shown.

  • Copy the credentials.

2.2. Add Watson Speech to Text credentials to the application

  • Add the Watson Speech to Text credentials in the static/credentials/speechtotext.json file.

    {
        "apikey": "xxxx",
        "iam_apikey_description": "xxxx",
        "iam_apikey_name": "xxxx",
        "iam_role_crn": "xxxx",
        "iam_serviceid_crn": "xxxx",
        "url": "xxxx"
    }

3. Run the Application

You can choose to run the application Locally or deploy on Red Hat OpenShift or deploy on IBM Public Cloud Foundry.

LocallyRed Hat OpenShiftIBM Public Cloud Foundry

4. Analyze the Application

About the Dataset

  • Select the Watson Speech to Text Language and Acoustic Model.

    screenshot2

    Custom language model is built to recognize the out of vocabulary words from the audio. Learn more

    Custom accoustic model is built to recognize the accent of the speaker from the audio. Learn more

    NOTE: A Standard account is required to train a custom Speech To Text Model. There are three types of plans, Lite (FREE), Standard and Premium (PAID) for more info visit https://cloud.ibm.com/catalog/services/speech-to-text

    You can refer to the IBM Watson Speech to Text custom model training.ipynb notebook to learn in detail how to build and train custom Watson Speech to Text models.

  • Click on submit.

    screenshot3
  • It will take approximately the same amount of time as the duration of the video to process the Speaker Diarized Output, Summary and Transcript.

  • You can view the Speaker Diarized Output.

    screenshot4

Speaker Diarization is a process of extracting multiple speakers information from an audio. Learn more

  • You can view the Summary from Gensim, GPT2, XLNet and Key Bert.

    screenshot5-1
    screenshot5-2
  • You can also view the transcript.

    screenshot6

Summary

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK