GitHub - IBM/vira-dialog-act-classification: Dialog-Act classifier of the VIRA c...
source link: https://github.com/IBM/vira-dialog-act-classification
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
vira-dialog-act-classification
Scope
The purpose of this project is to train and serve a language model of dialog-act classification for the VIRA chatbot.
Usage
This repo should be used in the following scenarios, in this order:
Adding new dialog-acts
Adding new dialog-acts is normally done on a maintainer personal computer using a CSV editor.
If the repo does not exist on your computer, clone it using the command:
git clone https://github.com/IBM/vira-dialog-act-classification.git
To add new dialog-acts:
- Make sure you are using the latest version of the repo using the command
git pull
- Edit the CSV files under
dialog-act_dataset
using a CSV editor. - Commit your changes to the repo. It is recommended to create a Pull Request when making changes, as described under [Maintenance].
Packaging the repository for deployment
This step is required only when there are changes to the Python code. It can be executed from the computer used for adding new dialog-acts.
Pre-requisites:
- If the repo does not exist on your computer, clone it as shown in the previous section.
- Make sure that Docker Desktop is installed on your computer
- Create a repository on the Docker hub as explained here
To package the repo for deployment:
- Run:
docker build . -t vira-dialog-act-classifier TODO
- Run:
docker push <hub-user>/<repo-name>:vira-dialog-act-classifier
Training a new dialog-act classification model
In many cases, training a new model can be done on the same computer that was used for adding new dialog-acts. However, it is also possible to use a separate computer, preferably one with a GPU, for faster execution.
If this repo does not exist on the computer used for training, clone it using the command:
git clone https://github.com/IBM/vira-dialog-act-classification.git
And in addition:
- Make sure you have Python 3.7+ installed.
- Open a shell and change directory to the repo root directory
- Create a new Python virtual environment:
python -m venv venv
- Activate the virtual environment using:
source venv/bin/activate
- Install the dependencies using:
pip install -r requirements.txt
- Deactivate the virtual environment by running:
deactivate
- Register at and obtain your authentication token from the tokens page
To train a new model:
- Make sure you are using the latest version of the repo using the command
git pull
- Activate the virtual environment using:
source venv/bin/activate
- Run the trainer script
python trainer.py
and wait until it finishes - Upload the new model and the dataset to HuggingFace hub using the command:
python upload.py <your_auth_token>
.
Deploying a dialog-act classification model
Deployment is normally done on a remote server that is publicly available on the web and supports containerized services such as Kubernetes. However, for testing purposes it is possible to deploy on a personal computer. It is recommended, but not mandatory, to use a hardware with GPU.
To deploy the model on a remote server:
- Configure the platform used for containerized services to run the docker image
<hub-user>/<repo-name>:vira-intent-classifier
- Verify that the service is running by opening a browser at the URL
https://<server-ip>/health
To deploy the model on a personal computer:
- Make sure you have Docker Desktop installed
- Run:
docker run -p 8000:8000 <hub-user>/<repo-name>:vira-dialog-act-classifier
- Verify that the service is running by opening a browser at the URL
https://127.0.0.1:8000/health
Maintenance
Pull requests are very welcome! Make sure your patches are well tested. Ideally create a topic branch for every separate change you make. For example:
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request and ask another person to review and merge
License
All source files must include a Copyright and License header. The SPDX license header is preferred because it can be easily scanned.
If you would like to see the detailed LICENSE click here.
#
# Copyright 2020- IBM Inc. All rights reserved
# SPDX-License-Identifier: Apache2.0
#
More Information
More information can be found in these files:
Notes
If you have any questions or issues you can create a new issue here.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK