

GitHub - stanford-crfm/mistral: Mistral: A strong, northwesterly wind: Framework...
source link: https://github.com/stanford-crfm/mistral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Mistral
Mistral: A strong and cool northwesterly wind that builds as it moves, bringing good health and clear skies.
A framework for transparent and accessible large-scale language model training, built with . Includes tools and helpful scripts for incorporating new pre-training datasets, various schemes for single node and distributed training - including on cloud providers like GCP, and importantly, scripts for evaluation.
Visit our Read the Docs for the full documentation.
A Propulsion Endeavor
Community
Mistral
is built to facilitate transparent and accessible training. To do our best to reach this goal, we will hold community meetings
twice a month we'll give updates as to where we're at and what we're working on, and more importantly, hear from you as to how we can help
and possibly work together.
We would love for folks from academia, other community efforts, as well as those in industry to join - all are welcome. The first meeting will be on Monday, August 30th at 4 PM PT.
We'll post the future dates (and times - which we hope to move around through the day to maximally engage folks in varied timezones) after the first meeting!
Quickstart
Installation
The dependencies for Mistral can be installed using Conda. Note that the provided environment assumes that CUDA 11.0 is installed. You may need to adjust the environment YAML file depending on your set up.
git clone https://github.com/stanford-crfm/mistral.git cd mistral conda env create -f environments/environment-gpu.yaml # Choose CUDA kernel based on the hardware!
If you are training on the CPU only, run conda env create -f environments/environment-cpu.yaml
instead.
Training GPT-2 Micro
Prerequisites
First, make sure to update conf/tutorial-gpt2-micro.yaml
with the directories you want to store the Hugging Face
cache and model runs.
# Artifacts & Caching
artifacts:
cache_dir: /path/to/artifacts
run_dir: /path/to/runs
Next, make sure that /path/to/mistral
is on your PYTHONPATH
.
Single-node single-GPU training
For single-node single-gpu training, run:
conda activate mistral cd mistral CUDA_VISIBLE_DEVICES=0 python train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro
Multi-node multi-GPU training with DeepSpeed
Modify /job/hostfile
in the following way:
<Hostname of first machine> slots=<Number of GPUs>
<Hostname of second machine> slots=<Number of GPUs>
...
<Hostname of the nth machine> slots=<Number of GPUs>
Below is an example hostfile where we train on machine1
and machine2
with 8 GPUs each:
machine1 slots=8
machine2 slots=8
To start distributed training, run:
conda activate mistral cd mistral deepspeed --num_gpus 8 --num_nodes 2 --master_addr machine1 train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 2 --nproc_per_node 8 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 4 --training_arguments.deepspeed conf/deepspeed/z1-conf.json --run_id tutorial-gpt2-micro-multi-node > tutorial-gpt2-micro-multi-node.out 2> tutorial-gpt2-micro-multi-node.err
Note: You may need to adjust your batch size depending on the capacity of your GPUs.
If you are interested in training a model on Google Cloud, check out our Google Cloud + Kubernetes Tutorial.
Using the model
Model checkpoints will be stored in the directory specified by the artifacts.run_dir
. An example checkpoint might be
in /path/to/runs/tutorial-gpt2-micro/checkpoint-1000
.
Mistral stores model checkpoints in the Hugging Face format, so models can be loaded and used in the same manner as if one had trained the model with Hugging Face.
For instance, to generate text with Transformers (you will need to clone the
transformers repo):
conda activate mistral cd transformers/examples/text-generation python run_generation.py --model_type=gpt2 --model_name_or_path=/path/to/runs/tutorial-gpt2-micro/checkpoint-1000
Or to load the model in Python code (make sure /path/to/mistral
is in your PYTHONPATH
):
from src.models.mistral_gpt2 import MistralGPT2LMHeadModel model = MistralGPT2LMHeadModel.from_pretrained("/path/to/runs/tutorial-gpt2-micro/checkpoint-1000")
Resources
The Propulsion team has trained 5 GPT-2 Medium models and 5 GPT-2 Small models on the OpenWebText corpus, as found in .
Checkpoints can be loaded as Hugging Face models. For each model, we provide checkpoints at 100k, 200k, 300k and 400k steps.
We have also stored over 600 checkpoints for each model, subject to the following checkpoint schedule:
- Every 10 Steps, for the first 0 - 100 Steps.
- Every 50 Steps, from 100 - 2000 Steps.
- Every 100 Steps, from 2000 - 20,000 Steps.
- Every 1000 Steps, from 20,000 - 400,000 Steps.
This comes out to 610 checkpoints per run, taking up ~22TB for all 10 models (making it pretty expensive to host!) If you are interested in acquiring these additional checkpoints, please file an issue or contact Laurel (lorr1) and Sidd (skaramcheti) at their @cs.stanford.edu email addresses, and we'll be happy to figure out a cost-effective solution to sharing them.
GPT-2 Medium
Run Type Checkpoint Size Link Arwen GPT-2 Medium 400000 4.9G download Arwen GPT-2 Medium 300000 4.9G download Arwen GPT-2 Medium 200000 4.9G download Arwen GPT-2 Medium 100000 4.9G download Beren GPT-2 Medium 400000 4.9G download Beren GPT-2 Medium 300000 4.9G download Beren GPT-2 Medium 200000 4.9G download Beren GPT-2 Medium 100000 4.9G download Celebrimbor GPT-2 Medium 400000 4.9G download Celebrimbor GPT-2 Medium 300000 4.9G download Celebrimbor GPT-2 Medium 200000 4.9G download Celebrimbor GPT-2 Medium 100000 4.9G download Durin GPT-2 Medium 400000 4.9G download Durin GPT-2 Medium 300000 4.9G download Durin GPT-2 Medium 200000 4.9G download Durin GPT-2 Medium 100000 4.9G download Eowyn GPT-2 Medium 400000 4.9G download Eowyn GPT-2 Medium 300000 4.9G download Eowyn GPT-2 Medium 200000 4.9G download Eowyn GPT-2 Medium 100000 4.9G downloadGPT-2 Small
Run Type Checkpoint Size Link Alias GPT-2 Small 400000 1.8G download Alias GPT-2 Small 300000 1.8G download Alias GPT-2 Small 200000 1.8G download Alias GPT-2 Small 100000 1.8G download Battlestar GPT-2 Small 400000 1.8G download Battlestar GPT-2 Small 300000 1.8G download Battlestar GPT-2 Small 200000 1.8G download Battlestar GPT-2 Small 100000 1.8G download Caprica GPT-2 Small 400000 1.8G download Caprica GPT-2 Small 300000 1.8G download Caprica GPT-2 Small 200000 1.8G download Caprica GPT-2 Small 100000 1.8G download Darkmatter GPT-2 Small 400000 1.8G download Darkmatter GPT-2 Small 300000 1.8G download Darkmatter GPT-2 Small 200000 1.8G download Darkmatter GPT-2 Small 100000 1.8G download Expanse GPT-2 Small 400000 1.8G download Expanse GPT-2 Small 300000 1.8G download Expanse GPT-2 Small 200000 1.8G download Expanse GPT-2 Small 100000 1.8G downloadIssues
To ask questions, report issues, or request features, please use the GitHub Issue Tracker. Before creating a new issue, please make sure to search for existing issues that may solve your problem.
Contributing
Please see the following page for information on contributing.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK