Mistral

Mistral: A strong and cool northwesterly wind that builds as it moves, bringing good health and clear skies.

A framework for transparent and accessible large-scale language model training, built with . Includes tools and helpful scripts for incorporating new pre-training datasets, various schemes for single node and distributed training - including on cloud providers like GCP, and importantly, scripts for evaluation.

Visit our Read the Docs for the full documentation.

A Propulsion Endeavor

Community

Mistral is built to facilitate transparent and accessible training. To do our best to reach this goal, we will hold community meetings twice a month we'll give updates as to where we're at and what we're working on, and more importantly, hear from you as to how we can help and possibly work together.

We would love for folks from academia, other community efforts, as well as those in industry to join - all are welcome. The first meeting will be on Monday, August 30th at 4 PM PT.

We'll post the future dates (and times - which we hope to move around through the day to maximally engage folks in varied timezones) after the first meeting!

Quickstart

Installation

The dependencies for Mistral can be installed using Conda. Note that the provided environment assumes that CUDA 11.0 is installed. You may need to adjust the environment YAML file depending on your set up.

git clone https://github.com/stanford-crfm/mistral.git
cd mistral
conda env create -f environments/environment-gpu.yaml  # Choose CUDA kernel based on the hardware!

If you are training on the CPU only, run conda env create -f environments/environment-cpu.yaml instead.

Training GPT-2 Micro

Prerequisites

First, make sure to update conf/tutorial-gpt2-micro.yaml with the directories you want to store the Hugging Face cache and model runs.

# Artifacts & Caching
artifacts:
    cache_dir: /path/to/artifacts
    run_dir: /path/to/runs

Next, make sure that /path/to/mistral is on your PYTHONPATH.

Single-node single-GPU training

For single-node single-gpu training, run:

conda activate mistral
cd mistral
CUDA_VISIBLE_DEVICES=0 python train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 1 --nproc_per_node 1 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 2 --run_id tutorial-gpt2-micro

Multi-node multi-GPU training with DeepSpeed

Modify /job/hostfile in the following way:

<Hostname of first machine> slots=<Number of GPUs>
<Hostname of second machine> slots=<Number of GPUs>
...
<Hostname of the nth machine> slots=<Number of GPUs>

Below is an example hostfile where we train on machine1 and machine2 with 8 GPUs each:

machine1 slots=8
machine2 slots=8

To start distributed training, run:

conda activate mistral
cd mistral
deepspeed --num_gpus 8 --num_nodes 2 --master_addr machine1 train.py --config conf/tutorial-gpt2-micro.yaml --nnodes 2 --nproc_per_node 8 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 4 --training_arguments.deepspeed conf/deepspeed/z1-conf.json --run_id tutorial-gpt2-micro-multi-node > tutorial-gpt2-micro-multi-node.out 2> tutorial-gpt2-micro-multi-node.err

Note: You may need to adjust your batch size depending on the capacity of your GPUs.

If you are interested in training a model on Google Cloud, check out our Google Cloud + Kubernetes Tutorial.

Using the model

Model checkpoints will be stored in the directory specified by the artifacts.run_dir. An example checkpoint might be in /path/to/runs/tutorial-gpt2-micro/checkpoint-1000.

Mistral stores model checkpoints in the Hugging Face format, so models can be loaded and used in the same manner as if one had trained the model with Hugging Face.

For instance, to generate text with Transformers (you will need to clone the transformers repo):

conda activate mistral
cd transformers/examples/text-generation
python run_generation.py --model_type=gpt2 --model_name_or_path=/path/to/runs/tutorial-gpt2-micro/checkpoint-1000

Or to load the model in Python code (make sure /path/to/mistral is in your PYTHONPATH):

from src.models.mistral_gpt2 import MistralGPT2LMHeadModel

model = MistralGPT2LMHeadModel.from_pretrained("/path/to/runs/tutorial-gpt2-micro/checkpoint-1000")

Resources

The Propulsion team has trained 5 GPT-2 Medium models and 5 GPT-2 Small models on the OpenWebText corpus, as found in .

Checkpoints can be loaded as Hugging Face models. For each model, we provide checkpoints at 100k, 200k, 300k and 400k steps.

We have also stored over 600 checkpoints for each model, subject to the following checkpoint schedule:

Every 10 Steps, for the first 0 - 100 Steps.
Every 50 Steps, from 100 - 2000 Steps.
Every 100 Steps, from 2000 - 20,000 Steps.
Every 1000 Steps, from 20,000 - 400,000 Steps.

This comes out to 610 checkpoints per run, taking up ~22TB for all 10 models (making it pretty expensive to host!) If you are interested in acquiring these additional checkpoints, please file an issue or contact Laurel (lorr1) and Sidd (skaramcheti) at their @cs.stanford.edu email addresses, and we'll be happy to figure out a cost-effective solution to sharing them.

GPT-2 Medium

Run Type Checkpoint Size Link Arwen GPT-2 Medium 400000 4.9G download Arwen GPT-2 Medium 300000 4.9G download Arwen GPT-2 Medium 200000 4.9G download Arwen GPT-2 Medium 100000 4.9G download Beren GPT-2 Medium 400000 4.9G download Beren GPT-2 Medium 300000 4.9G download Beren GPT-2 Medium 200000 4.9G download Beren GPT-2 Medium 100000 4.9G download Celebrimbor GPT-2 Medium 400000 4.9G download Celebrimbor GPT-2 Medium 300000 4.9G download Celebrimbor GPT-2 Medium 200000 4.9G download Celebrimbor GPT-2 Medium 100000 4.9G download Durin GPT-2 Medium 400000 4.9G download Durin GPT-2 Medium 300000 4.9G download Durin GPT-2 Medium 200000 4.9G download Durin GPT-2 Medium 100000 4.9G download Eowyn GPT-2 Medium 400000 4.9G download Eowyn GPT-2 Medium 300000 4.9G download Eowyn GPT-2 Medium 200000 4.9G download Eowyn GPT-2 Medium 100000 4.9G download

GPT-2 Small

Run Type Checkpoint Size Link Alias GPT-2 Small 400000 1.8G download Alias GPT-2 Small 300000 1.8G download Alias GPT-2 Small 200000 1.8G download Alias GPT-2 Small 100000 1.8G download Battlestar GPT-2 Small 400000 1.8G download Battlestar GPT-2 Small 300000 1.8G download Battlestar GPT-2 Small 200000 1.8G download Battlestar GPT-2 Small 100000 1.8G download Caprica GPT-2 Small 400000 1.8G download Caprica GPT-2 Small 300000 1.8G download Caprica GPT-2 Small 200000 1.8G download Caprica GPT-2 Small 100000 1.8G download Darkmatter GPT-2 Small 400000 1.8G download Darkmatter GPT-2 Small 300000 1.8G download Darkmatter GPT-2 Small 200000 1.8G download Darkmatter GPT-2 Small 100000 1.8G download Expanse GPT-2 Small 400000 1.8G download Expanse GPT-2 Small 300000 1.8G download Expanse GPT-2 Small 200000 1.8G download Expanse GPT-2 Small 100000 1.8G download

Issues

To ask questions, report issues, or request features, please use the GitHub Issue Tracker. Before creating a new issue, please make sure to search for existing issues that may solve your problem.

Contributing

Please see the following page for information on contributing.

GitHub - stanford-crfm/mistral: Mistral: A strong, northwesterly wind: Framework...

Mistral

Community

Quickstart

Installation

Training GPT-2 Micro

Prerequisites

Single-node single-GPU training

Multi-node multi-GPU training with DeepSpeed

Using the model

Resources

Issues

Contributing

Recommend

eltoo：闪电网络和链下合约的简化更新机制

新国标之下的中国白酒的年轻化之路

GitHub - cirolini/prometheus-curso-monitoring: CURSO PROMETHEUS E GRAFANA: Obser...

Folius Ventures：身处从 1 到 N 前夜，展望 Web3 未来

GitHub - MrMimic/data-scientist-roadmap: Toturials coming with the "data sc...

Apple Watch Series 7 Could Launch with a Larger 45mm Version

新手設計師在 Hahow 學到的事

CRM格局将定，销售易可有胜负手?

家豫同行众擎齐力-汽车之家为河南涉水车主开通换新“绿色通道”

Cointelegraph中文HUB | 即将到来的元宇宙将如何改变整个世界

About Joyk