GitHub - DongjunLee/transformer-tensorflow: TensorFlow implementation of ...
source link: https://github.com/DongjunLee/transformer-tensorflow
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
transformer
TensorFlow implementation of Attention Is All You Need. (2017. 6)
Requirements
- Python 3.6
- TensorFlow 1.8
- hb-config (Singleton Config)
- nltk (tokenizer and blue score)
- tqdm (progress bar)
- Slack Incoming Webhook URL
Project Structure
init Project by hb-base
.
├── config # Config files (.yml, .json) using with hb-config
├── data # dataset path
├── notebooks # Prototyping with numpy or tf.interactivesession
├── transformer # transformer architecture graphs (from input to logits)
├── __init__.py # Graph logic
├── attention.py # Attention (multi-head, scaled_dot_product and etc..)
├── encoder.py # Encoder logic
├── decoder.py # Decoder logic
└── layer.py # Layers (FFN)
├── data_loader.py # raw_date -> precossed_data -> generate_batch (using Dataset)
├── hook.py # training or test hook feature (eg. print_variables)
├── main.py # define experiment_fn
└── model.py # define EstimatorSpec
Reference : hb-config, Dataset, experiments_fn, EstimatorSpec
- Train and evaluate with 'WMT German-English (2016)' dataset
Config
Can control all Experimental environment.
example: check-tiny.yml
data:
base_path: 'data/'
raw_data_path: 'tiny_kor_eng'
processed_path: 'tiny_processed_data'
word_threshold: 1
PAD_ID: 0
UNK_ID: 1
START_ID: 2
EOS_ID: 3
model:
batch_size: 4
num_layers: 2
model_dim: 32
num_heads: 4
linear_key_dim: 20
linear_value_dim: 24
ffn_dim: 30
dropout: 0.2
train:
learning_rate: 0.0001
optimizer: 'Adam' ('Adagrad', 'Adam', 'Ftrl', 'Momentum', 'RMSProp', 'SGD')
train_steps: 15000
model_dir: 'logs/check_tiny'
save_checkpoints_steps: 1000
check_hook_n_iter: 100
min_eval_frequency: 100
print_verbose: True
debug: False
slack:
webhook_url: "" # after training notify you using slack-webhook
- debug mode : using tfdbg
check-tiny
is a data set with about 30 sentences that are translated from Korean into English. (recommend read it :) )
Usage
Install requirements.
pip install -r requirements.txt
Then, pre-process raw data.
python data_loader.py --config check-tiny
Finally, start train and evaluate model
python main.py --config check-tiny --mode train_and_evaluate
Or, you can use IWSLT'15 English-Vietnamese dataset.
sh prepare-iwslt15.en-vi.sh # download dataset
python data_loader.py --config iwslt15-en-vi # preprocessing
python main.py --config iwslt15-en-vi --mode train_and_evalueate # start training
Predict
After training, you can test the model.
- command
python predict.py --config {config} --src {src_sentence}
- example
$ python predict.py --config check-tiny --src "안녕하세요. 반갑습니다."
------------------------------------
Source: 안녕하세요. 반갑습니다.
> Result: Hello . I'm glad to see you . <\s> vectors . <\s> Hello locations . <\s> will . <\s> . <\s> you . <\s>
Experiments modes
: Working
: Not tested yet.
evaluate
: Evaluate on the evaluation data.extend_train_hooks
: Extends the hooks for training.reset_export_strategies
: Resets the export strategies with the new_export_strategies.run_std_server
: Starts a TensorFlow server and joins the serving thread.test
: Tests training, evaluating and exporting the estimator for a single step.train
: Fit the estimator using the training data.train_and_evaluate
: Interleaves training and evaluation.
Tensorboar
tensorboard --logdir logs
- check-tiny example
Reference
- hb-research/notes - Attention Is All You Need
- Paper - Attention Is All You Need (2017. 6) by A Vaswani (Google Brain Team)
- tensor2tensor - A library for generalized sequence to sequence models (official code)
Author
Dongjun Lee ([email protected])
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK