README.md

Welcome to the Transfer NLP library, a framework built on top of PyTorch whose goal is to progressively achieve 2 kinds of Transfer:

easy transfer of code: the framework should be modular enough so that you don't have to re-write everything each time you experiment with a new architecture / a new kind of task
easy transfer learning: the framework should be able to easily interact with pre-trained models and manipulate them in order to fine-tune some of their parts.

You can have an overview of the high-level API on this Colab Notebook, which shows how to use the framework on several examples. All examples on these notebooks embed in-cell Tensorboard training monitoring!

Set up your environment

mkvirtualenv transfernlp
workon transfernlp

git clone https://github.com/feedly/transfer-nlp.git
cd transfer-nlp
pip install -r requirements.txt

create a virtual environment: mkvirtualenv YourEnvName
clone the repository: git clone https://github.com/feedly/transfer-nlp.git
Install requirements: pip install -r requirements.txt

The library is available on Pypi but pip install transfer-nlp is not recommended yet.

Documentation

API documentation and an overview of the library can be found here

High-Level usage of the library

You can have a look at the Colab Notebook to get a simple sense of the library usage.

A basic usage is:

# Setup the experiment
config_file  = [Dict config file, or str/Path to a json config file]
experiment = ExperimentConfig.from_json(experiment=config_file)

# Launch the training session
experiment['trainer'].train()

# Use the predictor for inference
input_json = {"inputs": [Some Examples]}
output_json = experiment['predictor'].json_to_json(input_json=input_json)

You can use this code with all existing experiments in experiments/.

How to experiment with the library?

For reproducible research and easy ablation studies, the library enforces the use of configuration files for experiments.

In Transfer-NLP, an experiment config file contains all the necessary information to define entirely the experiment. This is where you will insert names of the different components your experiment will use. Transfer-NLP makes use of the Inversion of Control pattern, which allows you to define any kind of classes you could need, and the ExperimentConfig.from_json method will create a dictionnary and instatiate your objects accordingly.

To use your own classes inside Tranfer-NLP, you need to register them using the @register_plugin decorator. Instead of using a different registry for each kind of component (Models, Data loaders, Vectorizers, Optimizers, ...), only a single registry is used here, in order to enforce total customization.

Currently, the config file logic has 3 kinds of components:

simple parameters: those are parameters which you know the value in advance:

{"initial_learning_rate": 0.01,
"embedding_dim": 100,...}

simple lists: similar to simple parameters, but as a list:

{"layers_dropout": [0.1, 0.2, 0.3], ...}

Complex config: this is whre the library instantiates your objects: this needs to have the _name of the object class (you need to @register_plugin it), and some parameters. If your class has default parameters and your config file doesn't contain them, objects will be instantiated as default. Otherwise the parameters have to be present in the config file. Sometimes, initialization parameters are not available before launching the experiment. E.g., suppose your Model object needs a Vocabulary size as init input. The config file logic here makes it easy to deal with this while keeping the library code very general. You can have a look at the experiments for examples: surnames.py, news.py or cbow.py. The corresponding json files in experiments will show you examples of how to get started.

Usage Pipeline

The goal of the config file is to load a Trainer and run the experiment from it. We provide a BasicTrainer in transfer_nlp.plugins.trainers.py. This basic trainer will take a model and some data as input, and run a whole training pipeline. We make use of the PyTorch-Ignite library to monitor events during training (logging some metrics, manipulating learning rates, checkpointing models, etc...). Tensorboard logs are also included as an option, you will have to specify a tensorboard_logs simple parameters path in the config file. Then just run tensorboard --logdir=path/to/logs in a terminal and you can monitor your experiment while it's training. Tensorboard comes with very nice utilities to keep track of the norms of your model weights, histograms, distributions, visualizing embeddings, ...

Slack integration

While experimenting with your own models / data, the training might take some time. To get notified when your training finishes or crashes, we recommend the simple library knockknock by folks at HuggingFace, which add a simple decorator to your running function to notify you via Slack, E-mail, etc.

Some objectves to reach:

Unit-test everything
Include examples using state of the art pre-trained models
Include linguistic properties to models
Experiment with RL for sequential tasks
Include probing tasks to try to understand the properties that are learned by the models

Acknowledgment

The library has been inspired by the reading of "Natural Language Processing with PyTorch" by Delip Rao and Brian McMahan. Experiments in experiments, the Vocabulary building block and embeddings nearest neighbors are taken or adapted from the code provided in the book.

GitHub - feedly/transfer-nlp: NLP library designed for flexible research and dev...

README.md

Set up your environment

Documentation

High-Level usage of the library

How to experiment with the library?

Usage Pipeline

Slack integration

Some objectves to reach:

Acknowledgment

Recommend

怼了大学学生会，会不会有什么后果？ - 知乎

演员朱一龙有哪些很萌很戳的点？ - 知乎

如果有一台真理机器只能回答是或不是, 能对人类社会作多大贡献? - 知乎

安卓断供：谷歌的七伤拳，华为该如何应对？

深入理解 Webpack 打包分块（上）

一篇文章总结redux、react-redux、redux-saga

线程池没你想的那么简单

9012 年了，Array 数组的方法赶紧用起来！

【前端词典】从源码解读 Vuex 注入 Vue 生命周期的过程

精读《为什么专家不再关心技术细节》

About Joyk