3

AutoML Reading Note 2

 2 years ago
source link: https://xijunlee.github.io/2018/12/09/meta-learning/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This post is about meta learning in AutoML, corresponding to Chapter 2 of the book AutoML. Note that all references can be found in the book.

What is meta learning?

Challenges of meta learning

How to achieve meta learning?

Learning from model evaluation

  1. Relative landmarks: a first measure for task similarity considers the relative (pairwise) performance difference.
  2. Surrogate models: a more flexible way to transfer information is to build surrogate model s_j(\thetai) = P{i,j} for all prior task, trained using all available P. One can the define task similarity in terms of the error between s_j(\thetai) and P{i,new}: if the surrogate model for tj can generate accurate prediction for t{new}, then those tasks are intrinsitically similar. This is usually done in combination with Bayesian optimization to determine the next \theta.
  3. Warm-started multi-task learning: another approach to relate prior tasks tj is to learn a joint task representation using P. In [114], task-specic Bayesian linear regression [20] surrogate models sj(i) are trained and combined in a feedforward Neural Network NN(i) which learns a joint task representation that can accurately predict Pi;new. The surrogate models are pre-trained on OpenML meta-data to provide a warm-start for optimizing NN(i) in a multi-task learning setting. Earlier work on multitask learning [165] assumed that we already have a set of `similar’ source tasks tj . It transfers information between these tj and tnew by building a joint GP model for Bayesian optimization that learns and exploits the exact relationship between the tasks.
  4. Multi-armed bandits: another approach to relate prior tasks tj is to learn a joint task representation using P. In [114], task-specic Bayesian linear regression [20] surrogate models sj(i) are trained and combined in a feedforward Neural Network NN(i) which learns a joint task representation that can accurately predict Pi;new. The surrogate models are pre-trained on OpenML meta-data to provide a warm-start for optimizing NN(i) in a multi-task learning setting. Earlier work on multitask learning [165] assumed that we already have a set of `similar’ source tasks tj . It transfers information between these tj and tnew by building a joint GP model for Bayesian optimization that learns and exploits the exact relationshipbetween the tasks.

Learning from task properties

Self-play reinforcement learning approach: AlphaD3M [38] uses a self-play reinforcement learning approach in which the current state is represented by the current pipeline, and actions include the addition, deletion, or replacement of pipeline components. A Monte Carlo Tree Search (MCTS) generates pipelines, which are evaluated to train a recurrent neural network (LSTM) that can predict pipeline performance, in turn producing the action probabilities for the MCTS in the next round. The state description also includes meta-features of the current task, allowing the neural network to learn across tasks.

Learning from prior models

Transfer learning

In transfer learning, we take models trained on one or more source tasks, and use them as starting points for creating a model on a similar target task. This can be done by forcing the target model to be structurally or otherwise similar to the source models.

Neural networks are exceptionally suitable for transfer learning because both the structure and the model parameters of the source models can be used as a good initialization for the target model, yielding a pre-trained model which can be further fine-tuned using available training data on new task.

Meta learning& few-shot learning

This paper is based on the motivation that the parameter update of base learner resembles the updates for cell state in an LSTM. Thus they propose an LSTM-based meta-learner optimizer that is trained to optimize a learner neural network classifier. The meta-learner captures both short-term knowledge within a task and long-term knowledge common among all the tasks. By using an objective that directly captures an optimization algorithm’s ability to have good generalization performance given only a set number of updates, the meta learner model is trained to converge a learner classifier to good solution quickly on each task. Additionally, the formulation of the meta-learner model allows it to learn a task-common initialization for the learner classifier, which captures fundamental knowledge shared among all the tasks.

Model-Agnostic Meta-Learning (MAML) provides a good initialization of a model’s parameters to achieve an optimal fast learning on a new task with only a small number of gradient steps while avoiding overfitting that may happen when using a small dataset.

In the diagram above, θ is the model’s parameters and the bold black line is the meta-learning phase. When we have, for example, 3 different new tasks 1, 2 and 3, a gradient step is taken for each task (the gray lines). We can see that the parameters θ are close to all the 3 optimal parameters of task 1, 2, and 3 which makes θ the best parameters initialization that can quickly adapt to different new tasks. As a result, only a small change in the parameters θ will lead to an optimal minimization of the loss function of any task.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK