68

Self-Supervised Learning

 4 years ago
source link: https://www.tuicool.com/articles/u2Mfem3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
b6zmYzb.jpg!web
Photo by Kevin Gent on  Unsplash

Self-Supervision is in the air . Explaining the difference between self-, un-, weakly-, semi-, distantly-, and fully-supervised learning (and of course, RL) just got a 100 times tougher. :) Nevertheless, we are gonna try.

To encode an object (a word, sentence, image, video, audio, …) into a general-enough representation (blobs of numbers), you try to setup learning tasks between parts of it or different views of it (the self).

Given one part (input) of the object,

can you predict / generate the other part (output)?

For example, given a sentence context around a word, can you (learn to) predict the missing word (skip-grams, BERT). Or, modify the view of an object and predict what changed (rotate an image and predict the rotation angle). Because you are simply playing around with the object, these are free lunch tasks — no external labels needed.

But, now that you have (plenty of) auto-generated input-output examples , go ahead and use every hammer from your supervised learning toolkit to learn a great (universal?) representation for the object.

By trying to predict the self-output from the self-input, you end up learning about the intrinsic properties / semantics of the object, which otherwise would have taken a ton of examples to learn from.

QR3m2eV.png!web

Self-supervision losses have been the silent heroes for a while now, across representation learning for multiple domains (as auto-encoders , word embedders , auxiliary losses, … ). A very nice slide deck here . Now, with the ImageNet moment for NLP (ELMo, BERT and others), I guess they’ve made it on their own. The missing gap in the supervision spectrum that everyone (including AGI) has been waiting for.

Understandably, there is flurry of research activity around newer self-supervision tricks , getting SoTA with fewer examples , and mixing various kinds of supervisions (hello NeurIPS !). Till now, the self-supervised methods mostly try to relate the components of an object, taking one part as input, predict the other part. Let’s see how creative the community gets when playing around with the new hammer.

Also, I’m very curious who claims they were the first to do it :)

PS: if you are looking for someone to ‘supervise’ you (weakly, fully, remotely or even co-supervise) to solve some very interesting text, vision and speech problems, feel free to ping me at [email protected] !


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK