Github GitHub - OATML/non-parametric-transformers: Code for "Self-Attention...

3 years ago

source link: https://github.com/OATML/non-parametric-transformers
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Overview | Abstract | Installation | Examples | Citation

Overview

Hi, good to see you here!

Thanks for checking out the code for Non-Parametric Transformers (NPTs).

This codebase will allow you to reproduce experiments from the paper as well as use NPTs for your own research.

Abstract

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

Installation

Set up and activate the Python environment by executing

conda env create -f environment.yml
conda activate npt

For now, we recommend installing CUDA <= 10.2:

See issue with CUDA >= 11.0 here.

If you are running this on a system without a GPU, use the above with environment_no_gpu.yml instead.

Examples

We now give some basic examples of running NPT.

NPT downloads all supported datasets automatically, so you don't need to worry about that.

We use wandb to log experimental results. Wandb allows us to conveniently track run progress online. If you do not want wandb enabled, you can run wandb off in the shell where you execute NPT.

For example, run this to explore NPT with default configuration on Breast Cancer

python run.py --data_set breast-cancer

Another example: A run on the poker-hand dataset may look like this

python run.py --data_set poker-hand \
--exp_batch_size 4096 \
--exp_print_every_nth_forward 100

You can find all possible config arguments and descriptions in NPT/configs.py or using python run.py --help.

In scripts/ we provide a list with the runs and correct hyperparameter configurations presented in the paper.

We hope you enjoy using the code and please feel free to reach out with any questions

Citation

If you find this code helpful for your work, please cite our paper Paper as

@article{kossen2021self,
  title={Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning},
  author={Kossen, Jannik and Band, Neil and Gomez, Aidan N. and Lyle, Clare and Rainforth, Tom and Gal, Yarin},
  journal={arXiv:2106.02584},
  year={2021}
}

Recommend

135

Github github.com 7 years ago
Cache

GitHub - jsilter/parametric_tsne: Python / Tensorflow / Keras implementation of...

Overview This is a python package implementing parametric t-SNE. We train a neural-network to learn a mapping by minimizing the Kullback-Leibler divergence between the Gaussian distance metric in the high-dimensional space and the Student...

www.tuicool.com 6 years ago
Cache

Introduction to OCaml, part 4: higher order functions, parametric polymorphism a...

Higher order functions and parametric polymorphism Parametric polymorphism So far we have only worked with functions that take value of a single type known beforehand. However, we have already seen...

Github github.com 6 years ago
Cache

GitHub - openai/sparse_attention: Examples of using sparse attention, as in &quo...

README.md Status: Archive (code is provided as-is, no updates expected) Sparse Attention This repository contains the sparse at...

Github github.com 5 years ago
Cache

GitHub - facebookresearch/adaptive-span: Adaptive Attention Span in Transformers

README.md Adaptive Attention Span for Transformers This is a code for running experiments in Adaptive Attent...

towardsdatascience.com 4 years ago
Cache

Evolution of Language Models: N-Grams, Word Embeddings, Attention & Transfor...

In this post, I thought it would be nice to collate some research on the advancements of Natural Language Processing (NLP) over the years.

Github github.com 3 years ago
Cache

Github GitHub - lucidrains/x-transformers: A simple but complete full-attention...

x-transformers A concise but fully-featured transformer, complete with a set of promising experimental features from various papers. Install $ pip install x-transformers Usage

Github github.com 3 years ago
Cache

GitHub - microsoft/Focal-Transformer: Official code for "Focal Self-attenti...

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transformers", by Jianwei Yang, Chunyu...

Github github.com 3 years ago
Cache

GitHub - CadQuery/cadquery: A python parametric CAD scripting framework based on...

CadQuery What is CadQuery CadQuery is an intuitive, easy-to-use Python module for building parametric 3D CAD models. Using CadQuery, you can write short, simple scripts that produce high quality CAD models. It is easy to mak...

news.ycombinator.com 1 year ago
Cache

Ask HN: Can someone ELI5 transformers and the “Attention is all we need” paper?

bactra.org 1 year ago
Cache

"Attention", "Transformers", in Neural Network "Large L...

"Attention", "Transformers", in Neural Network "Large Language Models" 05 Dec 2023 10:14 Yet Another Inadequate Placeholder I find this literature irritating and opaque. This is at least somewhat because I do not...