Autocompletion with deep learning

TL;DR: TabNine is an autocompleter that helps you write code faster. We’re adding a deep learning model which significantly improves suggestion quality. You can see videos below and you can sign up for ithere.

There has been a lot of hype about deep learning in the past few years. Neural networks are state-of-the-art in many academic domains, and they have been deployed in production for tasks such as autonomous driving, speech synthesis, and adding dog ears to human faces. Yet developer tools have been slow to benefit from these advances. To use a surprisingly common idiom among software blogs , the cobbler’s children have no shoes.

TabNine hopes to change this. Java:

Python:

C++:

Haskell:

About Deep TabNine

Deep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it. To achieve this goal, it learns complex behaviour, such as type inference in dynamically typed languages:

Ur6rqiR.png!web nYBRJnV.png!web

Deep TabNine can use subtle clues that are difficult for traditional tools to access. For example, the return type of app.get_user() is assumed to be an object with setter methods, while the return type of app.get_users()

is assumed to be a list:

Nj2miur.png!web IJviqae.png!web

Deep TabNine is based on GPT-2 , which uses the Transformer

network architecture. This architecture was first developed to solve problems in natural language processing. Although modeling code and modeling natural language might appear to be unrelated tasks, modeling code requires understanding English in some unexpected ways. For example, we can make the model negate words with an if/else statement:

iyiqqa3.png!web aQz2Ebe.png!web

The model also uses documentation written in natural language to infer function names, parameters, and return types: IZnmiej.png!web

In the past, many users said they wished TabNine came with pre-existing knowledge, instead of looking only at the user’s current project. Pre-existing knowledge is especially useful when the project is small or a new library is being added to it. Deep TabNine helps address this issue; for example, it knows that when a class extends React.Component , its constructor usually takes a single argument called props , and it often assigns this.state

in its body:

rEbqyii.png!web uQv2aei.png!web

Deep TabNine can even do the impossible and remember C++ variadic forwarding syntax:

JN7ba2q.png!web

Using Deep TabNine

Deep TabNine requires a lot of computing power: running the model on a laptop would not deliver the low latency that TabNine’s users have come to expect. So we are offering a service that will allow you to use TabNine’s servers for GPU-accelerated autocompletion. It’s called TabNine Cloud, it’s currently in beta, and you can sign up for ithere.

We understand that many users want to keep their code on their own machine for privacy reasons. We’re taking the following steps to address this use case:

For individual developers, we are working on a reduced-size model which can run on a laptop with reasonable latency. If you are interested in this option, we’d appreciate your time to fill out this short survey about your system.
For enterprises, we will offer the option to license the model from us and run it on your own hardware. We can also train a custom model for you which understands the unique patterns and style within your codebase. If this sounds interesting to you, we would love to hear more about your use case at [email protected] .

If you choose to use TabNine Cloud, we take the following steps to reduce the risk of data breach:

TabNine Cloud will always be opt-in and we will never enable it without explicitly asking for your permission first.
We do not store or log your code after your query is fulfilled.
Your connection to TabNine servers is encrypted with TLS.
There is a setting which lets you use TabNine Cloud for whitelisted directories only.

TabNine Cloud is currently in beta, and scaling it up presents some unique challenges since queries are computationally demanding (over 10 billion floating point operations) yet they must be fulfilled with low latency. To ensure high service quality, we are releasing it gradually. You can request accesshere. Customers of TabNine will be the first to receive access.

Frequently asked questions

This deep learning stuff is cool but I’m skeptical that it can improve over my existing autocompleter which actually parses the code.

You can use both! TabNineintegrates with any autocompleter that implements the Language Server Protocol. TabNine will use your existing autocompleter when it provides suggestions and use Deep TabNine otherwise.

What latency can I expect?

You can look at the videos (1, 2 , 3 ,4) for an idea of the latency. They haven’t been edited or sped up.

What languages are supported?

Deep TabNine supports Python, JavaScript, Java, C++, C, PHP, Go, C#, Ruby, Objective-C, Rust, Swift, TypeScript, Haskell, OCaml, Scala, Kotlin, Perl, SQL, HTML, CSS, and Bash.

Is there any change to the existing $49/99license?

There is no change.

If you have a permanent license, then when we launch TabNine Cloud, you can apply the full purchase price of your license as a discount toward your TabNine Cloud subscription. Of course, you can also decline to use TabNine Cloud with no reduction of functionality.

Software licenses

Only code with one of the following licenses is included in the training data:

MIT
Unlicense
Apache 2.0
BSD 2-clause
BSD 3-clause

Licenses are determined per-repository by Licensee .

Acknowledgements

Thanks to everyone who gave feedback on this blog post, and thanks to OpenAI for open sourcing GPT-2 .

About Deep TabNine

Using Deep TabNine

Frequently asked questions

What latency can I expect?

What languages are supported?

Is there any change to the existing $49/99license?

Software licenses

Acknowledgements

Recommend

GitHub - davidhalter/jedi-vim: Using the jedi autocompletion library for VIM.

GitHub - google/ijaas: Make IntelliJ as a Java server that does autocompletion f...

GitHub - dbcli/pgcli: Postgres CLI with autocompletion and syntax highlighting

GitHub - dbcli/mycli: A Terminal Client for MySQL with AutoCompletion and Syntax...

GitHub - juliosueiras/vim-terraform-completion: A (Neo)Vim Autocompletion and li...

GitHub - alexeyr/company-auctex: company-mode autocompletion for auctex

GitHub - apasccon/SearchTextField: UITextField subclass with autocompletion sugg...

GitHub - marijnz/unity-shell: Write and execute code in an intuitive "shell...

GitHub - walseb/blimp: A complete wrapper around all imagemagick commands with a...

GitHub - fisadev/fisa-nvim-config: my neovim configuration (lot of python, autoc...

About Joyk