22

Autocompletion with deep learning

 4 years ago
source link: https://www.tuicool.com/articles/mInqumi
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

TL;DR: TabNine is an autocompleter that helps you write code faster. We’re adding a deep learning model which significantly improves suggestion quality. You can see videos below and you can sign up for ithere.

There has been a lot of hype about deep learning in the past few years. Neural networks are state-of-the-art in many academic domains, and they have been deployed in production for tasks such as autonomous driving, speech synthesis, and adding dog ears to human faces. Yet developer tools have been slow to benefit from these advances. To use a surprisingly common idiom among software blogs , the cobbler’s children have no shoes.

TabNine hopes to change this. Java:

Python:

C++:

Haskell:

About Deep TabNine

Deep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it. To achieve this goal, it learns complex behaviour, such as type inference in dynamically typed languages:

Ur6rqiR.png!webnYBRJnV.png!web

Deep TabNine can use subtle clues that are difficult for traditional tools to access. For example, the return type of app.get_user() is assumed to be an object with setter methods, while the return type of app.get_users()

is assumed to be a list:

Nj2miur.png!webIJviqae.png!web

Deep TabNine is based on GPT-2 , which uses the Transformer

network architecture. This architecture was first developed to solve problems in natural language processing. Although modeling code and modeling natural language might appear to be unrelated tasks, modeling code requires understanding English in some unexpected ways. For example, we can make the model negate words with an if/else statement:

iyiqqa3.png!webaQz2Ebe.png!web

The model also uses documentation written in natural language to infer function names, parameters, and return types: IZnmiej.png!web

In the past, many users said they wished TabNine came with pre-existing knowledge, instead of looking only at the user’s current project. Pre-existing knowledge is especially useful when the project is small or a new library is being added to it. Deep TabNine helps address this issue; for example, it knows that when a class extends React.Component , its constructor usually takes a single argument called props , and it often assigns this.state

in its body:

rEbqyii.png!webuQv2aei.png!web

Deep TabNine can even do the impossible and remember C++ variadic forwarding syntax:

JN7ba2q.png!web

Using Deep TabNine

Deep TabNine requires a lot of computing power: running the model on a laptop would not deliver the low latency that TabNine’s users have come to expect. So we are offering a service that will allow you to use TabNine’s servers for GPU-accelerated autocompletion. It’s called TabNine Cloud, it’s currently in beta, and you can sign up for ithere.

We understand that many users want to keep their code on their own machine for privacy reasons. We’re taking the following steps to address this use case:

  • For individual developers, we are working on a reduced-size model which can run on a laptop with reasonable latency. If you are interested in this option, we’d appreciate your time to fill out this short survey about your system.

  • For enterprises, we will offer the option to license the model from us and run it on your own hardware. We can also train a custom model for you which understands the unique patterns and style within your codebase. If this sounds interesting to you, we would love to hear more about your use case at [email protected] .

If you choose to use TabNine Cloud, we take the following steps to reduce the risk of data breach:

  1. TabNine Cloud will always be opt-in and we will never enable it without explicitly asking for your permission first.
  2. We do not store or log your code after your query is fulfilled.
  3. Your connection to TabNine servers is encrypted with TLS.
  4. There is a setting which lets you use TabNine Cloud for whitelisted directories only.

TabNine Cloud is currently in beta, and scaling it up presents some unique challenges since queries are computationally demanding (over 10 billion floating point operations) yet they must be fulfilled with low latency. To ensure high service quality, we are releasing it gradually. You can request accesshere. Customers of TabNine will be the first to receive access.

Frequently asked questions

This deep learning stuff is cool but I’m skeptical that it can improve over my existing autocompleter which actually parses the code.

You can use both! TabNineintegrates with any autocompleter that implements the Language Server Protocol. TabNine will use your existing autocompleter when it provides suggestions and use Deep TabNine otherwise.

What latency can I expect?

You can look at the videos (1, 2 , 3 ,4) for an idea of the latency. They haven’t been edited or sped up.

What languages are supported?

Deep TabNine supports Python, JavaScript, Java, C++, C, PHP, Go, C#, Ruby, Objective-C, Rust, Swift, TypeScript, Haskell, OCaml, Scala, Kotlin, Perl, SQL, HTML, CSS, and Bash.

Is there any change to the existing $49/99license?

There is no change.

If you have a permanent license, then when we launch TabNine Cloud, you can apply the full purchase price of your license as a discount toward your TabNine Cloud subscription. Of course, you can also decline to use TabNine Cloud with no reduction of functionality.

Software licenses

Only code with one of the following licenses is included in the training data:

  • MIT
  • Unlicense
  • Apache 2.0
  • BSD 2-clause
  • BSD 3-clause

Licenses are determined per-repository by Licensee .

Acknowledgements

Thanks to everyone who gave feedback on this blog post, and thanks to OpenAI for open sourcing GPT-2 .


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK