40

BlazingSQL is Now Open Source

 4 years ago
source link: https://www.tuicool.com/articles/bymAzaV
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

All of It.

Aug 5 ·3min read

vEbUR3V.png!web

BlazingSQL, the GPU-accelerated SQL engine of the RAPIDS ecosystem, is now 100% open-source licensed under Apache 2.0!

Check out the code on our Github page .

BlazingSQL is not a database, which is why we changed our original name of BlazingDB to BlazingSQL. It is a SQL engine that processes (almost) any data you want.

Working within RAPIDS has been game-changing. There are now over 100 developers contributing to our community. Most of these developers come from enterprise and their contributions add valuable features to BlazingSQL, like support for more file formats.

As RAPIDS adoption continues to explode, open-sourcing BlazingSQL accelerates our development cycle, gets our product in the hands of more users, and aligns our licensing and messaging with the greater RAPIDS.ai ecosystem.

“NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS,” said Josh Patterson, GM of data science at NVIDIA. “By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem.”

We went all-in on RAPIDS before it had a name. Now, open-sourcing is the culmination of a strategy by NVIDIA and BlazingSQL.

NVIDIA stepped up to ensure RAPIDS would solve customer problems at scale. BlazingSQL, in addition to contributing heavily to the RAPIDS ecosystem, will focus on the services and support agreements necessary to make RAPIDS + BlazingSQL deployments successful and accessible to all.

Customer Challenges

When we talk about challenges our customers are facing around their analytics pipelines we hear the same complaints over and over; processing data at scale is expensive, slow, and incredibly complex.

  • Expensive — Customers cluster thousands of servers together for data science at scale. BlazingSQL + RAPIDS requires a small fraction of the infrastructure to run at an equivalent scale.
  • Slow — Workloads and queries can take hours or days on large data sets. BlazingSQL + RAPIDS provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models.
  • Complex — Workloads are prototyped at small scale and then rebuilt for distributed systems. BlazingSQL + RAPIDS enables users to write code once and dynamically change the scale of distribution with a single line of code.

BlazingSQL addresses these customer concerns not only with an incredibly fast, distributed GPU SQL engine, but also a zealous focus on simplicity.

With a few lines of code, BlazingSQL can query your raw data, wherever it resides and interoperate with your existing analytics stack and RAPIDS.

The Future of Analytics

RAPIDS is the next-generation analytics ecosystem. SQL forms a fundamental pillar of every major analytics ecosystem to date, and BlazingSQL is the SQL standard for RAPIDS.

For this reason, we are fully integrated with the greater RAPIDS team and contribute heavily to cuDF . BlazingSQL is built entirely on top of cuDF and cuIO . New features pushed to these projects directly impact BlazingSQL features and performance, and because BlazingSQL runs on GDFs it is 100% interoperable with all of RAPIDS.

Something we wish to make very clear, if you are a user of RAPIDS, or are considering RAPIDS (which you honestly should), you need to check out BlazingSQL and add it to your stack. BlazingSQL offers RAPIDS users countless benefits, not limited to:

  • Reducing code complexity — SQL is easy and can replace dozens to hundreds of cuDF function calls with a single statement.
  • Connect to data lakes — never synch another database, BlazingSQL can query raw files in your cloud/networked filesystem.
  • Make RAPIDS faster — advanced SQL optimizers help the RAPIDS stack run smarter, not just harder.

“Open-sourcing redefines what’s possible, and now partners, like NVIDIA, are contributing code to the BlazingSQL codebase to provide customers with holistic data science solutions.” — Felipe Aramburu CTO

Time to Roll Up Your Sleeves

So if it isn’t abundantly clear, this is an open-source project. The only thing left to do is try BlazingSQL out, work with it, BREAK it (because you will), and maybe even help fix it.

You can get started easily, and on free GPUs, through our Google Colab Demos. You can also install on any device of your choosing through our Dockerhub container, or if you really want the guts, you can build from the source code here.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK