A lightweight, GPU accelerated, SQL engine built on RAPIDS

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem.

uiQJVva.png!web

BlazingSQL is a GPU accelerated SQL engine built on top of the RAPIDS ecosystem. RAPIDS is based on the Apache Arrow columnar memory format, and cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

BlazingSQL is a SQL interface for cuDF, with various features to support large scale data science workflows and enterprise datasets.

Query Data Stored Externally - a single line of code can register remote storage solutions, such as Amazon S3.
Simple SQL - incredibly easy to use, run a SQL query and the results are GPU DataFrames (GDFs).
Interoperable - GDFs are immediately accessible to any RAPIDS library for data science workloads.

Check out our 5-min quick start notebookusing BlazingSQL.

Getting Started

Please reference our docs to find out how to install BlazingSQL.

Querying a CSV file in Amazon S3 with BlazingSQL:

For example:

from blazingsql import BlazingContext
bc = BlazingContext()

bc.s3('dir_name', bucket_name='bucket_name', access_key_id='access_key', secret_key='secret_key')

# Create Table from CSV
bc.create_table('taxi', '/dir_name/taxi.csv')

# Query
result = bc.sql('SELECT count(*) FROM taxi GROUP BY year(key)').get()
result_gdf = result.columns

#Print GDF 
print(result_gdf)

Examples

Getting Started Guide - Google Colab
Netflow Demo - Google Colab
Taxi cuML Linear Regression - Google Colab

Documentation

You can find our full documentation at the following site

Build/Install from Source

See build instructions .

Contributing

Have questions or feedback? Post a new github issue .

Please see our guide for contributing to BlazingSQL .

Contact

Feel free to join our Slack chat room: RAPIDS Slack Channel

You may also email us at [email protected] or find out more details on the BlazingSQL site

License

Apache License 2.0

RAPIDS AI - Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Apache Arrow on GPU

The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

Getting Started

Examples

Documentation

Build/Install from Source

Contributing

Contact

License

RAPIDS AI - Open GPU Data Science

Apache Arrow on GPU

Recommend

13-Year-Old Encryption Bugs Still Haunt Apps and IoT

Rifiuti2：一款针对Windows回收站的文件分析工具

以太坊智能合约数超20万，居Defi生态系统主导地位

SwiftUI Scale Image Tutorial

Uhuru Kenyatta’s 2019 State of the Nation Address, Most Positive

Complex Queries in SQL

KDD Cup 2019 AutoML Track冠军深兰科技DeepBlueAI团队技术分享 | 开源代码

码良：在线生成 h5 页面并提供页面管理和页面编辑的平台

深度学习+量子计算打印文章整理 - 知乎

微服务与网关技术（SIA-GateWay）

About Joyk