README.md

LocustDB

An experimental analytics database aiming to set a new standard for query performance on commodity hardware. See How to Analyze Billions of Records per Second on a Single Desktop PC for an overview of current capabilities.

How to use

Install Rust
Clone the repository

git clone https://github.com/cswinter/LocustDB.git
cd LocustDB

Run the repl!

RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo +nightly run --release --bin repl -- test_data/nyc-taxi.csv.gz

Instead of test_data/nyc-taxi.csv.gz, you can also pass a path to any other .csv or gzipped .csv.gz file. The first line of the file will need to contain the names for each column. The datatypes for each column will be derived automatically, but things might break for columns that contain a mixture of numbers/strings/empty entries.

You can pass the magic strings nyc100m or nyc to load the first 5 files (100m records) or full 1.46 billion taxi rides dataset which you will need to download first (for the full dataset, you will need about 120GB of disk space and 60GB of RAM).

Running tests or benchmarks

cargo +nightly test

RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo +nightly bench

Goals

A vision for LocustDB.

Fast

Query performance for analytics workloads is best-in-class on commodity hardware, both for data cached in memory and for data read from disk.

Cost-efficient

LocustDB automatically achieves spectacular compression ratios, has minimal indexing overhead, and requires less machines to store the same amount of data than any other system. The trade-off between performance and storage efficiency is configurable.

Low latency

New data is available for queries within seconds.

Scalable

LocustDB scales seamlessly from a single machine to large clusters.

Flexible and easy to use

LocustDB should be usable with minimal configuration or schema-setup as:

a highly available distributed analytics system continuously ingesting data and executing queries
a commandline tool/repl for loading and analysing data from CSV files
an embedded database/query engine included in other Rust programs via cargo

Non-goals

Until LocustDB is production ready these are distractions at best, if not wholly incompatible with the main goals.

Strong consistency and durability guarantees

small amounts of data may be lost during ingestion
when a node is unavailable, queries may return incomplete results
results returned by queries may not represent a consistent snapshot

High QPS

LocustDB does not efficiently execute queries inserting or operating on small amounts of data.

Full SQL support

All data is append only and can only be deleted/expired in bulk.
LocustDB does not support queries that cannot be evaluated independently by each node (large joins, complex subqueries, precise set sizes, precise top n).

Support for cost-inefficient or specialised hardware

LocustDB does not run on GPUs.

GitHub - cswinter/LocustDB: Massively parallel, high performance analytics datab...

README.md

LocustDB

How to use

Running tests or benchmarks

Goals

Fast

Cost-efficient

Low latency

Scalable

Flexible and easy to use

Non-goals

Strong consistency and durability guarantees

High QPS

Full SQL support

Support for cost-inefficient or specialised hardware

Recommend

升级MySQL5.7，开发不得不注意的坑 - iVictor

思科前高管汇聚一堂，“MPLS”又添新成员

Spring理论基础-控制反转和依赖注入

三个值得期待的JavaScript新功能！ - 众成翻译

Apollo 源码解析 —— OpenAPI 认证与授权（一）之认证 | 芋道源码 —— 纯源码解析博客

从零开始学前端动画 —— 简单的特效登录

微信小程序朋友圈分享图片生成方案实现

高管出走、创始人学佛，“共享住宿第一股”住百家怎么了？

Shine 闪闪相机 - 随时随地为拍摄加上少女心闪光特效。记录分享你的每一个闪耀时刻！...

《3D绳结》iOS数字版游戏限时免费_Apple Store中国官网优惠

About Joyk