6

Scylla Developer Hackathon: Rust Driver

 3 years ago
source link: https://www.scylladb.com/2021/02/17/scylla-developer-hackathon-rust-driver/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

800x400-blog-hack-rust-driver.png

Scylla’s internal developer conference and hackathon this past year was a lot of fun and very productive. One of the projects we put our efforts into was a new shard-aware Rust driver. We’d like to share with you how far we’ve already gotten, and where we want to take the project next.

Motivation

The CQL protocol, used both by Scylla and Cassandra, already has some drivers for the Rust programming language on the market – cdrs, cassandra-rs and others. Still, we have rigorous expectations towards the drivers, and in particular we really wanted the following:

  • asynchronicity
  • support for prepared statements
  • token-aware routing
  • shard-aware routing (Scylla-specific optimization)
  • paging support

Also, it would be nice to have the driver written in pure Rust, without having to use any unsafe code. Since none of the existing drivers fulfilled our strict expectations, it was clear what we have to do — write our own async CQL driver in pure Rust! And the ScyllaDB hackathon was the perfect opportunity to do just that.

After intensive work during the hackathon, we completed the first version of the scylla-rust-driver and made it open-source available here:

The Team

Here’s our hackathon team, discussing crucial design decisions for the new most popular CQL driver for Rust!

Rust driver hackathon team members beginning with Piotr Sarna in the upper left and clockwise: Pekka Enberg, Piotr Dulikowski, and Kamil Braun.

Design and API example

Our driver exposes a set of asynchronous methods which can be used to establish a CQL session, send all kinds of CQL queries, receive and parse results into native Rust types, and much more. At the core of our driver, there’s a class which represents a CQL session. After establishing a connection to the cluster, the aforementioned session can be used to execute all kinds of requests.

Here’s what you can currently do with the API:

let session = Session::connect("localhost:9042", None).await?; session.refresh_topology().await?;

// Creating a keyspace session .query( "CREATE KEYSPACE IF NOT EXISTS ks WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}", &[] ) .await?;

// Creating a test table session .query( "CREATE TABLE IF NOT EXISTS ks.t (a int, b int, c text, primary key (a, b))", &[], ) .await?;

// Performing a non-prepared, raw statement session .query( "INSERT INTO ks.t (a, b, c) VALUES (?, ?, ?)", &scylla::values!(3, 4, "def"), ) .await?;

// Preparing a statement let prepared = session .prepare("INSERT INTO ks.t (a, b, c) VALUES (?, 7, ?)") .await?; // Executing a prepared statement session .execute(&prepared, &scylla::values!(42_i32, "I'm prepared!")) .await?;

// Getting result rows and parsing them as tuples if let Some(rows) = session.query("SELECT a, b, c FROM ks.t", &[]).await? { for row in rows.into_typed::<(i32, i32, String)>() { let (a, b, c) = row?; println!("a, b, c: {}, {}, {}", a, b, c); } }

// ... or as custom structs that derive from FromRow #[derive(Debug, FromRow)] struct RowData { a: i32, b: Option<i32>, c: String, }

if let Some(rows) = session.query("SELECT a, b, c FROM ks.t", &[]).await? { for row_data in rows.into_typed::<RowData>() { let row_data = row_data?; println!("row_data: {:?}", row_data); } }

// ... or simply as untyped rows if let Some(rows) = session.query("SELECT a, b, c FROM ks.t", &[]).await? { for row in rows { let a = row.columns[0].as_ref().unwrap().as_int().unwrap(); let b = row.columns[1].as_ref().unwrap().as_int().unwrap(); let c = row.columns[2].as_ref().unwrap().as_text().unwrap(); println!("a, b, c: {}, {}, {}", a, b, c); } }

API features include:

  • connect to a cluster
  • refresh cluster topology information
    • how many nodes are in the cluster
    • how many nodes are up
    • which nodes are responsible for which data partitions
  • perform a raw CQL query
    • not paged or with custom page size
  • prepare a CQL statement
  • execute a prepared statement
    • not paged or with custom page size

To see more comprehensive examples, take a look at https://github.com/scylladb/scylla-rust-driver/tree/main/examples

A snapshot of the documentation is available here:

https://psarna.github.io/scylla-rust-driver-docs/scylla/

Implementing the driver with Rust async/await and Tokio

tokio-logo.jpgRust language already has built-in support for asynchronous programming through the async/await mechanism. Additionally, we decided to base the driver on the Tokio framework, which provides an asynchronous runtime for Rust along with many useful features.

Our first step was to implement connection pools used to connect to Scylla/Cassandra clusters and to ensure the driver can handle both unprepared and prepared CQL statements. In order to provide that, we meticulously followed the CQL v4 protocol specification and implemented the initial request types: STARTUP, QUERY, PREPARE and EXECUTE.

Having such a solid footing, we split the work to also provide proper paging and the ability to fetch topology information from the cluster. The latter was needed to make our driver token-aware and shard-aware.

Token awareness allows the driver to route the request straight to the right coordinator node which owns the particular partition, which avoids inter-node communication and generally lowers the overhead. Shard awareness is one step further and is only supported when using the driver to connect to Scylla. The idea is that the request ends up not only on the right node, but also on the right CPU core, thus avoiding inter-core communication and minimizing the overhead even further. Read more about Scylla shard awareness and its positive effect on performance in a great blog which described this optimization for a Python driver.

Interlude: fixing murmur3 by implementing it with bugs

Wait, what? That’s right, during the hackathon we ended up needing to rewrite a murmur3 hashing algorithm with bugs in order to stay fully compatible with Apache Cassandra!

When performing token awareness tests, I noticed that around 30% of all requests ended up on a wrong node. That shouldn’t happen, so we quickly started an investigation. We meticulously checked:

  1. That the topology information fetched from Scylla is indeed correct,
    and consistent with the output of `nodetool describering`
  2. That the token computations return correct results on the first 100 keys,
    which makes it highly unlikely that token computation is to blame

… and we shouldn’t have stopped at checking just 100 keys! It turns out that the first failure happened after we rerun the test for the first 10,000 keys. Further investigation showed that a similar problem occurred for our Golang friends: https://github.com/gocql/gocql/issues/1033.

In short, Cassandra’s murmur3 implementation handwritten in Java operates on signed integers, while the original algorithm used unsigned ones. That creates some subtle differences when shifting the values, which in turn translates to around 30% of tokens being calculated inconsistently with the Cassandra way.

We had no choice but to stop using a comfy crate from crates.io which provided us with a neat murmur3 algorithm implementation and instead we spent the whole night rewriting the algorithm by hand, bugs included™!

Results

We ran two simple benchmarks to see how scylla-rust-driver compares to other existing drivers.

The benchmark’s goal was to send 10 million prepared statements as fast as possible, given a fixed concurrency of 1,024. The usage of token-aware and shard-aware routing was allowed, if supported by the driver. All drivers were compared against the same 3-node Scylla cluster, each node having 2 shards. We compared against GoCQL (enhanced by us with shard awareness) and cdrs.

gocql scylla-rust-driver cdrs Writes real 0m59.658s
user 15m21.846s
sys 2m32.438s real 0m18.310s
user 1m44.170s
sys 0m36.318s real 12m34.761s
user 2m14.253s
sys 6m21.757s Reads real 1m6.276s
user 17m35.803s
sys 2m46.497s real 0m23.928s
user 1m52.654s
sys 0m41.791s real 12m50.929s
user 3m7.008s
sys 6m43.048s Mixed
(reads and writes) real 1m3.409s
user 17m23.127s
sys 2m29.905s real 0m19.715s
user 1m51.372s
sys 0m35.209s real 13m28.133s
user 2m44.918s
sys 6m42.705s

Output of Linux’ time command for processing 10 million prepared statements with a fixed concurrency of 1,024 using different drivers.

Source code of all benchmarks:

Future plans

logo-Uniwersytet-Warszawski.png

The future of our project is very bright. As a matter of fact, it’s already scheduled for another year of hands-on work! A team of four talented students from the University of Warsaw will continue developing the driver from where we left off, as part of the ZPP program. This is the second time Scylla is proudly taking part in the program as a mentor. Here are the other projects from last year:

We also have an official roadmap, updated version of which can always be found in our repository (https://github.com/scylladb/scylla-rust-driver):

Done:

  • driver-side metrics
    • number of sent requests
    • latency percentiles of sent requests
    • number of errors
  • handling topology changes and presenting them to the user
  • CQL batch statements
  • custom error types
    • robust handling of various errors (e.g. repreparing statements)

In progress:

  • CQL authentication support
  • TLS support
  • configurable load balancing algorithms
  • configurable retry policies
  • query builders

Backlog:

  • CQL tracing
  • [additional] performance benchmarks against other drivers
    • gocql
  • handling events pushed by the server
  • speculative execution
  • expanding the documentation
  • more correctness tests
  • more integration tests – preferably using Scylla’s ccm framework and Python
  • preparing the work to be published on crates.io

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK