![](/style/images/good.png)
5
![](/style/images/bad.png)
gRPC load balancing in Rust
source link: https://truelayer.com/blog/grpc-load-balancing-in-rust
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
In the first post in our open source series, we share our solution for gRPC client-side load balancing, using ginepro.
We want to give back to the wider developer community. Each post in our open source series walks you through a challenge faced by TrueLayer's engineering teams – the code we wrote to solve the issue is released under an OSS license.In this post, we'll explain how we tackled the challenge of gRPC load balancing in Rust.Our solution was to release ginepro, a new gRPC channel implementation for tonic. ginepro (github) provides client-side gRPC load balancing by enriching tonic 's channel with periodic service discovery.The background
TrueLayer has recently started adopting Rust as a backend language. As we deploy more and more services to production we have to constantly improve our applications so that they can handle more load and achieve the required reliability SLAs.Load balancing gRPC requests has been a challenge: we do not use a service mesh and there was no gRPC client in the Rust ecosystem that satisfied all our requirements. To bridge the gap we built ginepro – an add-on to tonic's Channel which provides service discovery to perform client-side look-aside gRPC load balancing.1// Using the `LoadBalancedChannel`.
2use ginepro::LoadBalancedChannel;
3use ginepro::pb::tester_client::TesterClient;
4
5// Build a load-balanced channel given a service name and a port.
6let load_balanced_channel = LoadBalancedChannel::builder(
7 ("my_hostname", 5000)
8 )
9 .await
10 .expect("Failed to initialise the DNS resolver.")
11 .channel();
12
13// Initialise a new gRPC client for the `Test` service
14// using the load-balanced channel as transport
15let grpc_client = TestClient::new(load_balanced_channel);
1// Using `tonic::Channel`
2use tonic::endpoint::Channel;
3use ginepro::pb::tester_client::TesterClient;
4
5let channel = Channel::from_static("my_hostname:5000");
6let grpc_client = TestClient::new(channel);
The problem
gRPC uses the HTTP/2 protocol to multiplex requests and responses over a single TCP connection. This allows gRPC to be more efficient: you only pay the cost of establishing a connection once and better utilise the capacity of the underlying transport.Multiplexing, though, has a few implications when it comes to load balancing.HTTP/2 load balancingHTTP/2 connections are persistent: a direct connection between a client (or a load-balancer) and a specific server should remain open as long as possible.We do not open a new connection to a server every time we want to make a request.Here we can say that load balancing is done on a per-request basis: for every request the client will choose a new server and issue that request through an existing connection.But what happens if load balancing is moved out of the client?Clients will maintain a connection to a load balancer and all requests go through that single connection. Traditional network load balancers, however, are unable to tell application requests apart.Since network load balancers function at the fourth layer of the OSI stack (known as the transport layer), they can only reason about TCP and UDP connections. Therefore they will only able to forward the traffic from one client to one fixed server (remember that connections are persistent!).![Diagram](https://truelayer.com/static/bb1207d1161d83ff53372af98f38ac8c/53236/load-balancing-in-rust-2.png)
![Diagram](https://truelayer.com/static/b6bbf80c7a6cd20909ea772cc4b81e6b/53236/load-balancing-in-rust-3.png)
![Diagram](https://truelayer.com/static/c133ca5cc89d86b0762d4352f3f7b5d1/53236/load-balancing-in-rust-4.png)
An overview of gRPC load-balancing approaches
There are a couple of approaches we could take to avoid the scenario we just described:- The servers periodically force the client to reconnect
- The client periodically performs service discovery
- We introduce an application load balancer
![Diagram](https://truelayer.com/static/186d54bcff0e6a506cfb26918fb6655b/53236/load-balancing-in-rust-5.png)
- Look-aside services that tells the client which server to call
- A completely separate service where the load balancing, health checks, load reporting and service discovery is completely transparent to the application (eg Envoy).
- maintain a connection to the look-aside process (what server should I call?)
- establish and maintain open connections to all healthy server backends.
![Diagram](https://truelayer.com/static/c9d330da9f2f27b116645e0e29a8f08d/53236/load-balancing-in-rust-6.png)
- Service mesh: dedicated infrastructure layer that controls service-to-service communication (eg Istio and Linkerd), deployed as a sidecar.
![Diagram](https://truelayer.com/static/22c74d7d88f5aef89c79623a3db2667d/53236/load-balancing-in-rust-7.png)
- Service Proxy: single standalone service that all clients connect to and is configured for each gRPC service.
![Diagram](https://truelayer.com/static/66eae2ac58f7978b134bbae569da94b9/53236/load-balancing-in-rust-9.png)
- Container sidecar proxy: sidecar proxy is deployed alongside every client that are all configured to proxy across the same gRPC service.
![Diagram](https://truelayer.com/static/a5d71583784aae2fc424521170940e87/53236/load-balancing-in-rust-8.png)
- More moving parts in the hot path, impacting the latency of your system
- Both a service mesh and standalone proxies add a lot of complexity to your setup, with novel failure modes. They need to be set up, monitored and maintained.
TrueLayer’s approach
TrueLayer leverages gRPC to have strongly-typed contracts between applications written in various programming languages (C#, Rust, TypeScript, Python).We currently do not run a service mesh in our Kubernetes clusters, therefore we do not get gRPC load balancing out of the box.Historical precedent, C#: use an Envoy sidecarMost of our early gRPC servers and clients were written in C#.There we used the sidecar approach – a manually-configured Envoy proxy. With an Envoy sidecar you get a production hardened-solution with a considerable community around it. It was the fastest way to get gRPC load-balancing working at that point in time.Standalone sidecar proxies, as we discussed, increase the overall complexity of the system: it is another component to configure, operate and understand. In particular, configuration management scales poorly as the number of services increases while testing and reproducing failure modes locally or on CI is fairly hard.Rust opportunitiesWhat about Rust? What does the gRPC landscape look like? Is client-side load-balancing viable?Let’s look at Rust’s most popular gRPC crates:- grpc-rs by TiKV – implements load balancing but no way of updating service IPs;
- grpc-rust – does not implement load balancing;
- tonic – implements load balancing and has building blocks for updating endpoints dynamically.
1use tonic::transport::{Endpoint, Channel};
2
3// Create a Channel.
4// This returns the sender half of a multi-producer single-consumer channel.
5let (channel, sender) = Channel::balance_endpoint(1024);
6
7// Add a new Endpoint.
8sender.send(
9 Change::Insert(
10 "localhost:8080",
11 Endpoint::from_static("localhost:8080")
12 )
13);
14
15// Remove the Endpoint from the list of healthy targets.
16sender.send(Change::Remove("localhost:8080"));
1/// Interface that provides functionality to
2/// acquire a list of ips given a valid host name.
3#[async_trait::async_trait]
4pub trait LookupService {
5 /// Return a list of unique `SockAddr` associated with the provided
6 /// `ServiceDefinition` containing the `hostname` `port` of the service.
7 /// If no ip addresses were resolved, an empty `HashSet` is returned.
8 async fn resolve_service_endpoints(
9 &self,
10 definition: &ServiceDefinition,
11 ) -> Result<HashSet<SocketAddr>, anyhow::Error>;
12}
loop {
let discovered_endpoints = self
.lookup_service
.resolve_service_endpoints(service_definition).await;
let changeset = self.create_changeset(&discovered_endpoints).await;
// Report the changeset to `tonic` to update the list of available server IPs.
self.report_and_commit(changeset, endpoints).await?;
// Wait until the next interval.
tokio::time::sleep(self.probe_interval).await;
}
Summary
We were able to test ginepro extensively in CI prior to its deployment – the benefit of client-side solution written in the same stack of the service! Testing uncovered a few bugs in tonic (around transport and TLS) – we upstreamed patches as a result (1 and 2)ginepro was deployed in production five months ago, across several gRPC clients. We haven’t experienced any issue related to gRPC load balancing (yet).There is a catch: it only works for our Rust services. It might not be the final chapter of TrueLayer’s own saga for gRPC load balancing.Does the future hold a service mesh? We shall see. Nonetheless, there is value in the solution – that’s why we are opening it up to the Rust ecosystem as a whole. We hope other developers can build on top of our work and push forward the state of the gRPC stack within the Rust ecosystem. ginepro is only the beginning of our open source journey – the next issue will cover the machinery we built to extend reqwest with support for middlewares.We're hiring! If you're passionate about Rust and building great products, take a look at our job opportunitiesRecommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK