RPC Frameworks: gRPC vs Thrift vs RPyC for python
source link: https://www.tuicool.com/articles/hit/NzQjQrU
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Introduction
The year was 2015. I was writing a bunch of ML training scripts as well as several production scripts. They all needed financial data. Data was spread across multiple tables and multiple datastores. Intraday market data was stored differently in a cassandra cluster, while the daily/monthly data was in a MySQL database. Similarly, different types of securities (Future, Option, Stock etc.) were stored in different locations.
So, I decided to make a data library that I can use in my scripts. This data library turned out to be quite popular with my team. It had all the things we needed at that point:
- A single interface for all data types - Futures, Stocks, ETFs, Currencies, Indexes, and Funds from different exchanges.
- Easy to use interface.
- Flexible in terms of the data intervals supported. It worked flawlessly for interday, daily and monthly time periods.
- It could be used for both live ingestion/consumption as well as historical data requirements.
- It was easy to support a new type of data - for example, macroeconomic indicators.
However, it had some fatal flaws that I could not foresee at that point. Over time, the number of production scripts relying on this library grew exponentially. Our data library directly called database queries .
- Changing anything in the database would break existing production processes . So, there was no way of changing the database without incurring downtime.
- Additionally, rapidly increasing production processes caused a strain on the database. Because the database access was finely ingrained into the rest of the codebase, it was not possible to optimize or load balance properly.
About a year ago, I was asked if we should convert that library to a service. I brushed it away - not realizing the problems I was going to face in the next year. To be fair, I didn’t fully understand the services or microservices at that point - that made me skeptical of its use for something like data fetching. I was still convinced the flexibility and rapid changes would only come from having that code as a library.
But, I finally started taking another look at services a few days ago. I looked at gRPC
, Thrift
and RPyC
over the past few days. I am summarizing my initial findings in this post. Because I mostly use python for everything, I am approaching these frameworks from that point of view.
You can find the code for the subsequent examples in this repo .
gRPC
gGPC uses Protocol Buffers for serialization and deserialization. It was developed by Google - they released this as an open source software when they were rewriting their internal framework called stubby. At the moment, several companies including Netflix and Square are using this framework to implement their services.
Let’s jump directly into the simplest example.
We will use the same toy example for all 3 frameworks:
- We will define a service called
Time
. - It implements a single RPC call called
GetTime
. -
GetTime
doesn’t take any argument and returns the current server time instring
format.
Simple gRPC Example
Create a time.proto
Protocol Buffers file describing our service.
syntax = "proto3"; package time; service Time { rpc GetTime (TimeRequest) returns (TimeReply) {} } // Empty Request Message message TimeRequest { } // The response message containing the time message TimeReply { string message = 1; }
And here’s a bit of explanation of the above code.
Now, use the above protobuf file to generate python files time_pb2.py
and time_pb2_grpc.py
. We will use them for both our server and client code. Here’s the command line code to do so (you will need the grpcio-tools
python package):
python -m grpc_tools.protoc --python_out=. --grpc_python_out=. time.proto
Create the server script server.py
.
import time from concurrent import futures import grpc import time_pb2 import time_pb2_grpc _ONE_DAY_IN_SECONDS = 60 * 60 * 24 class Timer(time_pb2_grpc.TimeServicer): def GetTime(self, request, context): return time_pb2.TimeReply(message=time.ctime()) def serve(): server = grpc.server(futures.ThreadPoolExecutor(max_workers=10)) time_pb2_grpc.add_TimeServicer_to_server(Timer(), server) server.add_insecure_port('[::]:50051') server.start() try: while True: time.sleep(_ONE_DAY_IN_SECONDS) except KeyboardInterrupt: server.stop(0) if __name__ == '__main__': serve()
And here’s the annotated server code:
Add client code to the client.py
file.
import grpc import time_pb2 import time_pb2_grpc def run(): channel = grpc.insecure_channel('localhost:50051') stub = time_pb2_grpc.TimeStub(channel) response = stub.GetTime(time_pb2.TimeRequest()) print('Client received: {}'.format(response.message)) if __name__ == '__main__': run()
I have added the annotated client code below.
More Details
gRPC uses HTTP/2 for client-server communication. Every RPC call is a separate stream in the same TCP/IP connection.
4 different types of RPCs supported:
- Unary RPC - a single request followed by a single response from the server. Our TimeService example uses Unary RPC.
rpc GetTime (TimeRequest) returns (TimeReply) {}
- Server Streaming RPC - client sends a request and gets a stream to read from.
rpc GetTime (TimeRequest) returns (stream TimeReply) {}
- Client Streaming RPC - Client writes a sequence of messages.
rpc GetTime (stream TimeRequest) returns (TimeReply) {}
- Bidirectional Streaming RPC - Both sides send a sequence of messages using a read-write stream.
rpc GetTime (stream TimeRequest) returns (stream TimeReply) {}
gRPC comes with an inbuilt timeout functionality. This is quite handy in practice. Many applications require a response within a certain time interval.
Pros and Cons
Pros:
- Multiple Language Support for both servers and clients.
- It uses HTTP/2 by default for connections.
- Abundant documentation.
- This project is actively supported by Google and others.
Cons:
- Less flexibility (especially compared to
rpyc
).
Links:
- Official Website and Tutorial - https://grpc.io/docs/guides/ .
- gRPC Concepts .
Thrift
Thrift
is quite popular at Facebook and in the Hadoop/Java services world. It was created at Facebook and they open sourced it as an Apache project at some point.
Simple thrift Example
Create time_service.thrift
file describing the Interface using Thrift Interface Description Language (IDL).
service TimeService { string get_time() }
Run the following command to generate python code. It will create a gen-py
directory. We will use it to build Server and Client scripts.
thrift -r --gen py time_service.thrift
Write the following server code in server.py
.
import sys import time from thrift.protocol import TBinaryProtocol from thrift.server import TServer from thrift.transport import TSocket, TTransport sys.path.append('gen-py') from time_service import TimeService class TimeHandler: def __init__(self): self.log = {} def get_time(self): return time.ctime() if __name__ == '__main__': handler = TimeHandler() processor = TimeService.Processor(handler) transport = TSocket.TServerSocket(host='127.0.0.1', port=9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TSimpleServer(processor, transport, tfactory, pfactory) print('Starting the server...') server.serve() print('done.')
Write the following code in client.py
.
import sys from thrift import Thrift from thrift.protocol import TBinaryProtocol from thrift.transport import TSocket, TTransport sys.path.append('gen-py') from time_service import TimeService def main(): # Make socket transport = TSocket.TSocket('localhost', 9090) # Buffering is critical. Raw sockets are very slow transport = TTransport.TBufferedTransport(transport) # Wrap in a protocol protocol = TBinaryProtocol.TBinaryProtocol(transport) # Create a client to use the protocol encoder client = TimeService.Client(protocol) # Connect! transport.open() ts = client.get_time() print('Client Received {}'.format(ts)) # Close! transport.close() if __name__ == '__main__': try: main() except Thrift.TException as tx: print('%s' % tx.message)
Simple thriftPy Example
thriftPy seems to be more popular than the default python support. It also solves some common issues with the default python support - this includes a more pythonic approach to creating server and client code. For example, check out the following server and client code:
Server code
import time import thriftpy from thriftpy.rpc import make_server class Dispatcher(object): def get_time(self): return time.ctime() time_thrift = thriftpy.load('time_service.thrift', module_name='time_thrift') server = make_server(time_thrift.TimeService, Dispatcher(), '127.0.0.1', 6000) server.serve()
Client code
import thriftpy from thriftpy.rpc import make_client time_thrift = thriftpy.load('time_service.thrift', module_name='time_thrift') client = make_client(time_thrift.TimeService, '127.0.0.1', 6000) print(client.get_time())
Pros and Cons
Pros:
- Thrift supports container types
list
,set
andmap
. They also support constants. This is not supported by Protocol Buffers. However,rpyc
supports all python and python library types - you can even send a numpy array in an RPC call.
Cons:
sys.path.append('gen-py') gRPC
RPyC
RPyC
is a pure python RPC framework. It does not support multiple languages. If your entire codebase is in python, this could be an easy and flexible framework for you.
Simple rpyc Example
server.py
import time from rpyc import Service from rpyc.utils.server import ThreadedServer class TimeService(Service): def exposed_get_time(self): return time.ctime() if __name__ == '__main__': s = ThreadedServer(TimeService, port=18871) s.start()
Here’s the annotated server code:
client.py
import rpyc conn = rpyc.connect('localhost', 18871) print('Time is {}'.format(conn.root.get_time()))
Annotated client code:
Pros and Cons
Pros:
- Probably the easiest to get started. No need to understand Protocol Buffers or Thrift syntax.
- Extremely flexible. No need to formally use IDL (Interface Definition Language) to define the client-server interfaces. Simply start implementing your code - it embraces python’s Duck Typing .
Cons:
- Lack of multiple client languages.
- Lack of formally defined service interface can potentially cause maintenance issues if the codebase becomes large enough.
gRPC vs Thrift vs RPyC comparison matrix
Let me summarize my experiences here before jumping into the details of each framework.
gRPC Thrift RPyC Getting Started Documentation Language Support C++, Python,.. C++, Python,.. Python Only Maintenance Streaming Can work without IDLNotes on the above table:
Thrift RPyC
My preferences are:
- I would personally perfer to use
RPyC
if python is the only language I am going to use. - I would prefer to use
gPRC
if I needed robustness, reliability, and scalability from my services. - The best thing about
Thrift
is that it supports so many languages. If that’s what you’re targetting, go forThirft
.
Other important things to note:
- I did not compare speed . This might be the most relevant criterion for some.
- I do not have experience with very large services. I am not the right person to comment on maintainability of each framework . However, this is an important criterion to decide which RPC framework to choose.
You can find the code for the above examples in this repo .
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK