

A Basic understanding of Kafka Connect
source link: https://blog.knoldus.com/understanding-of-kafka-connect/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A Basic understanding of Kafka Connect
Reading Time: 4 minutes
Let us discuss something about Kafka connector and some basic fundamental of it. Before start, we need to have basic knowledge of Kafka or we can go through this Document.
Apache Kafka is a distributed, resilient, fault tolerant platform. Apache Kafka is a well-known name in the world of Big Data. It is one of the most used distributed streaming platforms. Kafka is just not a messaging queue but a full-fledged event streaming platform.
It is a framework for storing, reading and analyzing streaming data. It is a publish-subscribe based durable messaging system exchanging data between processes, applications, and servers.Apache Kafka is a distributed, resilient, fault tolerant platform .
Table of content
- what is Kafka Connect
- Architecture of Kafka connect
- Connectors and tasks
- Sources ans sinks
- Workers
- Standalone vs distributed Mode
- Features
- alternatives
- Conclusion
What is Kafka Connect?
Apache Kafka is a distributed streaming platform and kafka Connect is framework for connecting kafka with external systems like databases, key-value stores, search indexes, and file systems, using so-called Connectors. Kafka Connect is only used to copy the streamed data, thus its scope is not broad.It executes as an independent process for testing and a distributed, scalable service support for an organization.
Kafka connect makes our task much easier to connect Kafka to the other systems, without having to write all the glue code yourself.
common Kafka Use Cases:
Architecture of kafka connect
Let’s discuss above architectural structural diagram,
- It is a separate Cluster.
- Each Worker contains one or many Connector Tasks.
- A cluster can have multiple workers and worker runs on the cluster only.
- Tasks are automatically load-balanced if there is any failure as shown in the picture below.
- Above all, tasks in Kafka Connect act as Producers or Consumers depending on the type of Connector.
- Kafka connect cluster has multiple loaded connectors
Connectors and Tasks
Connectors are responsible to manage the tasks that will run. They must decide how data will be splitted to tasks, and provide tasks with specific configuration to perform their job well.
Tasks are responsible to get things in and out of Kafka. They get their context from the worker. Once initialized, they are started with a Properties
object, containing connectors configuration. Once started, the tasks poll an external source and return a list of records (and the worker will send those data to a Kafka broker).
Sources and Sinks
Kafka Connects focused on streaming data to and from kafka, According to direction of the data moved, the connector is classified as:
Source connector – Ingests entire databases and streams table updates to Kafka topics. A source connector can also collect metrics from all your application servers and store these in Kafka topics, making the data available for stream processing with low latency.
Sink connector – Delivers data from Kafka topics into secondary indexes such as Elasticsearch, or batch systems such as Hadoop for offline analysis.
Workers
Tasks are executed by Kafka connect workers
- A worker is a single java process
- Workers run Connectors (each connector is class inside a
jar
file) - A Worker can run in standalone mode or distributed mode
- If a worker crashes, a rebalance will occur (the heartbeat mechanism in the Kafka consumer’s Protocol is applied here)
- If a worker joins a Connect cluster, other workers will notice that and assign connectors or tasks to this new worker, in order to balance the cluster.To join a cluster, a worker must have the same
group.id
property.
Standalone vs Distributed Mode
Standalone
- Single Process run both connectors and tasks.
- Configuration use
.properties
files - Very easy to get start with, useful for development and testing.
- Not fault tolerant, no scalability, hard to monitor
Distributed
- Multiple workers run connectors and tasks
- Configuration is performed by a REST API
- easy to scale, and fault tolerant(rebalancing in case a worker dies)
- Useful for production deployment of connectors.
Features
Kafka connect features include:
- Common Framework For Kafka Connectors – makes the connector deployment easy.
- REST Interface – we can manage connectors using a REST API
- Automatic Offset management -Kafka Connect helps us to handle the offset commit process, which saves us the trouble of implementing this error-prone part of connector development manually
- Distributed and Standalone Modes -Scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments.
- Distributed and Scalable by Default – It builds upon the existing group management protocol. And to scale up a Kafka Connect cluster we can add more workers.
- Streaming/Batch Integration – Kafka Connect is an ideal solution for bridging streaming and batch data systems in connection with Kafka’s existing capabilities
- Transformations- these allow us to make simple and lightweight modifications to individual messages
alternatives
If You don’t want to use Kafka Connect to integrate Kafka with your other apps and databases. You can write your own code using the producer and Consumer API, or use the Stream API.
Or you could even use an integration framework that supports Kafka, like Apache Camel or Spring Integration.
Conclusion
In conclusion, In this blog, we have learned basics of Kafka Connector like features, use cases, Architecture etc. and in the next blog we will see how we can setup and Launch kafka connector.
If you want to know more about Apache Kafka, Streams and Connect, then I recommend these articles:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK