10

Stream Landing Kafka Data to Object Storage using Terraform

 2 years ago
source link: https://dzone.com/articles/stream-landing-kafka-data-to-object-storage-using
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Stream Landing Kafka Data to Object Storage using Terraform

Learn how to archive your Event Streams Kafka data to Object Storage using SQL Query. This process, called stream landing, can be set up using the Terraform ...

Join the DZone community and get the full member experience.

Join For Free

You can easily archive data to IBM Cloud Object Storage for long-term storage or to gain insight by leveraging interactive queries or big data analytics. You can achieve this through the Event Streams UI, where topics can be selected and linked to Cloud Object Storage buckets, with data automatically and securely streamed using the fully-managed IBM Cloud SQL Query service. All data is stored in Parquet format, making it easy to manage and process. Check out " Streaming to Cloud Object Storage by using SQL Query" for more info.

In this post, you will set up the Cloud Object Storage stream landing using Terraform.

What is Terraform?

Terraform is an open-source " Infrastructure as Code " tool created by HashiCorp.

A declarative coding tool, Terraform enables developers to use a high-level configuration language called HCL (HashiCorp Configuration Language) to describe the desired "end-state" cloud or on-premises infrastructure for running an application. It then generates a plan for reaching that end-state and executes the plan to provide the infrastructure:

15437817-stream_landing.png Streaming to Cloud Object Storage by using SQL Query.

Let's get started

If you have Terraform set up on your machine, follow the steps below:

  1. Open a terminal or command prompt on your machine, clone the GitHub repository and move to the directory:
    git clone https://github.com/IBM-Cloud/stream-landing-terraform
    cd stream-landing-terraform
  2. Create the local.envfile from the template file provided in the repo and update the environment variables accordingly. Once updated, source the file:
    cp template.local.env local.env
    source local.env
  3. You can now run the individual Terraform commands to provision the required IBM Cloud services:
    terraform init 
    terraform plan 
    terraform apply

Use the IBM Cloud Schematics UI

Alternatively, you can use the IBM Schematics UI. You don't need to install anything on your machine:

  1. Navigate to Schematics Workspaces on IBM Cloud and click on Create workspace.
  2. Under the Specify Template section, provide https://github.com/IBM-Cloud/stream-landing-terraform under GitHub or GitLab repository URL.
  3. Select terraform_v0.14 as the Terraform version and click Next.
  4. Provide the workspace name - stream-landing - and choose a resource group and location.
  5. Click Next and then click Create.
  6. You should see the Terraform variables section. Fill in the variables as per your requirement by clicking the action menu next to each of the variables.
  7. Scroll to the top of the page to Generate (terraform plan) and Apply (terraform apply) the changes.
  8. Click Apply plan and check the progress under the Log. (Generate plan is optional.)

To understand more about Terraform and IBM Cloud Schematics, check this blog post: " Provision Multiple Instances in a VPC Using Schematics." In short, you can run any Terraform script just by simply pointing to the Git repository with the scripts.

This is what the Terraform scripts do:

  1. Create a new resource group and provision resources under the group.
  2. Create a Key Protect service with a root key.
  3. Provision an Event Streams service with a topic.
  4. Provision a Cloud Object Storage service with a bucket.
  5. Provision a SQL Query service for stream landing.
  6. Stream landing permissions and authorizations.

Test stream landing

To produce messages to the event streams service, you can use tools like kcat (formerly Kafkacat) or Event Streams sample producer.

  1. Verify that the specified prefix in IBM Cloud Object Storage is filled with Parquet objects by navigating to the Object Storage service under your resources.
  2. Check the status of all streaming jobs in the SQL Query UI.
  3. Alternatively, use the REST API of SQL Query to get the list and the details of running stream landing jobs.
  4. In the Event Streams UI, you also get information about the active stream landing jobs per topic. Using Event Streams, you can view and stop the landing configuration.

Further reading

If you have any queries, feel free to reach out to me on Twitter or on LinkedIn.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK