4

GitHub - bloomberg/datalake-query-ingester

 2 years ago
source link: https://github.com/bloomberg/datalake-query-ingester
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Datalake Query Ingester

Python microservice that receives datalake metadata through POSTs to /add_query_ingest and sends them to a Kafka topic.

Rationale

The purpose of this service is to provide a way of producing Kafka messages from datalake query events, that will be consumed by datalake-query-pg-consumer.

This is part of a datalake query metadata ingestion and analysis pipeline. You can find more about that here.

Quick Start

To run the service locally, along with supporting services for testing, just run docker-compose up datalakequeryingester. Similarlly, for tests run docker-compose run tests.

Building

To build this project run docker build -f Dockerfile -t datalakequeryingester:<your tag>

Run docker-compose build, which will create the image datalakequeryingester:build. Use docker image tag datalakequeryingester:build datalakequeryingester:<your tag> to change its tag.

Installation

This is meant to be used with Trino and models data based on Trino's query metrics. This has been tested with Trino 363, backwards or forwards compatibility is not guaranteed.

The service is meant to be deployed with k8s. Configuration is passed with environment variables:

  • KAFKA_BROKERS
  • DATALAKEQUERYINGESTER_KAFKA_TOPIC
  • DATALAKEQUERYINGESTER_KAFKA_GROUP_ID

An example config can be found in docker-compose.yaml > datalakequeryingester.

Allows authentication through an API Key passed through the X-API-Key header. By default this does nothing. Implement a custom authentication/authorization mechanism by adding it to _apik.py.

Contributions

We heart contributions.

Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?

We welcome issue reports here; be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.

Before sending a Pull Request, please make sure you read our Contribution Guidelines.

License

Please read the LICENSE file.

Code of Conduct

This project has adopted a Code of Conduct. If you have any concerns about the Code, or behavior which you have experienced in the project, please contact us at [email protected].

Security Vulnerability Reporting

If you believe you have identified a security vulnerability in this project, please send email to the project team at [email protected], detailing the suspected issue and any methods you've found to reproduce it.

Please do NOT open an issue in the GitHub repository, as we'd prefer to keep vulnerability reports private until we've had an opportunity to review and address them.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK