33

GitHub - dagster-io/dagster: Dagster is an open-source system for building data...

 4 years ago
source link: https://github.com/dagster-io/dagster
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

57987382-7e294500-7a35-11e9-9c6a-f73e0f1d3a1c.png

68747470733a2f2f62616467652e667572792e696f2f70792f646167737465722e73766768747470733a2f2f636f766572616c6c732e696f2f7265706f732f6769746875622f646167737465722d696f2f646167737465722f62616467652e7376673f6272616e63683d6d6173746572 68747470733a2f2f62616467652e6275696c646b6974652e636f6d2f38383835343562656162383239653431653564373330336462313535323561326263336230663065333361373237353961632e7376673f6272616e63683d6d6173746572 68747470733a2f2f72656164746865646f63732e6f72672f70726f6a656374732f646167737465722f62616467652f3f76657273696f6e3d6d6173746572

Introduction

Dagster is a system for building modern data applications.

Combining an elegant programming model and beautiful tools, Dagster allows infrastructure engineers, data engineers, and data scientists to seamlessly collaborate to process and produce the trusted, reliable data needed in today's world.

Install

To get started:

pip install dagster dagit


This installs two modules:

  • dagster | The core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
  • dagit | A UI and rich development environment for Dagster, including a DAG browser, a type-aware config editor, and a streaming execution interface.

Learn

Next, jump right into our tutorial, or read our complete documentation. If you're actively using Dagster or have questions on getting started, we'd love to hear from you; come join our slack!

Contributing

For details on contributing or running the project for development, check out our contributing guide.

Integrations

Dagster works with the tools and systems that you're already using with your data, including:

Integration Dagster Library 57987547-a7e36b80-7a37-11e9-95ae-4c4de2618e87.png Apache Airflow dagster-airflow
Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs. 57987976-5ccc5700-7a3d-11e9-9fa5-1a51299b1ccb.png Apache Spark dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and Pyspark. 58348728-48f66b80-7e16-11e9-9e9f-1a0fea9a49b4.png Dask dagster-dask
Provides a Dagster integration with Dask / Dask.Distributed. 58349731-f36f8e00-7e18-11e9-8a2e-86e086caab66.png DataDog dagster-datadog
Provides a Dagster resource for publishing metrics to DataDog. 57987809-bf245800-7a3b-11e9-8905-494ed99d0852.png  /  57987827-fa268b80-7a3b-11e9-8a18-b675d76c19aa.png Jupyter / Papermill dagstermill
Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines. 57988016-f431aa00-7a3d-11e9-8cb6-1309d4246b27.png PagerDuty dagster-pagerduty
A library for creating PagerDuty alerts from Dagster workflows. 58349397-fcac2b00-7e17-11e9-900c-9ab8cf7cb64a.png Snowflake dagster-snowflake
A library for interacting with the Snowflake Data Warehouse. Cloud Providers 57987557-c2b5e000-7a37-11e9-9310-c274481a4682.png AWS dagster-aws
A library for interacting with Amazon Web Services. Provides integrations with S3, EMR, and (coming soon!) Redshift. 57987566-f98bf600-7a37-11e9-81fa-b8ca1ea6cc1e.png GCP dagster-gcp
A library for interacting with Google Cloud Platform. Provides integrations with BigQuery and Cloud Dataproc.

This list is growing as we are actively building more integrations, and we welcome contributions!

Example Projects

Several example projects are provided under the examples folder demonstrating how to use Dagster, including:

  1. examples/airline-demo: A substantial demo project illustrating how these tools can be used together to manage a realistic data pipeline.
  2. examples/event-pipeline-demo: An example illustrating a typical web event processing pipeline with S3, Scala Spark, and Snowflake.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK