3

Data Discovery Tools - Qubole Workbench | Qubole

 2 years ago
source link: https://www.qubole.com/tech-blog/data-discovery-tools-qubole-workbench/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Data Discovery Tools
May 28, 2021 by Ravi Luhadiya and Shefali Aggarwal Updated February 12th, 2022

It is common knowledge that data lakes offer the right architecture to support multiple use cases and tools, but can be operationally complex to implement and manage. Qubole provides an open data lake platform that enhances and greatly simplifies data lake projects. Our customers and users operate on petabyte-scale data lake footprints where data analytics challenges are amplified. For users doing data discovery and ad hoc analytics, the right dataset can be hard to find, get access to and then prepare, mine for insights. Furthermore, a data warehouse is not an obvious alternative because the data isn’t available there and due to the limited choice of tooling.

In this blog post, we will introduce Qubole’s Workbench application that facilitates self-service data discovery and reduces time-to-insight by allowing users to discover datasets, write queries, analyze results and share insights. Workbench also provides debug aids such as exact error surfacing, links to troubleshooting guides while working with data processing technologies. Workbench is generally available with Qubole release 59 and documentation for the same is available here. While many tools are available for querying and visualization, Workbench is a unique application as it allows users to easily connect and explore datasets in the data lake using multiple big data engines, and debug in a self-service way.

Qubole Workbench is offered as part of Qubole’s open data lake platform. Admins can easily roll out Workbench to their users and express fine-grained application and data access controls. Qubole uses a SaaS delivery model to provide frequent updates and patches.

image5-6.png
Diagram 1 — Self-service process for data discovery

Workbench includes the following features:

  • Discover
    • Connect to Hive metastore and other 3rd party data sources.
    • Search and browse schema using the table explorer widget.
    • Preview data, table info, statistics

image3-7.png

Image 1:  Key information aids in the discovery stage
  • Query
    • Compose SQL and NoSQL queries using the composer.
    • Choose from any of the supported analytical engines such as Presto, Spark, Hive, and more.
    • Use collections to organize your work as you iteratively build towards the final query.
    • View query history, search, and reuse.
    • Schedule queries and put workflows into production.

image7-2.png

Image 2: Compose SQL and NoSQL queries
  • Analyze
    • Download results as csv, tsv.
    • Download large results (known to work for 10’s of terabytes) as a single file.
    • Share query and table permalinks.

image6-2.png

Image 5: Download results as CSV, TSV
  • Debug
    • Check live cluster health prior to submitting queries.
    • See exact error message and troubleshooting links.
    • Get query-specific tips.

image2-4.png

Image 6:  Check live cluster health

image1-2.png

Image 7: Troubleshoot using exact error messages in the status tab

By interacting with our customers and beta user community, we get a front-row seat to witness the real problems users encounter while performing data discovery on petabyte-scale datasets. We are thankful to many of them for their valuable feedback, direction, and active participation in the beta program. Workbench is used today by data practitioners – both analysts & scientists – working in multiple areas such as customer micro-segmentation, gaming analytics, fraud detection, and digital ad operations. Key benefits for customers are greater productivity, faster time to value, and the ability to support a greater number of end-users.

Below are resources to help you learn more about Qubole Workbench,

You can also sign up for a free trial of Qubole here – https://www.qubole.com/free-trial/.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK