3

Interactive Query Service Amazon Athena Introduces New Engine

 1 year ago
source link: https://www.infoq.com/news/2022/10/amazon-athena-engine-3/?itm_source=infoq&itm_medium=popular_widget&itm_campaign=popular_content_list&itm_content=
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Interactive Query Service Amazon Athena Introduces New Engine

Oct 30, 2022 2 min read

AWS recently announced version 3 of the engine for Amazon Athena, the serverless interactive service to query S3 data using standard SQL. The cloud provider claims that the new engine improves performance and supports new use cases thanks to over 50 new SQL functions and 30 new analytics features.

Most of the improvements for Athena engine version 3 are coming from the open-source Trino and PrestoDB projects, with AWS speeding up the integration of enhancements and bug fixes from the community. Blayze Stefaniak, senior solutions architect at AWS, and colleagues write:

One of the most exciting aspects of engine version 3 is its new continuous integration approach to open source software management that will improve currency with the Trino and PrestoDB projects. This approach enables Athena to deliver increased performance and new features at an even faster pace.

Among other new features, Athena now supports T-Digest functions for rank-based statistics and new geospatial functions, with the addition of MATCH_RECOGNIZE for row pattern matching helping identify data patterns in applications such as fraud detection and sensor data analysis.

SELECT m.id AS row_id, m.match, m.val, m.label
FROM (VALUES(1, 90),(2, 80),(3, 70),(4, 70)) t(id, value)
MATCH_RECOGNIZE (
    	ORDER BY id
    	MEASURES match_number() AS match,
    	RUNNING LAST(value) AS val,
    	classifier() AS label
    	ALL ROWS FOR EACH MATCH
    	AFTER MATCH SKIP PAST LAST ROW
    	PATTERN (() | A) DEFINE A AS true
) AS m;

Source: https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html

The cloud provider released a guide on how to upgrade the query engine and a document highlighting key differences between version 2 and version 3.

According to AWS, the new engine improves query execution, reducing the amount of data scanned, and provides performance improvement of joins involving comparisons with the <,<=, >,>= operators, queries that contains JOIN, UNION, UNNEST, GROUP BY clauses, and queries using IN predicate. Stefaniak and colleagues add:

We performed benchmark testing on engine version 3 using TPC-DS benchmark queries at 3 TB scale, and observed 20% query performance improvement when compared to the latest release of engine version 2.

1figure2-1666946782716.png

Source: https://aws.amazon.com/blogs/big-data/upgrade-to-athena-engine-version-3-to-increase-query-performance-and-access-more-analytics-features/

Not everyone agrees with Michael Wittig, founder at cloudonaut.io, reporting a 10% decrease in performances. AWS acknowledges that a subset of use cases might be negatively affected, writing:

Many queries run faster on Athena engine version 3, but some query plans can differ from Athena engine version 2. As a result, some queries can differ in latency or cost.

Among the limitations, the Trino and Presto connectors are not supported, as well as fault-tolerant execution Trino Tardigrade. The new query engine is available in all regions supporting Athena, excluding the Chinese ones.

About the Author

Renato Losio

Renato has many years of experience as a software engineer, tech lead and cloud services specialist in Italy, UK, Portugal and Germany. He lives in Berlin and works remotely as principal cloud architect. Cloud services and relational databases are his main working interests. He is an AWS Data Hero.

Connect with him on LinkedIn.

Show more

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK