39

Apache Spark Developers Have Voted to Include Cypher in Spark 3.0 [Update]

 5 years ago
source link: https://www.tuicool.com/articles/hit/ERfIJrV
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
yyAfEbY.jpg!web

ByAlastair Green, Query Languages Standards & Research Lead, Neo4j | February 15, 2019

Reading time: 4 minutes

The community vote by Apache Spark contributors has just closed– and the results are positive. Thank you to everyone who participated when we asked for your votes and feedback .

vyUvyeA.png!web

As part of the preparations for a forthcoming Apache Spark 3.0 release, the Spark development community has just completed a positive vote for a Spark Project Improvement Proposal (SPIP) to add property graphs based on DataFrames to Spark. Based on the achievements of the ongoing Cypher for Apache Spark project , Spark 3.0 users will be able to use the well-established Cypher graph query language for graph query processing, as well as having access to graph algorithms stemming from the GraphFrames project .

Z3IVrui.png!web

This is a great step forward for a standardized approach to graph analytics – including querying and algorithms – in an extremely widely-used data science and data integration platform. The vote reflects much patient and detailed work from many groups, and it’s great to see collaboration by many contributors to bring additional graph capability to such a large open source project.

Cypher and Plans for GQL

Cypher continues to gain new implementations in research and industry. Besides its ease of use and strong graph-specific feature set, Cypher is attractive to vendors and users because the openCypher community and implementing vendors are strongly supportive of the plan to create a single standard declarative query language called GQL (Graph Query Language) , which will draw heavily on the ASCII-art, pattern-based representation of sub-graphs pioneered by Cypher, extended in Oracle’s PGQL and LDBC’s G-CORE research language.

The goal is that GQL will be a formal international standard , specified and maintained by the ISO working group that also manages the SQL standard (WG3).

The WG3 committee met last month in Brisbane, and they discussed and encouraged further work on shaping a proposal to initiate the GQL project. The new project should start formally in the second half of this year. Proposals from Neo4j , Oracle and TigerGraph on the content and scope of GQL were discussed at the meeting.

Property Graph & RDF Standards Specialists Will Meet at W3C Workshop

Supporters of GQL – including implementers of Cypher, PGQL and GSQL – are joining experts from the RDF world at a forthcoming W3C workshop on graph data management standards in Berlin early in March.

The over-subscribed W3C workshop will bring together 100 RDF, labelled property graph and SQL standards specialists to figure out the best ways of creating bridges between these disparate but related data models and languages. The goal is to benefit users who increasingly want to create effective graph-aware applications which fit well with existing data technologies.

An openCypher Implementers Meeting (oCIM) Will Follow

The fifth openCypher Implementers Meeting (oCIM) will also be taking place – at the same venue in Berlin – immediately after the W3C workshop.

oCIM participants will be discussing language improvement requests and proposals. These include the ability to carry out Cypher queries that project new graphs – and to incorporate those queries in parameterized views – as well as designs for domain-specific property graph types and relational-to-graph mappings.

Both these key features were first implemented in Cypher for Apache Spark, and they have also been discussed in previous implementers’ meetings. (The graph types and SQL source mappings are also reflected in Neo4j proposals for the forthcoming Property Graph Querying extension to SQL , which is seen as a read-only subset of the planned GQL language.)

The theme of creating a managed and orderly transition from Cypher to GQL is an overarching concern and opportunity for the openCypher community. With myNeo4j hat on, I can say that our company takes the need to avoid disruption to existing customers and their applications extremely seriously.

So, while we are big backers of GQL, we are strong advocates of carefully preserving working and familiar features from the “input” languages that are contributing to the future GQL specification. From a product perspective, we see Cypher as having a long future life while the industry defines – and then standardizes on – the GQL language over the coming years. For more information [email protected] .

GQL Community: The Property Graph Schema Working Group Will Also Meet Face to Face

openCypher advocates, designers and implementers from several companies are active in a broader, emerging GQL community that has already spawned informal working groups to analyze existing graph query languages and to discuss the scope and designs for stronger property graph schema.

There is a strong felt need for property graph schema/typing and a high interest in how to apply flexible or partial schema. The Property Graph Schema Working Group is also meeting face to face after the W3C workshop in Berlin, where there will have been an opportunity to correlate the property graph view against WC3 recommendations like OWL and SHACL, which overlap in their concerns.

It’s great to see this level of activity with so many contributors on so many fronts: the push for standardization reflects continuing growth in all aspects of thegraph database software and services market.

Want to learn how relational databases compare to their graph counterparts? Get The Definitive Guide to Graph Databases for the RDBMS Developer , and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook

Explore:apache spark cypher data integration data science graph algorithms Graph Analytics graph query language Oracle RDF W3C

About the Author

Alastair Green , Query Languages Standards & Research Lead, Neo4j

yyAfEbY.jpg!web

Alastair Green leads Neo4j’s work on graph query language development and standards, and he is part of the team making the Cypher language available in Apache Spark. He has a background in enterprise data integration and transaction processing product design and deployment.

He brings a strong mix of consulting, architecture, and product skills to the Neo4j team. He is Neo4j’s product manager for the Cypher language, and member of the Neo4j Cypher Language Group (CLG)

His career in IT began in software development, evolving into pre-sales and post-sales, then into various architect, consulting and business roles, and then eventually founding and running a startup specialized in distributed transaction management. For the last eight years, Alastair has worked in senior data-related product management and enterprise architecture positions inside of financial services: First at Barclays, and then at RBS where he was the head of Design Architecture for the Risk Solutions group.

Neo4j Community Disclaimer


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK