18

[译] Hadoop 霸主地位不保?看看大数据领域的六年巨变

 4 years ago
source link: https://mp.weixin.qq.com/s/3_TKYjt0VN2lGwRe6RFr9Q
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

 大叔的原创专栏 << 点击

来源 | https://blog.marouni.fr/bidata-tre nds-analysis/ 

作者 | Abbass Marouni  

I’ve been a loyal follower of  Data Eng Weekly newsl etter  (formerly Hadoop Weekly) for the past 6 years, the newsletter is a great source for everything related to Big data and data engineering in general with a wide selection of technical articles along with product announcements and industry news.

For this year’s holidays side project I decided to analyze  Da ta Eng’s archi ve s , that go back to January 2013, to try to analyze Big data trends and changes over the past 6 years.

So I crawled and cleaned over 290 weekly issues (well python did !), I kept articles’ snippets from the technical, news and releases sections only. Next, I ran some basic natural language processing followed by some basic filtering to produce keywords mentions and all of the plots that follow.

上面的大段英文,简单地说,本文的数据来源是 Data Eng Weekly,它是与大数据和数据工程相关内容的重要来源,涵盖了非常广泛的技术文章、产品公告和行业新闻。作者整理了290期内容,保留了与技术、新闻和发布公告相关的文章片段。

下面的英文很简单感觉你们都能看得懂...所以就不翻译了...拜拜...

Major trends over the last seven years

iQJ7FnV.jpg!web

Hadoop vs. Spark

uieuau3.png!web

O bservatio ns  : We see the steady decline of Hadoop since 2013 and the moment Spark took over Hadoop (especially MapReduce).

Hadoop vs. Kafka

uieuau3.png!web

Obs ervati o n s  : The rise of Kafka as the main building block in all Big data stacks.

Hadoop vs. Kubernetes

ARVNn23.png!web

Observations  : An interesting observation is the rise of Kubernestes, even though the Data Eng Weekly is not a Devops newsletter, is a witness to the overall hype around Kubernetes in all domains starting from beginning of 2017.

Yearly top keywords

Here I’m simply plotting the top 10 keywords by total number of mentions in a give year.

2013 : Hadoop’s golden year !

fEzye2b.png!web

Observa tion s  : All of the original Hadoop projects are here : HDFS, YARN, MR, PIG, … With the 2 major distributions CDH & HDP and nothing else !

2014 : The rise of Spark !

zmuM3qY.png!web

O b ser vations  : Hadoop in general continued its dominance but Spark made its debut with its first version this year was the hottest topic of 2014, e also got the first glimpse of Kafka !

2015 : Here comes Kafka !

7nEjyeV.png!web

O bservati ons  : Spark takes ever the first spot from Hadoop and Kafka making it to the top 3. Most of the old regime projects (HDFS, YARN, MR, PIG, …) didn’t make to the top 10.

2016 : Streaming is on fire !

fyMZvqQ.png!web

Observations  : 2016 was the streaming year, Kafka took the second place from Hadoop with Spark (streaming) continuing its dominance.

2017 : Stream everything !

uyyqMzM.png!web

O bservations  : The same lineup as 2016 with some Flink thrown in.

2018 : Back to basics !  

YFzUNnr.png!web

Observations  : Kubernetes makes its debut and we’re back to basics trying to figure out the how to manages (K8S), schedule (airflow) and run (Spark, Kafka, Storage, …) our streams.

2019 : …    

jUbU3eR.png!web       

Observati o ns  : It’s still too early to make any conclusions about 2019, but it looks like the year where K8s & co. go prod. mainstream !

>>    想学大数据?点击找大叔! <<

智能人工推荐:

查询太慢?看看ES是如何把索引的性能压榨到极致的!

ES是什么?看完这篇就不要再问这种低级问题了!

选方向?大数据的职位你了解多少

戏说数据中台 — 大佬玩概念,小弟写接口

>>  点击查看更多

觉得有价值请关注  

fqI3I3A.jpg!web


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK