GitHub - rajasekarv/native_spark: A new arguably faster implementation of Apache...

5 years ago

source link: https://github.com/rajasekarv/native_spark
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

native_spark

A new arguably faster implementation of Apache Spark from scratch in Rust. WIP

Just install Cap'n proto and you are good to go. Code is tested only on linux and requires nightly version. It is tested for version 1.39 only, there are some breaking changes in specialization from version to version, so use 1.39 only for now.

Use this command: cargo +nightly-2019-09-11 build --release

Refer make_rdd.rs and other examples in example code to get the basic idea

You need to have hosts.conf in the format present inside config folder in all of the machines when running in distributed mode and all of them should be sshable from master. master port can be configured in hosts.conf and 10500 in executors should be free. ports 5000-6000 is reserved for shuffle manager. It will be handled internally soon.

Since File readers are not done, you have to use manual file reading for now (like manually reading from S3 or hack around local files by distributing copies of all files to all machines and make rdd using filename list).

Ctrl-C handling and panic handling is not done yet. So if there is some problem in runtime, executors won't be shut down automatically and you have to manually kill the processes.

One of the limitations of current implementation is that the input and return types of all closures and all input to make_rdd should be owned data.

Configuration

You can specify the local IP address using the environmental variable SPARK_LOCAL_IP.

ToDo:

Error Handling(Priority)

RDD

Most of these except file reader and writer are trivial to implement

map
flat_map
filter
step_by
take_sample
union
glom
cartesian
group_by
reduceby
pipe
map_partitions
for_each
collect
reduce
fold
aggregate
take
first
save_as_text_file(can save only as text file in executors local file system)

Config Files

Replace hard coded values

Recommend

Github github.com 7 years ago
Cache

GitHub - influxdata/arrow: Implementation of https://arrow.apache.org in Go

Apache Arrow Powering In-Memory Analytics Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. Major compon...

Github github.com 5 years ago
Cache

GitHub - dubbo/go-for-apache-dubbo: Go Implementation For Apache Dubbo

README.md Go for Apache Dubbo 中文

Github github.com 5 years ago
Cache

GitHub - apache/dubbo-go: Go Implementation For Apache Dubbo

README.md Go for Apache Dubbo 中文

Github github.com 5 years ago
Cache

GitHub - apache/dubbo-admin: The ops and reference implementation for Apache Dub...

README.md Dubbo Admin

www.theverge.com 3 years ago
Cache

Google ‘arguably violated’ labor law by illegally firing three workers claims NL...

Google ‘arguably violated’ labor law by illegally firing three workers claims NLRB The move reverses an earlier decision made under President Trump ...

macdailynews.com 2 years ago
Cache

CNET: Apple TV+ has 'arguably the best TV show of the year'

CNET: Apple TV+ has ‘arguably the best TV show of the year’ Wednesday, September 7, 2022 12:07 pm

finance.yahoo.com 2 years ago
Cache

Microsoft executive says the economic environment is 'arguably the most uncertai...

Microsoft executive says the economic environment is 'arguably the most uncertain we've seen in decades'Alistair BarrFri, September 16, 2022, 12:14 AM·4 min read

www.neowin.net 2 years ago
Cache

Logitech K380, arguably the best budget keyboard, is 25% off, only $29.99

Logitech K380, arguably the best budget keyboard, is 25% off, only $29.99...

Github github.com 2 years ago
Cache

GitHub - H4ad/fast-iso-string: A faster implementation of new Date().toISOString...

Fast ISO String Usage...

www.techspot.com 2 years ago
Cache

Amazon is shutting down DPReview, arguably the top digital camera review site |...

Amazon is shutting down DPReview, arguably the top digital camera review site Amazon bought DPReview in 2007 By