29

The Whole Data Science World in Your Hands

 4 years ago
source link: https://www.tuicool.com/articles/NFnqiaU
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Whole Data Science World in Your Hands

Testing MatrixDS capabilities on different languages and tools. If you work with data you have to check this out.

Rzqqmaa.png!webJJN7Frn.png!web
Image by Héizel Vázquez

I’ve been looking for years for a platform where I can run my data science projects without the pain of installations and filling my computer with dozens of different tools and environments.

Luckily I found that MatrixDS has all of that and more for free! In this article I’ll be testing almost all the tools they have so you don’t have to.

The project is public in the platform, you can see it here:

fMZnay2.png!webyIBnmuN.png!web

If you want to test it out, what you hav to do is forklift it and that’s it.

Testing Python things

iYnmeiQ.jpg!web3AvqqyI.jpg!web

Jupyter Notebook

fi6VN3y.png!web

My favorite programming language of the moment is Python. There are lots of great tools and features that can help you using this language. One of the most popular ones is Jupyter Notebook. To launch a notebook in MatrixDS do this:

  1. Go to the Tools tab in the platform.
  2. Click on the (+) button on the right hand side:
uMVzuyv.png!web

3. Choose Python 3 (or 2) with Jupyter Notebook:

yAvquuz.png!webvm6R7zm.png!web

4. Choose a name for the tool and set the number of cores and RAM:

6BzIFjq.png!webMrMzmm3.png!web

5. When the notebook is created and started then just open it:

UVNJ7zA.png!webr63YNzb.png!web

6. Have fun programming ;)

Inside of the notebook you are free to do whatever you want. I created a simple Python notebook to test PySnooper so you can try.

Here’s a gist of that notebook, that you can find in the MatrixDS project:

UBNfqqr.png!web

Jupyter Lab

2AbMzma.png!webYvuMRry.png!web

JupyterLab is the next-generation web-based user interface for Project Jupyter. It’s like Jupyter Notebooks on steroids.

To launch a notebook in MatrixDS do this:

  1. Go to the Tools tab in the platform.
  2. Click on the (+) button on the right hand side:
uMVzuyv.png!web

3. Choose Python 3 with JupyterLab:

niQBfqF.png!web2Az2Anj.png!web

4. Choose a name for the tool and set the number of cores and RAM:

eU7J3yq.png!webquaURjF.png!web

5. When the tool is created and started then just open it:

2UNzqei.png!webjEVjQzA.png!web

6. Have more fun :)

I created a simple Python Notebook in the JupyterLab instance to test so you can try.

If you’ve been following me so far this is what you should be seeing:

EbMjMba.png!webzqIfmmE.png!web

Oh by the way if you want to know how to use git with MatrixDS check this article:

The test notebook I created tests the new library fklearn for functional machine learning. Here’s a gist of that notebook, that you can find in the MatrixDS project:

Testing R things

rYzYj2V.jpg!webMfQnYnU.jpg!web
https://www.computerworld.com/video/series/8563/do-more-with-r

I started my data science career on R. It’s a great tool for doing data analysis, data cleaning, plotting and much more. I think right now the machine learning part it’s better with Python, but to be a successful data scientist you need to know them both.

To launch RStudio in MatrixDS do this:

  1. Go to the Tools tab in the platform.
  2. Click on the (+) button on the right hand side:
uMVzuyv.png!web

3. Choose R 3.5 with RStudio:

ei2Y73M.png!webqaURRjA.png!web

4. Choose a name for the tool and set the number of cores and RAM:

q6NZJbU.png!webuqYjuuu.png!web

5. When the tool is created and started then just open it:

IbYNf23.png!web3aaI7zb.png!web

6. Have R fun :)

The test R environment I created is testing the new library caalled g2r that creates graphics for interactive visualizations using g2.

Btw! I had to do this before running g2r:

sudo su 
apt-get install libv8-dev

So, normally this is what you do for getting a plot with ggplot2:

library(ggplot2)
ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) +
  geom_point() +
  facet_wrap(.~Species)

And you will get:

ZV367rJ.png!webaUZvuaZ.png!web

Not so bad, but what about bringing interactivity to that?? With g2r it’s very easy. This is the code for doing that:

library(g2r)
g2(iris, asp(Petal.Length, Petal.Width, color = Species)) %>% 
  fig_point() %>%
  plane_wrap(planes(Species))

And you’ll get:

aiyABfA.jpgRjUF3mu.gif

In the code you change:

aes -> asp
geom_point() -> fig_point()
facet_wrap(.~Species) -> plane_wrap(planes(Species))

I’m still wondering why they didn’t use the same API but’s it’s a very cool project. There are more examples you can check out here:

This is all the code:

UBNfqqr.png!web

Testing Julia things

eymM7j6.png!webJBr2i2J.png!web

When I was studying my masters in Physics (like 2 years ago) I really thought that Julia was going to revolutionize the scientific programming world. Don’t get me wrong, it’s doing an amazing job, but I think new advancements with Python has left the project in second place for so many things.

For testing the Julia capabilities of MatrixDS I wanted to take a look at the data libraries of the language. And you can see that below.

To launch a Julia Notebook in MatrixDS do this:

  1. Go to the Tools tab in the platform.
  2. Click on the (+) button on the right hand side:
uMVzuyv.png!web

3. Choose Julia 1.1.0 with JupyterLab:

MRRFZvU.png!webUfeqau7.png!web

4. Choose a name for the tool and set the number of cores and RAM:

zyAzqaR.png!web6rmQJff.png!web

5. When the tool is created and started then just open it:

qeUrMrB.png!webf2mI3iV.png!web

6. Let’s Julia :) (sounds weird)

When you launch it, you can open or create any Python or Julia notebook:

Abq22uM.png!webru6b6fV.png!web

This is the notebook I created for testing Julia capabilities for data science:

UBNfqqr.png!web

Here I tested some libraries like DataFrames, Gadfly, Queryverse, Vega for plotting and more :)

So as you can see, this for me is the more complete platform for doing data science in the cloud. You need minimum configuration and you can even install your own tools with docker.

There’s much more to cover and things to do with the platform, and I’ll be doing that in other articles. If you want to be in touch with me follow me here:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK