28

Where is the data?

 4 years ago
source link: https://towardsdatascience.com/where-is-the-data-798faccb4e29?gi=8ce46123d408
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

O ne of the hardest things, when you are working with a new dataset, is to discover the most important features for predicting your target, and also, where you can find new sources of information that can improve your understanding of the data and your models.

In this article, I’m going to show you how to do that without any programming skills. Yes, that can sound weird right now but bear with me. In future articles, I’ll explore other programming libraries that can help you do this and see which approach gives better results.

We are going to do this with an example dataset. We are going to use the House Sales in King County, Seattle, USA dataset. You can find all the information about the data here:

The idea of the dataset is to predict the price of the house given the different features. Before going to the place where I’ll show you how to do the data enrichment, let’s load the data in python to get some information about it. Below you can see a simple notebook that does that:

It’s important to notice that I’m not doing the whole EDA for the dataset, this is just to get basic information. And what I’m about to show does not eliminate the data science process that you have to follow, is just to enrich your data and get more information about it.

Ok, now it’s time.

The way we are going to do this is with a system called Explorium. I discovered this software a while ago, and I’ve been using it ever since. They describe their product as:

Explorium is driving a new paradigm in the world of data science — one where companies can build models on the data they need, not the data they have. Discover the only end to end data science platform that focuses on superior data for machine learning.

So it’s an end-to-end platform where you can build and deploy models, but we will explore that in other articles. You can ask for a demo to replicate what I’m doing in this article here:

I’m going to do a step by step tutorial. If you have any questions please let me know in the comment section below :).

Creating a project:

The first thing you have to do when you have access to the app is to create a project.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK