2

Data Science

 3 years ago
source link: https://diegosalinas-47084.medium.com/the-best-data-science-course-is-this-compilation-of-medium-top-articles-f1c0111543f8
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Top Data Science Articles on Medium You Must Keep at Hand

Image for post
Image for post
Source: https://unsplash.com/photos/V4BS2agsRYI

If you are a beginner in Data Science there is a good chance that you are an avid reader of publications like Medium and Towards Data Science.

In fact, when I started reading books and doing courses online I quickly realized most of them simply cannot cover topics so in-depth as these huge publications, which have managed to put together lots of quality articles over the years.

Now I am into my third position as a data scientist and guess what, I still read a lot of articles online. I would even go as far as to say, that the more you know, the more questions you have, and no matter how complex the problem, there’s always someone who wrote about it.

That’s the reason, after losing track of so many important articles I’ve read over the years, I decided to select the most relevant topics for aspirant data scientists and create a curriculum that covers the most important topics you need to learn, saving you the hassle of searching over and over this vast ocean of information.

The other reason to do this is that I know how it feels when you buy a course online, read a book about Python for Data Science, and then get rejected again and again in your interviews.

The hard truth is learning basic Python, Pandas and Numpy won’t get you a job. Learning the concepts of regression, SVM, random forests and making a forecast with linear regression will also fail to get you a job. Hell, even a nice project porfolio will be useless if you don’t know how to solve a real problem in a fair amount of time.

Online Courses have only superficial information about the topics I’ve just mentioned. Everybody assumes you know them. Once I got hired I realized what you really need to know.

Writing fast code without for loops, loading several GB of images in a Pandas Dataframe or creating a pipeline to evaluate your models on the fly, these are the things separating beginners from experienced data scientists

Now that I know those kinds of things, I see a gap in between online courses and job interviews that I want to help you cover. OK, I know some top companies have very tricky questions that there is no way you can prepare beforehand. But the aspirant data scientist should not worry about that, because there is a good chunk of topics that will come over and over again in interviews and coding assignments, and if you know them, you will finally land your first job.

So if you know nothing about Python go and read some general book or course about it, because the materials I have compiled here will touch only briefly on the basics. It is assumed that you know some Pandas and Numpy as well, and have played with some projects. My intention here is to focus on topics that will move you from beginner to intermediate.

Finally, of course, this is a compilation of articles. If you have questions after reading them, keep digging and look for more info on the topics. The point is here you have a summary of the important stuff that keeps coming over and over in interviews, so you can keep coming back when you need to refresh some concepts.

Cool, so grab your coffee, popcorns, or whatever you are into, this is going to get serious.

Introduction

The first thing a data scientist needs to know is how to deal with data and by far the most popular language to learn how to this in Python. That’s why the first chapter will take you through topics like data cleaning and preparation, handling different types of datasets, loading data in data frames with Pandas, and writing efficient vectorized code with Numpy. This is perhaps the most important skill of a data scientist, at least it is the skill that you will be using most of the time in your daily job, so I hope you enjoy it.

Selected Articles:

Project: Building a Poker Simulator

Data Visualization

The second most commonly used skill in data science is data visualization. Business people very often are not interested in how you tweaked some formula in your algorithm. They hire data scientists to make sense of business data, and that involves making beautiful plots that are easy to understand during a presentation. So Matplotlib is your friend here, take your time to practice with these articles because I can promise you, it is not enough with a simple scatter plot.

Selected Articles:

Project: Customer Profiling, Clustering, and Recommendations

Model Selection and Evaluation

Junior data scientists are enthusiastic about cool models using cutting-edge technologies and making super accurate predictions. However, the truth is it takes 1% of your time to learn the basics of a model, and the other 99% of your time you are comparing results and fine-tuning parameters. Therefore you must develop an efficient workflow that allows you to build a data pipeline to test your models, tune parameters, and come up with the best decision. That’s what you are going to learn here.

Project: The Iris Dataset

Solving Classification Problems

This is pretty straightforward. Classification is a super common problem in supervised learning and there are a lot of algorithms that you can use to solve them. Here you learn the most relevant ones and how to use them.

Project: Choosing the Best Classification Algorithm

Solving Regression Problems

The other most common supervised learning problem is forecasting. There are many ways of implementing a regression in data science and here you will learn them.

Project: Forecasting House Prices in a Kaggle Competition

Deep Learning

I know, many people were looking forward to jumping here. Well, the truth is in many cases you don’t need to implement a neural network. But if you have a lot of data available, it is unstructured or you are dealing with perceptual variables like vision, text, or sounds, then there is a good chance you need to use them.

Project: Deep Dream with Tensorflow

If you made it to the end of this course, congratulations! You have covered most of the topics that you will encounter in a real data science job. Of course, there are thousands of articles that also would deserve to be here, but the point was to handpick the most relevant information and make it accessible so you keep it at hand.

Last but not least, don’t forget to like this post and follow me on Medium if you found this helpful, there will be more coming up :)

Happy coding!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK