10 Concepts Every Data Scientist Should Know

The concepts that are likely to be encountered at an interview.

Jul 22 ·10min read

z2YFNr3.jpg!web

Photo by Tyler Casey on Unsplash

Data science is such a broad field. If it was a recipe, the main ingredients would be linear algebra, statistics, software, analytical skills, storytelling and all seasoned with some domain knowledge. The amount of ingredients change according to the tasks you are working on.

Whatever you do as a data scientist, there are some terms and concepts you should definitely be familiar with. In this post, I will cover 10 of these concepts. Please note that this post is by no means aimed to be a comprehensive list of the topics you need to know. However, knowing the following concepts will absolutely add value to your skillset and help you in your journey to learn more.

Let’s start.

1. Central Limit Theorem

We first need to introduce the normal (gaussian) distribution for central limit theorem to make sense. Normal distribution is a probability distribution that looks like a bell:

v2YJJr.png!web

X-axis represents the values and y-axis represents the probabilities of observing these values. Normal distribution is used to represent random variables with unknown distributions. Thus, it is widely used in many fields including natural and social sciences. The reason to justify why it can used to represent random variables with unknown distributions is the central limit theorem (CLT) .

According to the CLT , as we take more samples from a distribution, the sample averages will tend towards a normal distribution regardless of the population distribution.

Consider a case that we need to learn the distribution of the heights of all 20-year-old people in a country. It is almost impossible and, of course not practical, to collect this data. So, we take samples of 20-year-old people across the country and calculate the average height of the people in samples. According to the CLT, as we take more samples from the population, sampling distribution will get close to a normal distribution.

Why is it so important to have a normal distribution? Normal distribution is described in terms of mean and standard deviation which can easily be calculated. And, if we know the mean and standard deviation of a normal distribution, we can compute pretty much everything about it.

很遗憾的说，推酷将在这个月底关闭。人生海海，几度秋凉，感谢那些有你的时光。

The concepts that are likely to be encountered at an interview.

1. Central Limit Theorem

Recommend

拯救者电竞手机 Pro 评测：带来骁龙 865 Plus 和 90W 快充，但这不是它的全部

这 10 个交互设计原则，你最好永远不要打破

用 Docker 构建 MySQL 主从环境

不会Docker和K8S，同事被移出公司群聊

发金币：开个小局

Storybook Controls - Live edit UI components with no code

如何用开源项目申请 JetBrains 产品的 license

Why ListenBrainz Moved from InfluxDB to TimescaleDB

BoCloud博云获中电基金、蔚来资本C+轮战略投资

砸下100亿，现在又一个富豪迷上造车

About Joyk