16

Roadmap to Machine Learning: Key Concepts Explained

 3 years ago
source link: https://towardsdatascience.com/roadmap-to-machine-learning-key-concepts-explained-91eacd53d81e?gi=53f933a395a9
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Y7N36zr.jpg!web

Credit : Unsplash

What if our memory was a storage device? How much easier the learning process would be. But the reality is to become an excellent professional in something you need to go through the thorny path. You learn, you forget, you make mistakes, you learn again, absorb new things, and thus you form a picture of everything learned in your head. Practise makes perfect, but sometimes this practice takes more time than you expected.

Machine Learning is exactly such a case. Very often, the majority simply get lost somewhere in the middle of the learning and thus lose motivation to move on. So many concepts needed to be systemized. But today, I want to do it for you and present all the concepts of machine learning that will help you to form a picture of this field faster. This article will be suitable for those who are just learning and who are already using it in practice.

Quick overview:

#1 Motivation

#2 Categories

#3 Types of problems

#4 Kind

#5 Performance Analysis

#6 Approaches (Algorithms)

#7 Tuning

Without further ado, let’s jump right in!

Machine Learning Concepts

Each of these concepts leads to other smaller derivative concepts. Here I try to give the shortest and simplest definition for each of the terms:

#1 Motivation.

Motivation is important for machine learning, ’cause it forms the process by which models are compared to data. There are two approaches to motivation in machine learning:

  • Prediction. Non-linear models that kelps to differentiate the predicted variable as a result of the inputs, but not each way of the inputs that affect the prediction. For example, a prediction-oriented perspective will be best suited to answer the question of: is my car over or undervalued? If to apply inference for this, the models will be much less interpretable.
  • Inference. Linear models to differentiate the way each one of the inputs that affect the prediction. For example, it will give you a precise answer of: how much would my car cost if it could be driven without a roof in place? Made through comparison of predictions from the model, Inference is easier to understand than their non-linear counterparts.

#2 Categories.

As with any method, there are different ways to train machine learning algorithms, each with their own advantages and disadvantages. Here they are:

  • Supervised . Task-driven approach, during which the computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs to outputs
  • Unsupervised. Data-driven approach with the goal to learn more about the data through modeling the underlying structure or distribution in the data. It can be both types: discovering hidden patterns in data, or a means towards an end (feature learning).
  • Reinforcement Learning. This category is based on learning from errors that trains algorithms using a system of reward and punishment.

#3 Types of problems.

If to go deeper into machine learning categories, there are five other types of problems:

  • Regression. Supervised type of problem where we need to predict the continuous-response value. Regression fits the data and gives an answer for all the feature points that are mapped. If the prediction value tends to be a continuous value then it falls under. For example: Giving area name, size of land, etc as features and predicting the expected cost of the land.
  • Classification. Supervised problem, the main aim of which is to separate the data. If the prediction value tends to be a category like yes/no, positive/negative, etc then it falls under classification type problems in machine learning. For example, given a sentence predicting whether it is negative or positive review
  • Clustering. Unsupervised problem where we group similar things together to a given number of clusters. The answer will not be given for the points. Example: Given 3, 4, 8, 9 and number of clusters to be 2 then the ML system might divide the given set into cluster 1–3, 4 and cluster 2–8, 9.
  • Density Estimation. It is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. Finds the distribution of inputs in some space.
  • Dimensionality Reduction. Simplifies inputs by mapping them into a lower-dimensional space.

#4 Kind.

A machine learning algorithm can be classified as either parametric or non-parametric:

  • Parametric — has a fixed number of parameters, and done in two steps:

Step 1: Making an assumption about the functional form or shape of our function (f), i.e.: f is linear, thus we will select a linear model.

Step 2: Selecting a procedure to fit or train our model. This means estimating the Beta parameters in the linear function. A common approach is the (ordinary) least squares, amongst others.

  • Non-Parametric — uses a flexible number of parameters, and the number of parameters often grows as it learns from more data. Since these methods do not reduce the problem of estimating f to a small number of parameters, a large number of observations is required in order to obtain an accurate estimate for f. An example would be the thin-plate spline model.

#5 Performance Analysis.

Performance analysis of an algorithm is the process of calculating the space and time required by that algorithm. Performance analysis of an algorithm is performed by using the following measures:

  • Confusion Matrix — a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.
  • Accuracy . Fraction of correct predictions, not reliable as skewed when the data set is unbalanced (that is, when the number of samples in different classes vary greatly)
  • f1 score — another measure of a test’s accuracy that is calculated due to: 1) Precision — out of all the examples the classifier labeled as positive, what fraction were correct? 2) Recall. Out of all the positive examples there were, what fraction did the classifier pick up?
  • ROC Curve — Receiver Operating Characteristics. True Positive Rate (Recall / Sensitivity) vs False Positive Rate (1-Specificity)
  • Bias–variance tradeoff — the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.
  • Mean Squared Error (MSE) — measures the average of the squares of the errors or deviations — that is, the difference between the estimator and what is estimated.
  • Error Rate. The proportion of mistakes made if we apply out estimate model function the training observations in a classification setting.

#6 Approaches (Algorithms)

Brace yourself, here’s the most interesting part! Here is how you can put machine learning in practice:

  • Decision tree learning — constructed via an algorithmic approach that identifies ways to split a data set based on different conditions.
  • Association rule learning — a rule-based machine learning and data mining technique that finds important relations between variables or features in a data set.
  • Artificial neural networks — an information processing model that is inspired by the way biological nervous systems, such as the brain, process information.
  • Deep learning — has networks capable of learning unsupervised from data that is unstructured or unlabeled. It teaches a computer to filter inputs through layers to learn how to predict and classify information.
  • Inductive logic programming — uses logic programming as a uniform representation, for example, background knowledge, and hypotheses.
  • Support vector machines — analyze data used for classification and regression analysis.
  • Clustering — the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
  • Bayesian networks — a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph.
  • Reinforcement learning — learns by interacting with its environment.
  • Feature learning — allows to discover the representations needed for feature detection or classification from raw data.
  • Similarity and metric learning — learns a similarity function that measures how similar or related two objects are.
  • Sparse dictionary learning — aims at finding a sparse representation of the input data in the form of a linear combination of basic elements.
  • Genetic algorithms — a metaheuristic inspired by the process of natural selection.
  • Rule-based machine learning — a data-driven approach that uses a labeled corpus of texts and their sentiments to predict.
  • Learning classifier systems — combines a discovery component with a learning component.

#7 Tuning

Tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. Here are the components of it:

Cross-validation— a technique that is used for the assessment of how the results of statistical analysis generalize to an independent data set. One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set).

Methods: Leave-p-out cross-validation, Leave-one-out cross-validation, k-fold cross-validation, Holdout method, and Repeated random sub-sampling validation.

Hyperparameters— a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are derived via training. They can be optimized using:

1) Grid Search. The traditional way, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm.

2) Random Search. It simply samples parameter settings a fixed number of times has been found to be more effective in high dimensional spaces than exhaustive search.

3) Gradient-based optimization. For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent.

Early Stopping (Regularization) —early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit, and stop the algorithm then.

Overfitting.It happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

Underfitting.The case where the model has “ not learned enough” from the training data, resulting in low generalization and unreliable predictions.

Bootstrap.It is any test or metric that uses random sampling with replacement and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates.

Bagging.It is an ensemble machine learning algorithm that combines the predictions from many decision trees.

………………………………………….…

To learn more about AI, ML & Data Science, feel free to subscribe to my Instagram and Medium blog. Also welcome to visit Linkedin .

If you are keen on SAP, welcome to my SAP S4 Hana channel on Telegram with useful material, and inspiration to grow. Besides, there is also my chat , where you can discuss everything on SAP.

Let’s grow together!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK