3

The Classic Machine Learning Workflow

 2 years ago
source link: https://codecondo.com/the-classic-machine-learning-workflow/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Classic Machine Learning Workflow

Any Machine Learning project has a lot of detailed processes and each process has a certain level of complexity of its own. But to make this easier, there is a basic Workflow that every team which is working on a specific ML project follows. We will be walking you through what this workflow exactly is like. Data collection, pre-processing, dataset generation, model training and refinement, evaluation, and deployment to production are all everyday activities. Some aspects of the machine learning operations cycle can be automated, such as model and feature selection, but not all.

Even though these phrases are universally accepted as industry standards, there is still room for development. You must initially explain the project before determining a suitable strategy while designing a machine learning method. Moreover, it would help if you do not attempt to fit the model into a predetermined plan and create a scalable approach instead, one that allows you to start small and scale up to a production-ready solution. 

The Four Major Stages

The Machine Learning Workflow is divided into four major steps. The “Prototype” phase is an iterative process in and of itself; it should be stressed. 

Stage 1: Define

The first phase focuses on the problem that has to be solved in the business. It’s all about sitting with the team and looking at the problem with a zoomed out lens. The project expectations and deliverables are determined once a shared knowledge of the initial problem has been established. The solution’s overall shape (e.g., real-time application) and model type become evident, while the specifics are left intentionally vague. In any case, this will generate implementation ideas. Consider Design Thinking as a workshop method or as a workshop approach.

During this phase, the following guiding questions are addressed:

  • What are the implications of the model’s findings?
  • Are there any other options or manual methods?
  • What can we expect from a viable solution?

It is assumed that the relevant problem has been addressed, and the framework is formed at the end of this phase.

Stage 2: Prototype

Proof of concept is the focus of the second phase. Data collection, analytical preparation, and the selection of a first model are the main procedures involved in this phase. Data professionals now have unprecedented access and practically endless freedom to experiment with the model’s and database’s architecture. Iteratively, new influencing elements are gathered from the data, and their influence on the model is assessed. The working environment is sometimes referred to as Data Lab because of its distinct independence.

At the end of this round of testing, a usable model prototype should emerge. The prototype is  workable but not the perfect answer to the problem. Extensive optimizations beyond the fourth decimal place are only relevant in a few cases when deciding whether a model is suitable for practical application. Answering the following questions is a significant focus of this phase:

  • Is the quantity and quality of data adequate?
  • Is it possible to solve the problem with a machine learning model?
  • Is it necessary to replace the current procedure?

Because the model will be improved iteratively in the future, you may go on to the next phase as soon as the model design meets the requirements.

Stage 3: Production

During the transition to the Production phase, the decision to apply the Machine Learning approach is made. The model can be used in a production context if it shows to be feasible. If the prototypes can’t show that they’re helpful, you should call a halt to your Machine Learning project at this point. Previous costs have been incurred and should not be used to influence decision-making. The primary outcomes, on the other hand, are not entirely useless. To avoid making incorrect decisions, it could be advantageous to bring in a third party.

Before going live, the strategy must overcome further organizational and cultural challenges. Efforts should be made at this level to increase user acceptance. In terms of technological implementation, the deployment benefits from a machine learning pipeline that is as continuous and integrated as possible between the data lab and the runtime environment. Because the most recent model types are not widely supported, the application of cutting-edge technologies may be limited depending on the design of the pipeline. As part of the automation, on-demand data transformation will be added to the channel.

This project phase answers the following questions, among others:

  • Is the user base enthusiastic about the model?
  • Which system architecture is the best fit for your needs? (On-premise, cloud, and cluster)
  • In what format should the findings be published? (e.g., via an API, a database, etc.)

After the production deployment, the model is ready for use with actual parameters. A shift in responsibilities frequently accompanies the introduction of a new model. Difficulties may arise during the handover, particularly about creative modelling methodologies.

Stage 4: Measure

The final phase ensures that your project will offer long-term value to your company. The model’s performance is tested throughout regular operation to accomplish this. Many changes occur in the exquisite setting, and these changes hurt the model’s output quality. This may be seen in the shifting of market shares or the altering of trends. This might have a negative influence on the overall quality of the model. As a result, the findings should be evaluated critically regularly and, if possible, compared to a standard. Factors like the model’s commercial value and the speed with which it provides results dictate the intervals.

The assumptions of the model must also be reviewed. If they are still valid, retraining the model with new data may result in a performance boost. Significant changes will either be a structural revision to the model or the addition of a new data source. The following difficulties are addressed in general at this phase:

  • Is the result’s quality still acceptable?
  • Are the model’s basic assumptions accurate?
  • Have the requirements changed?

To continue improving the model, it is necessary to version it. This is especially important when a model’s outputs must be reproducible at any time due to legal constraints, such as in the case of a credit inquiry.

Also Read: Understanding The Machine Learning Workflow


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK