22

Huawei’s MindSpore: A new competitor for TensorFlow and PyTorch?

 4 years ago
source link: https://towardsdatascience.com/huaweis-mindspore-a-new-competitor-for-tensorflow-and-pytorch-d319deff2aec?gi=9edbf41ddd23
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

IrE7Rje.png!web

Source: MindSpore

Huawei announced that its TensorFlow and PyTorch-style MindSpore Deep Learning middleware is now open source. Discover in this post its most relevant characteristics.

Huawei has just announced that its MindSpore framework for the development of artificial intelligence applications becomes open source and is available on GiHub and Gitee . MindSpore is another Deep Learning framework for the training of neural network models, similar to TensorFlow or PyTorch designed to be used from the Edge to the Cloud, that is supporting both GPUs and obviously Huawei Ascend processors.

It was in August of last year when Huawei announced the official launch of its Ascend processor that MindSpore was first introduced, stating then that “ In a typical training session based on ResNet-50, the combination of Ascend 910 and MindSpore is about two times faster at training AI models than other mainstream training cards using TensorFlow”.

It is true that there are many frameworks that have been coming out in recent years and perhaps MindSpore is no more than one of the bunch that even remotely can compete with TensorFlow (supported by Google) and PyTorch (supported by Facebook). But when I read this new a few days ago, that MindSpore was becoming open source, the same feeling of excitement ran through my body as it did in November 2015 when I read about TensorFlow for the first time. Just like then, I could not say what, but something tells me that MindSpore may become the third party in discordance.

And that is why this Easter weekend where we all find ourselves in the same situation, #StayAtHome , I have started to look a little more deeply this framework right now. The available technical documentation is still very scarce, disorganized and with many errors, but this has not prevented me from getting an idea of what they are proposing. Below I share with all of you my first findings.

System architecture

MindSpore website describes that the framework is structured in three main layers: Frontend Expression , Graph Engine , and Backend Runtime. The following figure shows a visual scheme:

aiumu2q.png!web

Source: MindSpore

The first layer of MindSpore offers a Python API for programmers. Since the lingua franca in our community is de facto Python, as could not be otherwise, MindSpore wants to compete with PyTorch and TensorFlow. With this API, programmers are allowed to manage the models (training, inference, …) and process the data. This first layer also includes support for an intermediate representation of the code (MindSpore IR) on which will be based many of the optimizations that can be performed in parallelization and automatic differentiation (GHLO).

Below is the Graph Engine layer that offers the necessary functionalities to create and perform the automatic differentiation of the execution graph. I can read in the website that with MindSpore they have opted for an automatic differentiation model different from PyTorch (which creates a dynamic execution graph) or TensorFlow (although it had initially adopted the option of creating a more efficient static execution graph, currently it also offers the option of a dynamic execution graph and allowing the static graph version using @tf.function decorator of its low level API). The MindSpore choice consists of converting the original code into the intermediate code format (MindSpore IR) so that it allows to take advantage of the two models (more detail can be found in the Automatic Differentiation section on the MindSpore website). I have not checked the GiHub code nor been able to evaluate the code, but it seems that the approach they propose makes sense.

The last layer is made up of all the libraries and runtimes necessary to support the different hardware architectures where the code will be processed. I have not found information about this layer on the web (and I have not gone to the GitHub to look at the code at all), but I as far as I can understand, it will be a backend very close to that of the other frameworks, perhaps with peculiarities of Huawei, for example with libraries such as HCCL ( Huawei Collective Communication Library ) equivalent to NVIDIA NCCLs ( NVIDIA Collective Communication Library ).

MindSpore installation and my first neural network

As often happens when a new framework appears, the information in the installation section is very scarce, without the binary codes for all platforms, and you have to cross your fingers waiting for the solution to appear in the Issues section of the GitHub/Gitee. In my case, on a MacBook Pro (Intel Core i7), I was able to install the docker they offer on their Docker Hub by following the instructions on this issues web page on Gitee .

With this docker, and with a little time, I have been able to program my first neural network using MindSpore API. Specifically a LeNet to classify MNIST digits based on an example of code provided by the MindSpore tutorial. In a subsequent post, I will introduce this code in more detail to show that the MindSpore AP, actually borrows a lot of syntax from the PyTorch API and the Keras API from TensorFlow. Therefore, it is easy to get started with this new API for anyone who is active in neural network programming.

Support for the visualization of the training process

According to the MindSpore tutorial, although it was not possible to install and use them, they have MindInsight to generate visualizations that are somewhat reminiscent of TensorFlow’s TensorBoard. Take a look at some of the screenshots they show on their website:

EnARFfa.png!web

Source: MindSpore

6fiQV3y.png!web

Source: MindSpore

According to the manual, MindSpore currently uses the Callback mechanism (reminiscent of how it is done with Keras) to record (in a log file) during the training process all those parameters and hyperparameters of the model that we want, as well as the graph of computation when the compilation of the neural network to the intermediate code has finished.

Parallelism

Due to installing a version of MindSpore with a single CPU (through a docker), I have not been able to test the parallelization it performs, and the information on this subject on its website is currently scarce.

At the moment I can mention that in their tutorial they talk about two parallelization modes ( DATA_PARALLEL and AUTO_PARALLEL ) and present a code example that trains a ResNet-50 with the CIFAR dataset for an Ascend 910 processor (which I have not been able to verify). I can imagine that DATA_PARALLEL refers to the strategy commonly known as data parallelism , which consists of dividing training data into several subsets, each of which is executed in the same model replica but in different processing units.

I far as I understood, the Graph Engine layer support is provided for the parallelization of codes and specifically for the AUTO_PARALLEL parallelism. AUTO_PARALLEL mode automatically optimizes parallelization by combining the data parallelism parallelization strategy (discussed above) with the model parallelism strategy , in which the model is divided into different parts and each part is executed in parallel in different processing units. This automatic mode selects the parallelization strategy that offers the best benefits, as can be read in the Automatic Parallel section of the MindSpore website (although they do not detail how the evaluation and decision are made).

We will have to wait to provide time to the technical team in order to expand the documentation and understand more details about the automatic parallelization strategy. But it is clear that this automatic parallelization strategy is crucial and it is where they must and can compete with TensorFlow or PyTorch, obtaining significantly higher performances using the Huawei processors.

Planned roadmap and how to contribute

It is evident that there is a lot of work to be done and at the moment they have ordered the ideas they have in mind for next year in an extensive roadmap shared on this page , but they claim that the priorities will be adjusted according to user feedback . At the moment, we can find these main lines:

  1. Support more models (pending classic models, GAN, RNN, Transformers, Reinforcement Learning models, probabilistic programming , AutoML, etc.).
  2. Expand API and libraries to improve usability and programming experience (more operators, more optimizers, more loss functions, etc.)
  3. Comprehensive support for the Huawei Ascend processor and optimizing its performance (optimizing compilation, improving the use of resources, etc.)
  4. Evolution of the software stack and perform optimizations of the computational graph (improve the intermediate representation IR, add more optimization opportunities, etc.).
  5. Support more programming languages (not just Python).
  6. Improve distributed training with optimizations of automatic scheduling, data distribution, etc.
  7. Improve the MindInsight tool to make it easier for the programmer to “debug” and improve the tuning of hyperparameters during the training process.
  8. Advance in offering functionalities of the inference environment in the devices that are in the Edge (security, support models of other frameworks through the ONNX standard, etc.)

On the Community page, you can see that MindSpore has partners beyond Huawei and China, such as the University of Edinburgh, Imperial College London, University of Muenster (Germany) or University Paris-Saclay. They say they will follow an open governance model and invite the entire community to contribute both in the code and in the documentation .

Conclusion

After a quick first glance, it seems to be right the design and implementation decisions (parallelism and automatic differentiation for example), which can add opportunities to allow improvements and optimizations that achieve better performance than the frameworks they want to beat. But there is still a huge amount of work ahead to catch PyTorch and TensorFlow, and above all, create a community around, not easy! However, all we already know that with the support of one big company in the sector such as Huawei, everything is possible, or was it evident three years ago, when the first version of PyTorch (Facebook) came out that it could be close on the heels of TensorFlow (Google)?

I hope this post has been useful for you, at least I have found it very interesting to write. See you in the next post where I will tell you more details about how I have programmed my first neural network in MindSpore. An entertaining experience since we are in #StayAtHome situation .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK