1

Why I re-implemented numpy in golang?

 3 years ago
source link: http://praveenpenumaka.github.io/posts/numpygo
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Why I re-implemented numpy in golang?

June 7, 2020

As part of my office work, I am working on a service that predicts based on machine learning classification algorithm ( XGBoost ) trained on historic data. More details here. We used golang service for orchestration/experimentation and python service for serving XGBoost. To be honest, when we started out, I had practical experience with neither xgboost nor golang.

Around the same time, while browsing /r/MachineLearning, I stumbled on a project by Erik called ML from scratch, which inspired me to start implementing few of the ML algorithms for understanding the internal workings. I decided to write the code in golang instead of python as a learning opportunity and as an opportunity to write a lot of golang code. I started the project “ML-From-Scratch-In-Go” which is a golang clone of Erik’s project. Quickly, I realized how much array data manipulation is needed and most of it is being handled by numpy in python implementation.

Numpy is an elegant piece of code. NumPy’s website claims that it has a powerful N-dimensional array object. NumPy’s core is actually implemented in c++ instead of python for performance and it has a lot of functionality, a lot. There are a lot of new concepts that are abstracted out for end-user like Internal representation, broadcasting, dimension reduction, etc. I explored there were libraries that tried to replicate numpy in golang ( numgo and other libraries ) either by binding c++ code to golang or re-implementing them. My main concern is how much of numpy is covered by the semi-managed libraries? How much do I need? I do not know. The last thing I want to do is getting stuck midway because that particular library is not supporting some weird edge cases

I started a library project to separate out array manipulation in golang, you know, separation of concerns and all. That is the start of the fun. Here are the things I had to go through

  1. 80% of my time was being spent on numpygo rather than actual ML algorithm implementation.
  2. I had to understand a lot of internal abstractions being used in numpy.
  3. I re-wrote the code at least twice to get the right architecture in place, even to this day, It still doesn’t feel right.
  4. it has only basic test cases that are required for my implementation.
  5. It is certainly not optimized for performance,
  6. Looking at my golang knowledge, it probably is not the optimized way of writing code as well

It took 4 months, apart from my full-time job to finish basic required functionalities of numpy along with xgboost implementation. I ended up writing which I assume 20% of numpy. Here is the code on github

Do I regret not picking some other library? Hell no. I had a lot of fun writing the code. I had a lot of learning on both numpy and golang that no other project could have given me. I still feel I need to learn more about both

I am planning to extend the “ML from scratch” project to implement neural networks which should increase the test cases hence generalizing the numpygo library. If someone wants to improve on the project, feel free to give a PR on github

More on XGBoost in later posts

Discussion, links, and tweets


Recommend

  • 194
    • Github github.com 7 years ago
    • Cache

    GitHub - numpy/numpy: Numpy main repository

    NumPy is the fundamental package for scientific computing with Python. It provides: a powerful N-dimensional array object sophisticated (broadcasting) functions tools for integrating C/C++ and Fortran code

  • 138

    NumPy Exercises In numerical computing in python, NumPy is essential. I'm writing simple (a few lines for each problem) but hopefully helpful exercises based on each of numpy's functions. The outline will be as follows. Array...

  • 113
    • www.solidot.org 7 years ago
    • Cache

    NumPy 将停止支持 Python 2

    solidot新版网站常见问题,请点击这里查看。 提交文章 ...

  • 86
    • Github github.com 7 years ago
    • Cache

    GitHub - shinseung428/gan_numpy

    README.md GAN in Numpy This is a very simple step by step implementation of GAN using only numpy. Without the use of GPU, it takes too much time to generate all the numbers. To get...

  • 121
    • www.jianshu.com 7 years ago
    • Cache

    NumPy Tips - 简书

    在机器学习领域中,NumPy是最基本的数据结构,用于存储矩阵和执行与矩阵计算相关的操作。本文主要分享关于NumPy的一些使用小技巧,通过矩阵计算避免循环逻辑。 概率矩阵 转 OneHot矩阵 列表的置信区间 桶区间索引列表 异常值检测 连续列表离散化 概率矩阵 转 OneHo...

  • 266

    README.md PyTorch for Numpy users.

  • 104
    • www.linuxprobe.com 6 years ago
    • Cache

    《NumPy Essentials》pdf电子书免费下载

    本书带领读者了解熟悉当下最流行的科学计算库NumPy的方方面面。书中不仅介绍了NumPy的安装、使用和各种相关概念,还介绍了如何利用这一最新的开源软件库,以尽可能接近传统数学语言的方式,编写可读性好、实现效率高和运行速度快的代码。最后还探究了几个和NumP...

  • 36

    Authors:  Cody Cutler, M. Frans Kaashoek, and Robert T. Morris, MIT CSAIL Abstract:  This paper presents an evaluation of the use of a high-level language (HLL) with garbage...

  • 45
    • www.tuicool.com 6 years ago
    • Cache

    Word2vec from Scratch with NumPy

    Word2vec from Scratch with NumPy How to implement a Word2vec model with Python and NumPy Ivan Chen ...

  • 11

    Godis 中文版 Godis is a golang implementation of Redis Server, which intents to provide an example of writing a high concurrent middleware us...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK