Data

PyTorch-based HyperLearn Statsmodels aims to implement a faster and leaner GPU Sklearn

3 min read

HyperLearn is a Statsmodel, a result of the collaboration of languages such as PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and has similarities to Scikit Learn.

This project started last month by Daniel Hanchen and still has some unstable packages. He aims to make Linear Regression, Ridge, PCA, LDA/QDA faster, which then flows onto other algorithms being faster.

This Statsmodels combo incorporates novel algorithms to make it 50% more faster and enables it to use 50% lesser RAM along with a leaner GPU Sklearn.

HyperLearn also has an embedded statistical inference measures, and can be called similar to a Scikit Learn’s syntax (model.confidence_interval_)

HyperLearn’s Speed/ Memory comparison

There is a  50%+ improvement on Quadratic Discriminant Analysis (similar improvements for other models) as can be seen below:

Source: GitHub

Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )

Key Methodologies and Aims of the HyperLearn project

#1 Parallel For Loops

  • Hyperlearn for loops will include Memory Sharing and Memory Management
  • CUDA Parallelism will be made possible through PyTorch & Numba

#2 50%+ faster and leaner

  • Matrix operations that have been improved include  Matrix Multiplication Ordering, Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3), reducing Matrix Operations to Einstein Notation and Evaluating one-time Matrix Operations in succession to reduce RAM overhead.
  • Applying QR Decomposition and then SVD(Singular Value decomposition) might be faster in some cases.
  • Utilise the structure of the matrix to compute faster inverse
  • Computing SVD(X) and then getting pinv(X) is sometimes faster than pure pinv(X)

#3 Statsmodels is sometimes slow

  • Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
  • Using Einstein Notation & Hadamard Products where possible.
  • Computing only what is necessary to compute (Diagonal of matrix only)
  • Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.

#4 Deep Learning Drop In Modules with PyTorch

  • Using PyTorch to create Scikit-Learn like drop in replacements.

#5 20%+ Less Code along with Cleaner Clearer Code

  • Using Decorators & Functions wherever possible.
  • Intuitive Middle Level Function names like (isTensor, isIterable).
  • Handles Parallelism easily through hyperlearn.multiprocessing

#6 Accessing Old and Exciting New Algorithms

  • Matrix Completion algorithms – Non Negative Least Squares, NNMF
  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)
  • Correlation Regression and many more!

       
Daniel further went on to publish some prelim algorithm timing results on a range of algos from MKL Scipy, PyTorch, MKL Numpy, HyperLearn’s methods + Numba JIT compiled algorithms

Here are his key findings on the HyperLearn statsmodel:

  1. HyperLearn’s Pseudoinverse has no speed improvement
  2. HyperLearn’s PCA will have over 200% improvement in speed boost.
  3. HyperLearn’s Linear Solvers will be over 1 times faster i.e  it will show a 100% improvement in speed

You can find all the details of the test on reddit.com

For more insights on HyperLearn, check out the release notes on Github.

Read Next

A new geometric deep learning extension library for PyTorch releases!

NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning?

Introduction to Sklearn

Melisha Dsouza

Share
Published by
Melisha Dsouza

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago