HyperLearn is a Statsmodel, a result of the collaboration of languages such as PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and has similarities to Scikit Learn.
This project started last month by Daniel Hanchen and still has some unstable packages. He aims to make Linear Regression, Ridge, PCA, LDA/QDA faster, which then flows onto other algorithms being faster.
This Statsmodels combo incorporates novel algorithms to make it 50% more faster and enables it to use 50% lesser RAM along with a leaner GPU Sklearn.
HyperLearn also has an embedded statistical inference measures, and can be called similar to a Scikit Learn’s syntax (model.confidence_interval_)
HyperLearn’s Speed/ Memory comparison
There is a 50%+ improvement on Quadratic Discriminant Analysis (similar improvements for other models) as can be seen below:
Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )
Key Methodologies and Aims of the HyperLearn project
#1 Parallel For Loops
- Hyperlearn for loops will include Memory Sharing and Memory Management
- CUDA Parallelism will be made possible through PyTorch & Numba
#2 50%+ faster and leaner
- Matrix operations that have been improved include Matrix Multiplication Ordering, Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3), reducing Matrix Operations to Einstein Notation and Evaluating one-time Matrix Operations in succession to reduce RAM overhead.
- Applying QR Decomposition and then SVD(Singular Value decomposition) might be faster in some cases.
- Utilise the structure of the matrix to compute faster inverse
- Computing SVD(X) and then getting pinv(X) is sometimes faster than pure pinv(X)
#3 Statsmodels is sometimes slow
- Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
- Using Einstein Notation & Hadamard Products where possible.
- Computing only what is necessary to compute (Diagonal of matrix only)
- Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.
#4 Deep Learning Drop In Modules with PyTorch
- Using PyTorch to create Scikit-Learn like drop in replacements.
#5 20%+ Less Code along with Cleaner Clearer Code
- Using Decorators & Functions wherever possible.
- Intuitive Middle Level Function names like (isTensor, isIterable).
- Handles Parallelism easily through hyperlearn.multiprocessing
#6 Accessing Old and Exciting New Algorithms
- Matrix Completion algorithms – Non Negative Least Squares, NNMF
- Batch Similarity Latent Dirichelt Allocation (BS-LDA)
- Correlation Regression and many more!
Daniel further went on to publish some prelim algorithm timing results on a range of algos from MKL Scipy, PyTorch, MKL Numpy, HyperLearn’s methods + Numba JIT compiled algorithms
Here are his key findings on the HyperLearn statsmodel:
- HyperLearn’s Pseudoinverse has no speed improvement
- HyperLearn’s PCA will have over 200% improvement in speed boost.
- HyperLearn’s Linear Solvers will be over 1 times faster i.e it will show a 100% improvement in speed
You can find all the details of the test on reddit.com
For more insights on HyperLearn, check out the release notes on Github.