News

GitHub Octoverse: top machine learning packages, languages, and projects of 2018

2 min read

The top tools and languages used in machine learning for 2018 were revealed in the GitHub The State of the Octoverse: Machine Learning. The general observation showed TensorFlow being one of the projects with the most number of contributions which is not surprising considering its age and popularity. Python was in the second place of the most popular languages on GitHub after JavaScript and Java.

The data on contributions of whole of 2018 led to some insights. Contributions are pushing code, pull requests, opening an issue, commenting, or any other related activities. Data consists of all public repositories and any private repositories that have opted in for the dependency graph.

Top languages used for machine learning on GitHub

The primary language used in a repository tagged with machine-learning is considered to rank the languages. Python is at the top followed by C++. Java makes it to the top 5 with JavaScript. What’s interesting is the growth of Julia which has bagged the sixth spot considering that it is a relatively new language. R, popular for data analytics tasks also shows up thanks to its wide range of libraries for many tasks.

  1. Python
  2. C++
  3. JavaScript
  4. Java
  5. C#
  6. Julia
  7. Shell
  8. R
  9. TypeScript
  10. Scala

Top machine learning and data science packages

Projects tagged with data science or machine learning that import Python packages were considered. NumPy, which is used for mathematical operations, is used in 74% of the projects. This is not surprising as it is a supporting package for scikit-learn among others. SciPy, pandas, and matplotlib are used in over 40% of the projects. scikit-learn is a collection of many algorithms and is used in 38% of the packages. TensorFlow is used in 24% of the projects, even though it is popular the use cases for it are narrow.

  1. numpy (74%)
  2. scipy (47%)
  3. pandas (41%)
  4. matplotlib (40%)
  5. scikit-learn (38%)
  6. six (31%)
  7. tensorflow (24%)
  8. requests (23%)
  9. python-dateutil (22%)
  10. pytz (21%)

Machine learning projects with most contributions

Tensorflow had the most contributions followed by scikit-learn. Julia again seems to have been garnering interest ranking fourth in this list.

  1. tensorflow/tensorflow
  2. scikit-learn/scikit-learn
  3. explosion/spaCy
  4. JuliaLang/julia
  5. CMU-Perceptual-Computing-Lab/openpose
  6. tensorflow/serving
  7. thtrieu/darkflow
  8. ageitgey/face-recognition
  9. RasaHQ/rasa_nlu
  10. tesseract-ocr/tesseract

Read next

GitHub Octoverse: The top programming languages of 2018

What we learnt from the GitHub Octoverse 2018 Report

Julia for machine learning. Will the new language pick up pace?

Prasad Ramesh

Data science enthusiast. Cycling, music, food, movies. Likes FPS and strategy games.

Share
Published by
Prasad Ramesh

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago