2 min read

The top tools and languages used in machine learning for 2018 were revealed in the GitHub The State of the Octoverse: Machine Learning. The general observation showed TensorFlow being one of the projects with the most number of contributions which is not surprising considering its age and popularity. Python was in the second place of the most popular languages on GitHub after JavaScript and Java.

The data on contributions of whole of 2018 led to some insights. Contributions are pushing code, pull requests, opening an issue, commenting, or any other related activities. Data consists of all public repositories and any private repositories that have opted in for the dependency graph.

Top languages used for machine learning on GitHub

The primary language used in a repository tagged with machine-learning is considered to rank the languages. Python is at the top followed by C++. Java makes it to the top 5 with JavaScript. What’s interesting is the growth of Julia which has bagged the sixth spot considering that it is a relatively new language. R, popular for data analytics tasks also shows up thanks to its wide range of libraries for many tasks.

  1. Python
  2. C++
  3. JavaScript
  4. Java
  5. C#
  6. Julia
  7. Shell
  8. R
  9. TypeScript
  10. Scala

Top machine learning and data science packages

Projects tagged with data science or machine learning that import Python packages were considered. NumPy, which is used for mathematical operations, is used in 74% of the projects. This is not surprising as it is a supporting package for scikit-learn among others. SciPy, pandas, and matplotlib are used in over 40% of the projects. scikit-learn is a collection of many algorithms and is used in 38% of the packages. TensorFlow is used in 24% of the projects, even though it is popular the use cases for it are narrow.

  1. numpy (74%)
  2. scipy (47%)
  3. pandas (41%)
  4. matplotlib (40%)
  5. scikit-learn (38%)
  6. six (31%)
  7. tensorflow (24%)
  8. requests (23%)
  9. python-dateutil (22%)
  10. pytz (21%)

Machine learning projects with most contributions

Tensorflow had the most contributions followed by scikit-learn. Julia again seems to have been garnering interest ranking fourth in this list.

  1. tensorflow/tensorflow
  2. scikit-learn/scikit-learn
  3. explosion/spaCy
  4. JuliaLang/julia
  5. CMU-Perceptual-Computing-Lab/openpose
  6. tensorflow/serving
  7. thtrieu/darkflow
  8. ageitgey/face-recognition
  9. RasaHQ/rasa_nlu
  10. tesseract-ocr/tesseract

Read next

GitHub Octoverse: The top programming languages of 2018

What we learnt from the GitHub Octoverse 2018 Report

Julia for machine learning. Will the new language pick up pace?

Data science enthusiast. Cycling, music, food, movies. Likes FPS and strategy games.