The data on contributions of whole of 2018 led to some insights. Contributions are pushing code, pull requests, opening an issue, commenting, or any other related activities. Data consists of all public repositories and any private repositories that have opted in for the dependency graph.
Top languages used for machine learning on GitHub
Top machine learning and data science packages
Projects tagged with data science or machine learning that import Python packages were considered. NumPy, which is used for mathematical operations, is used in 74% of the projects. This is not surprising as it is a supporting package for scikit-learn among others. SciPy, pandas, and matplotlib are used in over 40% of the projects. scikit-learn is a collection of many algorithms and is used in 38% of the packages. TensorFlow is used in 24% of the projects, even though it is popular the use cases for it are narrow.
- numpy (74%)
- scipy (47%)
- pandas (41%)
- matplotlib (40%)
- scikit-learn (38%)
- six (31%)
- tensorflow (24%)
- requests (23%)
- python-dateutil (22%)
- pytz (21%)
Machine learning projects with most contributions
Tensorflow had the most contributions followed by scikit-learn. Julia again seems to have been garnering interest ranking fourth in this list.