Python is one of the best programming languages for machine learning, quickly coming to rival R’s dominance in academia and research. But why is Python so popular in the machine learning world? Why is Python good for AI?
Mike Driscoll spoke to five Python experts and machine learning community figures about why the language is so popular as part of the book Python Interviews.
Programming is a social activity – Python’s community has acknowledged this best
Glyph Lefkowitz (@glyph), founder of Twisted, a Python network programming framework, awarded The PSF’s Community Service Award in 2017
AI is a bit of a catch-all term that tends to mean whatever the most advanced areas in current computer science research are.
There was a time when the basic graph-traversal stuff that we take for granted was considered AI. At that time, Lisp was the big AI language, just because it was higher-level than average and easier for researchers to do quick prototypes with. I think Python has largely replaced it in the general sense because, in addition to being similarly high-level, it has an excellent third-party library ecosystem, and a great integration story for operating system facilities.
Lispers will object, so I should make it clear that I’m not making a precise statement about Python’s position in a hierarchy of expressiveness, just saying that both Python and Lisp are in the same class of language, with things like garbage collection, memory safety, modules, namespaces and high-level data structures.
In the more specific sense of machine learning, which is what more people mean when they say AI these days, I think there are more specific answers. The existence of NumPy and its accompanying ecosystem allows for a very research-friendly mix of high-level stuff, with very high-performance number-crunching. Machine learning is nothing if not very intense number-crunching.
“…Statisticians, astronomers, biologists, and business analysts have become Python programmers and have improved the tooling.”
Machine learning is a particularly integration-heavy discipline, in the sense that any AI/machine learning system is going to need to ingest large amounts of data from real-world sources as training data, or system input, so Python’s broad library ecosystem means that it is often well-positioned to access and transform that data.
Python allows users to focus on real problems
Python is very easy to understand for scientists who are often not trained in computer science. It removes many of the complexities that you have to deal with, when trying to drive the external libraries that you need to perform research.
After Numeric (now NumPy) started the development, the addition of IPython Notebooks (now Jupyter Notebooks), matplotlib, and many other tools to make things even more intuitive, Python has allowed scientists to mainly think about solutions to problems and not so much about the technology needed to drive these solutions.
“Python is an ideal integration language which binds technologies together with ease.”
As in other areas, Python is an ideal integration language, which binds technologies together with ease. Python allows users to focus on the real problems, rather than spending time on implementation details. Apart from making things easier for the user, Python also shines as an ideal glue platform for the people who develop the low-level integrations with external libraries. This is mainly due to Python being very accessible via a nice and very complete C API.
Python is really easy to use for math and stats-oriented people
I think there are two main reasons, which are very related. The first reason is that Python is super easy to read and learn.
I would argue that most people working in machine learning and AI want to focus on trying out their ideas in the most convenient way possible. The focus is on research and applications, and programming is just a tool to get you there. The more comfortable a programming language is to learn, the lower the entry barrier is for more math and stats-oriented people.
Python is also super readable, which helps with keeping up-to-date with the status quo in machine learning and AI, for example, when reading through code implementations of algorithms and ideas. Trying new ideas in AI and machine learning often requires implementing relatively sophisticated algorithms and the more transparent the language, the easier it is to debug.
The second main reason is that while Python is a very accessible language itself, we have a lot of great libraries on top of it that make our work easier. Nobody would like to spend their time on reimplementing basic algorithms from scratch (except in the context of studying machine learning and AI). The large number of Python libraries which exist, help us to focus on more exciting things than reinventing the wheel.
Python is also an excellent wrapper language for working with more efficient C/C++ implementations of algorithms and CUDA/cuDNN, which is why existing machine learning and deep learning libraries run efficiently in Python. This is also super important for working in the fields of machine learning and AI.
To summarize, I would say that Python is a great language that lets researchers and practitioners focus on machine learning and AI and provides less of a distraction than other languages.
Python has so many features that are attractive for scientific computing
The most important and immediate reason is that the NumPy and SciPy libraries enable projects such as scikit-learn, which is currently almost a de facto standard tool for machine learning.
The reason why NumPy, SciPy, scikit-learn, and so many other libraries were created in the first place is because Python has some features that make it very attractive for scientific computing. Python has a simple and consistent syntax which makes programming more accessible to people who are not software engineers.
“Python benefits from a rich ecosystem of libraries for scientific computing.”
Another reason is operator overloading, which enables code that is readable and concise. Then there’s Python’s buffer protocol (PEP 3118), which is a standard for external libraries to interoperate efficiently with Python when processing array-like data structures. Finally, Python benefits from a rich ecosystem of libraries for scientific computing, which attracts more scientists and creates a virtuous cycle.
Python is good for AI because it is strict and consistent
What we’re doing in that field is developing our math and algorithms. We’re putting the algorithms that we definitely want to keep and optimize into libraries such as scikit-learn. Then we’re continuing to iterate and share notes on how we organize and think about the data.
A high-level scripting language is ideal for AI and machine learning, because we can quickly move things around and try again. The code that we create spends most of its lines on representing the actual math and data structures, not on boilerplate.
A scripting language like Python is even better, because it is strict and consistent. Everyone can understand each other’s Python code much better than they could in some other language that has confusing and inconsistent programming paradigms.
The availability of tools like IPython notebook has made it possible to iterate and share our math and algorithms on a whole new level. Python emphasizes the core of the work that we’re trying to do and completely minimizes everything else about how we give the computer instructions, which is how it should be. Automate whatever you don’t need to be thinking about.