In a study published in Nature Machine Intelligence, researchers discovered that in some cases of machine learning it cannot be proved whether the system actually ‘learned’ something or solved the problem. They explore machine learning learnability.

## Axioms leading to axioms in arithmetic models

We already know that machine learning systems, and AI systems in general are black boxes. You feed the system some data, you get some output or a trained system that performs some tasks but you don’t know how the system arrived at a particular solution. Now we have a published study from Ben-Davis et al that shows learnability in machine learning is undecidable.

In the 1930s, Austrian logician Kurt Gödel showed that a set of axioms forming an arithmetic model lead to more axioms. In the following decades it was demonstrated that the continuum hypothesis can neither be proved nor refuted using standard mathematical axioms. The hypothesis states that no set of objects is larger in size than integers or smaller in size than real numbers.

## What does this have to do with machine learning?

In machine learning, algorithms are designed to improve performance of certain actions with the data they are trained on. Some problems like facial recognition or recommendation engines cannot be created with regular linear programming. These are problems that can be solved today by machine learning. Machine learning learnability can be defined. A system can be considered learnable if the machine learning model can perform as the best predictor in a family of functions. This needs to be achieved under some reasonable constraints.

Typically learnability in a model can be explained by analysing dimensions. But this new research shows that this is not always the case. A learning model introduced in the paper is the focus of the research: estimating the maximum (EMX) which is similar to PAC learning. The authors of the paper discover a family of functions whose learnability in EMX cannot be proved with standard mathematics.

## What is the EMX problem?

As described in the paper, the EMX problem is: “Let X be some domain set, and let F be a family of functions from X to {0, 1} (we often think of each function f∈F as a subset of X and vice versa). Given a sample S of elements drawn i.i.d. from some unknown distribution P over X, the EMX problem is about finding a function f ∈ F that approximately maximizes the expectation EP(f) with respect to P.”

In the paper, the authors present an example problem—displaying specific ads to the most frequent visitors of a website. The catch is, which visitors will visit the website is unknown. Now the EMX problem is formed as a question—what is a learner’s ability to find a function whose expected value is the largest. They show a relation between machine learning and data compression. If training data labelled by a function can be compressed, then the family from which the function originates has low complexity. Such a function is considered learnable.

## Monotone compression

Algorithms can be used to compress data. A new one called monotone compression is introduced. They show that this compression is suitable to describe the learnability of function families in the EMX problem. A weak monotone compression is associated with the cardinality of particular infinite sets. The authors use the interval [0, 1] which contains real numbers.

The results show that the finite subsets in the interval [0, 1] have monotone compression and are therefore considered learnable in EMX. But, this applied only if the continuum hypothesis is true which stands to be unprovable to date.

## The problem is how you define learnability

In the concluding points, the paper points out an interesting perspective as to why current machine learning models get off easy without any questions about learnability. Or do they? The problem lies in how learnability is defined—as functions or as algorithms?. Current standard definitions focus on the theoretical aspect without considering computational implications. This approach in viewing learnability levies a high cost when more general types of learning is involved.

You can read the research paper by Shai Ben-David and others about learnability being undecidable at the Nature journal website. 