3 min read

Yesterday, the team at PyTorch announced the availability of PyTorch Hub which is a simple API and workflow that offers the basic building blocks to improve machine learning research reproducibility.

Reproducibility plays an important role in research as it is an essential requirement for a lot of fields related to research including the ones based on machine learning techniques. But most of the machine learning based research publications are either not reproducible or are too difficult to reproduce.

With the increase in the number of research publications, tens of thousands of papers being hosted on arXiv and submissions to conferences, research reproducibility has now become even more important. Though most of the publications are accompanied by code and trained models that are useful but still it is difficult for users to figure out for most of the steps, themselves.

PyTorch Hub consists of a pre-trained model repository that is designed to facilitate research reproducibility and also to enable new research. It provides built-in support for Colab, integration with Papers With Code and also contains a set of models including classification and segmentation, transformers, generative, etc. By adding a simple hubconf.py file, it supports the publication of pre-trained models to a GitHub repository, which provides a list of models that are to be supported and a list of dependencies that are required for running the models.

For example, one can check out the torchvision, huggingface-bert and gan-model-zoo repositories. Considering the case of torchvision hubconf.py: In torchvision repository, each of the model files can function and can be executed independently. These model files don’t require any package except for PyTorch and they don’t need separate entry-points.

A hubconf.py can help users to send a pull request based on the template mentioned on the GitHub page.

The official blog post reads, “Our goal is to curate high-quality, easily-reproducible, maximally-beneficial models for research reproducibility. Hence, we may work with you to refine your pull request and in some cases reject some low-quality models to be published. Once we accept your pull request, your model will soon appear on Pytorch hub webpage for all users to explore.”

PyTorch Hub allows users to explore available models, load a model as well as understand the kind of methods available for any given model. Below mentioned are few of the examples:

Explore available entrypoints:

With the help of torch.hub.list() API, users can now list all available entrypoints in a repo.  PyTorch Hub also allows auxillary entrypoints apart from pretrained models such as bertTokenizer for preprocessing in the BERT models and making the user workflow smoother.

Load a model:

With the help of torch.hub.load() API, users can load a model entrypoint. This API can also provide useful information about instantiating the model.

Most of the users are happy about this news as they think it will be useful for them. A user commented on HackerNews, “I love that the tooling for ML experimentation is becoming more mature. Keeping track of hyperparameters, training/validation/test experiment test set manifests, code state, etc is both extremely crucial and extremely necessary.”

Another user commented, “This will also make things easier for people writing algorithms on top of one of the base models.”

To know more about this news, check out PyTorch’s blog post.

Read Next

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

F8 PyTorch announcements: PyTorch 1.1 releases with new AI tools, open sourcing BoTorch and Ax, and more

Horovod: an open-source distributed training framework by Uber for TensorFlow, Keras, PyTorch, and MXNet