News

PyTorch announces the availability of PyTorch Hub for improving machine learning research reproducibility

3 min read

Yesterday, the team at PyTorch announced the availability of PyTorch Hub which is a simple API and workflow that offers the basic building blocks to improve machine learning research reproducibility.

Reproducibility plays an important role in research as it is an essential requirement for a lot of fields related to research including the ones based on machine learning techniques. But most of the machine learning based research publications are either not reproducible or are too difficult to reproduce.

With the increase in the number of research publications, tens of thousands of papers being hosted on arXiv and submissions to conferences, research reproducibility has now become even more important. Though most of the publications are accompanied by code and trained models that are useful but still it is difficult for users to figure out for most of the steps, themselves.

PyTorch Hub consists of a pre-trained model repository that is designed to facilitate research reproducibility and also to enable new research. It provides built-in support for Colab, integration with Papers With Code and also contains a set of models including classification and segmentation, transformers, generative, etc. By adding a simple hubconf.py file, it supports the publication of pre-trained models to a GitHub repository, which provides a list of models that are to be supported and a list of dependencies that are required for running the models.

For example, one can check out the torchvision, huggingface-bert and gan-model-zoo repositories. Considering the case of torchvision hubconf.py: In torchvision repository, each of the model files can function and can be executed independently. These model files don’t require any package except for PyTorch and they don’t need separate entry-points.

A hubconf.py can help users to send a pull request based on the template mentioned on the GitHub page.

The official blog post reads, “Our goal is to curate high-quality, easily-reproducible, maximally-beneficial models for research reproducibility. Hence, we may work with you to refine your pull request and in some cases reject some low-quality models to be published. Once we accept your pull request, your model will soon appear on Pytorch hub webpage for all users to explore.”

PyTorch Hub allows users to explore available models, load a model as well as understand the kind of methods available for any given model. Below mentioned are few of the examples:

Explore available entrypoints:

With the help of torch.hub.list() API, users can now list all available entrypoints in a repo.  PyTorch Hub also allows auxillary entrypoints apart from pretrained models such as bertTokenizer for preprocessing in the BERT models and making the user workflow smoother.

Load a model:

With the help of torch.hub.load() API, users can load a model entrypoint. This API can also provide useful information about instantiating the model.

Most of the users are happy about this news as they think it will be useful for them. A user commented on HackerNews, “I love that the tooling for ML experimentation is becoming more mature. Keeping track of hyperparameters, training/validation/test experiment test set manifests, code state, etc is both extremely crucial and extremely necessary.”

Another user commented, “This will also make things easier for people writing algorithms on top of one of the base models.”

To know more about this news, check out PyTorch’s blog post.

Read Next

Sherin Thomas explains how to build pipeline in PyTorch for deep learning workflows

F8 PyTorch announcements: PyTorch 1.1 releases with new AI tools, open sourcing BoTorch and Ax, and more

Horovod: an open-source distributed training framework by Uber for TensorFlow, Keras, PyTorch, and MXNet

 

.

Amrata Joshi

Share
Published by
Amrata Joshi

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago