2 min read

Google has launched Dataset Search, a search engine for finding datasets on the internet. This search engine will be a companion of sorts to Google Scholar, the company’s popular search engine for academic studies and reports. Google Dataset Search will allow users to search through datasets across thousands of repositories on the Web whether it be on a publisher’s site, a digital library, or an author’s personal web page.

Google’s Dataset Search scrapes government databases, public sources, digital libraries, and personal websites to track down the datasets. It also supports multiple languages and will add support for even more soon. The initial release of Dataset Search will cover the environmental and social sciences, government data, and datasets from news organizations like ProPublica. It may soon expand to include more sources.

Google has developed certain guidelines for dataset providers to describe their data in a way that Google can better understand the content of their pages. Anybody who publishes data structured using schema.org markup or similar equivalents described by the W3C, will be traversed by this search engine. Google also mentioned that Data Search will improve as long as data publishers are willing to provide good metadata. If publishers use the open standards to describe their data, more users will find the data that they are looking for.

Natasha Noy, a research scientist at Google AI who helped create Dataset Search, says that “the aim is to unify the tens of thousands of different repositories for datasets online. We want to make that data discoverable, but keep it where it is.

Ed Kearns, Chief Data Officer at NOAA, is a strong supporter of this project and helped NOAA make many of their datasets searchable in this tool. “This type of search has long been the dream for many researchers in the open data and science communities” he said.

Try out Google’s new Dataset Search here.

Read Next

25 Datasets for Deep Learning in IoT.

Datasets and deep learning methodologies to extend image-based applications to videos.

Google-Landmarks, a novel dataset for instance-level image recognition.


Subscribe to the weekly Packt Hub newsletter. We'll send you the results of our AI Now Survey, featuring data and insights from across the tech landscape.