Home Web Development Can an Open Web Index break Google’s stranglehold over the search engine...

Can an Open Web Index break Google’s stranglehold over the search engine market?

April 22, 2019 - 1:26 pm

2079

3 min read

Earlier this month, Dirk Lewandowski, Professor of Information Research & Information Retrieval at Hamburg University of Applied Sciences, Germany, published a proposal for building an index of the Web. His proposal aims to separate the infrastructure part of search engine from the services part.

Search engines are our way to the web, which makes them an integral part of the Web’s infrastructure. While there are a significant number of search engines present in the market, there are only a few relevant search engines that have their own index, for example, Google, Bing, Yandex and Baidu. Other search engines that pull results from these search engines, for instance, Yahoo, cannot really be considered search engines in the true sense.

The US search engine market is split between Google and Bing with roughly two thirds to one-third, respectively, In most European countries, Google covers the 90% of the market share. Highlighting the implications of Google’s dominance in the current search engine market, the report reads, “As this situation has been stable over at least the last few years, there have been discussions about how much power Google has over what users get to see from the Web, as well as about anti-competitive business practices, most notably in the context of the European Commission’s competitive investigation into the search giant.”

The proposal aims to bring plurality in the search engine market, not only in terms of the numbers of search engine providers but also in the number of search results users get to see when using search engines.

The idea is to implement the “missing part of the Web’s infrastructure” called searchable index. The author proposes to separate the infrastructure part of the search engine from services part. This will allow multitude of services, whether existing as search engines or otherwise to be run on a shared infrastructure.

The following figure shows how the public infrastructure crawls the web for indexing its content and provides an interface to the services that are built on top of the index. The indexing stage is split into basic indexing and advanced indexing. Basic indexing is responsible for providing the data in a form that services built on top of the index can easily and rapidly process the data. Though services are allowed to do their further indexing to prepare the documents, the open infrastructure also provides some advanced indexing. This provides additional information to the indexed documents, for example, semantic annotations.

This advanced indexing requires an extensive infrastructure for data mining and processing. Services will be able to decide for themselves to what extent they want to rely on the pre-processing infrastructure provided by the Open Web Index. A common design principle can be adopted is allowing services a maximum of flexibility.

Credits: arXiv

Many users are supporting this idea. One Redditor said, “I have been wanting this for years…If you look at the original Yahoo Page when Yahoo first started out it attempted to solve this problem.I believe this index could be regionally or language based.”

Some others do believe that implementing an open web index will come with its own challenges. “One of the challenges of creating a “web index” is first creating indexes of each website. “Crawling” to discover every page of a website, as well as all links to external sites, is labour-intensive and relatively inefficient. Part of that is because there is no 100% reliable way to know, before we begin accessing a website, each and every URL for each and every page of the site. There are inconsistent efforts such “site index” pages or the “sitemap” protocol (introduced by Google), but we cannot rely on all websites to create a comprehensive list of pages and to share it,” adds another Redditor.

To read more in detail, check out the paper titled: The Web is missing an essential part of infrastructure: an Open Web Index.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Can an Open Web Index break Google’s stranglehold over the search engine market?

Read Next

Must Read in Web Dev

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring Forms in Angular – types, benefits and differences

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9...

Interviews

Learn Transformers for Natural Language Processing with Denis Rothman

Clean Coding in Python with Mariano Anaya

Bringing AI to the B2B world: Catching up with Sidetrade CTO Mark Sheldon [Interview]

On Adobe InDesign 2020, graphic designing industry direction and more: Iman Ahmed, an Adobe Certified Partner and Instructor [Interview]

Is DevOps experiencing an identity crisis? [Interview]

MobilePro

datapro

Programming

Subscribe to our newsletter