Google Search — How A Master’s Thesis Became An Idea Worth $70 Billion

Written by soundarya | Published 2018/11/21
Tech Story Tags: google | google-search | google-origin-story | google-thesis | google-founders-stanford

TLDRvia the TL;DR App

What most of you might know is that the Google Search that you currently know and use began as a Master’s thesis that Larry Page and Sergey Brin worked on back in 1996, that revolutionized the way people looked at search engines. However, what most do not know is that their initial idea was not to rank websites, rather to rank annotations on websites.

“ One idea Page presented to Winograd, a collaboration with Brin, seemed more promising than the others: creating a system where people could make annotations and comments on websites. But the more Page thought about annotation, the messier it got. How would you figure out who gets to comment or whose comment would be the one you’d see first? For that, he says, “We needed a rating system.”

Having a human being determine the ratings was out of the question. First, it was inherently impractical. Further, humans were unreliable. Only algorithms — well drawn, efficiently executed, and based on sound data — could deliver unbiased results. Page realized that such data already existed and no one else was really using it. He asked Brin, “Why don’t we use the links on the web to do that?”

— In The Plex by Steven Levy

Note: All the following quotes, unless otherwise mentioned, are also from ‘In The Plex’.

And that’s how the mispronounced name of Googol was born!

They worked on this idea for a year before giving it a name “BackRub”, a search engine that took advantage of the links that was incoming to a website, in a way that every web page was given a ‘rank’. Brin, acting as the mathematical counterpart, produced the data mining system that scans web pages, and Page produced the algorithm that uses that information to rank the pages (named PageRank — the “Page” coming from his name and not a web “page”, a slight but much deserved vanity). The original search engine wouldn’t have worked had either one of those components been missing.

And now, almost 20 years later, Google still uses PageRank as one of its 200 components across which it evaluates a web page to give it a score. Some of the other ones being keyword usage, domain age, domain history, country TLD extension, TF-IDF and so on.

How Google Went From The Picture On Left To The Picture On Right

Evolution of Google from 1998 to 2018

Below is the timeline of how Google went from a search engine with 25 million pages that lists the top 10 links to a search engine that indexes over 1.8 billion pages with listing over billion results filled with text, image, video and a melange of other content.

Timeline of Google Search — Could find only till 2013

However, the journey was neither easy nor quick. After I began reading ‘In the Plex’ by Steven Levy, I understood just how massive a problem Google tackled — and made lives of over 2 billion people easier. Right now, if you type in ‘Houston Baker’ in Google, you get the links of an American Scholar who goes by that name (go ahead, try it). However, about 18 years back that was not the case. The nascent algorithm did not understand the differences between names and keywords. I’m outlining below just 3 out of over 1000+ problems that Google overcame.

1. Index Faster, Damn It!

How Crawling and Indexing Works

Let’s assume, god forbid, there was a massive earthquake in Chennai, India today. The first instinct is to go on Google and learn about what happened and the casualties. However, instead of displaying the news, what if Google showed the best places to visit? What a sacrilege.

“ But as the web kept growing, Google added more machines — by the end of 1999, there were eighty machines involved in the crawl (out of a total of almost three thousand Google computers at that time) — and the likelihood that something would break increased dramatically. Especially since Google made a point of buying what its engineers referred to as “el cheapo” equipment”.

By 2000, the web was growing like a weed. Billions of new pages were added every year to the corpus. But what good is that if the engine cannot crawl them? Understanding the severity, the engineers built a system that implemented “checkpointing,” a way for the index to hold its place if a calamity befell a server or hard disk. But the new system went further — it used a different way to handle a cluster of disks, more akin to the parallel-processing style of computing than the “sharding” technique Google had used.

Sanjay Ghemwat, with his team, rebuilt the entire file system.

2. Audrey Fino — Why Can’t I See My Name?

We’ve all been there. We love typing our names into the search engines with the curiosity of a kid opening the Christmas gift. But how can an algorithm that is based on back links and keywords differentiate between a name and anything else?

Haven’t we all been there?

“ One unsuccessful search became a legend: Sometime in 2001, Amit Singhal learned of poor results when people typed the name “audrey fino” into the search box. Google kept returning Italian sites praising Audrey Hepburn. (Fino means fine in Italian.) “We realized that this is actually a person’s name,” Singhal says. “But we didn’t have the smarts in the system.”

Even back then, over 8% of all searches were a name. So how would you devise new signals to more skillfully identify names from queries and dig them out of the web corpus? Singhal and his colleagues began where they almost always did: with data. To improve search, Google licensed the White Pages, allowing it to use all the information contained in hundreds of thick newsprint-based tomes where the content consisted of nothing but names (and addresses and phone numbers).

Google’s search engine sucked up the names and analyzed them until it had an understanding of what a name was and how to recognize it in the system.

3. Hot dogs — Boiling Puppies or Sausage Sandwich?

I couldn’t resist!

Does a hot dog dignify a boiling puppy or a sausage sandwich? That’s a stupid question. Even a (non-psychopathic) 5 year old would know the meaning. Google was smart enough to understand that ‘hot’ meant ‘boiling’ and ‘dog’ meant ‘puppy’. But it didn’t do well with context and combinations.

“ Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein’s theories about how words are defined by context.”

This meant, by crawling and indexing billions of pages, Google could understand which two words appeared closer to each other. ‘Hot dog’ appeared often with ‘bread’, ‘sausage’ and ‘baseball’. After millions of trials and clicks, it finally understood what to do with such queries.

To make the engine smarter, Singhal realized he had to learn to master the art of ‘bigram breakage’. A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. He had to teach the system that ‘New York’ is in fact one entity, but that is different from ‘New York Times’ which is an entity on its own.

How Google Search Will Look In Future — My Idea

Google has seen a tremendous shift in the way it indexes websites, ranks them and displays information. Here are my three ideas for the search engine:

1. One Stop Destination

My two-cents is that Google Search has shifted from being a search engine to becoming a one-stop-destination to find answers. Here are some stats on % of people who click on the various links from the search results.

Stats on % of people who visit different pages on Google

It is said that 67% of people click on the top 5 links. And 95% click on the top 10 links. This gives a mere 5% chance that the links on the other millions of pages get clicked. This proves that people want immediate gratification — and they can get that from the main Search Engine Results Page (SERP) on Google.

Knowing this, I would envision a future where people don’t need to click on any link, rather expand each result and perform actions on the main page.

Simple prototype of how it could look like — with an ‘expand’ option present

Simple (and shabby) prototype of what happens after clicking on ‘expand’

2. Contextual Excellence

People have begun searching for complex queries, and they expect Google to provide them the answers. With its resources and talent surrounding Natural Language Processing, I envision a future where Google can answer even queries such as the following.

Currently the results from Google Search

How it could look like in the future

3. Just Think, Don’t Type

Imagine a world where you do not need to type anymore. What a relief. All you have to do is think, and the device in front of you understands that. Sounds too far-fetched? I fear not. Researchers have been working on this technology for more than 5 years now, and there are some very convincing prototypes like this and this to show that they are making progress.

How AlterEgo works (like magic)

I would envision a future where Google commercializes these devices, converting them into a non-intrusive (and not ridiculously looking) one so that people can buy them and control their Google Search, or even their phone, with it.

*******************************************************************

Google might have begun as a search engine that displays links. However, 20 years has seen it grow tremendously, expand its reach into other fields such as autonomous cars, disease prevention, cyber-security, smart home devices and more. And I only see it growing further, and making our lives easier than we could ever imagine.

******************************************************************

If you found this to be useful, do Follow me for more articles. Did you know you can 👏 more than once? Try it out! 💓_I love writing about social issues, products, the technology sector and my graduate school experience in the US. Here is my_ personal blog.

The best way to get in touch with me is via Instagram and Facebook. I share some interesting content there. To know more about my professional life, check out my LinkedIn. Happy reading!


Written by soundarya | A 22 year old Product Manager, Bookworm, and Writer.
Published by HackerNoon on 2018/11/21