- Internet & the Web
Searching for a Better Search Engine
Why do we need search engines at all? Why don't we just search the entire internet ourselves for what we are trying to find? I always thought that the reason we didn't was because it would take us so much more time than it would for a computer to do it for us, and that's why we assigned this routine work to a search engine.
I was already searching legal databases using primitive search engines back in the 1980s when I was in law school. At the time, the search algorithm was not a big, proprietary secret. It was access to the database that was paid for by subscription. The search algorithm was spelled right out for us.
Here is how I understood the explanations:
- If one search term was submitted with no special symbols, all documents containing that term would be listed in the search results, either chronologically ordered or alphabetically ordered, depending on the default of that particular search engine or a default that the user selected.
- If two or more search terms were submitted in a sequence, (without quotes or other symbols) such as Vacuum County or crede vaccam or Nabal Cabeza de Vaca, then the search engine would first return all those documents that had the exact words in that exact sequence with nothing between them, followed by all other documents where the search terms appeared in that order, but possibly with other intervening terms. (They would be ordered so that the ones with the fewest intervening terms would appear first on the list.) After that, all documents that had all terms in whatever order would be returned. After this would be listed all the documents that had at least one of the search terms.
- If two or more search terms were submitted between quotes, like this "Vacuum County" or "crede vaccam" or "Nabal Cabeza de Vaca", then only those documents that contained the exact search terms in exactly that sequence, without intervening items, would be returned.
- If one of the search terms was submitted with a plus sign in front of it, such as +vaccam, then no document would be returned that did not contain it.
- If one of the search terms was submitted with a minus sign in front of it, such as -cleaner, then no document would be returned that did contain it.
- There was no magic formula and no secret algorithm, and the whole purpose of cluing us into these rules was the better to help us find what we were looking for.
Nobody supposed at the time that the documents we were searching for would have a partisan interest in being found first, or that the people who had written the documents would bribe the search engine into tricking us to overlook competing documents. The purpose was really to find the best possible match for the exact search term entered. If you typed in Vacuum you got documents that had the term Vacuum. If you typed in vaccam you got documents that contained the word vaccam. It was inconceivable that somebody could type in Nabal and the search engine would decide all by itself and without consulting us that a document with the word naval was a better match.
Those were the days!
What Google finds when we search for Vacuum County
Vacuum County is my second novel. I finished writing it in 1993, and then I spent a couple of years trying to interest an agent or a publisher in the manuscript. And then I figured, why not just let people read it for free? After all, there's this new fangled thing called the internet, and anybody can find anything just by looking it up on Altavista. All they have to do is put in a couple of search terms, and voila! -- the closest match will be returned. And since I'm the only person who has ever written a novel called Vacuum County about a man named Nabal Cabeza de Vaca and since the phrase crede vaccam is exceedingly rare, anybody who even just by accident juxtaposes those words will get a glimpse of my novel. Sure I'd love to have been able to sell it, but more importantly I want it to be read. So there it sat, and pretty much nobody read it for the past fifteen years.
For all you novelists out there, here's some information that might come in handy. The worst thing that can happen to your novel is not that someone will steal it. The worst possible thing that can happen is that nobody will ever read it.
But that's okay, right? People didn't read it, because they weren't interested. That's fair enough. Every time they looked up "Vacuum County" and found my book, they quickly left the site. Is that what happened? Or somewhere in the past ten years, when Google became the "best search engine in the world", might it not be possible that my search results didn't even come up, because the new algorithm made sure that they wouldn't?
Google lets us make little videos of search results. See what happened when I searched for Vacuum County.
Vacuum County is hard to find if you don't know what it is
Without the quotes, the sequence Vacuum County yielded a top result that didn't even have the words in that order: "North County Vacuum". This was followed with lots of other information about vacuum cleaners in which the word county appeared somewhere in the text. Wouldn't my pages that had the words in the correct order get priority in an old fashioned search?
When the words were placed within quotes, the top search item was "Wholesale Vacuum County buy Vaccum County lots." Now what is that? It doesn't even make sense. So if you click on it at aliex.press.com > Wholesale Product you will find there is no "Vacuum County" there. It's some kind of scam that whatever you happen to be looking for, they will insert the words you used into the search result.
This is followed by my own hub "The Problem with Genre" and a CreateSpace blog of mine that briefly mentioned Vacuum County and then more listings involving vacuum cleaners. Should I be happy that my hub and my blog surfaced at all? Fine, I'm happy, but don't you think that the novel is a better match?
For the search term Nabal Cabeza de Vaca, the top returns were about Alvar Nunez Cabeza de Vaca, and the word Nabal doesn't even appear in the entire document anywhere. The word naval is in bold in the Google listing, leading me to believe that Google overlooked pages that had the word Nabal in them in favor of pages with the word naval. Is that what the best search engine in the world does? Even a decent librarian wouldn't do that.
But now see what happens when we try to look for the phrase crede vaccam. The top two results substitute creed for crede. Meanwhile Google tries to tempt the searcher to go look at more vacuum cleaners by asking "Did you mean crede vacuum?"
If you agree with Google and say "yeah, that's it", I believe in the vacuum, not the cow, here's what you'd see: "Joe Crede is a f****ng vacuum", followed by more information about vacuum cleaners.
What's in a typo?
Why the Google algorithm Allows Popularity to Affect page rank
What was the idea behind the Google algorithm? The idea was to save time on searches by prioritizing based on popularity. This makes a certain amount of sense if you are looking for a sequence that comes up very, very often. If somebody is looking for the phrase vacuum cleaner, then because it is such a common phrase, maybe it would make sense to allow the strength of the linking to the site to play a part in deciding what search item appears first. But even here, you wouldn't reverse the order of the words. You wouldn't put a site that had the sequence cleaner vacuum above one that had vacuum cleaner, no matter how popular the cleaner vacuum site was. You'd go with the sequence the searcher gave you first.
When a search sequence is rare, then there is no competition, and no reason to look at popularity. If on the entire web there are only three documents with the sequence crede vaccam, those documents should rank first in a search for crede vaccam. If they don't, then something is wrong with the search engine. It's as simple as that.
Competing Search Engines: Yahoo
If I look up Vacuum County on http:www.yahoo.com, my top two results today are:
Now I don't know why they chose those particular chapters, or why they went into vacuum cleaner wholesalers immediately after those two entries. This is not an unqualified endorsement of yahoo. I think they've been bribed, too. But they're a lot more decent about it, don't you think?
If I look up crede vaccam on yahoo today, the first thing that comes up is something about vacuum cleaners, but then the second and third entry are two chapters from Vacuum County that contain the sequence crede vaccam.
If I look up Nabal Cabeza de Vaca on Yahoo today, I get two chapters from Vacuum County, followed by these two sites:
Películas gratis de Nabal | Filmografia Nabal | Cartelera ...- Translate con la filmografía de Nabal Presentamos tráilers de cine gratis online para ... Cabeza de Vaca | 1990; Fugitivos Rebeldes | 1954; La mujer milagro | 1931; Calígula | 1979pejino.com/cine/nabal - Cached
Personaje bíblico | cristianismo | Nabal | Laredo Cantabria- Translate 25:14 Pero uno de los criados dio aviso a Abigail mujer de Nabal, diciendo: He ... Cabeza de Vaca | 1990; Fugitivos Rebeldes | 1954; La mujer milagro | 1931; Calígula | 1979pejino.com/pelicula/cristianismo/nabal - Cached
The sites in Spanish actually contain all the words in the sequence Nabal Cabeza de Vaca, though not in that order. Yahoo can help the searcher identify the Biblical character Nabal, on whom my novel is based. The fact that in Spanish Cabeza de Vaca isn't just a name, it's also three independent words, allows searchers to identify the semantic relationship between the name of the famous explorer and the occupation of the biblical character Nabal. So even though I think yahoo violated the ordinary rules of priority in search, I can't feel very upset about it, because they contribute a better understanding of the background of my novel to anyone who might care to know what it is really about.
While we're thinking about those Spanish listings, don't you think it's interesting that all the top Google listings about Alvar Nunez Cabeza de Vaca weren't even in Spanish? How are those the best results, even if I were looking for the famous explorer? Wouldn't his own book Naufragios y Comentarios be a better, more primary source?
Competing Search Engines: DUCKDUCKGO results
At duckduckgo.com, the first result for Vacuum County is chapter twenty of my novel. The rest of the results are vacuum cleaner sites. For crede vaccam, at duckduckgo.com, the top result is chapter eighteen of my novel, followed by documents in latin that contain both words, though not in that order or without intervening words. Nabal Cabeza de Vaca at duckduckgo yields chapter nine of my novel, followed by the site in Spanish about the biblical character, followed by sites that contain long lists of names.
So? DuckDuckGo is less corrupt than Google, but not as generous as Yahoo.
Competing Search Engines: Bing
At Bing, Vacuum County yields chapters twenty-seven and eleven of my novel as the two top results, followed by vacuum cleaner listings. Crede vaccam on Bing gets us a vacuum cleaner listing in the top spot, followed by two of my chapters, followed by more vacuum cleaners. Nabal Cabeza de Vaca at Bing gets us chapters twenty-seven and seven of my novel, followed by the Spanish language biblical listing on Nabal, followed by the list of names, followed by Spanish texts.
I'd say Bing is not as good as Yahoo, but possibly equal to DuckDuckGo in the value of the results, though by no means identical.
Vacuum County is now available on Amazon
Is it paranoid to conclude that Google is corrupt?
In discussing the recent algorithm change, there are many opinions. Some are angry with Google and others think this is just a settling down period. Some even say that the bad results are getting top billing in order to find the "bad guys" and punish them.
Me? I don't think there are any bad guys among the listings.The listings are inanimate. They are just information. Information is neither black hat or white hat. It is what it is. The readers get to decide what they want to read. It should be up to the search engines to arrange the pages according to comprehensible rules. The algorithm should not be a proprietary secret. It should be known to all -- especially the people who are searching, so that they can know what terms to input in order to get the best results for them.
Google claims its algorithm is in order to help the searcher find the best results. But the best results are different depending on who you are. In fact, weighting of different search results based on prioritizing them by popularity should be something that a searcher can select by himself, and each searcher should be able to use his own private algorithm the better to help him find what he is looking for. If Google really cared about us, that is what they would let us do.
When someone asserts that Google wouldn't dare give slanted search results, for fear of losing its market share, I have to laugh. They've been doing it for years. All the major search engines are doing it, to a greater or lesser extent. They do it, because they don't get paid by the searchers. They get paid by advertisers. The algorithm is all about exactly how many vacuum cleaner sales sites will get higher priority in a search for Vacuum County that would never have gotten into the list in the first place under a simple boolean search.
How to get around this? Write your own search engine. You won't get rich doing it, because nobody will pay you. But if the results are slanted, they'll be slanted to your bias and nobody else's!
© 2011 Aya Katz
Related Hubs & Links
- Tutorial: How to Create your own search engine using Php and MySql Database - For your Website
In this tutorial, You will learn "How to build your own search function" for allowing your visitors to search through your site via a Html search form or Button. Here, we are using php language and MySql...
- 5 Alternative Search Engines To Google, Yahoo And Bing
The first search engine, Archie, was created in 1990; albeit it shared very few of the characteristics that we attribute to the typical search engine nowadays. Numerous search engines followed, some becoming...
- How To Start Your Own AdSense Revenue Sharing Site
This is a step-by-step tutorial to creating your own AdSense revenue sharing website with Drupal 6. I will walk you through all the necessary steps from installing the software to the point where people are...
- Google UK Search Engine
www.google.co.uk is currently the most popular search engine in the United Kingdom. Google.co.uk prioritises its search results to pages from the UK