LSI: The New High In CyberSearch?
Latent Semantic Indexing – the new buzz word in ‘Search’ has been quietly and insidiously spreading its tentacles through the great big Google www world. Webmasters still can’t decide whether it counts for Page Rank or not. Here’s what could be good about it from the other side – for the millions who see that Google Search Bar as their entry into a world of information.
What exactly is LSI?
It is, pure and simple, yet another information retrieval system. However, it goes beyond SEO or Search Engine Optimisation in that it gives you more than just the millions of pages that contain the word you just typed into the Search Bar. Have you ever been totally frustrated when you’ve wanted information on a particular subject and you had to dig through hundreds of pages to find what you wanted? What LSI is moving towards is to try and make search more relevant. When you type in a keyword, the Google spiders look at all the web pages that contain that particular word and also analyse them for words that are semantically similar. So what you get as a result are pages that will be more relevant to the subject you are interested in rather than just the ones that contain your ‘search’ word. While you may not really notice too much of a difference just yet when you are searching for simple terms, you most definitely will when you looking for something more complex.
For webmasters, however, how this works as far as Page Rank goes is not yet clear. SEO and all the other complex factors that go into getting your web site up there on Page 1 are too important to discard and it might be quite a while before LSI kicks in and becomes the yardstick. Most of the ones I know and work with might look at a middle ground combining both – but they are not about to throw out all their SEO maneuverings just yet. They can’t afford to – not if they are playing the Page Rank game in real earnest.
A magic wand it isn’t
If you think LSI will get you into some kind of realm of artificial intelligence, banish the thought. It is and was designed to be a mathematical formula. However, the way it functions, one can be forgiven for thinking otherwise. As one expert puts it, it takes the whole search operation from a common accountant mentality to a new level of matrix algebra. It’s a powerful algorithm that seeks out similarity values, arranges the results and what you get is page indexing that goes beyond mere searching for a term – you have a stage of analyzing that comes before the search begins.
Adding another dimension
It’s like moving from 2-D to 3-D. It retrieves documents based on similar content – and those similarities are determined by the content on all the relevant pages. So what took you hours sometimes to plough through, with numerous permutations and combinations of search words or phrases will be done behind the scenes of that search bar and presented to you. What it does is to co-relate semantically similar words over thousands or maybe millions of related documents and then come up with a set of content words that are likely to be relevant.
When Google bought over Applied Semantics, it was a foregone conclusion that it would only be a matter of time before their software CIRCA would be put to use in the retrieval of information. This application extracts and organises information and almost mimics human thought. What it has done for cyber search is to go beyond keywords to keyword themes.
How do you have access to this new dimension of searching? It’s easy. Look at the little-used key on your keyboard to the left of your ‘1’. That squiggly symbol on the top is called a ‘tilde’. That’s the magic key to get you there. All you need to do is to put that little symbol in front of your search word, like so: ~song. Do it both ways and see the difference. The first time, without – and what you get are pages that contain the word ‘song’. Then add the tilde before it. See the difference? Now, you might just have pages listed that don’t contain the word ‘song’. It could include documents that have the words ‘music’, ‘lyrics’, ‘MP3’, etc. (Look at the words in Bold and you’ll see the keywords that are being picked up.)
Bringing back the joy of writing
What does this mean for someone who is an online writer? HOPE. At the present moment, probably nothing more. However, the fact that more and more people will search in a more focussed way means that no matter even if your sites are way down as far as Page Rank goes, chances are that if they are doing an LSI search, you will get read. And what is most welcome is the fact that you don’t need to stuff all those keywords into the copy. As long as the relevant words and phrases occur naturally, those invisible spiders will find them and present them to the person who is looking for them. So Content might emerge out of the SEO clutches to remain king, making it easier to search and easier and more satisfying to write. Will the webmasters welcome it? That’s something we’ll have to wait to find out.