The way search engines find things is that they send out little automated computer programs, called "bots," that simulate someone browsing the web.
Every time a bot encounters a link, it follows it to see where it leads. In that way it finds new pages. (And that's why posting on a big site like Hubpages ensures new content will be found quickly; Hubpages, to a bot, is a complex web of links). Navigating links to find new pages is called CRAWLING, because another early web term for these bots was "spiders" which "crawled" the web.
When a bot encounters a new page, normally, it sits down and examines the whole page, collecting all the text and data it can read off the page and sending that back to the search engine's data center. That means the page is INDEXED. Each search engine then checks all the pages it's indexed, and performs complex calculations to decide which pages are the most relevant for each search query.
As you can imagine, all this is only possible because computers are now incredibly fast and have a lot of storage.
If you don't want a search engine crawling your website, you can embed a hidden "noindex" code on your site to tell it not to index particular pages, or even the whole site. Since Google upranks/downranks Hubpages as a whole depending on the quality of the content its bots discover on the site, Hubpages has implemented the "Quality Assessment Process" so that newly-published hubs are initially set "noindex," then that tag is removed when human quality assessors have had a chance to check a hub and make sure it's not too spotty in terms of content and writing quality.