The first step in learning SEO is to understand how search engines work. Search engines return results in a process that includes crawling, indexing and relevancy ranking.
My journey in learning SEO hath begun. After conducting extensive online research, I’ve quickly discovered that there are thousands upon thousands of articles, resources and discussions revolving around one thing: better optimizing your website for search engines.
I quickly got lost in a whirlwind of information and ideas, so the decision was made to boil everything down to its essence. I would first learn how search engines work. Makes sense, right? How can you help your online business’ website climb rankings if you don’t know the basics of search engine procedures?
This is what I’ve learned. If you don’t remember anything from this post, make sure you at least retain the following information.
Search engines work by performing these three tasks:
- Crawling – search engines deploy software (spiders) to scan through the content and links of your website
- Indexing – the spiders then transfer the content of your website to a giant database
- Relevancy ranking– when a search is performed, search engines tap this database to determine which website delivers the most relevant information
My first thought was “well, relevancy ranking is obviously the most important step to good SEO,” but it quickly becomes clear that this step can’t take place before the others are completed. Try thinking of search engines as an actual spider. First, the spider crawls around looking for a place to cast a web. Once it finds a location of interest, it begins to spin that web. And when the web is built, it has a big net it can catch food and live in.
Same idea for your website – of course you want to be included in this web of content for others to see, but if your location hasn’t been found (crawled) and included in the net (indexed), you’re not going to be brain feed for anyone (ranked). For the first time, you want to be caught in a spider web and eaten.
Now we’ll start looking more in-depth into each step. Note that any of the words that are underlined are SEO vocabulary words and will be defined at the bottom of the post.
Step One: Crawling
If you don’t like the idea of spiderscrawling around your website, think of a butterfly fluttering through your content. Either way, the main takeaway is that a crawler is a crafty piece of software. Its main job is to run around the web downloading content and following links within that content. In the meantime, this software sends its findings to a database for future reference.
So let’s pretend you have a shiny new website that you’ve just deployed. Typically a spider will come to you by finding another link that leads to your website. But if you’re a new website without any external links, your best bet is to submit a sitemap. This will alert crawlers that something is new and they’ll come to you. If you don’t submit a sitemap, your site will eventually be found, but it will take longer and they may not find all of your pages.
Other good ways to increase the impact of crawling include:
- Have good link structure. Place your most important pages very close to the home page in terms of navigation. In other words, don’t bury your key content deep into your website – spiders may not go that far.
- Add new content (a lot). You can do this through a blog or updating your pages, but the more you update your site with new (and quality) content, the more spiders will come back.
- Start link building. Again, spiders like to extend their nets via other links, so if you have links coming in from other quality sites, you can increase your crawl rate.
- Clean up your URLs. Spiders don’t like URLs with a bunch of weird symbols in them. Make sure you utilize search engine friendly URLs to prevent crawler frustration.
- Check for broken links. Make sure all of your links are working, especially internal links. If a crawler is going through your site and hits a dead end, consider Spidey squashed and unable to crawl any further.
Step Two: Indexing
Think of indexing like you would a big research journal. You know, those paper things we used in school before Wikipedia? Most of these books have a topical index at the very end. So instead of looking through every single page, you can find a topic of interest, and then find what pages are most pertinent to your needs.
The same is true with indexing, which is the process in which search engines store content downloaded by spiders/crawlers. And just like the index of a book, the point of search engine indexing is to make your searches faster. So instead of scanning a billion webpages each time you search, the system only has to run through an already-created database.
Here’s some other interesting factoids about indexing:
- Each time a crawler finds new information on your site, it will update the index with the new findings.
- Simple words like “the, is, of” are ignored. This explains why they’re not too helpful when you’re conducting a search of your own.
- Just because your website has been crawled doesn’t necessarily mean it’s been indexed. The indexing will come, but there may be a delay. The problem with this delay is that until content is indexed, it can’t be accessed by users of search engines.
Finally, one very important step of the indexing process leads into the final step – spiders are also determining the relevancy of content in comparison to various words and phrases used by searchers.
Step Three: Relevancy Ranking
I’m not going to lie – this is where things get a bit more complicated. There are so many factors in how search engines determine page rank that my next post will be fully dedicated to this step. Part of the reason things get a bit cloudy here is because each search engine uses its own algorithm to decide which pages are best connected to a certain search term. So if you’re selling dog supplies and have the best website in the world, if Google, Yahoo or Bing doesn’t deem you dog-supply-worthy, you’re out of luck.
Some of the variables considered in search algorithms include the usage of keywords, the number of links coming to a site and the strength of the site in general. Each of these variables have factors within themselves.
What I can tell you, however, is a basic overview of how the process works. Once a search query is entered, the algorithm goes to work by scanning through the database to come up with a smaller number of pertinent pages (think of the book’s index). Then, it takes this subset and applies another calculation to decide which of these webpages are the most helpful. It then relays its result via a rank and lists these ranked results on a search engine results page (SERP). Amazingly, all of this happens in less than a second.
I hope this explanation of how search engines work clears up a cloud of mystery behind how Google spits back the results it does. I’ve learned that each step of the search engine process is of equal SEO importance, even though the relevancy ranking is the most complicated. I’m also beginning to learn that the practice of SEO is both an art and a science, mostly because major players in the search engine industry guard their algorithms better than Fort Knox.
But then again, Google probably has enough money to buy Fort Knox.
Join me next time for a deeper look into page ranking. And check out the glossary below!
-Matt Winn, Marketing Associate
.SEO Glossary – Lesson One: How Search Engines Work
- Spiders: Also known as crawlers or robots, a search engine spider is a piece of software that scans and downloads content across the Internet. Part of the crawling process.
- External link: A link on a website that links to a webpage on a completely different website.
- Sitemap: A list of pages on a website to help users and crawlers navigate through it.
- Link structure: How links are organized within a website – has to do with site navigation.
- Link building: The practice of obtaining more external links pointing towards a certain website.
- Crawl rate: The frequency in which spiders/crawlers/robots crawl a website.
- Search engine friendly URLs: URLs that can easily be crawled by search engine spiders/robots. These URLs have minimal symbols or extraneous characters.
- Internal link: A link on a website that links to another page within the same website.
- Page rank: The name for the degree of relevancy a search engine assigns to a particular website based on a search query.
- Algorithm: A set of rules, standards and mathematical equations search engines use to determine relevancy of web content.
- Search query: A word or phrase entered by a search engine user to find specific information. For example “how search engines work” is a search query.
- Search engine results page (SERP): The ranked list of webpages returned to a user after submitting a search query to a search engine.
Learn SEO One Step at a Time Series:
Step One: An Important Introduction
Step Two: How Search Engines Work
Step Three: How Search Engines Rank Pages
Step Four: An Introduction to Keywords
Step Five: Keyword Research
Step Six: The Long Tail of SEO
Step Seven: Building a SEO Friendly Site
Step Eight: Link Building Basics
Step Nine: Basic SEO Measurement/Conclusion