How Search Engines Index Websites
How Search Engines Index Websites
The term "search engine" is often used interchangeably to describe crawler-based search engines and human-edited web directories. However, these two search systems use very different ways to gather their listings.
Human-edited directories such as Dmoz.org rely on submissions from humans to populate the directory's categories. Crawler-based search engines such as Google use special software called "bots" or "spiders" to "crawl" the Internet, to look for webpages to add to their listings.
Directories contain information about websites, whereas crawler-based search engines gather information from webpages. They don't necessarily grab all the information on each webpage, but they take a significant amount and apply complex algorithms to index the information.
Parts of a Search Engine
All search engines consist of three parts: a database of webpages, a spider operating on that database, and a series of search engine software that decide how search results are displayed.
First, a spider visits a webpage, reads it and then follows the links to other pages within the website. All the data that the spider has gathered is stored in the database, which contains a copy of every webpage that the spider has found. The spider will often return regularly to a website to look for any changes and update the database accordingly.
The search engine software sifts through the millions of webpages stored in the database, identifying the body text, links and other content on the page. It does this so it can find matches to a search and ranks them in order of their relevancy. Page titles, meta-descriptions and other various elements play a part in determining where a website should be ranked.
One advantage of having spiders revisit your site is that you can make changes to your webpage in order to appear higher in their relevancy rankings, then see if your changes worked. This is what search engine optimization is all about.
Although every search engine is made up of these three parts, there are differences and biases in how webpages are evaluated. This explains why the same search phrase will yield different results on different search engines. Google uses over 150 criteria to evaluate a webpage.
Some pages however are excluded from the database either by policy or because the spiders cannot access them, such as Flash pages or webpages with URLs containing special characters like question marks or ampersands (&). These factors prevent a website from being viewed as search-engine friendly.
The Importance of Linking
If a webpage is never linked to from any other webpage, the spiders will never find it. The only ways a brand new webpage can get indexed by the search engines is if it is linked from within the website or the URL is sent to the search engine companies as a request to be included in their index.
It's important to know that the links that count in terms of search rankings, are the ones pointing to your website.
Online Resources about Search Engines
Articles about search engines abound on the Internet. However, one of the more interesting ones is a Search Engines Tutorial (www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html) at the UC Berkeley Online Library.
This tutorial looks at the features of all the major search engines (Google, Yahoo, Ask.com) and has a document summarizing what makes a good search engine.
Danny Sullivan, former editor of SearchEngineWatch.com, wrote a series of detailed articles explaining how search engines work. A little outdated but still relevant, he also explains how to make your website pages more search-engine friendly.
How Search Engines Index Websites - To learn more about this author, visit Pasquale Spadafora's Website.
Like this article? Share it with your friends
What is a Search Engine?
The term "search engine" is often used interchangeably to describe crawler-based search engines and human-edited web directories. However, these two search systems use very different ways to gather their listings.
Human-edited directories such as Dmoz.org rely on submissions from humans to populate the directory's categories. Crawler-based search engines such as Google use special software called "bots" or "spiders" to "crawl" the Internet, to look for webpages to add to their listings.
Directories contain information about websites, whereas crawler-based search engines gather information from webpages. They don't necessarily grab all the information on each webpage, but they take a significant amount and apply complex algorithms to index the information.
Parts of a Search Engine
All search engines consist of three parts: a database of webpages, a spider operating on that database, and a series of search engine software that decide how search results are displayed.
First, a spider visits a webpage, reads it and then follows the links to other pages within the website. All the data that the spider has gathered is stored in the database, which contains a copy of every webpage that the spider has found. The spider will often return regularly to a website to look for any changes and update the database accordingly.
The search engine software sifts through the millions of webpages stored in the database, identifying the body text, links and other content on the page. It does this so it can find matches to a search and ranks them in order of their relevancy. Page titles, meta-descriptions and other various elements play a part in determining where a website should be ranked.
One advantage of having spiders revisit your site is that you can make changes to your webpage in order to appear higher in their relevancy rankings, then see if your changes worked. This is what search engine optimization is all about.
Although every search engine is made up of these three parts, there are differences and biases in how webpages are evaluated. This explains why the same search phrase will yield different results on different search engines. Google uses over 150 criteria to evaluate a webpage.
Some pages however are excluded from the database either by policy or because the spiders cannot access them, such as Flash pages or webpages with URLs containing special characters like question marks or ampersands (&). These factors prevent a website from being viewed as search-engine friendly.
The Importance of Linking
If a webpage is never linked to from any other webpage, the spiders will never find it. The only ways a brand new webpage can get indexed by the search engines is if it is linked from within the website or the URL is sent to the search engine companies as a request to be included in their index.
It's important to know that the links that count in terms of search rankings, are the ones pointing to your website.
Online Resources about Search Engines
Articles about search engines abound on the Internet. However, one of the more interesting ones is a Search Engines Tutorial (www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html) at the UC Berkeley Online Library.
This tutorial looks at the features of all the major search engines (Google, Yahoo, Ask.com) and has a document summarizing what makes a good search engine.
Danny Sullivan, former editor of SearchEngineWatch.com, wrote a series of detailed articles explaining how search engines work. A little outdated but still relevant, he also explains how to make your website pages more search-engine friendly.
How Search Engines Index Websites - To learn more about this author, visit Pasquale Spadafora's Website.
Like this article? Share it with your friends
![]() | |
| |
No article feedback found. |
| |
Leave Your Feedback |
|
| |
| |||
John AlexanderJohn has taught keyword research and SEO skills to small groups of business owners and Webmasters from over 80 different countries world wide since 2002. John is also the Director of Search Engine Academy ; Co-director of Training at Search Engine Workshops offering live, SEO Workshops with his partner SEO educator Robin Nobles, author of the very first comprehensive online search engine marketing courses at SEO Training Online and the SEO Workshop Resource Center. I look forward to hearing from you! - Visit John Alexander's Website |
|||
Stephanie RobeyStephanie Robey is President and CoFounder of Pivot Positive, LLC - an Internet marketing business focused on helping people start work at home ventures. Previously, she was employed at The Search Agency with over 20 years experience in graphic design and 10 years experience in online marketing. She was responsible for launching the Conversion Path Optimization (CPO) unit where she and her team have conducted hundreds of optimization tests for online companies across multiple verticals. She is a successful entrepreneur having started and sold 2 companies and remains on the board of directors of the third, PhotoSpin.com Stephanie began her career in the direct marketing realm creating and producing direct mail for many of the major cable television companies and directly attributes her understanding of Internet marketing to those early offline experiences. Stephanie is a graduate of San Diego State University with a BFA in Graphic Arts and also holds an Executive MBA from the Graziadio School of Business and Management at Pepperdine University. Read Steph's Blog Meet Steph and Dave Sign up for our Free 7-Day BootCamp: Self Employed & Rich - Visit Stephanie Robey's Website |
|||
|
To learn more about the Evan Elite Author Program please contact us. | |||
![]() | |
![]()
| |
![]() | |
|
| |
![]() | |
|
| |
![]() | |||||||
|
![]() | ||
|
| ||
![]() |
| Have you written articles that would be of value to entrepreneurs? Become an expert on our site by publishing them! Expose yourself to a wide audience, drive more traffic to your website and get more sales! Click Here for details. |
|
|
![]() |
| Modeling the Masters: Learn the true secrets behind Walt Disney's business success factors & grow your company! Video produced by Phanta Media |
|
|
![]() |
"Learn straight from Evan how you can Make a Full Time Income (And More) from a Website"
Click Here To Learn More |
|
|
|
|
Get advice & tips from famous business owners, new articles by entrepreneur experts, my latest website updates, & special sneak peaks at what's to come!
|
![]() |
|
|
![]() | ||
|
Top 50 Debt Blogs
Learn To Get Out Of Debt | ||
|
Top 50 Marketing Blogs
Top Blogs To Watch In 2008 | ||
![]() | ||
![]() | ||||
| ||||
| ||||
| ||||
|
|
|
|
|
||||||||||||
|
|
|
|
|







Subscribe to Pasquale's articles











