FAQ: Improve Your Search Engine Rating |
|
|
|
|
Rik Nilsson AbstractPresented as a FAQ list, this article describes which, when and how search engines scan public web sites using automated indexing software engines and lists some criteria for effectively improving the relevance ranking in the indices resulting from such scans. How Do Search Engines Work?Search engines match keywords you submit to entries in keyed indexes they create from user submission of URLs or by actually scanning the Web for sites. Most of the details of individual search engine technologies are proprietary knowlege. In general, though, the auto-indexing engines (sometimes called "robots" because they operate autonomously) start at your doorway ("home page" or "index page") and by analyzing your anchor tags (links) it makes a list of the pages in your web directory. It digs into those pages and looks for their links, too, building a linked picture of your site structure. It records some of the content, page titles and image names on those pages. The various engines ("spiders" or "crawlers") use different text-searching techniques and differ widely in their complexity. Some of the more sophisticated spiders cross-index your pages with pages on other sites you link to and that link to you. The more sites of related content that link to you, the higher your relevance should be. What Is "Relevancy"?"Relevance", or ranking, is the search engine's calculation of how accurately or importantly your page relates to the keywords that index it, within its apparent category. Most search services perform some relevancy judgment and list the results of a search in order of decreasing relevancy, or ranking. How Can I Check My Ranking?You can check your ranking with Google's toolbar for Internet Explorer, or use "Prog" [http://www.webmasterbrain.com/prog/] which returns Google results with ranking on a scale of zero to ten. Just enter your site's URL and click "Search". What Are the Important Search Engines?The undisputed king of Web-crawlers (search engines that use active robotic indexing) is Google.com. Not only does Google directly have over 35% of the market, but they supply most of the search results for Yahoo!, AOL, Netscape, Earthlink, AT&T and many others. Adding all that participation yields a total of 75% or better market share. About, cNet, MSN, and RoadRunner all use LookSmart and other paid listing engines. MSN will soon implement an automated crawler, no doubt to target a portion of Google's market share, and Yahoo! is apparently soon to acquire Overture. When Will My Site Be Spidered?Google's advanced indexing engines actually operate daily on higher rated and frequently changing sites, but exactly when your site is included in the scan is probably determined mostly by your previous ratings. You may be scanned up to daily by 'sniffers' looking for site changes, then once or twice a month by indexing crawlers. As of this writing, Google indexes the entire Web at least once a month. So, if you suspect that you've been blackballed for antisocial design (hidden text, excess keywording, structural errors, etc.), be sure to recant and update your doorway and other pages. Then go to http://www.google.com/webmasters/ and submit your URL. Google may then list you to be spidered the next time around. Are "Meta" Tags Important?No. So many sites originally appearing on the Web over-used these features, that now at least the top five search engines for the most part ignore them (Google, Yahoo!, MSN, AOL, Overture) except to respect the "robots" meta to stop spidering a site. Keyword metas are only useful for intranet sites, if you are providing search capabilities on an internal network. More than 10 keywords in a row or using all caps on a public site may frighten smarter search engines into giving you a "spammer" rating. Commas are not required. If you feel compelled to use some metas on your public site, the maximum meta content I would recommend is: <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> <meta name="description" content="terse description of your site"> The first meta listed above will more or less ensure that special characters (trademark, copyright, registered, etc.) in your document are readable in browsers using standard Roman alphabetic characters. The "description" meta may appear as a first line in some search engine results. But keep it short, and use some of the same words as are in your page title. Google made it big because they pretty often skip over the meta and other tags and collect keywords from actual content on your site that is visible to the user-agent (browser), thereby providing their search customers more relevant retrievals. Google's algorithms are smart enough to determine if the content is really visible (not "white-on-white" or in "hidden" DIVs or HTML comments, for example). The first "clear text" on a page containing the searcher's keywords will generally appear in the search result paragraph for that page. Every other "View > Page Source" I do, I see the obsolete "MSSmartTagsPreventParsing" meta-inhibitor, which tells you something about the Web designer - Microsoft stopped using that technology years ago. Go to Google.com now and search on the words clean elegant web design. If you see one, click on the first listing for digital.watersgulch.com and compare the search result with what you see in a "View > Page Source" of the web page. Here it is as of October 14, 2003: (80Kb)
(That was October; by December I had disappeared, and had to add keyword "programing" to bring me to the top again.) Is It OK To Use SSI Or Scripted Dynamic Content?Yes, but... One site I did has dynamic PHP scripts that generate page titles and content on the fly depending on what my site visitors are looking for. With their original crawler engine, Google relentlessly explored all the possible combinations (over 6000!) to fully index the site. The same is true of sites using SSI (server-side includes), Java server pages (JSP), and Microsoft's Active Server Pages (ASP) technology. Make sure all the intra-site links are fully qualified URLs, that is, links to your various pages are proper relative or absolute URLs. For example, use "http://www.mysite.net/index.php" not "http://mysite.net/". If your site is scripted, make sure query URLs also generate fully qualified URLs with the actual script page name preceding the query: "http://www.mysite.com/index.php?show=yes&product=MB33102", for example. Google's new (post-June 2003) crawler also checks page title "unique-ness", so it is still possible to use single-page dynamic sites as long as the displayed titles are different on each one. For example, consider this dynamic URL: http://www.mysite.com/index.php?page_title=CD%20PLAYERS&product=MB33102On the "index.php" page we put PHP code in the head that picks up the page title from the URL query data and fills in the <TITLE> tag. Google's spider sees the new title as a unique content container, and your page count has miraculously increased by one, plus added all its links to the matrix. Here's the "But...": It appears that only the higher ranking sites are spidered anywhere close to what the pre-June 2003 crawler used to do. It is easy to see why this new behavior was initiated - it cuts the number of pages spidered by several orders of magnitude. The downside is that many mom-n-pop ecommerce sites have literally dropped to zero ranking. Bottom Line: How Can I Get A Better Rating?Publish relevant content delivered in correct HTML structure. That is, use header and paragraph tags along with CSS styling to build your pages. Use complete, keyworded URLs and unique page <title>'s for every page on your site. Choose headers (<h1>, <h2>, etc) that are meaningful - that is, they reflect the key content in the page or paragraphs you wish to have noticed. If the first header of the page is "Weekly Special:" and not "Electronics Accessories" for example, you may end up the highest-ranked weekly special page instead of appearing in searches for electronics. Give each page on your site a unique title (in the head <title> tag) and link to it using that title as the link text. It takes thought and work. Put good, representative and descriptive visible text early in your entry page. Use words that indicate what the theme of your site is, but don't, on the other hand, overdo it and make the content unreadable. Although it seems well known that Google ignores "meta" tags, several of my sites appear in Google search results with the page title and a description word for word from the title and meta description content... apparently because they share some of the same words. Enough said. This sounds dumb and repetitive, but make absolutely sure your pages are correctly built using standard HTML structure. If any one of these is missing or out of order in a complex page, you may not be spidered at all:
Use CSS according to standards to render the simple structure described above in an attractive way, instead of the old 20th-century workaround of using tables to organize content. It's OK to put a few words about your site in the title. More than that and you risk the "spammer" detector's wrath. Also notice that the title is within the head section. If you have a "document type" tag to enable use of validation services, make sure it is the right type and correctly formed and placed for your page. It should be all on one line, and there should be a blank line between the doctype and the opening <html> tag: (The example below is wrapped by the word processor, but appears all on one line in the actual Web page.)<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" Use a "description" meta tag, and put it right under the title tag. Include a "site map" page in your design, so there is a complete, linked list of all the active static pages you want spidered all in one place. Use keywords in main index page links to other portions of your site. For example, if you are a retailer with a database-driven site, use links with names of your popular items or categories on buttons or tabs programmed to do specific database queries in those categories. I know that Google, in particular, will pick up on these. Google places high importance on the text and URLs in anchor tags and the "src" attributes of tags. Avoid using "bridge" pages - pages that automatically redirect to another page. These are identified primarily by the "refresh" meta. This technique is commonly used to add extra pages to a listing, and some search engine spiders ignore them, as it fills up the index with useless pages. You may get away with server-side scripting redirects using PHP or ".htaccess" methods, since the crawlers can't see your PHP code. Google's crawler is pretty smart, though... If you want to appear in the 25% of searches not covered by Google, then pay your $300 per year to list with one of LookSmart's clients. Doing this can actually provide regenerative effects in Google's ratings, because it indicates a higher popularity and may add legitimacy to your Web presence. List your URL with Dmoz.org (free) and Yahoo! ($300/yr). Visit Google often and search for keywords that make your site show up in their lists; that improves your "link-ability" rating. Get other high-ranking sites of like kind to link to you, if you respect them. Join web-rings of like sites, and post your site to theme-related newsgroups and mailing lists, if your server can take the load of the resulting spam you are likely to attract! © 2003 Rik Nilsson, All rights reserved. Richard H. Nilsson/Sept 30, 2003 | |