| By Sumantra Roy
Note: If you get discouraged after
getting to the techy parts of this article, you might
want to check out a handy robots.txt creator utility
that costs only $24.95.
Read our Robogen
Review.
Some people believe that they should create different
pages for different search engines, each page optimized
for one keyword and for one search engine. Now, while
I don't recommend that people create different pages for
different search engines, if you do decide to create such
pages, there is one issue that you need to be aware of.
These pages, although optimized for different search
engines, often turn out to be pretty similar to each other.
The search engines now have the ability to detect when
a site has created such similar looking pages and are
penalizing or even banning such sites.
In order to prevent your site from being penalized for
spamming, you need to prevent the search engine spiders
from indexing pages which are not meant for it, i.e. you
need to prevent AltaVista from indexing pages meant for
Google and vice-versa. The best way to do that is to use
a robots.txt file.
You should create a robots.txt file using a text editor
like Windows Notepad. Don't use your word processor to
create such a file.
Here is the basic syntax of the robots.txt file:
User-Agent: [Spider Name]
Disallow: [File Name]
For instance, to tell AltaVista's spider, Scooter, not
to spider the file named myfile1.html residing in the
root directory of the server, you would write
User-Agent: Scooter
Disallow: /myfile1.html
To tell Google's spider, called Googlebot, not to spider
the files myfile2.html and myfile3.html, you would write
User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html
You can, of course, put multiple User-Agent statements
in the same robots.txt file. Hence, to tell AltaVista
not to spider the file named myfile1.html, and to tell
Google not to spider the files myfile2.html and myfile3.html,
you would write
User-Agent: Scooter
Disallow: /myfile1.html
User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html
If you want to prevent all robots from spidering the
file named myfile4.html, you can use the * wildcard character
in the User-Agent line, i.e. you would write
User-Agent: *
Disallow: /myfile4.html
However, you cannot use the wildcard character in the
Disallow line.
Once you have created the robots.txt file, you should
upload it to the root directory of your domain. Uploading
it to any sub-directory won't work - the robots.txt file
needs to be in the root directory.
Now we come to how the robots.txt file can be used to
prevent your site from being penalized for spamming in
case you are creating different pages for different search
engines. What you need to do is to prevent each search
engine from spidering pages which are not meant for it.
For simplicity, let's assume that you are targeting only
two keywords: "tourism in Australia" and "travel
to Australia". Also, let's assume that you are targeting
only three of the major search engines: AltaVista, HotBot
and Google.
Now, suppose you have followed the following convention
for naming the files: Each page is named by separating
the individual words of the keyword for which the page
is being optimized by hyphens. To this is added the first
two letters of the name of the search engine for which
the page is being optimized.
Hence, the files for AltaVista are
tourism-in-australia-al.html
travel-to-australia-al.html
The files for HotBot are
tourism-in-australia-ho.html
travel-to-australia-ho.html
The files for Google are
tourism-in-australia-go.html
travel-to-australia-go.html
As I noted earlier, AltaVista's spider is called Scooter
and Google's spider is called Googlebot.
Now, we know that HotBot uses Inktomi and from this list,
we find that Inktomi's spider is called Slurp. Using this
knowledge, here's what the robots.txt file should contain:
User-Agent: Scooter
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html
User-Agent: Slurp
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html
User-Agent: Googlebot
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
When you put the above lines in the robots.txt file,
you instruct each search engine not to spider the files
meant for the other search engines.
When you have finished creating the robots.txt file,
double-check to ensure that you have not made any errors
anywhere in it. A small error can have disastrous consequences
- a search engine may spider files which are not meant
for it, in which case it can penalize your site for spamming,
or, it may not spider any files at all, in which case
you won't get top rankings in that search engine.
Article by Sumantra Roy. Sumantra is one of the most
respected search engine positioning specialists on the
Internet. To have Sumantra's company place your site at
the top of the search engines, visit
his site.
Internet
Marketing Home
**
This is a sample of what you'll find in our Affordable,
Members-Only Area **
We offer TONS of great tips, a complete guide to Internet
Marketing, and more!
Membership to our site will allow you
access to numerous reviews on all kinds of
products and services in every category you can imagine.
Join us Today!
If you're not sure yet... be sure to Join
our newsletter for the latest announcements,
additions, and changes to our website. We never accept
"paid ads" and will never sell
your email --as we DETEST Spammers as much -if not more-
than you do! Join Below!
|