Archive for July, 2007

Managing your Robots.txt File

robots.txt imageA robots.txt file is a text file that manages the search engines that listen to its instructions. 

Major search engines such as Google, Yahoo and MSN all listen to the robots.txt file and obey the instruction placed in it.

What kind of instructions can you place in a robots.txt file?  Well for one you can allow or disallow search engines from spidering your site.  

Other uses for a robots.txt file include:

  • disallowing search engine bots from spidering certain sections of your site such as private files

  • setting the pace of how fast a robot can spider your site (this can help reduce bandwidth sage)

  • completely banning certain engines from spidering your site while allowing other to spider all or part of it.

How do you set up a robots.txt file for your site?

Easy, you create a new text file and upload it to the root of your site (where you homepage index file resides).  In it you can place the following:

Placing this in your robots.txt file:

User-agent: *
Disallow: /

instructs all compliant spiders not to index anything in your site.

While placing this is your robots.txt file:

User-agent: *
Disallow:

allows all spiders to index your site.

You can also place this in your robots.txt file:

User-agent: *
Disallow: /tmp
Disallow: /logs

and it will instruct all compliant spiders not to spider the specified folders.

You can also be specific and place this in the robots.txt file:

User-agent: Googlebot
Disallow: /tmp
Disallow: /logs

and Google alone will not spider the specified folders.

Here is a list of the main search engines and their user agents:

AltaVista: Scooter
Infoseek: Infoseek
Hotbot: Slurp
AOL: Slurp
Excite: ArchitextSpider
Google: Googlebot
Goto: Slurp:
Lycos: Lycos
MSN: Slurp
Netscape: Googlebot
NorthernLight: Gulliver
WebCrawler: ArchitextSpider
Iwon: Slurp
Fast: Fast
DirectHit: Grabber
Yahoo Web Pages: Googlebot
Looksmart Web Pages: Slurp

To sum up, using a robots.txt file is yet another important tool a webmaster has at their disposal to manage the activities of search engines.

Comments (1)

Custom Search Business Edition

A while back I blogged about the Google custom search engine and how I experimented with it.  Well, today, Google officially announced that they’re launching Custom Search (Business Edition)

The CSE Business Edition allows organizations to create a search engine and search results that are tailored to their industry including full control over search results presentation and integration with the main website. 

Results can be obtained through XM and there is an option to turn off ads and have further control over branding.

Most importantly, CSBE provides options for email and phone support from Google.

All this for just $100 per year for searching up to 5,000 pages…

Good way to diiversify your revenue Google, I think its a fantastic service, and I’m speaking from experience!

Comments

Google’s Unavailable_After Meta Tag

Google is planning to introduce a new meta tag to deal with time sensitive information on websites.  The new tag -dubbed “unavailable_after” - was announced by  Dan Crow, director of crawl systems at Google at a recent SEMNE conference.

The “unavailable_after” tag will for instance allow webmasters to tell Google when a particular page on their site will no longer be available.  For instance, if there is a time-sensitve special offer on a site that expires on a certain date, a site owner will be able to use the unavailable_after tag to let Google know when to stop indexing that page.

I think this is actually a very good idea and long overdue, well done G!

Comments

APIs for SEO

google maps mashup exampleAPI stands for Application Programming Interface which in a nutshell refers to a user-friendly interface to the backend and databases of websites.  For instance, the amazon API is an interface where webmastes can log into to access the massive amazon database of prodcuts.

Now for the gravey… How can you use APIs for SEO.  Simple… Content!  Fresh Content… and plenty of it!

The trick here though is not to use a single api becasue most likely many other webmasters have beat you to it and therefore it will appear as duplicate content to the search engines.  The key is to use a mashup of various APIs and to create a unique combination that has not been used before. 

Here’s a great link to a comprehensive list of APIs to get you started.  Have fun!

Comments

Google sued by Ozzy Gov

google sued by ozzy govThe Australian Government is suing Google for misleading users by not distinguishing between paid and natural search results.

Specifically, the lawsuit is being raised by the Australian Competition and Consumer Comission against Trading Post Australia Pty Ltd, Google Inc, Google Ireland Limited and Google Australia Pty Ltd.  The ACCC announced on their website that Google misleads and deceives users in relation to sponsored links that appeared on the Google website.

The issue began in 2005 when Google listed two car dealerships from the New South Wales city of Newcastle as sponsored links

The basis of the lawsuit is that the listings linked to the website of a rival to the dealerships (the Trading Post) which competes with them for car sales.

If this lawsuit is successful, it would become a benchmark for future lawsuits by other government and companies not only against Google but also the other major search engines Yahoo and MSN.

Comments

« Previous entries · Next entries »

Add to Technorati Favorites