Nutch Search Engine Finally Working!
If you recall a few weeks back I posted about building your own google, well I finally did it, my very own google search engine is finally up and running.
It was not an easy job at all, and very very frustrating at times but very rewarding indeed!
There are still some issues that need to be ironed out, for instance the cache links among a few others give an error message when you click on them, but all in all these are minor issues compared to the hurdels I jumped over to get this engine going.
I managed to spider the cnn.com website (only a few pages as an experiment) and feed the resuts of the crawl into my search engine. Try searching for weather on CNN using my search engine and check out the results.
I will be experimenting further with nutch including deeper and multiple crawls as well as fixing the odd bug or two that currently exist.
I will also start reading up on the nutch technology to better understand it and to get a better feel for its potential.
Hopefully in the next few months I will begin creating new websites that will cater to vertical search and see where that takes me.
I’ll keep you all posted. In the meatime if you have any ideas or questions please feel free to post a comment or two.



























omar said,
September 5, 2007 @ 11:14 am
an update on this…
I had to take the engine down as it was using up too many resources and crashing my server!