Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Fetcher Web Crawler: Technical Overview

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


cmad at lanlab

May 6, 2002, 11:14 AM

Post #1 of 2 (34 views)
Permalink
Fetcher Web Crawler: Technical Overview

Hi,

First, I want to thank you for putting the web crawler into the Lucene
sandbox. I'm looking forward to the future developments.

I have put together a technical overview of the Fetcher web crawler.
For those of you interested in it probably a starting point.

I hope you understand my English. I'm open for any comments concerning style
and grammar ;-)
I could imagine that some of Andrew's ideas could very well be included in a
future version as well. (http://www.trilug.org/~acoliver/luceneplan.html)

I will put this document in CVS as soon as possible.

Clemens


PS By the way, it's a Word document. Any preferences within the Jakarta
project...?
Attachments: The Fetcher Web Crawler.pdf (108 KB)


acoliver at apache

May 6, 2002, 12:38 PM

Post #2 of 2 (32 views)
Permalink
Re: Fetcher Web Crawler: Technical Overview [In reply to]

Thats cool . What I'd like to do is come up with an overall
architecture for crawlers, content handlers, etc... And make them
relatively pluggable.

-Andy

Clemens Marschner wrote:

>Hi,
>
>First, I want to thank you for putting the web crawler into the Lucene
>sandbox. I'm looking forward to the future developments.
>
>I have put together a technical overview of the Fetcher web crawler.
>For those of you interested in it probably a starting point.
>
>I hope you understand my English. I'm open for any comments concerning style
>and grammar ;-)
>I could imagine that some of Andrew's ideas could very well be included in a
>future version as well. (http://www.trilug.org/~acoliver/luceneplan.html)
>
>I will put this document in CVS as soon as possible.
>
>Clemens
>
>
>PS By the way, it's a Word document. Any preferences within the Jakarta
>project...?
>
>
>------------------------------------------------------------------------
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe [at] jakarta>
>For additional commands, e-mail: <mailto:lucene-dev-help [at] jakarta>
>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe [at] jakarta>
For additional commands, e-mail: <mailto:lucene-dev-help [at] jakarta>

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.