Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Not necessarily related to python Web Crawlers

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


disappearedng at gmail

Jul 5, 2008, 1:31 AM

Post #1 of 3 (145 views)
Permalink
Not necessarily related to python Web Crawlers

Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
--
http://mail.python.org/mailman/listinfo/python-list


circularfunc at yahoo

Jul 5, 2008, 2:07 AM

Post #2 of 3 (136 views)
Permalink
Re: Not necessarily related to python Web Crawlers [In reply to]

just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html



http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/spider_simpleCrawler.asp
--
http://mail.python.org/mailman/listinfo/python-list


tamim.shahriar at gmail

Jul 6, 2008, 3:32 AM

Post #3 of 3 (117 views)
Permalink
Re: Not necessarily related to python Web Crawlers [In reply to]

On Jul 5, 2:31 pm, disappeare...@gmail.com wrote:
> Hi
> Does anyone here have a good recommendation for an open source crawler
> that I could get my hands on? It doesn't have to be python based. I am
> interested in learning how crawling works. I think python based
> crawlers will ensure a high degree of flexibility but at the same time
> I am also torn between looking for open source crawlers in python vs C
> ++ because the latter is much more efficient(or so I heard. I will be
> crawling on very cheap hardware.)
>
> I am definitely open to suggestions.
>
> Thx

You can check my python blog. There are some tips and codes on
crawlers.
http://love-python.blogspot.com/

regards,
Subeen
http://love-python.blogspot.com/
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.