Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Help me to understanding how does DataBase work in Search engine.

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


khongkhi02 at mail

Dec 27, 2008, 1:34 PM

Post #1 of 2 (797 views)
Permalink
Help me to understanding how does DataBase work in Search engine.

Hello everyone,

I have a project Search engine. My part is :
1. Creat dataBase for Search engine
2. indexing and searching

The 1st my friend run Crawler to save webPages from internet then I use
these webpages to create database by Oracle ( I want to create DataBase to
optimal search).
The 2nd I use Hibernate to connect DataBase and Java .
Then I indexing and searching by Lucene.

it's only the ways I read and understand from internet (my Enghlish is very
bad I'm afraid I mistook ) .
My friend told me need indexing all webpages then save all files were
indexed in DataBase.
then I don't know which is right.

Who can tell me the right way ? or if have another way, tell me please
.Thanks

--
View this message in context: http://www.nabble.com/Help-me-to-understanding-how-does-DataBase-work-in-Search-engine.-tp21187891p21187891.html
Sent from the Lucene - General mailing list archive at Nabble.com.


hossman_lucene at fucit

Jan 3, 2009, 4:07 PM

Post #2 of 2 (726 views)
Permalink
Re: Help me to understanding how does DataBase work in Search engine. [In reply to]

: I have a project Search engine. My part is :
: 1. Creat dataBase for Search engine
: 2. indexing and searching

: My friend told me need indexing all webpages then save all files were
: indexed in DataBase.
: then I don't know which is right.

databases can build internal "indexes" on tables to make certain
queries faster ... so if you have a database of webpages you can build an
index on something like a "size" field to make searching for pages by size
faster.

some databases have a feature called a "fulltext" index that can be built
on text colums to make searching for words faster them doing simple "LIKE"
queries. This can work in some use cases, but these database "fulltext"
indexes tend to be very limiting and not easy to customize.

based on what you've described, a couple of Lucene subrpojects might be
useful to you...

http://lucene.apache.org/nutch/
Nutch is specificly designed to crawl and index webpages.

http://lucene.apache.org/solr/
Solr is a search "application" that let's you index/query content using
any language over HTTP. It comes with a DataImportHandler plugin that
lets you automaticly index databases using configuration to describe how
to fetch the logical contents of each "document"

http://lucene.apache.org/java/
Lucene-Java is the underlying search library used in both Nutch and Solr,
if you want to custom build search based logic you can use this library.
As you mentioned, there is also a Hibernate project for integrating with
Lucene.

if you have followup questions about any of those 3 subprojects, please
consult the specific user mailing list for the project that you are
interested in.


-Hoss

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.