Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

XML parsing using Lucene in Java

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


fayyazuddin at gmail

Nov 18, 2007, 7:43 PM

Post #1 of 3 (471 views)
Permalink
XML parsing using Lucene in Java

Dear Fellow Lucene Developers:

I am a java/jsp developer and have started learning lucene for the purpose
of creating a search engine for some books that I have in xml format. The
XML document is actually quite large, and would like to provide as accurate
results as possible to the user searching through these books. My question
is, which xml parser do you recommend using, SAX or Digester? Is there a
difference? Does one parser provide better results than the other? What
about performance issues?

Any help that you can provide is greatly appreciated. I look forward to
hearing from you soon.

Take care.
Sincerely;
Fayyaz

--
View this message in context: http://www.nabble.com/XML-parsing-using-Lucene-in-Java-tf4833124.html#a13827336
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


kchhabra at akamai

Nov 18, 2007, 8:00 PM

Post #2 of 3 (449 views)
Permalink
RE: XML parsing using Lucene in Java [In reply to]

Checkout for "http://www.ibm.com/developerworks/web/library/j-lucene/"
Though this page does not list a comparison between SAX and Digester, it
convinced me enough to use Digester

Regards,
kapilChhabra

-----Original Message-----
From: syedfa [mailto:fayyazuddin [at] gmail]
Sent: Monday, November 19, 2007 9:13 AM
To: java-user [at] lucene
Subject: XML parsing using Lucene in Java


Dear Fellow Lucene Developers:

I am a java/jsp developer and have started learning lucene for the
purpose of creating a search engine for some books that I have in xml
format. The XML document is actually quite large, and would like to
provide as accurate results as possible to the user searching through
these books. My question is, which xml parser do you recommend using,
SAX or Digester? Is there a difference? Does one parser provide better
results than the other? What about performance issues?

Any help that you can provide is greatly appreciated. I look forward to
hearing from you soon.

Take care.
Sincerely;
Fayyaz

--
View this message in context:
http://www.nabble.com/XML-parsing-using-Lucene-in-Java-tf4833124.html#a1
3827336
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


catalinmititelu at yahoo

Nov 19, 2007, 12:31 AM

Post #3 of 3 (449 views)
Permalink
Re: XML parsing using Lucene in Java [In reply to]

Hi Fayyaz,
I recommend to use SAX or, maybe, a custom parser for large xml files .It should be faster than using Digester. The main difference between those xml parsers is that Digester needs to load the entire xml document in memory when it creates those objects, meanwhile you can parse the document and add its content in Lucene index on the fly using SAX. On the other hand with Digester the documents are parsed twice: once to transform the xml to Digester object and second you should use this object to add its content to Lucene index.
Digester is very good for small documents and if you don't want to worry about the xml parsing problems.
A custom parser maybe is the best solution if you want to have best performances. I chose this solution.

Regards,
Catalin

----- Original Message ----
From: syedfa <fayyazuddin [at] gmail>
To: java-user [at] lucene
Sent: Monday, November 19, 2007 5:43:28 AM
Subject: XML parsing using Lucene in Java


Dear Fellow Lucene Developers:

I am a java/jsp developer and have started learning lucene for the
purpose
of creating a search engine for some books that I have in xml format.
The
XML document is actually quite large, and would like to provide as
accurate
results as possible to the user searching through these books. My
question
is, which xml parser do you recommend using, SAX or Digester? Is there
a
difference? Does one parser provide better results than the other?
What
about performance issues?

Any help that you can provide is greatly appreciated. I look forward
to
hearing from you soon.

Take care.
Sincerely;
Fayyaz

--
View this message in context:
http://www.nabble.com/XML-parsing-using-Lucene-in-Java-tf4833124.html#a13827336
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene







____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.