Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Lucene.NET Integration

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


cbkprice at gmail

May 26, 2009, 1:50 PM

Post #1 of 11 (1960 views)
Permalink
Lucene.NET Integration

Good afternoon,

We are trying to integrate Lucene into our web search functionality.

Indexing:

Dim LuceneIndex = New Lucene.Net.Index.IndexWriter("C:\SearchIndex", New
Lucene.Net.Analysis.Standard.StandardAnalyzer, True)

'For each webpage-
Dim LuceneDoc As New Lucene.Net.Documents.Document

'v_WebPageContent contains keywords, description, etc.
LuceneDoc .Add(Lucene.Net.Documents.Field.Text("content", v_WebPageContent))
LuceneDoc .Add(Lucene.Net.Documents.Field.Text("url", v_WebPageURL))
LuceneDoc .Add(Lucene.Net.Documents.Field.Text("title", v_WebPageTitle))

LuceneIndex.AddDocument(LuceneDocument)

Searching:

Dim searcher As New Lucene.Net.Search.IndexSearcher("C:\SearchIndex")

Dim query As Lucene.Net.Search.Query
query = Lucene.Net.QueryParsers.QueryParser.Parse(mySearchQuery, "content",
New Lucene.Net.Analysis.Standard.StandardAnalyzer)

Dim hits As Lucene.Net.Search.Hits
hits = searcher.Search(query)

'Loop through hits, and display as web page.

Results:

I've verified that my Lucene documents are being created, and that the
"content" field is being populated. I believe the issue lies in the
Searching.

My results are erratic:

1. A keyword on one page will hit, while the same keyword on another page
will not.
2. "abcd efgh" will return 100 hits, but "efgh" will return 50 hits. The
expected result is that "efgh" will return at least 100 hits.

--

Thank you in advance.
--
View this message in context: http://www.nabble.com/Lucene.NET-Integration-tp23731090p23731090.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ted.dunning at gmail

May 26, 2009, 1:54 PM

Post #2 of 11 (1891 views)
Permalink
Re: Lucene.NET Integration [In reply to]

I think you might want to adopt the google convention of enclosing queries
that you are talking about in [ ] rather than "".

Are you saying that the query [abcd efgh] returns 100 and [efgh] return 50?
If so, this is expected behavior since the default operator is OR.

Or are you saying that the query ["abcd efgh"] (i.e. a phrase query) returns
more hits than ["efgh"] (which is equivalent to [efgh]).

On Tue, May 26, 2009 at 1:50 PM, KingKory <cbkprice [at] gmail> wrote:

> 2. "abcd efgh" will return 100 hits, but "efgh" will return 50 hits. The
> expected result is that "efgh" will return at least 100 hits.
>



--
Ted Dunning, CTO
DeepDyve


cbkprice at gmail

May 26, 2009, 2:00 PM

Post #3 of 11 (1887 views)
Permalink
Re: Lucene.NET Integration [In reply to]

Ted Dunning wrote:
>
> Are you saying that the query [abcd efgh] returns 100 and [efgh] return
> 50?
> If so, this is expected behavior since the default operator is OR.
>

Hello Ted, and thank you for your reply. I've quoted what I was trying to
convey.

Are you saying that [abcd efgh] is read by Lucene as [abcd] OR [efgh]? Space
characters are read as an OR operator?

If so, how do I construct my query to submit the entire string [abcd efgh]
as a single term?

Thanks.
--
View this message in context: http://www.nabble.com/Lucene.NET-Integration-tp23731090p23731284.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ted.dunning at gmail

May 26, 2009, 2:02 PM

Post #4 of 11 (1902 views)
Permalink
Re: Lucene.NET Integration [In reply to]

Yes.

If you want a term with a space in the middle, you need to have a special
purpose analyzer and likely won't be able to use the normal query parser.

On Tue, May 26, 2009 at 2:00 PM, KingKory <cbkprice [at] gmail> wrote:

> Are you saying that [abcd efgh] is read by Lucene as [abcd] OR [efgh]?
> Space
> characters are read as an OR operator?
>


cbkprice at gmail

May 26, 2009, 3:05 PM

Post #5 of 11 (1907 views)
Permalink
Re: Lucene.NET Integration [In reply to]

Ted Dunning wrote:
>
> Yes.
>
> If you want a term with a space in the middle, you need to have a special
> purpose analyzer and likely won't be able to use the normal query parser.
>
> On Tue, May 26, 2009 at 2:00 PM, KingKory <cbkprice [at] gmail> wrote:
>
>> Are you saying that [abcd efgh] is read by Lucene as [abcd] OR [efgh]?
>> Space
>> characters are read as an OR operator?
>>
>
>
Ted, thanks again.

If I used something like this to instantiate my parser:

Dim searcher As New Lucene.Net.Search.IndexSearcher("C:\SearchIndex")

'Dim query As Lucene.Net.Search.Query
'query = Lucene.Net.QueryParsers.QueryParser.Parse(mySearchQuery, "content",
New Lucene.Net.Analysis.Standard.StandardAnalyzer)
Dim query As New Lucene.Net.Search.FuzzyQuery(New
Lucene.Net.Index.Term("content", "~" & mySearchQuery & "~"), 0.35, 0)

Dim hits As Lucene.Net.Search.Hits
hits = searcher.Search(query)

'Loop through hits, and display as web page.

Would it be more successful?

Browsing through a few Lucene tutorials, it says one major pitfall is that
the Analyzers used to Index and Search are not the same. Using the
"FuzzyQuery" method, we are not able to specify an Analyzer.

However, the IndexWriter constructor has no variants where an Analyzer
parameter is not required, so I believe I would be falling into this
pitfall.

Will you please elaborate on the "Special Purpose Analyzer"? Are these
included with the standard Lucene API?

Thanks again,
--
View this message in context: http://www.nabble.com/Lucene.NET-Integration-tp23731090p23732277.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ted.dunning at gmail

May 26, 2009, 4:25 PM

Post #6 of 11 (1890 views)
Permalink
Re: Lucene.NET Integration [In reply to]

I don't think so.

First, fuzzy queries don't work that way. The normal query parser will
accept ~ as a suffix operator to indicate that a term is fuzzy.

Secondly, you *really* should be using the same analyzer for your query
parsing as for your indexing.

Thirdly, I don't have a clue what you are doing with the fuzzy query. Part
of that is just the visual basic syntax, but part of it is the code itself.
You should instantiate a query parser and then use it to parse your query.
You should not have to instantiate the Fuzzy query directly. Also, it seems
that you have declared your query as a query, but then you are not
instantiating a query parser. Generally, you need the parser to form the
query.

It is not unusual for this to require some fancy footwork since few real
applications exactly match what the query parser does. The footwork often
consists of rewriting the query as parsed into something different. For
instance, you might change default field terms into references to both title
and to body text or you might have versions of the body text that are both
stemmed and not stemmed and want to query both. Another area where
fanciness can be required is for cases where you have different analyzers
for different fields.

On Tue, May 26, 2009 at 3:05 PM, KingKory <cbkprice [at] gmail> wrote:

> If I used something like this to instantiate my parser:
>
> Dim searcher As New Lucene.Net.Search.IndexSearcher("C:\SearchIndex")
>
> 'Dim query As Lucene.Net.Search.Query
> 'query = Lucene.Net.QueryParsers.QueryParser.Parse(mySearchQuery,
> "content",
> New Lucene.Net.Analysis.Standard.StandardAnalyzer)
> Dim query As New Lucene.Net.Search.FuzzyQuery(New
> Lucene.Net.Index.Term("content", "~" & mySearchQuery & "~"), 0.35, 0)
>
> Dim hits As Lucene.Net.Search.Hits
> hits = searcher.Search(query)
>
> 'Loop through hits, and display as web page.
>
> Would it be more successful?
>



--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)


cbkprice at gmail

May 26, 2009, 4:45 PM

Post #7 of 11 (1895 views)
Permalink
Re: Lucene.NET Integration [In reply to]

Ted Dunning wrote:
>
> I don't think so.
>
> First, fuzzy queries don't work that way. The normal query parser will
> accept ~ as a suffix operator to indicate that a term is fuzzy.
>
> Secondly, you *really* should be using the same analyzer for your query
> parsing as for your indexing.
>
> Thirdly, I don't have a clue what you are doing with the fuzzy query.
> Part
> of that is just the visual basic syntax, but part of it is the code
> itself.
> You should instantiate a query parser and then use it to parse your query.
> You should not have to instantiate the Fuzzy query directly. Also, it
> seems
> that you have declared your query as a query, but then you are not
> instantiating a query parser. Generally, you need the parser to form the
> query.
>
> It is not unusual for this to require some fancy footwork since few real
> applications exactly match what the query parser does. The footwork often
> consists of rewriting the query as parsed into something different. For
> instance, you might change default field terms into references to both
> title
> and to body text or you might have versions of the body text that are both
> stemmed and not stemmed and want to query both. Another area where
> fanciness can be required is for cases where you have different analyzers
> for different fields.
>
> On Tue, May 26, 2009 at 3:05 PM, KingKory <cbkprice [at] gmail> wrote:
>
>> If I used something like this to instantiate my parser:
>>
>> Dim searcher As New Lucene.Net.Search.IndexSearcher("C:\SearchIndex")
>>
>> 'Dim query As Lucene.Net.Search.Query
>> 'query = Lucene.Net.QueryParsers.QueryParser.Parse(mySearchQuery,
>> "content",
>> New Lucene.Net.Analysis.Standard.StandardAnalyzer)
>> Dim query As New Lucene.Net.Search.FuzzyQuery(New
>> Lucene.Net.Index.Term("content", "~" & mySearchQuery & "~"), 0.35, 0)
>>
>> Dim hits As Lucene.Net.Search.Hits
>> hits = searcher.Search(query)
>>
>> 'Loop through hits, and display as web page.
>>
>> Would it be more successful?
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
> 111 West Evelyn Ave. Ste. 202
> Sunnyvale, CA 94086
> http://www.deepdyve.com
> 858-414-0013 (m)
> 408-773-0220 (fax)
>
>
Thanks Ted.

I guess the question becomes: Is there any resource out there that describes
the syntax used for the the Lucene StandardAnalyzer?

I have no problem footworkin' these search terms into whatever form
necessary.

Thanks for your help.
--
View this message in context: http://www.nabble.com/Lucene.NET-Integration-tp23731090p23733463.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ted.dunning at gmail

May 26, 2009, 5:04 PM

Post #8 of 11 (1896 views)
Permalink
Re: Lucene.NET Integration [In reply to]

http://lucene.apache.org/java/2_3_2/queryparsersyntax.html

You may be able to succeed by escaping the space with a backslash.

On Tue, May 26, 2009 at 4:45 PM, KingKory <cbkprice [at] gmail> wrote:

>
>
>
> Ted Dunning wrote:
> >
> > I don't think so.
> >
> > First, fuzzy queries don't work that way. The normal query parser will
> > accept ~ as a suffix operator to indicate that a term is fuzzy.
> >
> > Secondly, you *really* should be using the same analyzer for your query
> > parsing as for your indexing.
> >
> > Thirdly, I don't have a clue what you are doing with the fuzzy query.
> > Part
> > of that is just the visual basic syntax, but part of it is the code
> > itself.
> > You should instantiate a query parser and then use it to parse your
> query.
> > You should not have to instantiate the Fuzzy query directly. Also, it
> > seems
> > that you have declared your query as a query, but then you are not
> > instantiating a query parser. Generally, you need the parser to form the
> > query.
> >
> > It is not unusual for this to require some fancy footwork since few real
> > applications exactly match what the query parser does. The footwork
> often
> > consists of rewriting the query as parsed into something different. For
> > instance, you might change default field terms into references to both
> > title
> > and to body text or you might have versions of the body text that are
> both
> > stemmed and not stemmed and want to query both. Another area where
> > fanciness can be required is for cases where you have different analyzers
> > for different fields.
> >
> > On Tue, May 26, 2009 at 3:05 PM, KingKory <cbkprice [at] gmail> wrote:
> >
> >> If I used something like this to instantiate my parser:
> >>
> >> Dim searcher As New Lucene.Net.Search.IndexSearcher("C:\SearchIndex")
> >>
> >> 'Dim query As Lucene.Net.Search.Query
> >> 'query = Lucene.Net.QueryParsers.QueryParser.Parse(mySearchQuery,
> >> "content",
> >> New Lucene.Net.Analysis.Standard.StandardAnalyzer)
> >> Dim query As New Lucene.Net.Search.FuzzyQuery(New
> >> Lucene.Net.Index.Term("content", "~" & mySearchQuery & "~"), 0.35, 0)
> >>
> >> Dim hits As Lucene.Net.Search.Hits
> >> hits = searcher.Search(query)
> >>
> >> 'Loop through hits, and display as web page.
> >>
> >> Would it be more successful?
> >>
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
> > 111 West Evelyn Ave. Ste. 202
> > Sunnyvale, CA 94086
> > http://www.deepdyve.com
> > 858-414-0013 (m)
> > 408-773-0220 (fax)
> >
> >
> Thanks Ted.
>
> I guess the question becomes: Is there any resource out there that
> describes
> the syntax used for the the Lucene StandardAnalyzer?
>
> I have no problem footworkin' these search terms into whatever form
> necessary.
>
> Thanks for your help.
> --
> View this message in context:
> http://www.nabble.com/Lucene.NET-Integration-tp23731090p23733463.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>


--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)


cbkprice at gmail

May 27, 2009, 8:52 AM

Post #9 of 11 (1868 views)
Permalink
Re: Lucene.NET Integration [In reply to]

Ted Dunning wrote:
>
> http://lucene.apache.org/java/2_3_2/queryparsersyntax.html
>
> You may be able to succeed by escaping the space with a backslash.
>
Thanks Ted.

So to summarize: If I created my parser like this-

Dim searcher As New Lucene.Net.Search.IndexSearcher("C:\SearchIndex")

Dim query As Lucene.Net.Search.Query
query = Lucene.Net.QueryParsers.QueryParser.Parse(mySearchQuery, "content",
New Lucene.Net.Analysis.Standard.StandardAnalyzer)

Dim hits As Lucene.Net.Search.Hits
hits = searcher.Search(query)

Using the same Analyzer I used to create the index, I should be able to
format mySearchQuery in such a way to perform valid searches on the index?

I went through the link you provided, will you give me an example of how I
would need to format [abcd efgh] to search the index (using the integration
above) for abcd efgh instead of [abcd] OR [efgh]?

Thanks again.
--
View this message in context: http://www.nabble.com/Lucene.NET-Integration-tp23731090p23745328.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ted.dunning at gmail

May 27, 2009, 8:57 AM

Post #10 of 11 (1882 views)
Permalink
Re: Lucene.NET Integration [In reply to]

I really can't help you with the details of .net, nor with going over
example code. Get the Lucene book, check out all the tutorials on the web.

Consider dumping .net if you have trouble translating all of the available
java examples.

Also, the [ ] notation is outside the query syntax. It is used to indicate
the region in which you should interpret the text as a query.

On Wed, May 27, 2009 at 8:52 AM, KingKory <cbkprice [at] gmail> wrote:

>
> ...
> Using the same Analyzer I used to create the index, I should be able to
> format mySearchQuery in such a way to perform valid searches on the index?
>
> I went through the link you provided, will you give me an example of how I
> would need to format [abcd efgh] to search the index (using the integration
> above) for abcd efgh instead of [abcd] OR [efgh]?
>
>
>


cbkprice at gmail

May 27, 2009, 9:07 AM

Post #11 of 11 (1863 views)
Permalink
Re: Lucene.NET Integration [In reply to]

Ted Dunning wrote:
>
> I really can't help you with the details of .net, nor with going over
> example code. Get the Lucene book, check out all the tutorials on the
> web.
>
> Consider dumping .net if you have trouble translating all of the available
> java examples.
>
> Also, the [ ] notation is outside the query syntax. It is used to
> indicate
> the region in which you should interpret the text as a query.
>
Great, thanks for all your help Ted.

I believe I've got it up and going.

After putting some Chr(34)'s before and after the mySearchQuery parameter,
it seems to have the desired effect.

Thanks again, this question can be considered closed.

--
View this message in context: http://www.nabble.com/Lucene.NET-Integration-tp23731090p23745638.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.