Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Preliminary, fundamental question about the demo

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


dvergnaud at yahoo

Sep 8, 2008, 1:16 AM

Post #1 of 3 (322 views)
Permalink
Preliminary, fundamental question about the demo

Hi,

I just started with Lucene today, and the first thing I did was try out the
small demo. I followed the instructions in "Getting started - Building and
Installing the Basic Demo" by the letter -- I downloaded the JAR files
(2.3.2), unpacked and launched the indexer on the src directory -- worked
fine, indexed all java files in the directory and its subdirectories. I
didn't try to search for a swearword, but I did try to search for "vector".
The fact that I got only one result whereas the demo says I should get a
bunch of them isn't really the problem. The problem is that I got only one
result although the word "vector" appears in TWO documents:
src/demo/org/apache/lucene/demo/html/HTMLParser.java
src/demo/org/apache/lucene/demo/SearchFiles.java
(I checked that with grep)

When I enter my query, I get a very clear answer:
Enter query:
vector
Searching for: vector
1 total matching documents
1. src/demo/org/apache/lucene/demo/SearchFiles.java

grep's version:
[silenos:apache/lucene/demo] veda> pwd
/home/veda/lucene/lucene-2.3.2/src/demo/org/apache/lucene/demo
[silenos:apache/lucene/demo] veda> grep -i vector * */*
SearchFiles.java: * are all identical, then single norm vector may be
shared. */
html/HTMLParser.java: private java.util.Vector jj_expentries = new
java.util.Vector();
[silenos:apache/lucene/demo] veda>


So my question is a very easy one: what happened? Is there a special
processing for java files, like for HTML documents, which leaves comments
out? Is that a bug only in the "demo" part of this small program (this would
be surprising, as other queries seem to be working fine)? Is there actually
a way I can check the content of my index -- what files were actually
indexed, or search for a file in particular? A bit like a field search, but
with the URI of the file itself (though I think I read this is
implementation-dependent, that means one could do it programmatically, but
it's not in the demo, right?)?

Anyway, thx for your answers. I hope there is a good one to this question,
cos I'd feel rather deceived if a search engine so obviously ignores some
results...

David
--
View this message in context: http://www.nabble.com/Preliminary%2C-fundamental-question-about-the-demo-tp19367781p19367781.html
Sent from the Lucene - General mailing list archive at Nabble.com.


dvergnaud at yahoo

Sep 8, 2008, 1:21 AM

Post #2 of 3 (305 views)
Permalink
Re: Preliminary, fundamental question about the demo [In reply to]

ok, my mistake. apparently the dot '.' is not considered a separator, so
documents containing "java.util.Vector" will *not* be matched by a search
for "vector". quite surprising if you ask me, but well, this can most
probably be changed...
D
--
View this message in context: http://www.nabble.com/Preliminary%2C-fundamental-question-about-the-demo-tp19367781p19367849.html
Sent from the Lucene - General mailing list archive at Nabble.com.


hossman_lucene at fucit

Sep 10, 2008, 2:11 PM

Post #3 of 3 (282 views)
Permalink
Re: Preliminary, fundamental question about the demo [In reply to]

Hello,

Two things you should know:

1) this is the general[at]lucene list -- it's hte starting point for people
with questions baout the entire Lucene project wheren they really have no
idea where to get started. You seem to be asking about the Lucene-Java
demo code, so i'm assuming you are interested in writing java code that
uses the Lucene search library to build your own applications. In that
case, your best bet for future assistence is the java-user[at]lucene mailing
list. (if i'm wrong, and you are more interested in using applications
already built with the Lucene library such as Solr or Nutch; or iwth using
the .Net port of hte library, these subprojects all have their own
subproject mailing list as well)...

http://lucene.apache.org/mail.html

2) regarding this comment...

: ok, my mistake. apparently the dot '.' is not considered a separator, so
: documents containing "java.util.Vector" will *not* be matched by a search
: for "vector". quite surprising if you ask me, but well, this can most
: probably be changed...

That is a specific behavior of the "Analyzer" used when analyzing the
text, it most certianly can be changed and there is a wide variety of
Analyzers available that come with Lucene (particularly in the analysis
contrib package)

The other oddity to arrise from what you are seeing is that recent
versions of Lucene have reduced the usage of the Vector class quite a bit,
but the tutorial still uses it as an example, i'll commit a quick fix for
that.




-Hoss

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.