Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Extreme Beginner Needs Lucene Help

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


brenkelly+lucene at gmail

Sep 29, 2011, 12:31 AM

Post #1 of 4 (591 views)
Permalink
Extreme Beginner Needs Lucene Help

I'm trying to write a Lucene program to index a large .txt file.

Really, it should be extremely basic, I just want to learn how to use Lucene
but I'm getting all sorts of strange errors when I try simple lines of code.

Here's an example:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
boolean recreateIndexIfExists = true;
IndexWriter indexWriter = new IndexWriter("/indexDirectory", analyzer,
recreateIndexIfExists);

From what I read in the documentation and the official Lucene Book, there
should be nothing wrong with that. The error I get most often is "the
IndexWriter(String, StandardAnalyzer, boolean) constructor is not defined.


So I'm going to guess the problem lies elsewhere.

Here's a confession: I've never used any external jars when writing java
programs before. Nor have I ever needed to edit my Path or Classpath
variables. So I'm going to run through a list of things I've done to try to
"connect" Lucene, hopefully they'll weed out any suggestions you might have.
But first:

-I am using Eclipse though I tried to run the Indexer.java program
downloaded from manning's site in Textpad and the problem persisted, with
the same flavor of error.
-I am on a Windows OS
-I am using java version 1.7
-Lucene version 3.4.0

Here are the things I've done so far:

1) Added the Lucene core and lucene demo jars to my system classpath. (While
I'm reasonably certain I did this correctly, I'm at the point where I just
feel like I've made a really stupid mistake and don't want to leave anything
out, so here is what my CLASSPATH looks like now):

.;C:\Program Files
(x86)\Java\jre6\lib\ext\QTJava.zip;C:\Users\Nathan\Documents\School\Lucene\lucene-3.4.0\lucene-core-3.4.0.jar;C:\Users\Nathan\Documents\School\Lucene\lucene-3.4.0\contrib\demo\lucene-demo-3.4.0.jar;

(unrelated question: I'm using java 1.7 in my PATH variable. I don't know
why that reference to jre6 is still there but I didn't want to remove it.
The same folder in jre7 doesn't contain QTJava.zip so I figured it couldn't
be harmful. Anyone know what I should do there?)

2) (Somewhat related to the unrelated question:) I added both of the .jars
to the \lib\ext folders of both the jre6 folder referenced in my path and
the jre7 folder that presumably should be. I read somewhere this is
essentially the same as adding to the CLASSPATH and I did so when I was
having trouble figuring out just wtf I was doing (which I still am).

3) I configured the Build Path for my java project to include all relevant
jars (the core and demo jars as well as their javadoc jars)

4) I have import statements out the wazoo. I mention this because it took me
a while to figure out that I needed them because the book doesn't include
them nor does it even mention their existence. Remember, this is my first
time working with external code like this. I'm sure I seem fully retarded
but I guess I thought that referencing the jars gave me everything I needed
out of the box.

5) I found this http://jacobian.web.id/2010/08/09/how-to-use-lucene-part-1/
website and copied and pasted the code to see if it would work and it works
flawlessly. It's simply not useful because it creates the index in memory,
manually adds documents, and prints results. Since my biggest problem right
now is the index directory and referencing the file to be indexed, that
flawless program just doesn't help me.

So I'm really just at a loss. I've found several other websites with some
sample "getting started" programs but they all give me various amounts of
similarly confusing errors.

This
http://www.avajava.com/tutorials/lessons/how-do-i-use-lucene-to-index-and-search-text-files.html?page=1
site was particularly useful as it had pics of what my project should look
like in Eclipse.

Thank you for any help you can offer. I'm sorry if it hurts your head to see
something as stupid as this. I promise my head is hurting right now, too.




Oh and lastly, since I don't seem to have any shame, here's another probably
newbish question:

When Eclipse tells me that something is 'deprecated' (e.g. The Field
Version.LUCENE_CURRENT is deprecated), what does that mean? It's just a
warning and Eclipse suggests that I just Suppress Deprecation Warnings as a
quick fix but I'd rather know what's up. Some of the sample code I tested
had a StandardAnalyzer with nothing being passed and those gave me errors. I
had to add the Version.LUCENE_CURRENT to fix them. Additionally, the java
documentation for Lucene is riddled with deprecation this and deprecation
that. If I had to guess, I would say it's like doing something a quick and
dirty way that's kind of frowned upon.

Thanks again



--
View this message in context: http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp3378560p3378560.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Jason.Sendros at VerizonWireless

Sep 29, 2011, 7:16 AM

Post #2 of 4 (557 views)
Permalink
RE: Extreme Beginner Needs Lucene Help [In reply to]

Well you certainly provided a lot of information for us to work with
here! I suggest picking up Lucene In Action since it's a great
introduction to Lucene and will resolve a lot of your beginner
questions. It's a fairly quick read and a great reference as well.

1) Could you please provide the exact errors, exceptions, and stack
traces occurring here along with any code associated with them.

2) Lucene and Java 7 don't play nice together yet. Try Java 6 for now,
even if this is not what is causing your headaches.
http://lucene.apache.org/#28+July+2011+-+WARNING%3A+Index+corruption+and
+crashes+in+Apache+Lucene+Core+%2F+Apache+Solr+with+Java+7

3) Deprecated (with regards to ANY code you see in the future) means the
code is no longer being supported by the developers and will probably be
removed in the upcoming releases. Typically, this means there are safer,
more efficient, or otherwise improved ways of completing the same task
you're trying to complete. Rarely, it means the task you're trying to
complete is being removed entirely from the code. Even though you CAN
use deprecated code, there is no guarantee that it will work as
expected. Try to use nondeprecated code whenever you can.

4) To use Lucene in your project in Eclipse, right click on your
project, go to Properties -> Java Build Path -> Libraries -> Add
External Jar.

Jason

-----Original Message-----
From: lucenewbie [mailto:brenkelly+lucene [at] gmail]
Sent: Thursday, September 29, 2011 3:32 AM
To: general [at] lucene
Subject: Extreme Beginner Needs Lucene Help

I'm trying to write a Lucene program to index a large .txt file.

Really, it should be extremely basic, I just want to learn how to use
Lucene
but I'm getting all sorts of strange errors when I try simple lines of
code.

Here's an example:

StandardAnalyzer analyzer = new
StandardAnalyzer(Version.LUCENE_CURRENT);
boolean recreateIndexIfExists = true;
IndexWriter indexWriter = new
IndexWriter("/indexDirectory", analyzer,
recreateIndexIfExists);

From what I read in the documentation and the official Lucene Book,
there
should be nothing wrong with that. The error I get most often is "the
IndexWriter(String, StandardAnalyzer, boolean) constructor is not
defined.


So I'm going to guess the problem lies elsewhere.

Here's a confession: I've never used any external jars when writing java
programs before. Nor have I ever needed to edit my Path or Classpath
variables. So I'm going to run through a list of things I've done to try
to
"connect" Lucene, hopefully they'll weed out any suggestions you might
have.
But first:

-I am using Eclipse though I tried to run the Indexer.java program
downloaded from manning's site in Textpad and the problem persisted,
with
the same flavor of error.
-I am on a Windows OS
-I am using java version 1.7
-Lucene version 3.4.0

Here are the things I've done so far:

1) Added the Lucene core and lucene demo jars to my system classpath.
(While
I'm reasonably certain I did this correctly, I'm at the point where I
just
feel like I've made a really stupid mistake and don't want to leave
anything
out, so here is what my CLASSPATH looks like now):

.;C:\Program Files
(x86)\Java\jre6\lib\ext\QTJava.zip;C:\Users\Nathan\Documents\School\Luce
ne\lucene-3.4.0\lucene-core-3.4.0.jar;C:\Users\Nathan\Documents\School\L
ucene\lucene-3.4.0\contrib\demo\lucene-demo-3.4.0.jar;

(unrelated question: I'm using java 1.7 in my PATH variable. I don't
know
why that reference to jre6 is still there but I didn't want to remove
it.
The same folder in jre7 doesn't contain QTJava.zip so I figured it
couldn't
be harmful. Anyone know what I should do there?)

2) (Somewhat related to the unrelated question:) I added both of the
.jars
to the \lib\ext folders of both the jre6 folder referenced in my path
and
the jre7 folder that presumably should be. I read somewhere this is
essentially the same as adding to the CLASSPATH and I did so when I was
having trouble figuring out just wtf I was doing (which I still am).

3) I configured the Build Path for my java project to include all
relevant
jars (the core and demo jars as well as their javadoc jars)

4) I have import statements out the wazoo. I mention this because it
took me
a while to figure out that I needed them because the book doesn't
include
them nor does it even mention their existence. Remember, this is my
first
time working with external code like this. I'm sure I seem fully
retarded
but I guess I thought that referencing the jars gave me everything I
needed
out of the box.

5) I found this
http://jacobian.web.id/2010/08/09/how-to-use-lucene-part-1/
website and copied and pasted the code to see if it would work and it
works
flawlessly. It's simply not useful because it creates the index in
memory,
manually adds documents, and prints results. Since my biggest problem
right
now is the index directory and referencing the file to be indexed, that
flawless program just doesn't help me.

So I'm really just at a loss. I've found several other websites with
some
sample "getting started" programs but they all give me various amounts
of
similarly confusing errors.

This
http://www.avajava.com/tutorials/lessons/how-do-i-use-lucene-to-index-an
d-search-text-files.html?page=1
site was particularly useful as it had pics of what my project should
look
like in Eclipse.

Thank you for any help you can offer. I'm sorry if it hurts your head to
see
something as stupid as this. I promise my head is hurting right now,
too.




Oh and lastly, since I don't seem to have any shame, here's another
probably
newbish question:

When Eclipse tells me that something is 'deprecated' (e.g. The Field
Version.LUCENE_CURRENT is deprecated), what does that mean? It's just a
warning and Eclipse suggests that I just Suppress Deprecation Warnings
as a
quick fix but I'd rather know what's up. Some of the sample code I
tested
had a StandardAnalyzer with nothing being passed and those gave me
errors. I
had to add the Version.LUCENE_CURRENT to fix them. Additionally, the
java
documentation for Lucene is riddled with deprecation this and
deprecation
that. If I had to guess, I would say it's like doing something a quick
and
dirty way that's kind of frowned upon.

Thanks again



--
View this message in context:
http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp
3378560p3378560.html
Sent from the Lucene - General mailing list archive at Nabble.com.


brenkelly+lucene at gmail

Sep 29, 2011, 11:01 AM

Post #3 of 4 (557 views)
Permalink
RE: Extreme Beginner Needs Lucene Help [In reply to]

First, thanks for your reply. A few quick notes:

- I have the book Lucene in Action. Additionally, I have all the source code
for the book. They don't really address my problems.
- I already configured the build path in Eclipse.
- I was pretty sure that Lucene 3.4.0 addressed many of the Java 7 issues.

Thanks for the explanation on deprecation, very informative.

Here is the Indexer.java program given in the source for Lucene in Action. I
edited this to run from a JAVA IDE instead of command line (but that only
meant changing 2 files and removing the argument accepting). I'll mark
errors with *****!!!******errors******!!!*****




import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import java.io.File;
import java.io.IOException;
import java.io.FileReader;
import java.util.Date;

/**
* This code was originally written for
* Erik's Lucene intro java.net article
*/
public class Indexer {

public static void main(String[] args) throws Exception {

File indexDir = new File("indexDirectory");
File dataDir = new File("filestobeindexed");

long start = new Date().getTime();
int numIndexed = index(indexDir, dataDir);
long end = new Date().getTime();

System.out.println("Indexing " + numIndexed + " files took "
+ (end - start) + " milliseconds");
}

public static int index(File indexDir, File dataDir)
throws IOException {

if (!dataDir.exists() || !dataDir.isDirectory()) {
throw new IOException(dataDir
+ " does not exist or is not a directory");
}

IndexWriter writer = new IndexWriter("/index", new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED);

*****!!!******The constructor IndexWriter(String, StandardAnalyzer, boolean,
IndexWriter.MaxFieldLength) is undefined, MaxFieldLength.LIMITED is
deprecated, MaxFieldLength is deprecated, Version cannot be resolved to a
variable.******!!!*****

writer.setUseCompoundFile(false);

indexDirectory(writer, dataDir);

int numIndexed = writer.docCount();

*****!!!******The method docCount() is undefined for the type
IndexWriter******!!!*****

writer.optimize();
writer.close();
return numIndexed;
}

private static void indexDirectory(IndexWriter writer, File dir)
throws IOException {

File[] files = dir.listFiles();

for (int i = 0; i < files.length; i++) {
File f = files[i];
if (f.isDirectory()) {
indexDirectory(writer, f); // recurse
} else if (f.getName().endsWith(".txt")) {
indexFile(writer, f);
}
}
}

private static void indexFile(IndexWriter writer, File f)
throws IOException {

if (f.isHidden() || !f.exists() || !f.canRead()) {
return;
}

System.out.println("Indexing " + f.getCanonicalPath());

Document doc = new Document();
doc.add(Field.Text("contents", new FileReader(f)));

*****!!!******The method Text(String, FileReader) is undefined for the type
Feild******!!!*****

doc.add(Field.Keyword("filename", f.getCanonicalPath()));


*****!!!******The method Text(String, String) is undefined for the type
Feild******!!!*****

writer.addDocument(doc);
}
}

--
View this message in context: http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp3378560p3380039.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Jason.Sendros at VerizonWireless

Sep 29, 2011, 11:18 AM

Post #4 of 4 (561 views)
Permalink
RE: Extreme Beginner Needs Lucene Help [In reply to]

Looks like you're trying to write code using old Lucene syntax with the
newest available jar.

Your options are to learn new Lucene:
http://lucene.apache.org/java/3_4_0/api/all/overview-summary.html
Or to use an older version of Lucene that supports the things you're
trying to do: http://archive.apache.org/dist/lucene/java/2.9.4/

Try sticking with Java 6 for now. You will avoid plenty of headaches!

I might suggest using the older version for now since it seems your
tutorials and learning guides are using these older versions. Once you
learn enough about Lucene, you can migrate your code to a newer version
of Lucene.

Jason


-----Original Message-----
From: lucenewbie [mailto:brenkelly+lucene [at] gmail]
Sent: Thursday, September 29, 2011 2:01 PM
To: general [at] lucene
Subject: RE: Extreme Beginner Needs Lucene Help

First, thanks for your reply. A few quick notes:

- I have the book Lucene in Action. Additionally, I have all the source
code
for the book. They don't really address my problems.
- I already configured the build path in Eclipse.
- I was pretty sure that Lucene 3.4.0 addressed many of the Java 7
issues.

Thanks for the explanation on deprecation, very informative.

Here is the Indexer.java program given in the source for Lucene in
Action. I
edited this to run from a JAVA IDE instead of command line (but that
only
meant changing 2 files and removing the argument accepting). I'll mark
errors with *****!!!******errors******!!!*****




import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

import java.io.File;
import java.io.IOException;
import java.io.FileReader;
import java.util.Date;

/**
* This code was originally written for
* Erik's Lucene intro java.net article
*/
public class Indexer {

public static void main(String[] args) throws Exception {

File indexDir = new File("indexDirectory");
File dataDir = new File("filestobeindexed");

long start = new Date().getTime();
int numIndexed = index(indexDir, dataDir);
long end = new Date().getTime();

System.out.println("Indexing " + numIndexed + " files took "
+ (end - start) + " milliseconds");
}

public static int index(File indexDir, File dataDir)
throws IOException {

if (!dataDir.exists() || !dataDir.isDirectory()) {
throw new IOException(dataDir
+ " does not exist or is not a directory");
}

IndexWriter writer = new IndexWriter("/index", new
StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.LIMITED);

*****!!!******The constructor IndexWriter(String, StandardAnalyzer,
boolean,
IndexWriter.MaxFieldLength) is undefined, MaxFieldLength.LIMITED is
deprecated, MaxFieldLength is deprecated, Version cannot be resolved to
a
variable.******!!!*****

writer.setUseCompoundFile(false);

indexDirectory(writer, dataDir);

int numIndexed = writer.docCount();

*****!!!******The method docCount() is undefined for the type
IndexWriter******!!!*****

writer.optimize();
writer.close();
return numIndexed;
}

private static void indexDirectory(IndexWriter writer, File dir)
throws IOException {

File[] files = dir.listFiles();

for (int i = 0; i < files.length; i++) {
File f = files[i];
if (f.isDirectory()) {
indexDirectory(writer, f); // recurse
} else if (f.getName().endsWith(".txt")) {
indexFile(writer, f);
}
}
}

private static void indexFile(IndexWriter writer, File f)
throws IOException {

if (f.isHidden() || !f.exists() || !f.canRead()) {
return;
}

System.out.println("Indexing " + f.getCanonicalPath());

Document doc = new Document();
doc.add(Field.Text("contents", new FileReader(f)));

*****!!!******The method Text(String, FileReader) is undefined for the
type
Feild******!!!*****

doc.add(Field.Keyword("filename", f.getCanonicalPath()));


*****!!!******The method Text(String, String) is undefined for the type
Feild******!!!*****

writer.addDocument(doc);
}
}

--
View this message in context:
http://lucene.472066.n3.nabble.com/Extreme-Beginner-Needs-Lucene-Help-tp
3378560p3380039.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.