Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Words Frequency Problem

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


iamaslamok at yahoo

Aug 17, 2006, 11:13 PM

Post #1 of 6 (1278 views)
Permalink
Words Frequency Problem

Dear All,
I am new to Lucene. I am searching for a word "circle" in my indexed document list. It gives me total document found 4 i.e. Hits. But now i want to get how many occurances are there in each document i.e. frequency of words in result document. Plz. give me suggestions.

Thanks...


---------------------------------
Here's a new way to find what you're looking for - Yahoo! Answers
Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW


DORONC at il

Aug 17, 2006, 11:25 PM

Post #2 of 6 (1217 views)
Permalink
Re: Words Frequency Problem [In reply to]

See
http://www.nabble.com/Accessing-%22term-frequency-information%22-for-documents-tf1964461.html#a5390696


- Doron

aslam bari <iamaslamok [at] yahoo> wrote on 17/08/2006 23:13:27:

> Dear All,
> I am new to Lucene. I am searching for a word "circle" in my
> indexed document list. It gives me total document found 4 i.e. Hits.
> But now i want to get how many occurances are there in each document
> i.e. frequency of words in result document. Plz. give me suggestions.
>
> Thanks...
>
>
> ---------------------------------
> Here's a new way to find what you're looking for - Yahoo! Answers
> Send FREE SMS to your friend's mobile from Yahoo! Messenger Version
> 8. Get it NOW


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


iamaslamok at yahoo

Aug 17, 2006, 11:41 PM

Post #3 of 6 (1202 views)
Permalink
Re: Words Frequency Problem [In reply to]

Thanks Doron,

My Code is as below please tell me where to add/modify for TermFreqVector.

IBasicResultSet result = new BasicResultSetImpl (false);
try
{
Searcher searcher = new IndexSearcher(indexPath);
Query query = QueryParser.parse(searchedText, TextContentIndexer.CONTENT_TEXT, analyzer);


Hits hits = searcher.search (query);
int noOfHits = hits.length();


for (int i = 0; i < noOfHits; i++)
{
Document doc = hits.doc(i);

String uri = doc.get(TextContentIndexer.URI_FIELD);
IBasicQuery q = factory.getQuery();
String scope = q.getSearchToken().getSlideContext().getSlidePath(q.getScope().getHref());
if (uri.startsWith(scope)) {
RequestedResource resource = createResource(uri);
result.add (resource);
}
}
}


I have another question, if i am searching for "circle red color", then by default it uses "OR" as i think. If i want to search exact word "red color" , then how can i query for this.

Doron Cohen <DORONC [at] il> wrote:
See
http://www.nabble.com/Accessing-%22term-frequency-information%22-for-documents-tf1964461.html#a5390696


- Doron

aslam bari wrote on 17/08/2006 23:13:27:

> Dear All,
> I am new to Lucene. I am searching for a word "circle" in my
> indexed document list. It gives me total document found 4 i.e. Hits.
> But now i want to get how many occurances are there in each document
> i.e. frequency of words in result document. Plz. give me suggestions.
>
> Thanks...
>
>
> ---------------------------------
> Here's a new way to find what you're looking for - Yahoo! Answers
> Send FREE SMS to your friend's mobile from Yahoo! Messenger Version
> 8. Get it NOW


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene




---------------------------------
Here's a new way to find what you're looking for - Yahoo! Answers
Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW


soeren.pekrul at gmx

Aug 17, 2006, 11:46 PM

Post #4 of 6 (1213 views)
Permalink
Re: Words Frequency Problem [In reply to]

aslam bari wrote:
> I am searching for a word "circle" in my indexed document list. It gives me total document found 4 i.e. Hits. But now i want to get how many occurances are there in each document i.e. frequency of words in result document.

Hello Aslam,

you should store the TermVector in the index as well:

doc.add(new Field("field name", "field value", Field.Store.YES,
Field.Index.TOKENIZED, Field.TermVector.YES));

"A term vector is a list of the document's terms and their number of
occurences in that document."

The IndexReader allows you to access the TermVector of a document:

TermFreqVector IndexReader.getTermFreqVector(int docNumber, String field)

I hope it helps.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


iamaslamok at yahoo

Aug 18, 2006, 12:28 AM

Post #5 of 6 (1205 views)
Permalink
Re: Words Frequency Problem [In reply to]

Hi soren,
Thanks a lot for help.
As you suggest me, i written the code, but i wonder Field does not contain Store, Index YES etc. It only contains Field.Keyword, Field.Text etc. Am i missing something. My Code is as Below.

indexWriter = new IndexWriter(indexpath, analyzer, false);
// Create document
Document doc = new Document();
doc.add(Field.Keyword(URI_FIELD, uri.toString()));
doc.add(Field.Text(CONTENT_TEXT, readContent(revisionDescriptor, revisionContent)));

//It gives error
/* doc.add(new Field("frequency", "field value", Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));*/

if ( revisionContent != null && revisionDescriptor != null ) {
List extractor = ExtractorManager.getInstance().getContentExtractors(uri.getNamespace().getName(), (NodeRevisionDescriptors)null, revisionDescriptor);
for ( int i = 0, l = extractor.size(); i < l; i++ ) {
Reader reader = ((ContentExtractor)extractor.get(i)).extract(new ByteArrayInputStream(revisionContent.getContentBytes()));
doc.add(Field.Text(CONTENT_TEXT, reader));
}
}
indexWriter.addDocument(doc);
indexWriter.optimize();

Sören Pekrul <soeren.pekrul [at] gmx> wrote:
aslam bari wrote:
> I am searching for a word "circle" in my indexed document list. It gives me total document found 4 i.e. Hits. But now i want to get how many occurances are there in each document i.e. frequency of words in result document.

Hello Aslam,

you should store the TermVector in the index as well:

doc.add(new Field("field name", "field value", Field.Store.YES,
Field.Index.TOKENIZED, Field.TermVector.YES));

"A term vector is a list of the document's terms and their number of
occurences in that document."

The IndexReader allows you to access the TermVector of a document:

TermFreqVector IndexReader.getTermFreqVector(int docNumber, String field)

I hope it helps.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene




---------------------------------
Here's a new way to find what you're looking for - Yahoo! Answers
Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW


soeren.pekrul at gmx

Aug 18, 2006, 3:14 AM

Post #6 of 6 (1183 views)
Permalink
Re: Words Frequency Problem [In reply to]

aslam bari wrote:
> As you suggest me, i written the code, but i wonder Field does not contain Store, Index YES etc. It only contains Field.Keyword, Field.Text etc. Am i missing something.

That sounds to me you using an older version. I use lucene-2.0.0. It
should be the latest stabil. Sorry, I don't know how to store the
TermIndex in former versions.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.