Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

questions about DocValues in 4.0 alpha

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


fancyerii at gmail

Aug 6, 2012, 2:34 AM

Post #1 of 3 (284 views)
Permalink
questions about DocValues in 4.0 alpha

hi everyone,
in lucene 4.0 alpha, I found the DocValues are available and gave
it a try. I am following the slides in
http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
I have got 2 questions.
1. is DocValues updatable now?

2. How can I get docBase of an AtomicReader?
in Collector, it's easy to get docBase. But I need to get
docValues after scoring. I find
AtomicReader.getTopReaderContext().docBaseInParent
and subReader.getTopReaderContext().docBase. But neither of them is correct.
So I have to iterate through all subReaders and use maxDoc()
to find suitable subReader for a docID. any better method to find
corresponding AtomicReader of a docID?
File d=new File("./testIndex");
IndexWriterConfig cfg=new IndexWriterConfig(Version.LUCENE_40, new
WhitespaceAnalyzer(Version.LUCENE_40));
cfg.setOpenMode(OpenMode.CREATE);
Directory dir=FSDirectory.open(d);
IndexWriter writer=new IndexWriter(dir,cfg);
FieldType titleFieldType=new FieldType();
titleFieldType.setStored(true);
titleFieldType.setIndexed(true);
titleFieldType.setTokenized(true);
titleFieldType.setOmitNorms(true);

Document doc=new Document();
Field f=new Field("title","a b c",titleFieldType);
doc.add(f);

FloatDocValuesField dvf=new FloatDocValuesField("pagerank", 0.8f);
doc.add(dvf);

writer.addDocument(doc);

doc=new Document();
doc.add(new Field("title","b d",titleFieldType));
dvf=new FloatDocValuesField("pagerank", 0.5f);
doc.add(dvf);
writer.addDocument(doc);

writer.commit();

doc=new Document();
doc.add(new Field("title","a c",titleFieldType));
dvf=new FloatDocValuesField("pagerank", 0.5f);
doc.add(dvf);
writer.addDocument(doc);


DirectoryReader reader=DirectoryReader.open(writer, true);
IndexSearcher searcher=new IndexSearcher(reader);
Query q=new TermQuery(new Term("title","a"));
TopDocs topDocs=searcher.search(q, 10);
Set<String> fieldsNeedLoaded=new HashSet<String>(1);
fieldsNeedLoaded.add("title");
@SuppressWarnings("unchecked")
List<AtomicReader> subReaders=(List<AtomicReader>)
reader.getSequentialSubReaders();
Source[] sources=new Source[subReaders.size()];
int idx=0;
for(AtomicReader subReader:subReaders){
sources[idx++]=subReader.docValues("pagerank").getSource();
}

for(int i=0;i<topDocs.totalHits;i++){
int docId=topDocs.scoreDocs[i].doc;
float score=topDocs.scoreDocs[i].score;
//get title
Document document=searcher.document(docId, fieldsNeedLoaded);
System.out.println("title: " +document.get("title")+" score: "+score);
idx=-1;
int docBase=0;
for(AtomicReader subReader:subReaders){
idx++;
//int docBase=subReader.getTopReaderContext().docBaseInParent;

int realDoc=docId-docBase;
if(realDoc>=0&&realDoc<subReader.maxDoc()){
double pagerank=sources[idx].getFloat(realDoc);
System.out.println(pagerank);
break;
}
docBase+=subReader.maxDoc();
}
}
}

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


uwe at thetaphi

Aug 6, 2012, 2:47 AM

Post #2 of 3 (271 views)
Permalink
Re: questions about DocValues in 4.0 alpha [In reply to]

You have to call getTopReaderContext on the directory reader and can loop easily over the leaves using leaves(). All docbases are then relative to the directory reader. If you get the top reader context from the atomic reader itsself its only relative to itsself, which does not help.

getSequentialSubReaders might get protected before release anyway.

Uwe



Li Li <fancyerii [at] gmail> schrieb:

>hi everyone,
> in lucene 4.0 alpha, I found the DocValues are available and gave
>it a try. I am following the slides in
>http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
> I have got 2 questions.
> 1. is DocValues updatable now?
>
> 2. How can I get docBase of an AtomicReader?
> in Collector, it's easy to get docBase. But I need to get
>docValues after scoring. I find
>AtomicReader.getTopReaderContext().docBaseInParent
>and subReader.getTopReaderContext().docBase. But neither of them is
>correct.
> So I have to iterate through all subReaders and use maxDoc()
>to find suitable subReader for a docID. any better method to find
>corresponding AtomicReader of a docID?
> File d=new File("./testIndex");
> IndexWriterConfig cfg=new IndexWriterConfig(Version.LUCENE_40, new
>WhitespaceAnalyzer(Version.LUCENE_40));
> cfg.setOpenMode(OpenMode.CREATE);
> Directory dir=FSDirectory.open(d);
> IndexWriter writer=new IndexWriter(dir,cfg);
> FieldType titleFieldType=new FieldType();
> titleFieldType.setStored(true);
> titleFieldType.setIndexed(true);
> titleFieldType.setTokenized(true);
> titleFieldType.setOmitNorms(true);
>
> Document doc=new Document();
> Field f=new Field("title","a b c",titleFieldType);
> doc.add(f);
>
> FloatDocValuesField dvf=new FloatDocValuesField("pagerank", 0.8f);
> doc.add(dvf);
>
> writer.addDocument(doc);
>
> doc=new Document();
> doc.add(new Field("title","b d",titleFieldType));
> dvf=new FloatDocValuesField("pagerank", 0.5f);
> doc.add(dvf);
> writer.addDocument(doc);
>
> writer.commit();
>
> doc=new Document();
> doc.add(new Field("title","a c",titleFieldType));
> dvf=new FloatDocValuesField("pagerank", 0.5f);
> doc.add(dvf);
> writer.addDocument(doc);
>
>
> DirectoryReader reader=DirectoryReader.open(writer, true);
> IndexSearcher searcher=new IndexSearcher(reader);
> Query q=new TermQuery(new Term("title","a"));
> TopDocs topDocs=searcher.search(q, 10);
> Set<String> fieldsNeedLoaded=new HashSet<String>(1);
> fieldsNeedLoaded.add("title");
> @SuppressWarnings("unchecked")
> List<AtomicReader> subReaders=(List<AtomicReader>)
>reader.getSequentialSubReaders();
> Source[] sources=new Source[subReaders.size()];
> int idx=0;
> for(AtomicReader subReader:subReaders){
> sources[idx++]=subReader.docValues("pagerank").getSource();
> }
>
> for(int i=0;i<topDocs.totalHits;i++){
> int docId=topDocs.scoreDocs[i].doc;
> float score=topDocs.scoreDocs[i].score;
> //get title
> Document document=searcher.document(docId, fieldsNeedLoaded);
> System.out.println("title: " +document.get("title")+" score:
>"+score);
> idx=-1;
> int docBase=0;
> for(AtomicReader subReader:subReaders){
> idx++;
> //int docBase=subReader.getTopReaderContext().docBaseInParent;
>
> int realDoc=docId-docBase;
> if(realDoc>=0&&realDoc<subReader.maxDoc()){
> double pagerank=sources[idx].getFloat(realDoc);
> System.out.println(pagerank);
> break;
> }
> docBase+=subReader.maxDoc();
> }
> }
> }
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>For additional commands, e-mail: java-user-help [at] lucene

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de


simon.willnauer at gmail

Aug 6, 2012, 4:40 AM

Post #3 of 3 (274 views)
Permalink
Re: questions about DocValues in 4.0 alpha [In reply to]

hey,

On Mon, Aug 6, 2012 at 11:34 AM, Li Li <fancyerii [at] gmail> wrote:
> hi everyone,
> in lucene 4.0 alpha, I found the DocValues are available and gave
> it a try. I am following the slides in
> http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
> I have got 2 questions.
> 1. is DocValues updatable now?

no not yet.

simon


>
> 2. How can I get docBase of an AtomicReader?
> in Collector, it's easy to get docBase. But I need to get
> docValues after scoring. I find
> AtomicReader.getTopReaderContext().docBaseInParent
> and subReader.getTopReaderContext().docBase. But neither of them is correct.
> So I have to iterate through all subReaders and use maxDoc()
> to find suitable subReader for a docID. any better method to find
> corresponding AtomicReader of a docID?
> File d=new File("./testIndex");
> IndexWriterConfig cfg=new IndexWriterConfig(Version.LUCENE_40, new
> WhitespaceAnalyzer(Version.LUCENE_40));
> cfg.setOpenMode(OpenMode.CREATE);
> Directory dir=FSDirectory.open(d);
> IndexWriter writer=new IndexWriter(dir,cfg);
> FieldType titleFieldType=new FieldType();
> titleFieldType.setStored(true);
> titleFieldType.setIndexed(true);
> titleFieldType.setTokenized(true);
> titleFieldType.setOmitNorms(true);
>
> Document doc=new Document();
> Field f=new Field("title","a b c",titleFieldType);
> doc.add(f);
>
> FloatDocValuesField dvf=new FloatDocValuesField("pagerank", 0.8f);
> doc.add(dvf);
>
> writer.addDocument(doc);
>
> doc=new Document();
> doc.add(new Field("title","b d",titleFieldType));
> dvf=new FloatDocValuesField("pagerank", 0.5f);
> doc.add(dvf);
> writer.addDocument(doc);
>
> writer.commit();
>
> doc=new Document();
> doc.add(new Field("title","a c",titleFieldType));
> dvf=new FloatDocValuesField("pagerank", 0.5f);
> doc.add(dvf);
> writer.addDocument(doc);
>
>
> DirectoryReader reader=DirectoryReader.open(writer, true);
> IndexSearcher searcher=new IndexSearcher(reader);
> Query q=new TermQuery(new Term("title","a"));
> TopDocs topDocs=searcher.search(q, 10);
> Set<String> fieldsNeedLoaded=new HashSet<String>(1);
> fieldsNeedLoaded.add("title");
> @SuppressWarnings("unchecked")
> List<AtomicReader> subReaders=(List<AtomicReader>)
> reader.getSequentialSubReaders();
> Source[] sources=new Source[subReaders.size()];
> int idx=0;
> for(AtomicReader subReader:subReaders){
> sources[idx++]=subReader.docValues("pagerank").getSource();
> }
>
> for(int i=0;i<topDocs.totalHits;i++){
> int docId=topDocs.scoreDocs[i].doc;
> float score=topDocs.scoreDocs[i].score;
> //get title
> Document document=searcher.document(docId, fieldsNeedLoaded);
> System.out.println("title: " +document.get("title")+" score: "+score);
> idx=-1;
> int docBase=0;
> for(AtomicReader subReader:subReaders){
> idx++;
> //int docBase=subReader.getTopReaderContext().docBaseInParent;
>
> int realDoc=docId-docBase;
> if(realDoc>=0&&realDoc<subReader.maxDoc()){
> double pagerank=sources[idx].getFloat(realDoc);
> System.out.println(pagerank);
> break;
> }
> docBase+=subReader.maxDoc();
> }
> }
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.