Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

[jira] [Commented] (SOLR-3642) Count is inconsistent between facet and stats

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


jira at apache

Jul 18, 2012, 5:39 PM

Post #1 of 4 (69 views)
Permalink
[jira] [Commented] (SOLR-3642) Count is inconsistent between facet and stats

[ https://issues.apache.org/jira/browse/SOLR-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417906#comment-13417906 ]

Yandong Yao commented on SOLR-3642:
-----------------------------------

You are right, Relative code below:

SchemaField fsf = searcher.getSchema().getField(facetField);
FieldType facetFieldType = fsf.getType();

if (facetFieldType.isTokenized() || facetFieldType.isMultiValued()) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
"Stats can only facet on single-valued fields, not: " + facetField
+ "[" + facetFieldType + "]");
}
try {
facetTermsIndex = FieldCache.DEFAULT.getTermsIndex(searcher.getAtomicReader(), facetField);
}

Sounds like the condition is not enough for multiValued field, should be:

if (fsf.multiValued() || facetFieldType.isTokenized() || facetFieldType.isMultiValued())


> Count is inconsistent between facet and stats
> ---------------------------------------------
>
> Key: SOLR-3642
> URL: https://issues.apache.org/jira/browse/SOLR-3642
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.0-ALPHA
> Environment: 4.0 alpha on macos 10.6
> Reporter: Yandong Yao
>
> Steps to reproduce:
> 1) Download apache-solr-4.0.0-ALPHA
> 2) cd example; java -jar start.jar
> 3) cd exampledocs; ./post.sh *.xml
> 4) Use statsComponent to get the stats info for field 'popularity' based on facet 'cat'. And the 'count' for 'electronics' is 3
> http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat
> {
> stats_fields:
> {
> popularity:
> {
> min: 0,
> max: 10,
> count: 14,
> missing: 0,
> sum: 75,
> sumOfSquares: 503,
> mean: 5.357142857142857,
> stddev: 2.7902892835178013,
> facets:
> {
> cat:
> {
> music:
> {
> min: 10,
> max: 10,
> count: 1,
> missing: 0,
> sum: 10,
> sumOfSquares: 100,
> mean: 10,
> stddev: 0
> },
> monitor:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> hard drive:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> scanner:
> {
> min: 6,
> max: 6,
> count: 1,
> missing: 0,
> sum: 6,
> sumOfSquares: 36,
> mean: 6,
> stddev: 0
> },
> memory:
> {
> min: 0,
> max: 7,
> count: 3,
> missing: 0,
> sum: 12,
> sumOfSquares: 74,
> mean: 4,
> stddev: 3.605551275463989
> },
> graphics card:
> {
> min: 7,
> max: 7,
> count: 2,
> missing: 0,
> sum: 14,
> sumOfSquares: 98,
> mean: 7,
> stddev: 0
> },
> electronics:
> {
> min: 1,
> max: 7,
> count: 3,
> missing: 0,
> sum: 9,
> sumOfSquares: 51,
> mean: 3,
> stddev: 3.4641016151377544
> }
> }
> }
> }
> }
> }
> 5) Facet on 'cat' and the count is 14. http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat
> {
> cat:
> [.
> "electronics",
> 14,
> "memory",
> 3,
> "connector",
> 2,
> "graphics card",
> 2,
> "hard drive",
> 2,
> "monitor",
> 2,
> "camera",
> 1,
> "copier",
> 1,
> "multifunction printer",
> 1,
> "music",
> 1,
> "printer",
> 1,
> "scanner",
> 1,
> "currency",
> 0,
> "search",
> 0,
> "software",
> 0
> ]
> },
> So from StatsComponent the count for 'electronics' cat is 3, while FacetComponent report 14 'electronics'. Is this a bug?
> Following is the field definition for 'cat'.
> <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 19, 2012, 7:49 PM

Post #2 of 4 (67 views)
Permalink
[jira] [Commented] (SOLR-3642) Count is inconsistent between facet and stats [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418890#comment-13418890 ]

Yandong Yao commented on SOLR-3642:
-----------------------------------

Hi Hoss,

Thanks for the quick commit, one further question: if i would like to implement stats with facet field which is multi-valued field, would you please provide some guidance on this?

Currently StatsComponent don't support multivalued facet field because it is using FieldCache which don't support multivalued field. Any alternatives?

If it is possible, I would like to create a JIRA issue for it and try to work on it.

Thanks!

Regards,
Yandong

> Count is inconsistent between facet and stats
> ---------------------------------------------
>
> Key: SOLR-3642
> URL: https://issues.apache.org/jira/browse/SOLR-3642
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.0-ALPHA
> Environment: 4.0 alpha on macos 10.6
> Reporter: Yandong Yao
> Assignee: Hoss Man
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3642.patch
>
>
> Steps to reproduce:
> 1) Download apache-solr-4.0.0-ALPHA
> 2) cd example; java -jar start.jar
> 3) cd exampledocs; ./post.sh *.xml
> 4) Use statsComponent to get the stats info for field 'popularity' based on facet 'cat'. And the 'count' for 'electronics' is 3
> http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat
> {
> stats_fields:
> {
> popularity:
> {
> min: 0,
> max: 10,
> count: 14,
> missing: 0,
> sum: 75,
> sumOfSquares: 503,
> mean: 5.357142857142857,
> stddev: 2.7902892835178013,
> facets:
> {
> cat:
> {
> music:
> {
> min: 10,
> max: 10,
> count: 1,
> missing: 0,
> sum: 10,
> sumOfSquares: 100,
> mean: 10,
> stddev: 0
> },
> monitor:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> hard drive:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> scanner:
> {
> min: 6,
> max: 6,
> count: 1,
> missing: 0,
> sum: 6,
> sumOfSquares: 36,
> mean: 6,
> stddev: 0
> },
> memory:
> {
> min: 0,
> max: 7,
> count: 3,
> missing: 0,
> sum: 12,
> sumOfSquares: 74,
> mean: 4,
> stddev: 3.605551275463989
> },
> graphics card:
> {
> min: 7,
> max: 7,
> count: 2,
> missing: 0,
> sum: 14,
> sumOfSquares: 98,
> mean: 7,
> stddev: 0
> },
> electronics:
> {
> min: 1,
> max: 7,
> count: 3,
> missing: 0,
> sum: 9,
> sumOfSquares: 51,
> mean: 3,
> stddev: 3.4641016151377544
> }
> }
> }
> }
> }
> }
> 5) Facet on 'cat' and the count is 14. http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat
> {
> cat:
> [.
> "electronics",
> 14,
> "memory",
> 3,
> "connector",
> 2,
> "graphics card",
> 2,
> "hard drive",
> 2,
> "monitor",
> 2,
> "camera",
> 1,
> "copier",
> 1,
> "multifunction printer",
> 1,
> "music",
> 1,
> "printer",
> 1,
> "scanner",
> 1,
> "currency",
> 0,
> "search",
> 0,
> "software",
> 0
> ]
> },
> So from StatsComponent the count for 'electronics' cat is 3, while FacetComponent report 14 'electronics'. Is this a bug?
> Following is the field definition for 'cat'.
> <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 24, 2012, 3:15 PM

Post #3 of 4 (68 views)
Permalink
[jira] [Commented] (SOLR-3642) Count is inconsistent between facet and stats [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421824#comment-13421824 ]

Hoss Man commented on SOLR-3642:
--------------------------------

Yangdong: the issue i linked this one to (SOLR-1782) is open precisely to try and address this problem -- there is an (old) patch there that i honestly have not had time to look at, but you may want to take a look and see if it can be brought up to date and polished up to work and have good tests

(IIRC: the reason i never really dug into it before was because the way StatsComponent deals with stats.facet in general struck me as being kind of kludgy and hard to understand, and i couldn't see a clean way to make it work well with both multivalued fields and arbitrary field types)


> Count is inconsistent between facet and stats
> ---------------------------------------------
>
> Key: SOLR-3642
> URL: https://issues.apache.org/jira/browse/SOLR-3642
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.0-ALPHA
> Environment: 4.0 alpha on macos 10.6
> Reporter: Yandong Yao
> Assignee: Hoss Man
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3642.patch
>
>
> Steps to reproduce:
> 1) Download apache-solr-4.0.0-ALPHA
> 2) cd example; java -jar start.jar
> 3) cd exampledocs; ./post.sh *.xml
> 4) Use statsComponent to get the stats info for field 'popularity' based on facet 'cat'. And the 'count' for 'electronics' is 3
> http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat
> {
> stats_fields:
> {
> popularity:
> {
> min: 0,
> max: 10,
> count: 14,
> missing: 0,
> sum: 75,
> sumOfSquares: 503,
> mean: 5.357142857142857,
> stddev: 2.7902892835178013,
> facets:
> {
> cat:
> {
> music:
> {
> min: 10,
> max: 10,
> count: 1,
> missing: 0,
> sum: 10,
> sumOfSquares: 100,
> mean: 10,
> stddev: 0
> },
> monitor:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> hard drive:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> scanner:
> {
> min: 6,
> max: 6,
> count: 1,
> missing: 0,
> sum: 6,
> sumOfSquares: 36,
> mean: 6,
> stddev: 0
> },
> memory:
> {
> min: 0,
> max: 7,
> count: 3,
> missing: 0,
> sum: 12,
> sumOfSquares: 74,
> mean: 4,
> stddev: 3.605551275463989
> },
> graphics card:
> {
> min: 7,
> max: 7,
> count: 2,
> missing: 0,
> sum: 14,
> sumOfSquares: 98,
> mean: 7,
> stddev: 0
> },
> electronics:
> {
> min: 1,
> max: 7,
> count: 3,
> missing: 0,
> sum: 9,
> sumOfSquares: 51,
> mean: 3,
> stddev: 3.4641016151377544
> }
> }
> }
> }
> }
> }
> 5) Facet on 'cat' and the count is 14. http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat
> {
> cat:
> [.
> "electronics",
> 14,
> "memory",
> 3,
> "connector",
> 2,
> "graphics card",
> 2,
> "hard drive",
> 2,
> "monitor",
> 2,
> "camera",
> 1,
> "copier",
> 1,
> "multifunction printer",
> 1,
> "music",
> 1,
> "printer",
> 1,
> "scanner",
> 1,
> "currency",
> 0,
> "search",
> 0,
> "software",
> 0
> ]
> },
> So from StatsComponent the count for 'electronics' cat is 3, while FacetComponent report 14 'electronics'. Is this a bug?
> Following is the field definition for 'cat'.
> <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene


jira at apache

Jul 30, 2012, 4:58 PM

Post #4 of 4 (49 views)
Permalink
[jira] [Commented] (SOLR-3642) Count is inconsistent between facet and stats [In reply to]

[ https://issues.apache.org/jira/browse/SOLR-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425402#comment-13425402 ]

Yandong Yao commented on SOLR-3642:
-----------------------------------

Hi Hoss,

Thanks a lot, Will look at the patch at SOLR-1782 and try to apply to trunk.

Regards,
Yandong

> Count is inconsistent between facet and stats
> ---------------------------------------------
>
> Key: SOLR-3642
> URL: https://issues.apache.org/jira/browse/SOLR-3642
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other
> Affects Versions: 4.0-ALPHA
> Environment: 4.0 alpha on macos 10.6
> Reporter: Yandong Yao
> Assignee: Hoss Man
> Fix For: 4.0, 5.0
>
> Attachments: SOLR-3642.patch
>
>
> Steps to reproduce:
> 1) Download apache-solr-4.0.0-ALPHA
> 2) cd example; java -jar start.jar
> 3) cd exampledocs; ./post.sh *.xml
> 4) Use statsComponent to get the stats info for field 'popularity' based on facet 'cat'. And the 'count' for 'electronics' is 3
> http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&stats=true&stats.field=popularity&stats.facet=cat
> {
> stats_fields:
> {
> popularity:
> {
> min: 0,
> max: 10,
> count: 14,
> missing: 0,
> sum: 75,
> sumOfSquares: 503,
> mean: 5.357142857142857,
> stddev: 2.7902892835178013,
> facets:
> {
> cat:
> {
> music:
> {
> min: 10,
> max: 10,
> count: 1,
> missing: 0,
> sum: 10,
> sumOfSquares: 100,
> mean: 10,
> stddev: 0
> },
> monitor:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> hard drive:
> {
> min: 6,
> max: 6,
> count: 2,
> missing: 0,
> sum: 12,
> sumOfSquares: 72,
> mean: 6,
> stddev: 0
> },
> scanner:
> {
> min: 6,
> max: 6,
> count: 1,
> missing: 0,
> sum: 6,
> sumOfSquares: 36,
> mean: 6,
> stddev: 0
> },
> memory:
> {
> min: 0,
> max: 7,
> count: 3,
> missing: 0,
> sum: 12,
> sumOfSquares: 74,
> mean: 4,
> stddev: 3.605551275463989
> },
> graphics card:
> {
> min: 7,
> max: 7,
> count: 2,
> missing: 0,
> sum: 14,
> sumOfSquares: 98,
> mean: 7,
> stddev: 0
> },
> electronics:
> {
> min: 1,
> max: 7,
> count: 3,
> missing: 0,
> sum: 9,
> sumOfSquares: 51,
> mean: 3,
> stddev: 3.4641016151377544
> }
> }
> }
> }
> }
> }
> 5) Facet on 'cat' and the count is 14. http://localhost:8983/solr/collection1/select?q=cat:electronics&wt=json&rows=0&facet=true&facet.field=cat
> {
> cat:
> [.
> "electronics",
> 14,
> "memory",
> 3,
> "connector",
> 2,
> "graphics card",
> 2,
> "hard drive",
> 2,
> "monitor",
> 2,
> "camera",
> 1,
> "copier",
> 1,
> "multifunction printer",
> 1,
> "music",
> 1,
> "printer",
> 1,
> "scanner",
> 1,
> "currency",
> 0,
> "search",
> 0,
> "software",
> 0
> ]
> },
> So from StatsComponent the count for 'electronics' cat is 3, while FacetComponent report 14 'electronics'. Is this a bug?
> Following is the field definition for 'cat'.
> <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe [at] lucene
For additional commands, e-mail: dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.