Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

unit test speed for TestCustomScoreQuery

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


erickerickson at gmail

Oct 29, 2009, 12:10 PM

Post #1 of 8 (427 views)
Permalink
unit test speed for TestCustomScoreQuery

OK, I'm actually getting of my duff and trying to do something. And I'm off
today feeling ill, so I have a bit of time. So the rational place to start
is to get all the code and run the unit tests, and I've even got them
running in IntelliJ as well as the ant task. Will wonders never cease.

The unit tests in TestCustomScoreQuery take on the order of 80 seconds on my
machine. Before I go off the deep end I wanted to run what I've found past
y'all to see if it makes sense.

Virtually all the time is taken up in the method logResult on the call to
QueryUtils.check(q, s). But the logResult method is called from
verifyResults method in a loop like:

for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
call logResult for 5 different queries.
}
But the queries don't change that I can see...

Why couldn't we just call QueryUtils.check once for each of the 5 queries
outside the for loop? Doing so reduces the time it takes for the test from
~80 seconds to 9 seconds.

If there're no objections, I'll make a patch. Which folks will probably have
to be patient with me since it'll be the first one I've produced for this
project.....

While I'm at it, what are we thinking about using JUnit4? Looking *briefly*
at the code, this actually seems like it'd be difficult. We'd have to deal
with the LuceneTestCase superclass...

Best
Erick


markrmiller at gmail

Oct 29, 2009, 12:12 PM

Post #2 of 8 (405 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

+1 - I made a similar observation a while back and started an issue to
address Junit test speeds:

https://issues.apache.org/jira/browse/LUCENE-1844

Erick Erickson wrote:
> OK, I'm actually getting of my duff and trying to do something. And
> I'm off today feeling ill, so I have a bit of time. So the rational
> place to start is to get all the code and run the unit tests, and I've
> even got them running in IntelliJ as well as the ant task. Will
> wonders never cease.
>
> The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> on my machine. Before I go off the deep end I wanted to run what I've
> found past y'all to see if it makes sense.
>
> Virtually all the time is taken up in the method logResult on the call
> to QueryUtils.check(q, s). But the logResult method is called from
> verifyResults method in a loop like:
>
> for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
> call logResult for 5 different queries.
> }
> But the queries don't change that I can see...
>
> Why couldn't we just call QueryUtils.check once for each of the 5
> queries outside the for loop? Doing so reduces the time it takes for
> the test from ~80 seconds to 9 seconds.
>
> If there're no objections, I'll make a patch. Which folks will
> probably have to be patient with me since it'll be the first one I've
> produced for this project.....
>
> While I'm at it, what are we thinking about using JUnit4? Looking
> *briefly* at the code, this actually seems like it'd be difficult.
> We'd have to deal with the LuceneTestCase superclass...
>
> Best
> Erick


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rcmuir at gmail

Oct 29, 2009, 12:16 PM

Post #3 of 8 (404 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

maybe we should link this issue to that as well:
https://issues.apache.org/jira/browse/LUCENE-1786

except that one is also a problem because it downloads zip files from the
internet, and if for some reason this fails, the test just passes...

one of these days maybe get off my duff and look at that one... (or if you
want another one to look at, have at it!)

On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller [at] gmail> wrote:

> +1 - I made a similar observation a while back and started an issue to
> address Junit test speeds:
>
> https://issues.apache.org/jira/browse/LUCENE-1844
>
> Erick Erickson wrote:
> > OK, I'm actually getting of my duff and trying to do something. And
> > I'm off today feeling ill, so I have a bit of time. So the rational
> > place to start is to get all the code and run the unit tests, and I've
> > even got them running in IntelliJ as well as the ant task. Will
> > wonders never cease.
> >
> > The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> > on my machine. Before I go off the deep end I wanted to run what I've
> > found past y'all to see if it makes sense.
> >
> > Virtually all the time is taken up in the method logResult on the call
> > to QueryUtils.check(q, s). But the logResult method is called from
> > verifyResults method in a loop like:
> >
> > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
> > call logResult for 5 different queries.
> > }
> > But the queries don't change that I can see...
> >
> > Why couldn't we just call QueryUtils.check once for each of the 5
> > queries outside the for loop? Doing so reduces the time it takes for
> > the test from ~80 seconds to 9 seconds.
> >
> > If there're no objections, I'll make a patch. Which folks will
> > probably have to be patient with me since it'll be the first one I've
> > produced for this project.....
> >
> > While I'm at it, what are we thinking about using JUnit4? Looking
> > *briefly* at the code, this actually seems like it'd be difficult.
> > We'd have to deal with the LuceneTestCase superclass...
> >
> > Best
> > Erick
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


--
Robert Muir
rcmuir [at] gmail


erickerickson at gmail

Oct 29, 2009, 12:43 PM

Post #4 of 8 (405 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

Yep, I remember reading something about that, which is what
prodded me to take a look after I ran the tests and saw how
long they took....

several more questions:
1> Why call QueryUtils.check in the first place? I admit when I
looked that method over, my eyes glazed. I try to take things
in bite-sized chunks and that one stretched my jaw. Can anyone
give me a two-sentence statement of what is that method trying
to do and when it's desirable/necessary to call it? Because it looks
*very* expensive. If we just don't call it at all, we're down to < 1
sec.
I can imagine someone thinking "check, that sounds like a good thing
to call" without appreciating the expense. I wonder if this is the
*first* thing to check when looking at slow tests....
2> About reformatting. When I'm working, I'll often just check in a
reformatted file w/o any code changes to make comparisons
easier. I grabbed the codestyle.xml file from the SOLR site
but using that reformatted a bunch of stuff (in IntelliJ) that
confuse the code changes (to the TestCustomScoreQuery
file only). Just check it all in anyway? (really, submit a patch)?
Or do two separate patches? Or back out and not reformat
that file?
3> I'm assuming that all unit tests are also Java5, so I added
some generics notations. Is this acceptable?

Thanks
Erick

On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller [at] gmail> wrote:

> +1 - I made a similar observation a while back and started an issue to
> address Junit test speeds:
>
> https://issues.apache.org/jira/browse/LUCENE-1844
>
> Erick Erickson wrote:
> > OK, I'm actually getting of my duff and trying to do something. And
> > I'm off today feeling ill, so I have a bit of time. So the rational
> > place to start is to get all the code and run the unit tests, and I've
> > even got them running in IntelliJ as well as the ant task. Will
> > wonders never cease.
> >
> > The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> > on my machine. Before I go off the deep end I wanted to run what I've
> > found past y'all to see if it makes sense.
> >
> > Virtually all the time is taken up in the method logResult on the call
> > to QueryUtils.check(q, s). But the logResult method is called from
> > verifyResults method in a loop like:
> >
> > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
> > call logResult for 5 different queries.
> > }
> > But the queries don't change that I can see...
> >
> > Why couldn't we just call QueryUtils.check once for each of the 5
> > queries outside the for loop? Doing so reduces the time it takes for
> > the test from ~80 seconds to 9 seconds.
> >
> > If there're no objections, I'll make a patch. Which folks will
> > probably have to be patient with me since it'll be the first one I've
> > produced for this project.....
> >
> > While I'm at it, what are we thinking about using JUnit4? Looking
> > *briefly* at the code, this actually seems like it'd be difficult.
> > We'd have to deal with the LuceneTestCase superclass...
> >
> > Best
> > Erick
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


erickerickson at gmail

Oct 29, 2009, 5:48 PM

Post #5 of 8 (388 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

I *finally* looked at the Jira and lo! you specifically identified the
check method.... Someday I'll learn to look first, *then* explore.
Siiigghhh.

Anyway, looking at things a bit more, I'm more confident that
TestCustomScoreQuery wouldn't lose anything by moving the
query check outside of the loop. The queries don't change and
we're not re-querying....

TestBooleanShouldMatch is trickier. It produces 1,000 queries
that have widely varying forms. I don't see a clean way to test
one of each "form". We could decide that, say, the first 50
were enough. Or we shouldn't be running check on randomly-
generated queries. Or if we find a query that's actually a bug
we should test for it explicitly. Or... But I'm not comfortable
changing the check in the random test without some
consensus.

I'll give folks a chance to chime in between now and Monday
and generate a patch for one or both depending on the response.

Erick

On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller [at] gmail> wrote:

> +1 - I made a similar observation a while back and started an issue to
> address Junit test speeds:
>
> https://issues.apache.org/jira/browse/LUCENE-1844
>
> Erick Erickson wrote:
> > OK, I'm actually getting of my duff and trying to do something. And
> > I'm off today feeling ill, so I have a bit of time. So the rational
> > place to start is to get all the code and run the unit tests, and I've
> > even got them running in IntelliJ as well as the ant task. Will
> > wonders never cease.
> >
> > The unit tests in TestCustomScoreQuery take on the order of 80 seconds
> > on my machine. Before I go off the deep end I wanted to run what I've
> > found past y'all to see if it makes sense.
> >
> > Virtually all the time is taken up in the method logResult on the call
> > to QueryUtils.check(q, s). But the logResult method is called from
> > verifyResults method in a loop like:
> >
> > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
> > call logResult for 5 different queries.
> > }
> > But the queries don't change that I can see...
> >
> > Why couldn't we just call QueryUtils.check once for each of the 5
> > queries outside the for loop? Doing so reduces the time it takes for
> > the test from ~80 seconds to 9 seconds.
> >
> > If there're no objections, I'll make a patch. Which folks will
> > probably have to be patient with me since it'll be the first one I've
> > produced for this project.....
> >
> > While I'm at it, what are we thinking about using JUnit4? Looking
> > *briefly* at the code, this actually seems like it'd be difficult.
> > We'd have to deal with the LuceneTestCase superclass...
> >
> > Best
> > Erick
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


markrmiller at gmail

Oct 29, 2009, 6:05 PM

Post #6 of 8 (395 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

Erick Erickson wrote:
> I *finally* looked at the Jira and lo! you specifically identified the
> check method.... Someday I'll learn to look first, *then* explore.
> Siiigghhh.

No worries ;) So many half baked JIRA issues out there, it happens to
everyone - no way around it.
This way worked just as well I think - and sometimes coming at it fresh
gives a different insight anyway.
>
> Anyway, looking at things a bit more, I'm more confident that
> TestCustomScoreQuery wouldn't lose anything by moving the
> query check outside of the loop. The queries don't change and
> we're not re-querying....
>
> TestBooleanShouldMatch is trickier. It produces 1,000 queries
> that have widely varying forms. I don't see a clean way to test
> one of each "form". We could decide that, say, the first 50
> were enough. Or we shouldn't be running check on randomly-
> generated queries. Or if we find a query that's actually a bug
> we should test for it explicitly. Or... But I'm not comfortable
> changing the check in the random test without some
> consensus.
>
> I'll give folks a chance to chime in between now and Monday
> and generate a patch for one or both depending on the response.
>
> Erick
>
> On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller [at] gmail
> <mailto:markrmiller [at] gmail>> wrote:
>
> +1 - I made a similar observation a while back and started an issue to
> address Junit test speeds:
>
> https://issues.apache.org/jira/browse/LUCENE-1844
>
> Erick Erickson wrote:
> > OK, I'm actually getting of my duff and trying to do something. And
> > I'm off today feeling ill, so I have a bit of time. So the rational
> > place to start is to get all the code and run the unit tests,
> and I've
> > even got them running in IntelliJ as well as the ant task. Will
> > wonders never cease.
> >
> > The unit tests in TestCustomScoreQuery take on the order of 80
> seconds
> > on my machine. Before I go off the deep end I wanted to run what
> I've
> > found past y'all to see if it makes sense.
> >
> > Virtually all the time is taken up in the method logResult on
> the call
> > to QueryUtils.check(q, s). But the logResult method is called from
> > verifyResults method in a loop like:
> >
> > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
> > call logResult for 5 different queries.
> > }
> > But the queries don't change that I can see...
> >
> > Why couldn't we just call QueryUtils.check once for each of the 5
> > queries outside the for loop? Doing so reduces the time it takes for
> > the test from ~80 seconds to 9 seconds.
> >
> > If there're no objections, I'll make a patch. Which folks will
> > probably have to be patient with me since it'll be the first one
> I've
> > produced for this project.....
> >
> > While I'm at it, what are we thinking about using JUnit4? Looking
> > *briefly* at the code, this actually seems like it'd be difficult.
> > We'd have to deal with the LuceneTestCase superclass...
> >
> > Best
> > Erick
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> <mailto:java-dev-unsubscribe [at] lucene>
> For additional commands, e-mail: java-dev-help [at] lucene
> <mailto:java-dev-help [at] lucene>
>
>


--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


erickerickson at gmail

Nov 4, 2009, 3:01 PM

Post #7 of 8 (329 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

I'll be attaching a new patch to JIRA-1844 after all the tests run, a bit
later this
evening. The two files I changed move/remove calls to
CheckUtils.check(query);

I arbitrarily limited the random query test in TestBooleanShouldMatch
to only checking the first 100, on the theory that we were at diminishing
returns after 100, whoever commits this feel free to disagree.

But a couple of process questions.

1> I almost reflexively reformat files I'm working on. Which makes diffing
them really a pain. But the "how to contribute" page can be read as
this
is OK as long as one is working on the file. What's preferred? Separate
checkins for reformatting-only? And how is that accomplished
2> I cleaned up the warnings from IntelliJs inspections, I assume that's
reasonable.
3> Did some upgrading-to-java-5 modifications....

Let me know what I should do differently next time, or what needs to be
changed.

Thanks
Erick

On Thu, Oct 29, 2009 at 8:05 PM, Mark Miller <markrmiller [at] gmail> wrote:

> Erick Erickson wrote:
> > I *finally* looked at the Jira and lo! you specifically identified the
> > check method.... Someday I'll learn to look first, *then* explore.
> > Siiigghhh.
>
> No worries ;) So many half baked JIRA issues out there, it happens to
> everyone - no way around it.
> This way worked just as well I think - and sometimes coming at it fresh
> gives a different insight anyway.
> >
> > Anyway, looking at things a bit more, I'm more confident that
> > TestCustomScoreQuery wouldn't lose anything by moving the
> > query check outside of the loop. The queries don't change and
> > we're not re-querying....
> >
> > TestBooleanShouldMatch is trickier. It produces 1,000 queries
> > that have widely varying forms. I don't see a clean way to test
> > one of each "form". We could decide that, say, the first 50
> > were enough. Or we shouldn't be running check on randomly-
> > generated queries. Or if we find a query that's actually a bug
> > we should test for it explicitly. Or... But I'm not comfortable
> > changing the check in the random test without some
> > consensus.
> >
> > I'll give folks a chance to chime in between now and Monday
> > and generate a patch for one or both depending on the response.
> >
> > Erick
> >
> > On Thu, Oct 29, 2009 at 3:12 PM, Mark Miller <markrmiller [at] gmail
> > <mailto:markrmiller [at] gmail>> wrote:
> >
> > +1 - I made a similar observation a while back and started an issue
> to
> > address Junit test speeds:
> >
> > https://issues.apache.org/jira/browse/LUCENE-1844
> >
> > Erick Erickson wrote:
> > > OK, I'm actually getting of my duff and trying to do something. And
> > > I'm off today feeling ill, so I have a bit of time. So the rational
> > > place to start is to get all the code and run the unit tests,
> > and I've
> > > even got them running in IntelliJ as well as the ant task. Will
> > > wonders never cease.
> > >
> > > The unit tests in TestCustomScoreQuery take on the order of 80
> > seconds
> > > on my machine. Before I go off the deep end I wanted to run what
> > I've
> > > found past y'all to see if it makes sense.
> > >
> > > Virtually all the time is taken up in the method logResult on
> > the call
> > > to QueryUtils.check(q, s). But the logResult method is called from
> > > verifyResults method in a loop like:
> > >
> > > for (Iterator it = h1.keySet().iterator(); it.hasNext();) {
> > > call logResult for 5 different queries.
> > > }
> > > But the queries don't change that I can see...
> > >
> > > Why couldn't we just call QueryUtils.check once for each of the 5
> > > queries outside the for loop? Doing so reduces the time it takes
> for
> > > the test from ~80 seconds to 9 seconds.
> > >
> > > If there're no objections, I'll make a patch. Which folks will
> > > probably have to be patient with me since it'll be the first one
> > I've
> > > produced for this project.....
> > >
> > > While I'm at it, what are we thinking about using JUnit4? Looking
> > > *briefly* at the code, this actually seems like it'd be difficult.
> > > We'd have to deal with the LuceneTestCase superclass...
> > >
> > > Best
> > > Erick
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> > <mailto:java-dev-unsubscribe [at] lucene>
> > For additional commands, e-mail: java-dev-help [at] lucene
> > <mailto:java-dev-help [at] lucene>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


lucene at mikemccandless

Nov 5, 2009, 1:24 AM

Post #8 of 8 (339 views)
Permalink
Re: unit test speed for TestCustomScoreQuery [In reply to]

On Wed, Nov 4, 2009 at 6:01 PM, Erick Erickson <erickerickson [at] gmail> wrote:

> But a couple of process questions.
>
> 1> I almost reflexively reformat files I'm working on. Which makes
> diffing them really a pain. But the "how to contribute" page
> can be read as this is OK as long as one is working on the
> file. What's preferred? Separate checkins for
> reformatting-only? And how is that accomplished
> 2> I cleaned up the warnings from IntelliJs inspections, I assume
> that's reasonable.
> 3> Did some upgrading-to-java-5 modifications....

Good questions!

I think you should in fact clean-as-you-go, and include it in the
patch, as you've done....

I've actually flip-flopped on this: I used to feel we should try hard
to NOT do formatting, cleanups, etc., while making a "real" change,
but in practice I think that's too severe and it'll mean
far less code cleaning (which all source code sorely needs) will
happen. So now I think clean-as-you-go is preferred.

One useful command (on unix's) is something like this:

svn diff --diff-cmd diff -x -w

It generates a diff that ignores whitespace differences. It's not
something you can feed back to patch, but, it's great for seeing
mostly "real" diffs when you have whitespace-only formatting
differences.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.