Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

"Advanced" query language

 

 

First page Previous page 1 2 3 4 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


markharw00d at yahoo

Dec 2, 2005, 7:03 AM

Post #1 of 83 (5840 views)
Permalink
"Advanced" query language

There seems to be a growing gap between Lucene
functionality and the query language offered by
QueryParser (eg no support for regex queries, span
queries, "more like this", filter queries,
minNumShouldMatch etc etc).

Closing this gap is hard when:
a) The availability of Javacc+Lucene skills is a
bottleneck
b) The syntax of the query language makes it difficult
to add new features eg rapidly running out of "special
characters"

I don't think extending the existing query
parser/language is necessarily useful and I see it
being used purely to support the classic "simple
search engine" syntax.

Unfortunately the fall-back position for applications
which require more complex queries is to "just write
some Java code to instantiate the Query objects
programmatically." This is OK but I think there is
value in having an advanced search syntax capable of
supporting the latest Lucene features and expressed in
XML. It's worth considering why it's useful to have a
String-representable form for queries:
1) Queries can be stored eg in audit logs or "saved
queries" used for tasks like auto-categorization
2) Clients built in languages other than Java can
issue queries to a Lucene server
3) I can decouple a request from the code that
implements the query when distributing software e.g my
applet may not want Lucene dragging down to the client

Currently we cannot easily do the above for any
"complex" queries because they are not easily
persisted (yes, we could serialize Query objects but
that seems messy and does not solve points 2 and 3).

We can potentially use XML in the same way ANT does
i.e. a declarative way of invoking an extensible list
of Java-implemented features. A query interpreter is
used to instantiate the configured Java Query objects
and populates them with settings from the XML in a
generic fashion (using reflection) eg:
....
<MoreLikeThis minNumberShouldMatch="3"
maxQueryTerms="30">
<text>
Lorem ipsum dolor sit amet, consectetuer
adipiscing
elit. Morbi eget ante blandit quam faucibus
posuere. Vivamus
porta, elit fringilla venenatis consequat, neque
lectus
gravida dolor, sed cursus nunc elit non lorem.
Nullam congue
orci id eros. Nunc aliquet posuere enim.
</text>
</MoreLikeThis>
</BooleanClause>

Do people feel this would be a worthwhile endeavour?
I'm not sure if enough people feel pain around the
points 1-3 outlined above to make it worth pursuing.


Cheers
Mark




___________________________________________________________
How much free photo storage do you get? Store your holiday
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yseeley at gmail

Dec 2, 2005, 12:42 PM

Post #2 of 83 (5769 views)
Permalink
Re: "Advanced" query language [In reply to]

> It's worth considering why it's useful to have a
> String-representable form for queries:

Absolutely. A quickly parseable string representation for queries is
essential in so many contexts, for the reasons you brought out. Think
what SQL does for the database.

-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Dec 2, 2005, 2:50 PM

Post #3 of 83 (5757 views)
Permalink
Re: "Advanced" query language [In reply to]

On Friday 02 December 2005 16:03, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).
>
> Closing this gap is hard when:
> a) The availability of Javacc+Lucene skills is a
> bottleneck
> b) The syntax of the query language makes it difficult
> to add new features eg rapidly running out of "special
> characters"
>
> I don't think extending the existing query
> parser/language is necessarily useful and I see it
> being used purely to support the classic "simple
> search engine" syntax.
>
> Unfortunately the fall-back position for applications
> which require more complex queries is to "just write
> some Java code to instantiate the Query objects
> programmatically." This is OK but I think there is
> value in having an advanced search syntax capable of
> supporting the latest Lucene features and expressed in
> XML. It's worth considering why it's useful to have a
> String-representable form for queries:
> 1) Queries can be stored eg in audit logs or "saved
> queries" used for tasks like auto-categorization
> 2) Clients built in languages other than Java can
> issue queries to a Lucene server
> 3) I can decouple a request from the code that
> implements the query when distributing software e.g my
> applet may not want Lucene dragging down to the client
>
> Currently we cannot easily do the above for any
> "complex" queries because they are not easily
> persisted (yes, we could serialize Query objects but
> that seems messy and does not solve points 2 and 3).
>
> We can potentially use XML in the same way ANT does
> i.e. a declarative way of invoking an extensible list
> of Java-implemented features. A query interpreter is
> used to instantiate the configured Java Query objects
> and populates them with settings from the XML in a
> generic fashion (using reflection) eg:
> ....
> <MoreLikeThis minNumberShouldMatch="3"
> maxQueryTerms="30">
> <text>
> Lorem ipsum dolor sit amet, consectetuer
> adipiscing
> elit. Morbi eget ante blandit quam faucibus
> posuere. Vivamus
> porta, elit fringilla venenatis consequat, neque
> lectus
> gravida dolor, sed cursus nunc elit non lorem.
> Nullam congue
> orci id eros. Nunc aliquet posuere enim.
> </text>
> </MoreLikeThis>
> </BooleanClause>

Quidquid id est ...
Do we have a Latin analyzer?

>
> Do people feel this would be a worthwhile endeavour?
> I'm not sure if enough people feel pain around the
> points 1-3 outlined above to make it worth pursuing.

There are at least two more issues:

Some queries can be nested inside others, and some
nesting combinations can not be searched. For example it is
not possible to have a BooleanQuery inside a PhraseQuery.
How to deal with these?

XML is not readable/writable by the most humans that could
make good use of the extra power in the gap left open
by the default query language. See also this:
http://ciir.cs.umass.edu/irdemo/inqinfo/inqueryhelp.html
Do you want to decouple (as above) at the human interface?


There is also the contrib/surround query language/
This language avoids using special characters by using prefix
operators. Adding prefix operators like this is straightforward:

moreLikeThis(3, 30, termList(Lorem ipsum dolor sit amet))

for practical use, this could be simplified to:

mlt(3, 30, (Lorem ipsum dolor sit amet))

Such additions are a bit of work, but the query possibilities of Lucene
do not change that fast.
Adding infix operators with operators in between their arguments
(infix) is a bit more involved.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


rengels at ix

Dec 2, 2005, 3:03 PM

Post #4 of 83 (5787 views)
Permalink
Re: "Advanced" query language [In reply to]

I don't see the value in this. What ever is generating the xml could just as easily create/instantiate the query objects.

I would much rather see the query parser migrated to an internal parser (that would be easier to maintain), and develop a syntax that allowed easier use of the most common/powerful features.

-----Original Message-----
From: mark harwood <markharw00d [at] yahoo>
Sent: Dec 2, 2005 10:03 AM
To: java-dev [at] lucene
Subject: "Advanced" query language

There seems to be a growing gap between Lucene
functionality and the query language offered by
QueryParser (eg no support for regex queries, span
queries, "more like this", filter queries,
minNumShouldMatch etc etc).

Closing this gap is hard when:
a) The availability of Javacc+Lucene skills is a
bottleneck
b) The syntax of the query language makes it difficult
to add new features eg rapidly running out of "special
characters"

I don't think extending the existing query
parser/language is necessarily useful and I see it
being used purely to support the classic "simple
search engine" syntax.

Unfortunately the fall-back position for applications
which require more complex queries is to "just write
some Java code to instantiate the Query objects
programmatically." This is OK but I think there is
value in having an advanced search syntax capable of
supporting the latest Lucene features and expressed in
XML. It's worth considering why it's useful to have a
String-representable form for queries:
1) Queries can be stored eg in audit logs or "saved
queries" used for tasks like auto-categorization
2) Clients built in languages other than Java can
issue queries to a Lucene server
3) I can decouple a request from the code that
implements the query when distributing software e.g my
applet may not want Lucene dragging down to the client

Currently we cannot easily do the above for any
"complex" queries because they are not easily
persisted (yes, we could serialize Query objects but
that seems messy and does not solve points 2 and 3).

We can potentially use XML in the same way ANT does
i.e. a declarative way of invoking an extensible list
of Java-implemented features. A query interpreter is
used to instantiate the configured Java Query objects
and populates them with settings from the XML in a
generic fashion (using reflection) eg:
....
<MoreLikeThis minNumberShouldMatch="3"
maxQueryTerms="30">
<text>
Lorem ipsum dolor sit amet, consectetuer
adipiscing
elit. Morbi eget ante blandit quam faucibus
posuere. Vivamus
porta, elit fringilla venenatis consequat, neque
lectus
gravida dolor, sed cursus nunc elit non lorem.
Nullam congue
orci id eros. Nunc aliquet posuere enim.
</text>
</MoreLikeThis>
</BooleanClause>

Do people feel this would be a worthwhile endeavour?
I'm not sure if enough people feel pain around the
points 1-3 outlined above to make it worth pursuing.


Cheers
Mark




___________________________________________________________
How much free photo storage do you get? Store your holiday
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markharw00d at yahoo

Dec 2, 2005, 5:42 PM

Post #5 of 83 (5787 views)
Permalink
Re: "Advanced" query language [In reply to]

>What ever is generating the xml could just as easily create/instantiate the query objects.
>
>
Yes, it is easier using the existing Java objects to construct queries
but they are inappropriate when you consider the scenarios 1 to 3 I
outlined earlier (query persistence, support for clients written in
other languages or remote clients that don't want/need Lucene
implementation classes locally). I find the separation of service
request from service implementation is generally useful and using Query
objects for both feels a bit dodgy.

I was not necessarily considering this proposal as a query language
which power end-users would normally type. I saw this more as a syntax
which would typically be generated by application code and could be
conveniently persisted and/or serialized between machines. For example,
I've a nice applet based GUI query builder for constructing boolean
logic(see "Modern Information Retrieval" p285) which accumulates the
user's query clauses, constructs a regular Lucene query string and sends
it serverside. Unfortunately, using this approach I cannot offer the
full range of Lucene query capabilities because they can't all be
expressed in the existing query syntax. Yes,I could create Query objects
in the client applet and serialize them to the server but how do I save
them in an audit log or database of "saved queries"?
I suspect there are many other Lucene applications which are essentially
fronted by forms that are interpreted by application code to produce the
query strings. In these scenarios where app code creates the query
requests the "end-user-friendliness" of the query syntax may be a lesser
priority compared to the required flexibility to represent all types of
Lucene Queries - hence the XML suggestion, although I'm not married to that.

Yonik was right to highlight the parallel with SQL here.





___________________________________________________________
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


erik at ehatchersolutions

Dec 2, 2005, 6:03 PM

Post #6 of 83 (5766 views)
Permalink
Re: "Advanced" query language [In reply to]

On Dec 2, 2005, at 10:03 AM, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).

At least with a couple of these it would be sensible to subclass
QueryParser and override some getters to create other types of
queries. For example, if you need ordered sloppy phrase queries you
could create a SpanNearQuery instead of a PhraseQuery. Likewise with
RegexQuery instead of WildcardQuery.

Question - since when is "more like this" a Query? Should it be?

Your points below are well taken though....

> Closing this gap is hard when:
> a) The availability of Javacc+Lucene skills is a
> bottleneck

job security?! :)

I've been doing a lot of JavaCC work this year, and it has been a
humbling learning curve, and I barely feel capable with it.

One interesting project I just came across is JParsec: http://
jparsec.codehaus.org - perhaps this could be a much simpler way than
using JavaCC.

> b) The syntax of the query language makes it difficult
> to add new features eg rapidly running out of "special
> characters"

This is the biggest issue of all. What do humans want to type in in
order to achieve sophisticated queries?

Apple has it pretty nicely implemented with additive builders (such
as with Finder, Mail rules, and smart playlists in iTunes) but they
don't support nested expressions rather only "all" or "any" of the
criteria.

> I don't think extending the existing query
> parser/language is necessarily useful and I see it
> being used purely to support the classic "simple
> search engine" syntax.

I concur. Tacking more into QueryParser is not going to make most
users happy. I think there may be too many bells and whistles in it
already.

> Unfortunately the fall-back position for applications
> which require more complex queries is to "just write
> some Java code to instantiate the Query objects
> programmatically."

I've not found a generalization of how queries are entered into the
system across the applications I've worked on, though. Every query
interface has been custom.

> This is OK but I think there is
> value in having an advanced search syntax capable of
> supporting the latest Lucene features and expressed in
> XML. It's worth considering why it's useful to have a
> String-representable form for queries:
> 1) Queries can be stored eg in audit logs or "saved
> queries" used for tasks like auto-categorization
> 2) Clients built in languages other than Java can
> issue queries to a Lucene server
> 3) I can decouple a request from the code that
> implements the query when distributing software e.g my
> applet may not want Lucene dragging down to the client

This is an interesting proposal, and one that has a lot of merit in
how you've explained it.

> We can potentially use XML in the same way ANT does
> i.e. a declarative way of invoking an extensible list
> of Java-implemented features.

I've told many developers that the answer to almost all Java
questions lies within the source code to Ant :)

> A query interpreter is
> used to instantiate the configured Java Query objects
> and populates them with settings from the XML in a
> generic fashion (using reflection) eg:
> ....
> <MoreLikeThis minNumberShouldMatch="3"
> maxQueryTerms="30">

We're back to MoreLikeThis - it's not currently a Query subclass.
How do you envision this sort of thing fitting in if it's not a Query?

> Do people feel this would be a worthwhile endeavour?

I think a way to get a query to/from XML is a good one. Perhaps the
XML serialization feature of JDK 1.4 (or is it 1.5?) is sufficient
for this? Maybe not though - and there are plenty of handy helpers
from just doing raw reflection tricks like Ant, to using something
like Digester or Castor. I wouldn't recommend reinventing the XML de/
serialization aspect of this.

> I'm not sure if enough people feel pain around the
> points 1-3 outlined above to make it worth pursuing.

I don't see where I would use this capability just yet, but I do see
it as useful in the contexts you provided.

I'd also be interested in effort towards an Apple-like query builder.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yseeley at gmail

Dec 2, 2005, 6:57 PM

Post #7 of 83 (5768 views)
Permalink
Re: "Advanced" query language [In reply to]

Just as a clarification, human-readable strings for queries are
essential for how we do things at CNET.

In addition to Mark's comments:
- standard logging mechanisms such as the access log of a app server
are readable
- easily human typable one-off queries during development and for
troubleshooting + support are essential.
- the speed at which a query can be parsed is important... in some
systems, it's part of the transfer syntax from client to server and is
an integral part of the system (again, analogy to SQL).

That doesn't mean I fully support the XML idea, nor am I ready to
abandon the current query syntax. I have contemplated XML in the past
as a way to support templating of queries... a way for a user to say,
when someone queries field "x", expand this to this type of
arbitrarily comples query involving fields a,s,d,f. There might be a
place for both LXQ (Lucene XML Query?) and the current query syntax.

My (very long) todo list has support for DisjunctionMax and
minNrShouldMatch on it, and I have worked in JavaCC in the past (an
ASN.1 compiler, circa 1998). No timeline promises though. Also need
to look closer at Paul's surround query language... I looked very
briefly, but not enough to "get" it.

It would be nice to resolve/fix the whole "JavaCC using an exception
for flow control" issue too.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucenelist2005 at danielnaber

Dec 3, 2005, 5:17 AM

Post #8 of 83 (5746 views)
Permalink
Re: "Advanced" query language [In reply to]

On Samstag 03 Dezember 2005 03:57, Yonik Seeley wrote:

> It would be nice to resolve/fix the whole "JavaCC using an exception
> for flow control" issue too.

Did anybody have a look yet at javacc 4.0beta1, does it maybe fix that
problem?

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


erik at ehatchersolutions

Dec 3, 2005, 7:47 AM

Post #9 of 83 (5759 views)
Permalink
Re: "Advanced" query language [In reply to]

Rest assured that human-readable query expressions aren't going away
at all. I don't think Mark even implied that. The idea is to have a
way to communicate a query electronically in a precise way that
avoids parser syntax and the awkwardness this could have with
analysis. This seems reasonable.

I can't imagine humans typing <BooleanQuery><TermQuery field="title"
term="foobar"/><WildcardQuery field="body" expression="*foo*"/></

Erik


On Dec 2, 2005, at 9:57 PM, Yonik Seeley wrote:
> Just as a clarification, human-readable strings for queries are
> essential for how we do things at CNET.
>
> In addition to Mark's comments:
> - standard logging mechanisms such as the access log of a app server
> are readable
> - easily human typable one-off queries during development and for
> troubleshooting + support are essential.
> - the speed at which a query can be parsed is important... in some
> systems, it's part of the transfer syntax from client to server and is
> an integral part of the system (again, analogy to SQL).
>
> That doesn't mean I fully support the XML idea, nor am I ready to
> abandon the current query syntax. I have contemplated XML in the past
> as a way to support templating of queries... a way for a user to say,
> when someone queries field "x", expand this to this type of
> arbitrarily comples query involving fields a,s,d,f. There might be a
> place for both LXQ (Lucene XML Query?) and the current query syntax.
>
> My (very long) todo list has support for DisjunctionMax and
> minNrShouldMatch on it, and I have worked in JavaCC in the past (an
> ASN.1 compiler, circa 1998). No timeline promises though. Also need
> to look closer at Paul's surround query language... I looked very
> briefly, but not enough to "get" it.
>
> It would be nice to resolve/fix the whole "JavaCC using an exception
> for flow control" issue too.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markharw00d at yahoo

Dec 3, 2005, 10:00 AM

Post #10 of 83 (5775 views)
Permalink
Re: "Advanced" query language [In reply to]

Erik Hatcher wrote:

> Rest assured that human-readable query expressions aren't going away
> at all. I don't think Mark even implied that.


That's right. The proposal is *not* to replace what is already there -
QueryParser will always have a useful role to play supporting the
"Google-like" query syntax familiar to millions.
I'd just like to see another full-featured query representation for the
reasons already outlined.

Picking up on some points raised:

Re: MoreLikeThis queries.
Yes, they can be usefully wrapped as queries (see attached simple
example). In fact it was my attempts at bastardising QueryParser to
support them that brought home it's limitations. I ended up with a
subclass hack that (mis)used the field name to parse a query string
"like:123" where 123 was a doc id. With the QueryParser syntax I was not
able to pass other parameters which MoreLikeThis could usefully use to
control the behaviour of this query type eg choice of fieldname(s) used,
max number of terms generated, minNumberShouldTerms to match etc etc.
This is not unusual, each query type has potentially multiple optional
parameters that tweak it's behaviour. If I don't have a query language
that names the parameters explicitly (say, XML) I end up having to
define what looks like a function with a long list of parameters: "like
(123,,,4,,,)". Ack.

Here's a psuedo-code example that throws together some of the more
obscure parts of Lucene not represented in the existing QueryParser as
an illustration of how this could look in a more wide-reaching parser.
Imagine the user has selected an example doc #44 as something they are
interested in, on the subject of "hockey" but they prefer to see
documents that don't talk about ice hockey

<BoostingQuery>
<MatchQuery>
<MoreLikeThisQuery percentTermsToMatch="0.25f"
docId="44">
<CompareField name="contents"/>
<CompareField name="title"/>
</MoreLikeThis>
</MatchQuery>
<DowngradeQuery demoteValue="0.5" >
<SimpleQuery defaultField="contents">
<queryText>"ice hockey" OR puck OR
rink</queryText>
</SimpleQuery>
</DowngradeQuery>
</BoostingQuery>

BoostingQuery is a class that can use a second query to demote the
results of a first query if it matches (see here:
http://wiki.apache.org/jakarta-lucene/CommunityContributions)
For this and other forms of query to be able to plug into new parser the
Query objects just need to adhere to bean conventions to be
automatically wired in an ANT/Spring like way using reflection.
For example, the implementation of BoostingQuery would need to have
getter/setter properties for "MatchQuery" and "downgradeQuery".
Note in this example that the existing QueryParser syntax is usefully
used in "SimpleQuery" to avoid making the XML too verbose.

There's much detail to be added in how this would work in practice but I
thought I'd post it here to show the general shape of one possible
direction.
Attachments: MoreLikeThisQuery.java (1.81 KB)


lucene-list at lucenedotnet

Dec 3, 2005, 12:18 PM

Post #11 of 83 (5760 views)
Permalink
RE: "Advanced" query language [In reply to]

Hi,

> From: Erik Hatcher [mailto:erik [at] ehatchersolutions]

> > <MoreLikeThis minNumberShouldMatch="3"
> > maxQueryTerms="30">
>
> We're back to MoreLikeThis - it's not currently a Query subclass.
> How do you envision this sort of thing fitting in if it's not a Query?

But MoreLikeThis class produces a Query. It's similar to google "define:"
search.
I think goolge handle such queries and then redirect search to somewhere.
And QueryParser can handle such searches too and use an alternative logic to
create Query.

For example, we can extend the QueryParser by special (syntax) handlers
which will be create the Query.

Something lke this:
------
class LikeHandler {};
LikeHandler likeHandler = new LikeHandler(...);
string queryString = "like:(red quick fox)";
Query q = QueryParser.parse(queryString, analyzer, likeHandler);
------

QueryParser scan the input, find special command (like:) and then find the
handler for this command.
If the handler exists the QP call it to create the Query.

Disadvantages are present.

Pasha Bizhan



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene-list at lucenedotnet

Dec 3, 2005, 12:37 PM

Post #12 of 83 (5764 views)
Permalink
RE: "Advanced" query language [In reply to]

Hi,

> From: markharw00d [mailto:markharw00d [at] yahoo]
> Re: MoreLikeThis queries.
> Yes, they can be usefully wrapped as queries (see attached simple
> example). In fact it was my attempts at bastardising QueryParser to
> support them that brought home it's limitations. I ended up with a
> subclass hack that (mis)used the field name to parse a query string
> "like:123" where 123 was a doc id. With the QueryParser
> syntax I was not able to pass other parameters which MoreLikeThis could
> usefully use to control the behaviour of this query type eg choice of
> fieldname(s) used, max number of terms generated, minNumberShouldTerms to
match etc etc.

With the _current_ QP syntax.

In refer to my previous letter about syntax handlers you would be able to
pass the parameters to handler.

string query = "like(param1, param2,...): (bla-bla-bla)";

A syntax of parameters isn't signifant to QP. QP do not need to know
anything about parameter's syntax.

string query="like(percentTermsToMatch="0.25f",docId="44",...):...
";
Or
string query="like(0.25f,44): ..."


> This is not unusual, each query type has potentially multiple
> optional
> parameters that tweak it's behaviour. If I don't have a query
> language
> that names the parameters explicitly (say, XML) I end up having to
> define what looks like a function with a long list of
> parameters: "like
> (123,,,4,,,)". Ack.

Exactly.

Pasha Bizhan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Dec 3, 2005, 3:12 PM

Post #13 of 83 (5760 views)
Permalink
Re: "Advanced" query language [In reply to]

On Saturday 03 December 2005 19:00, markharw00d wrote:
> Erik Hatcher wrote:
>
...
> parameters that tweak it's behaviour. If I don't have a query language
> that names the parameters explicitly (say, XML) I end up having to
> define what looks like a function with a long list of parameters: "like
> (123,,,4,,,)". Ack.
>

Indeed, this is a disadvantage of the "function call" syntax.
Human input of the arguments of more complex queries
best supported by a GUI, much like existing SQL query
front ends.
Would it be possible to privide such a GUI automatically
(by introspection) given a set of Query classes of which objects
can be mixed to form a query?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


yseeley at gmail

Dec 3, 2005, 8:17 PM

Post #14 of 83 (5775 views)
Permalink
Re: "Advanced" query language [In reply to]

On 12/3/05, Paul Elschot <paul.elschot [at] xs4all> wrote:
> Indeed, this is a disadvantage of the "function call" syntax.

It depends on the langage. Take Python for example:

>>> def foo(a,b): print a,b
>>> foo(1,2)
1 2
>>> foo(a=1,b=2)
1 2
>>> foo(b=2,a=1)
1 2
>>>


-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markharw00d at yahoo

Dec 4, 2005, 1:03 AM

Post #15 of 83 (5752 views)
Permalink
Re: "Advanced" query language [In reply to]

Paul Elschot wrote:

>Would it be possible to privide such a GUI automatically
>(by introspection) given a set of Query classes of which objects
>can be mixed to form a query?
>
>

Certainly possible - I've seen app servers with automatic GUI test
clients which can introspect an EJB interface and let you construct
instances of the data objects that need to be passed. As generic tools
they can be clunky to use so it's definitely a developer-level tool,
(Luke ?) not an end-user level tool. I wonder if it's worth considering
when developers have IDEs with decent autocomplete/integrated Javadoc
hints.

If you were to provide an end user-friendly generic client I suspect
you'd need metadata about not just the Query objects but also the
documents in the index e.g to offer drop-down lists of values for
certain fields in the GUI. Again, possible, but you'd have to ask
yourself if it would just be simpler to code a custom GUI for your users
in each case.





___________________________________________________________
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Dec 4, 2005, 3:52 AM

Post #16 of 83 (5761 views)
Permalink
Re: "Advanced" query language [In reply to]

On Sunday 04 December 2005 05:17, Yonik Seeley wrote:
> On 12/3/05, Paul Elschot <paul.elschot [at] xs4all> wrote:
> > Indeed, this is a disadvantage of the "function call" syntax.
>
> It depends on the langage. Take Python for example:
>
> >>> def foo(a,b): print a,b
> >>> foo(1,2)
> 1 2
> >>> foo(a=1,b=2)
> 1 2
> >>> foo(b=2,a=1)
> 1 2
> >>>

I tried rewroting the XML query in exactly this way, with a
few property=.. constructs:

boostingQuery(
matchQuery=moreLikeThis(
percentTermsToMatch="0.25",
docId="44",
compareField("contents"),
compareField("title")),
downGradeQuery=simpleQuery("contents")
....
etc.

But then I concluded that a GUI would be better for human input.
Nonetheless, this syntax is simpler than XML, so it might
be more acceptable than XML for human input.
When the property=... syntax is optional, (as it is in python),
and when meaningfull abbreviations for the longNames above can be found,
it might actually be feasible.

The problem is that query language operators form queries and have
properties and subqueries with possibly different roles.
The subqueries cause the need for nesting and the properties and roles
cause the need for the property=... syntax.

XML already has the property=... syntax, and there are good GUI's available
for manually creating nested XML constructs.
Also I think we can safely assume that the users that can benefit from
more complex query facilities will be able to provide queries in XML.

I don't know XML that well. Does it have a facility to allow different roles
for nested constructs?

That only leaves the longNames in the examples above, but these
can be avoided by allowing short forms.

So I think using XML for an advanced query language is a good idea when:
- short forms are provided for the most common names to be used,
much like <a href="...">...</a>, <p> and <h3>...</h3> in HTML, and
- it has an easy to use facility to allow different roles for
nested constructs.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


erik at ehatchersolutions

Dec 4, 2005, 6:26 AM

Post #17 of 83 (5759 views)
Permalink
Re: "Advanced" query language [In reply to]

On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote:
> I tried rewroting the XML query in exactly this way, with a
> few property=.. constructs:
>
> boostingQuery(
> matchQuery=moreLikeThis(
> percentTermsToMatch="0.25",
> docId="44",
> compareField("contents"),
> compareField("title")),
> downGradeQuery=simpleQuery("contents")
> ....
> etc.
>
> But then I concluded that a GUI would be better for human input.
> Nonetheless, this syntax is simpler than XML, so it might
> be more acceptable than XML for human input.

I cannot at all fathom a use case where anything like this would be
human enterable. I realize, Paul, that you're after a human-
enterable syntax that can create sophisticated queries, but XML
certainly is not appropriate, or even a short-cut of XML (see YAML -
http://www.yaml.org/). It's a shame there isn't (that I can find) a
decent YAML parser in Java.

Almost all users want to enter "words separated by spaces", and very
little else. QueryParser succeeds fine for this purpose.

I think we should focus on the machine-to-machine use case of
communicating a Query in this discussion.

> The problem is that query language operators form queries and have
> properties and subqueries with possibly different roles.
> The subqueries cause the need for nesting and the properties and roles
> cause the need for the property=... syntax.

> I don't know XML that well. Does it have a facility to allow
> different roles
> for nested constructs?

I'm not following what you mean by different roles. Could you
provide an example.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Dec 4, 2005, 8:02 AM

Post #18 of 83 (5763 views)
Permalink
Re: "Advanced" query language [In reply to]

On Sunday 04 December 2005 15:26, Erik Hatcher wrote:
>
> On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote:
> > I tried rewroting the XML query in exactly this way, with a
> > few property=.. constructs:
> >
> > boostingQuery(
> > matchQuery=moreLikeThis(
> > percentTermsToMatch="0.25",
> > docId="44",
> > compareField("contents"),
> > compareField("title")),
> > downGradeQuery=simpleQuery("contents")
> > ....
> > etc.
> >
> > But then I concluded that a GUI would be better for human input.
> > Nonetheless, this syntax is simpler than XML, so it might
> > be more acceptable than XML for human input.
>
> I cannot at all fathom a use case where anything like this would be
> human enterable. I realize, Paul, that you're after a human-
> enterable syntax that can create sophisticated queries, but XML
> certainly is not appropriate, or even a short-cut of XML (see YAML -
> http://www.yaml.org/). It's a shame there isn't (that I can find) a
> decent YAML parser in Java.

Are there XML editors that can limit their output to a given stylesheet?
In that case one only needs to predefine a style sheet for queries.

> Almost all users want to enter "words separated by spaces", and very
> little else. QueryParser succeeds fine for this purpose.

Those are not the users that I'm thinking of.

> I think we should focus on the machine-to-machine use case of
> communicating a Query in this discussion.

That's ok, but when a few simple constraints are enough to
make it useful for humans that need the extra query power
enough to be willing to enter more syntax, then why not?

> > The problem is that query language operators form queries and have
> > properties and subqueries with possibly different roles.
> > The subqueries cause the need for nesting and the properties and roles
> > cause the need for the property=... syntax.
>
> > I don't know XML that well. Does it have a facility to allow
> > different roles
> > for nested constructs?
>
> I'm not following what you mean by different roles. Could you
> provide an example.

For example the clauses in a boolean query can have these roles:
required, optional, and excluded.
Thinking about it, this would probably map to sth like:

<BooleanQuery>
<BooleanClause role="required">
<SomeSubQuery/>
</BooleanClause>
<!-- more clauses -->
</BooleanQuery>

Is it possible in XML to predefine rc so that <rc>...</rc> means:
<BooleanClause role="required">...</BooleanClause> ?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


erik at ehatchersolutions

Dec 4, 2005, 8:59 AM

Post #19 of 83 (5783 views)
Permalink
Re: "Advanced" query language [In reply to]

On Dec 4, 2005, at 11:02 AM, Paul Elschot wrote:
> Are there XML editors that can limit their output to a given
> stylesheet?
> In that case one only needs to predefine a style sheet for queries.

Yes, there are many sophisticated XML editors. I'm not quite sure
where you're going with this though.

>> Almost all users want to enter "words separated by spaces", and very
>> little else. QueryParser succeeds fine for this purpose.
>
> Those are not the users that I'm thinking of.

Those are some highly specialized users :)

>> I think we should focus on the machine-to-machine use case of
>> communicating a Query in this discussion.
>
> That's ok, but when a few simple constraints are enough to
> make it useful for humans that need the extra query power
> enough to be willing to enter more syntax, then why not?

I agree with the sentiment, truly. And I'm quite open to QueryParser
itself being expanded to support more sophisticated queries if the
additional syntax still allows the more common simpler TermQuery/
PhraseQuery/BooleanQuery cases.

I just don't think its very practical to come up with such syntax and
have any kind of consensus on it across the majority of Lucene
users. QueryParser is surely embedded in many applications and
exposing querying capability that the application developers may not
be aware is possible. Field selection itself is even questionable in
the general sense.

In short, QueryParser is a double-edged sword - powerful, but perhaps
too powerful. Simple in one sense, but too complicated when digging
deeper. I could almost be bold enough to claim that each application
should build this kind of parsing in a custom way.

>>> I don't know XML that well. Does it have a facility to allow
>>> different roles
>>> for nested constructs?
>>
>> I'm not following what you mean by different roles. Could you
>> provide an example.
>
> For example the clauses in a boolean query can have these roles:
> required, optional, and excluded.
> Thinking about it, this would probably map to sth like:
>
> <BooleanQuery>
> <BooleanClause role="required">
> <SomeSubQuery/>
> </BooleanClause>
> <!-- more clauses -->
> </BooleanQuery>
>
> Is it possible in XML to predefine rc so that <rc>...</rc> means:
> <BooleanClause role="required">...</BooleanClause> ?

Only via DTD/XSD could this be done, but that is way overkill and too
complicated for our purposes. For the general XML case, make it
verbose specifying everything with no shortcuts like this. The flags
should be spelled out explicitly for <BooleanClause>. If someone
wants a shortcut XML syntax, that is where XSLT could come in, possibly.

"role" seems too generic here. I recommend something like
occur="must/should/mustnot" which maps to the API more precisely.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markharw00d at yahoo

Dec 4, 2005, 1:32 PM

Post #20 of 83 (5745 views)
Permalink
Re: "Advanced" query language [In reply to]

I think I'm with Erik on this - I generally don't see end users keen to
type anything other than "words with spaces" as queries.
I do see them commonly using GUI forms with multiple inputs and behind
the scenes application code assembling the query - the same way just
about every web app in the world has forms that create SQL on the user's
behalf.
Like SQL, I do see this proposed new query syntax as a language for
developers.

Aside from the debate over choice of query syntax we would also need to
consider the impact such a language has on the query objects it
instantiates.
I like the Spring/Ant approach which uses reflection to wire up beans
generically because this allows new objects to be plugged in to the
framework without having to rewrite the parser.
This "generic wirer" approach requires the wirable objects to obey
JavaBean conventions (zero arg constructor and public getters/setters
for properties). Many existing Lucene Query objects have their mandatory
properties passed into their constructors and so would not directly fit
into such a framework. I can see that changing existing query classes to
provide a no-arg constructor would be a contentious move because it
would make it possible for developers using them directly to mistakenly
instantiate Query objects without passing mandatory parameters. Perhaps
in these cases it would be better to preserve the existing class and
provide a "parser wrapper bean" used purely to integrate the existing
Query class with the new parser framework.



Cheers,
Mark



___________________________________________________________
Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


paul.elschot at xs4all

Dec 4, 2005, 2:02 PM

Post #21 of 83 (5762 views)
Permalink
Re: "Advanced" query language [In reply to]

On Sunday 04 December 2005 22:32, markharw00d wrote:
> I think I'm with Erik on this - I generally don't see end users keen to
> type anything other than "words with spaces" as queries.

I think/hope that XSL allows a simplified front end that would fit
my needs.

> I do see them commonly using GUI forms with multiple inputs and behind
> the scenes application code assembling the query - the same way just
> about every web app in the world has forms that create SQL on the user's
> behalf.
> Like SQL, I do see this proposed new query syntax as a language for
> developers.
>
> Aside from the debate over choice of query syntax we would also need to
> consider the impact such a language has on the query objects it
> instantiates.

For the surround language I made a layer of classes between the
parser and Lucene in the org.apache.lucene.queryParser.surround.query
package. This layer exists mainly because not all the operators in
the surround language directly match to Lucene classes. Also,
term expansion of truncations is refactored more than in Lucene
to allow for a maximum on the total number of expanded terms,
regardless of the query structure.
I don't know whether such a layer would be needed for an xml
based parser.

> I like the Spring/Ant approach which uses reflection to wire up beans
> generically because this allows new objects to be plugged in to the
> framework without having to rewrite the parser.
> This "generic wirer" approach requires the wirable objects to obey
> JavaBean conventions (zero arg constructor and public getters/setters
> for properties). Many existing Lucene Query objects have their mandatory
> properties passed into their constructors and so would not directly fit
> into such a framework. I can see that changing existing query classes to
> provide a no-arg constructor would be a contentious move because it
> would make it possible for developers using them directly to mistakenly
> instantiate Query objects without passing mandatory parameters. Perhaps
> in these cases it would be better to preserve the existing class and
> provide a "parser wrapper bean" used purely to integrate the existing
> Query class with the new parser framework.

That sounds like some good reasons for a layer between the parser
and Lucene.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


andy.hind at alfresco

Dec 5, 2005, 2:46 AM

Post #22 of 83 (5765 views)
Permalink
RE: "Advanced" query language [In reply to]

Hi

So far I have seen no mention of XPath like queries.

Not that I am a huge fan here, but it would give a standard query
language and standard parser (jaxen saxpath maybe). The disadvantage is
wrapping stuff up as functions, as already discussed. Adding functions
is OK.

E.g.

//*[termQuery(@my:title, 'foobar') and wildCardQuery(@my:body, '*foo*')]

And you could always have one function that does a simple parse on the
second argument.

I think it would be good to keep track of JSR 170 (as it contains
content and meta-data related query stuff) and the follow up JSR 283.
I am not sure if you are involved in any way with the latter.

There are some ideas I would like to see hidden in advanced queries:

1) Already mentioned, support for queries that need two independent
passes of the index followed by an intersection on doc id.


This may be one too far:

2) Backend support for queries that do a join. Intersection based on
fields with some kind of optimization. Yes, you can index the
denormalised document, but you then have to keep track of the document
fields you have denormalised. Yes, you can do two passes but you have to
extract all the document information. Maybe this is too far into the
database zone and query optimization issues.

Regards

Andy



-----Original Message-----
From: mark harwood [mailto:markharw00d [at] yahoo]
Sent: 02 December 2005 15:03
To: java-dev [at] lucene
Subject: "Advanced" query language

There seems to be a growing gap between Lucene
functionality and the query language offered by
QueryParser (eg no support for regex queries, span
queries, "more like this", filter queries,
minNumShouldMatch etc etc).

Closing this gap is hard when:
a) The availability of Javacc+Lucene skills is a
bottleneck
b) The syntax of the query language makes it difficult
to add new features eg rapidly running out of "special
characters"

I don't think extending the existing query
parser/language is necessarily useful and I see it
being used purely to support the classic "simple
search engine" syntax.

Unfortunately the fall-back position for applications
which require more complex queries is to "just write
some Java code to instantiate the Query objects
programmatically." This is OK but I think there is
value in having an advanced search syntax capable of
supporting the latest Lucene features and expressed in
XML. It's worth considering why it's useful to have a
String-representable form for queries:
1) Queries can be stored eg in audit logs or "saved
queries" used for tasks like auto-categorization
2) Clients built in languages other than Java can
issue queries to a Lucene server
3) I can decouple a request from the code that
implements the query when distributing software e.g my
applet may not want Lucene dragging down to the client

Currently we cannot easily do the above for any
"complex" queries because they are not easily
persisted (yes, we could serialize Query objects but
that seems messy and does not solve points 2 and 3).

We can potentially use XML in the same way ANT does
i.e. a declarative way of invoking an extensible list
of Java-implemented features. A query interpreter is
used to instantiate the configured Java Query objects
and populates them with settings from the XML in a
generic fashion (using reflection) eg:
....
<MoreLikeThis minNumberShouldMatch="3"
maxQueryTerms="30">
<text>
Lorem ipsum dolor sit amet, consectetuer
adipiscing
elit. Morbi eget ante blandit quam faucibus
posuere. Vivamus
porta, elit fringilla venenatis consequat, neque
lectus
gravida dolor, sed cursus nunc elit non lorem.
Nullam congue
orci id eros. Nunc aliquet posuere enim.
</text>
</MoreLikeThis>
</BooleanClause>

Do people feel this would be a worthwhile endeavour?
I'm not sure if enough people feel pain around the
points 1-3 outlined above to make it worth pursuing.


Cheers
Mark




___________________________________________________________
How much free photo storage do you get? Store your holiday
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


mamcxyz at gmail

Dec 5, 2005, 7:13 AM

Post #23 of 83 (5756 views)
Permalink
Re: "Advanced" query language [In reply to]

From my work in port Lucene to Delphi I think that have a LQL (Lucene Query
Language) is a valuable idea, but I consider that put it in XML is not
expresive enough for this...

Anyway, I think that going the way of SQL or OCL can be better.... is a more
clear syntax have:

QUERY title,date,size, content WHERE (title LIKE 'foo*' OR size>=0)

and like in OCL, the functions can be injected for better flexibility...

or maybe embeb a Scripting engine? that can be more usefull, and can be used
for easy extend other things apart of the query language..


--
Mario Alejandro Montoya
MCP
www.solucionesvulcano.com
!Obtenga su sitio Web dinámico!


yseeley at gmail

Dec 5, 2005, 7:38 AM

Post #24 of 83 (5751 views)
Permalink
Re: "Advanced" query language [In reply to]

On 12/5/05, Mario Alejandro M. <mamcxyz [at] gmail> wrote:
> or maybe embeb a Scripting engine? that can be more usefull, and can be used
> for easy extend other things apart of the query language..

I looked into this a year ago... most scripting languages have an
emphasis on script execution speed, not script parsing speed (which is
what we would need). The scripting languages I tried were horribly
slow at parsing a small script. The only one that could parse at a
reasonable speed was rhino (javascript) in interp mode.

-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jch at scalix

Dec 5, 2005, 8:56 AM

Post #25 of 83 (5751 views)
Permalink
Re: "Advanced" query language [In reply to]

Yonik Seeley wrote:

>I looked into this a year ago... most scripting languages have an
>emphasis on script execution speed, not script parsing speed (which is
>what we would need). The scripting languages I tried were horribly
>slow at parsing a small script. The only one that could parse at a
>reasonable speed was rhino (javascript) in interp mode.
>
>
I've always found the lisp syntax very easy to parse. In this case,
it's just prefix with the nam of he operator being first in the list, eg
(and "eggs" "oranges"). There are wrinkles for named and optional
parameters, but the basic syntax is a doddle.

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

First page Previous page 1 2 3 4 Next page Last page  View All Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.