
wolfgang.hoschek at mac
Jan 4, 2006, 12:53 PM
Post #4 of 4
(856 views)
Permalink
|
If you'd consider using a MemoryIndex for this, I'd recommend also having a look at nux.xom.pool.FullTextUtil and nux.xom.pool.FullTextPool, adding smart caching for indexes, queries and results on top of a MemoryIndex. With some luck this (or some variant of it) could help speed up your use cases, at least as far as I gather. [It's part of the Nux download] Wolfgang. Snippet from the javadoc: /** * Thread-safe XQuery/XPath fulltext search utilities; implemented with the * Lucene engine and a custom high-performance adapter for * on-the-fly main memory indexing with smart caching for indexes, queries and results. * <p> * Complementing the standard XPath string and regular * expression matching functionality, Lucene has a powerful query syntax with support * for word stemming, fuzzy searches, similarity searches, approximate searches, * boolean operators, wildcards, grouping, range searches, term boosting, etc. * For details see the <a target="_blank" * href="http://lucene.apache.org/java/docs/ queryparsersyntax.html">Lucene Query * Syntax and Examples</a>. * Also see {@link org.apache.lucene.index.memory.MemoryIndex} * and {@link PatternAnalyzer} for detailed documentation. * <p> * Example Java usage: * <pre> * Analyzer analyzer = PatternAnalyzer.DEFAULT_ANALYZER; * float score = FullTextUtil.match( * "Readings about Salmons and other select Alaska fishing Manuals", * "+salmon~ +fish* manual~", * analyzer, analyzer); * if (score > 0.0f) { * // query matches text * } else { * // query does not match text * } * </pre> On Jan 4, 2006, at 6:03 AM, karl wettin wrote: > Hello list, > > I wrote a search agent thingy for Lucene. It was built to handle > huge amounts of agents. > > Rather than one query per agent to find out if the new document is > interesting or not, agent trigger queries are stored in an index > that is queried with the tokens of a new document. > > Since it uses the index a bit backwards the agent trigger queries > are somewhat limited: > > At least one token in a OR or FUZZY OR per agent field must match > the new document. > Any NOT token in agent must not match the new document. > > It is fairly easy to add more query types, but is limited to single > token and non-wildcard types since the query if created from the > new document tokens. > > Agents are clustered by required fields by agent, and each cluster > is stored in an own index. When a new document is sent to the > AgentManager it creates one query per possible cluster. I'm not > sure this actually speeds things up, just a gut feeling. > > Example agents in psuedo trigger query language: > > Possible agent: > > AND (OR ("category","media")) > AND (OR ("name", "hotel") OR ("name","rowanda")) > AND (NOT("name", "paradise")) > > Impossible agent: > > AND (OR ("category","media")) > AND (("name", "hotel") AND ("name","rowanda")) > AND (NOT("name", "paradise")) > > In effect the agents can't trigger on AND queries of the same field. > > One could of couse place a more complex query on the new document > as the agent triggers, use some classifier or whatever if speed is > not a big deal. The agent triggers could then be built from the > original query. I probably won't implement such a thing my self. > > Should I post the code to the sandbox when I've tested it? Are > there any restrictions to the code if I do that? > > -- > karl > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene > For additional commands, e-mail: java-dev-help [at] lucene > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene For additional commands, e-mail: java-dev-help [at] lucene
|