
janisrocks007 at yahoo
Jan 1, 2009, 9:56 PM
Post #1 of 2
(659 views)
Permalink
|
|
Need help on a Lucene problem
|
|
Hi there, Am working on web based Job search application using Lucene.User on my site can search for jobs which are within a radius of 100 miles from say "Boston,MA" or any other location. Also, I need to show the search results sorted by "relevance"(ie. Score returned by lucene) in descending order. I'm using a 3rd party API to fetch all the cities within given radius of a city.This API returns me around 864 cities within 100 miles radius of "Boston,MA". I'm building the city/state Lucene query using the following logic which is part of my "BuildNearestCitiesQuery" method. Here nearestCities is a hashtable returned by the above API.It contains 864 cities with CityName ass key and StateCode as value. And finalQuery is a Lucene BooleanQuery object which contains other search criteria entered by the user like:skills,keywords,etc. /*code*/ foreach (string city in nearestCities.Keys) { BooleanQuery tempFinalQuery = finalQuery; cityStateQuery = new BooleanQuery(); queryCity = queryParserCity.Parse(city); queryState = queryParserState.Parse(((string[])nearestCities[city])[1]); cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); //must is like an AND cityStateQuery.Add(queryState, BooleanClause.Occur.MUST); } nearestCityQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD); //should is like an OR finalQuery.Add(nearestCityQuery, BooleanClause.Occur.MUST); /*code*/ I then input finalQuery object to Lucene's Search method to get all the jobs within 100 miles radius.: searcher.Search(finalQuery, collector); I found out this BuildNearestCitiesQuery method takes a whopping 29 seconds on an average to execute which obviously is unacceptable by any standards of a website.I also found out that the statements involving "Parse" take a considerable amount of time to execute as compared to other statements. A job for a given location is a dynamic attribute in the sense that a city could have 2 jobs(meeting a particular search criteria) today,but zero job for the same search criteria after 3 days.So,I cannot use any "Caching" over here. Is there any way I can optimize this logic?or for that matter my whole approach/algorithm towards finding all jobs within 100 miles using Lucene? FYI,here is how my indexing in Lucene looks like: doc.Add(new Field("jobId", job.JobID.ToString().Trim(), Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add(new Field("title", job.JobTitle.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); doc.Add(new Field("description", job.JobDescription.Trim(), Field.Store.NO, Field.Index.TOKENIZED)); doc.Add(new Field("city", job.City.Trim(), Field.Store.YES, Field.Index.TOKENIZED , Field.TermVector.YES)); doc.Add(new Field("state", job.StateCode.Trim(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES)); doc.Add(new Field("citystate", job.City.Trim() + ", " + job.StateCode.Trim(), Field.Store.YES, Field.Index.UN_TOKENIZED , Field.TermVector.YES)); doc.Add(new Field("datePosted", jobPostedDateTime, Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.Add(new Field("company", job.HiringCoName.Trim(), Field.Store.YES, Field.Index.TOKENIZED)); doc.Add(new Field("jobType", job.JobTypeID.ToString(), Field.Store.NO, Field.Index.UN_TOKENIZED,Field.TermVector.YES)); doc.Add(new Field("sector", job.SectorID.ToString(), Field.Store.NO, Field.Index.UN_TOKENIZED, Field.TermVector.YES)); doc.Add(new Field("showAllJobs", "yy", Field.Store.NO, Field.Index.UN_TOKENIZED)); Thanks a ton for reading!I would really appreciate your help on this. Janis -- View this message in context: http://www.nabble.com/Need-help-on-a-Lucene-problem-tp21248342p21248342.html Sent from the Lucene - General mailing list archive at Nabble.com.
|