Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Lucene search fails for japanese characters in URL

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


anand.sarwade at corp

Sep 17, 2008, 7:41 AM

Post #1 of 8 (753 views)
Permalink
Lucene search fails for japanese characters in URL

Hi ,

I am facing below problem. Please help me in this.

I have integrated CJK Analyzer for Japanese characters. I am able to save
japanese double byte characters in mysql database in UTF-8 format without
issues. I could that data is getted indexed. Now when i search the Japanese
characters which were indexed using the URL below , returns empty results.

http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&max=100&cap=言語

Noticed that the above url gets converted to the following URL having some
HTML encoded strings in search.

http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&max=100&cap=%E8%A8%80%E8%AA%9E

This does not match with the existing lucene indexes henceforth returns
empty results. How do i solve this lucene search issue having japanese
words in URLs.? Is there any way to convert such characters back to Japanese
words???

Any help/suggestions in this regards is highly appreciated.

Thanks in Advance.

Regards,
Anand

--
View this message in context: http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19533647.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jimi.hullegard at mogul

Sep 17, 2008, 7:49 AM

Post #2 of 8 (710 views)
Permalink
RE: Lucene search fails for japanese characters in URL [In reply to]

What webserver are you using? For example, with Tomcat, it could be because of the setting URIEncoding in server.xml.

http://tomcat.apache.org/tomcat-5.5-doc/config/http.html

/Jimi

mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30 stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 | jimi.hullegard [at] mogul | www.mogul.com


> -----Original Message-----
> From: anandsarwade [mailto:anand.sarwade [at] corp]
> Sent: den 17 september 2008 16:42
> To: java-user [at] lucene
> Subject: Lucene search fails for japanese characters in URL
>
>
> Hi ,
>
> I am facing below problem. Please help me in this.
>
> I have integrated CJK Analyzer for Japanese characters. I am
> able to save
> japanese double byte characters in mysql database in UTF-8
> format without
> issues. I could that data is getted indexed. Now when i
> search the Japanese
> characters which were indexed using the URL below , returns
> empty results.
>
> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> x=100&cap=言語
>
> Noticed that the above url gets converted to the following
> URL having some
> HTML encoded strings in search.
>
> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> x=100&cap=%E8%A8%80%E8%AA%9E
>
> This does not match with the existing lucene indexes
> henceforth returns
> empty results. How do i solve this lucene search issue
> having japanese
> words in URLs.? Is there any way to convert such characters
> back to Japanese
> words???
>
> Any help/suggestions in this regards is highly appreciated.
>
> Thanks in Advance.
>
> Regards,
> Anand
>
> --
> View this message in context:
> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
> ers-in-URL-tp19533647p19533647.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


anand.sarwade at corp

Sep 17, 2008, 8:12 AM

Post #3 of 8 (709 views)
Permalink
RE: Lucene search fails for japanese characters in URL [In reply to]

Hello Jimi,

Thanks a lot for your valuable suggestion.

I am using tomcat 5 . As per your suggestions ,checked the server.xml but
found that no URIEncoding was set.
I have set now and to my great relief :-) i could see the Lucene results on
my browser for japanese string with request objects in UTF-8 now.

Thanks again for your help.

Regards,
Anand.


JimiH wrote:
>
> What webserver are you using? For example, with Tomcat, it could be
> because of the setting URIEncoding in server.xml.
>
> http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
>
> /Jimi
>
> mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30
> stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
> jimi.hullegard [at] mogul | www.mogul.com
>
>
>> -----Original Message-----
>> From: anandsarwade [mailto:anand.sarwade [at] corp]
>> Sent: den 17 september 2008 16:42
>> To: java-user [at] lucene
>> Subject: Lucene search fails for japanese characters in URL
>>
>>
>> Hi ,
>>
>> I am facing below problem. Please help me in this.
>>
>> I have integrated CJK Analyzer for Japanese characters. I am
>> able to save
>> japanese double byte characters in mysql database in UTF-8
>> format without
>> issues. I could that data is getted indexed. Now when i
>> search the Japanese
>> characters which were indexed using the URL below , returns
>> empty results.
>>
>> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
>> x=100&cap=言語
>>
>> Noticed that the above url gets converted to the following
>> URL having some
>> HTML encoded strings in search.
>>
>> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
>> x=100&cap=%E8%A8%80%E8%AA%9E
>>
>> This does not match with the existing lucene indexes
>> henceforth returns
>> empty results. How do i solve this lucene search issue
>> having japanese
>> words in URLs.? Is there any way to convert such characters
>> back to Japanese
>> words???
>>
>> Any help/suggestions in this regards is highly appreciated.
>>
>> Thanks in Advance.
>>
>> Regards,
>> Anand
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
>> ers-in-URL-tp19533647p19533647.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>

--
View this message in context: http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yeshuangming at gmail

Sep 17, 2008, 5:43 PM

Post #4 of 8 (694 views)
Permalink
Re: Lucene search fails for japanese characters in URL [In reply to]

You must trace the string in each step!
Important step is get string from MYSQL and get parameter in servlet, please
check it, do you get the right string?
Chinese has the same problem too.

2008/9/17 anandsarwade <anand.sarwade [at] corp>

>
> Hello Jimi,
>
> Thanks a lot for your valuable suggestion.
>
> I am using tomcat 5 . As per your suggestions ,checked the server.xml but
> found that no URIEncoding was set.
> I have set now and to my great relief :-) i could see the Lucene results on
> my browser for japanese string with request objects in UTF-8 now.
>
> Thanks again for your help.
>
> Regards,
> Anand.
>
>
> JimiH wrote:
> >
> > What webserver are you using? For example, with Tomcat, it could be
> > because of the setting URIEncoding in server.xml.
> >
> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
> >
> > /Jimi
> >
> > mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30
> > stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
> > jimi.hullegard [at] mogul | www.mogul.com
> >
> >
> >> -----Original Message-----
> >> From: anandsarwade [mailto:anand.sarwade [at] corp]
> >> Sent: den 17 september 2008 16:42
> >> To: java-user [at] lucene
> >> Subject: Lucene search fails for japanese characters in URL
> >>
> >>
> >> Hi ,
> >>
> >> I am facing below problem. Please help me in this.
> >>
> >> I have integrated CJK Analyzer for Japanese characters. I am
> >> able to save
> >> japanese double byte characters in mysql database in UTF-8
> >> format without
> >> issues. I could that data is getted indexed. Now when i
> >> search the Japanese
> >> characters which were indexed using the URL below , returns
> >> empty results.
> >>
> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> x=100&cap=言語
> >>
> >> Noticed that the above url gets converted to the following
> >> URL having some
> >> HTML encoded strings in search.
> >>
> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> x=100&cap=%E8%A8%80%E8%AA%9E
> >>
> >> This does not match with the existing lucene indexes
> >> henceforth returns
> >> empty results. How do i solve this lucene search issue
> >> having japanese
> >> words in URLs.? Is there any way to convert such characters
> >> back to Japanese
> >> words???
> >>
> >> Any help/suggestions in this regards is highly appreciated.
> >>
> >> Thanks in Advance.
> >>
> >> Regards,
> >> Anand
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
> >> ers-in-URL-tp19533647p19533647.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Sorry for my english!! 明
Please help me to correct my english expression and error in syntax


anand.sarwade at corp

Sep 17, 2008, 11:57 PM

Post #5 of 8 (693 views)
Permalink
Re: Lucene search fails for japanese characters in URL [In reply to]

Hi,

I do get the same string from Mysql and also in servlet request. I could
observe the actaul string in eclipse while debugging. it is stored as UTF-8
format so retrievel is coming as stored.

plz let me know if iam not clear


叶双明 wrote:
>
> You must trace the string in each step!
> Important step is get string from MYSQL and get parameter in servlet,
> please
> check it, do you get the right string?
> Chinese has the same problem too.
>
> 2008/9/17 anandsarwade <anand.sarwade [at] corp>
>
>>
>> Hello Jimi,
>>
>> Thanks a lot for your valuable suggestion.
>>
>> I am using tomcat 5 . As per your suggestions ,checked the server.xml but
>> found that no URIEncoding was set.
>> I have set now and to my great relief :-) i could see the Lucene results
>> on
>> my browser for japanese string with request objects in UTF-8 now.
>>
>> Thanks again for your help.
>>
>> Regards,
>> Anand.
>>
>>
>> JimiH wrote:
>> >
>> > What webserver are you using? For example, with Tomcat, it could be
>> > because of the setting URIEncoding in server.xml.
>> >
>> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
>> >
>> > /Jimi
>> >
>> > mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30
>> > stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
>> > jimi.hullegard [at] mogul | www.mogul.com
>> >
>> >
>> >> -----Original Message-----
>> >> From: anandsarwade [mailto:anand.sarwade [at] corp]
>> >> Sent: den 17 september 2008 16:42
>> >> To: java-user [at] lucene
>> >> Subject: Lucene search fails for japanese characters in URL
>> >>
>> >>
>> >> Hi ,
>> >>
>> >> I am facing below problem. Please help me in this.
>> >>
>> >> I have integrated CJK Analyzer for Japanese characters. I am
>> >> able to save
>> >> japanese double byte characters in mysql database in UTF-8
>> >> format without
>> >> issues. I could that data is getted indexed. Now when i
>> >> search the Japanese
>> >> characters which were indexed using the URL below , returns
>> >> empty results.
>> >>
>> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
>> >> x=100&cap=言語
>> >>
>> >> Noticed that the above url gets converted to the following
>> >> URL having some
>> >> HTML encoded strings in search.
>> >>
>> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
>> >> x=100&cap=%E8%A8%80%E8%AA%9E
>> >>
>> >> This does not match with the existing lucene indexes
>> >> henceforth returns
>> >> empty results. How do i solve this lucene search issue
>> >> having japanese
>> >> words in URLs.? Is there any way to convert such characters
>> >> back to Japanese
>> >> words???
>> >>
>> >> Any help/suggestions in this regards is highly appreciated.
>> >>
>> >> Thanks in Advance.
>> >>
>> >> Regards,
>> >> Anand
>> >>
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
>> >> ers-in-URL-tp19533647p19533647.html
>> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>
> --
> Sorry for my english!! 明
> Please help me to correct my english expression and error in syntax
>
>

--
View this message in context: http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19547081.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yeshuangming at gmail

Sep 18, 2008, 12:36 AM

Post #6 of 8 (694 views)
Permalink
Re: Lucene search fails for japanese characters in URL [In reply to]

And, you can use Tool luke to see what is in the index indeed.
what is in the Query which put into IndexSearcher.search(), what is the
defaultOperatoer of QueryParser.

Can you get hits by setup a simple IndexSearcher, no through tomcat?

2008/9/18 anandsarwade <anand.sarwade [at] corp>

>
> Hi,
>
> I do get the same string from Mysql and also in servlet request. I could
> observe the actaul string in eclipse while debugging. it is stored as UTF-8
> format so retrievel is coming as stored.
>
> plz let me know if iam not clear
>
>
> 叶双明 wrote:
> >
> > You must trace the string in each step!
> > Important step is get string from MYSQL and get parameter in servlet,
> > please
> > check it, do you get the right string?
> > Chinese has the same problem too.
> >
> > 2008/9/17 anandsarwade <anand.sarwade [at] corp>
> >
> >>
> >> Hello Jimi,
> >>
> >> Thanks a lot for your valuable suggestion.
> >>
> >> I am using tomcat 5 . As per your suggestions ,checked the server.xml
> but
> >> found that no URIEncoding was set.
> >> I have set now and to my great relief :-) i could see the Lucene results
> >> on
> >> my browser for japanese string with request objects in UTF-8 now.
> >>
> >> Thanks again for your help.
> >>
> >> Regards,
> >> Anand.
> >>
> >>
> >> JimiH wrote:
> >> >
> >> > What webserver are you using? For example, with Tomcat, it could be
> >> > because of the setting URIEncoding in server.xml.
> >> >
> >> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
> >> >
> >> > /Jimi
> >> >
> >> > mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113 30
> >> > stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
> >> > jimi.hullegard [at] mogul | www.mogul.com
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: anandsarwade [mailto:anand.sarwade [at] corp]
> >> >> Sent: den 17 september 2008 16:42
> >> >> To: java-user [at] lucene
> >> >> Subject: Lucene search fails for japanese characters in URL
> >> >>
> >> >>
> >> >> Hi ,
> >> >>
> >> >> I am facing below problem. Please help me in this.
> >> >>
> >> >> I have integrated CJK Analyzer for Japanese characters. I am
> >> >> able to save
> >> >> japanese double byte characters in mysql database in UTF-8
> >> >> format without
> >> >> issues. I could that data is getted indexed. Now when i
> >> >> search the Japanese
> >> >> characters which were indexed using the URL below , returns
> >> >> empty results.
> >> >>
> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> >> x=100&cap=言語
> >> >>
> >> >> Noticed that the above url gets converted to the following
> >> >> URL having some
> >> >> HTML encoded strings in search.
> >> >>
> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> >> x=100&cap=%E8%A8%80%E8%AA%9E
> >> >>
> >> >> This does not match with the existing lucene indexes
> >> >> henceforth returns
> >> >> empty results. How do i solve this lucene search issue
> >> >> having japanese
> >> >> words in URLs.? Is there any way to convert such characters
> >> >> back to Japanese
> >> >> words???
> >> >>
> >> >> Any help/suggestions in this regards is highly appreciated.
> >> >>
> >> >> Thanks in Advance.
> >> >>
> >> >> Regards,
> >> >> Anand
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
> >> >> ers-in-URL-tp19533647p19533647.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> >
> > --
> > Sorry for my english!! 明
> > Please help me to correct my english expression and error in syntax
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19547081.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Sorry for my english!! 明
Please help me to correct my english expression and error in syntax


anand.sarwade at corp

Sep 18, 2008, 3:22 AM

Post #7 of 8 (691 views)
Permalink
Re: Lucene search fails for japanese characters in URL [In reply to]

This Luke tool seems to be pretty cool. I have installed and its very easy to
find out the indexes and what is being stored. thanks for this info.

I have tried in tomcat and things works fine without issues. Default
operator is OR in my case. i havent tried with setting up stanalone
indexsearcher but i believe it should work. Please let me know if any
issues.


叶双明 wrote:
>
> And, you can use Tool luke to see what is in the index indeed.
> what is in the Query which put into IndexSearcher.search(), what is the
> defaultOperatoer of QueryParser.
>
> Can you get hits by setup a simple IndexSearcher, no through tomcat?
>
> 2008/9/18 anandsarwade <anand.sarwade [at] corp>
>
>>
>> Hi,
>>
>> I do get the same string from Mysql and also in servlet request. I could
>> observe the actaul string in eclipse while debugging. it is stored as
>> UTF-8
>> format so retrievel is coming as stored.
>>
>> plz let me know if iam not clear
>>
>>
>> 叶双明 wrote:
>> >
>> > You must trace the string in each step!
>> > Important step is get string from MYSQL and get parameter in servlet,
>> > please
>> > check it, do you get the right string?
>> > Chinese has the same problem too.
>> >
>> > 2008/9/17 anandsarwade <anand.sarwade [at] corp>
>> >
>> >>
>> >> Hello Jimi,
>> >>
>> >> Thanks a lot for your valuable suggestion.
>> >>
>> >> I am using tomcat 5 . As per your suggestions ,checked the server.xml
>> but
>> >> found that no URIEncoding was set.
>> >> I have set now and to my great relief :-) i could see the Lucene
>> results
>> >> on
>> >> my browser for japanese string with request objects in UTF-8 now.
>> >>
>> >> Thanks again for your help.
>> >>
>> >> Regards,
>> >> Anand.
>> >>
>> >>
>> >> JimiH wrote:
>> >> >
>> >> > What webserver are you using? For example, with Tomcat, it could be
>> >> > because of the setting URIEncoding in server.xml.
>> >> >
>> >> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
>> >> >
>> >> > /Jimi
>> >> >
>> >> > mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113
>> 30
>> >> > stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
>> >> > jimi.hullegard [at] mogul | www.mogul.com
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: anandsarwade [mailto:anand.sarwade [at] corp]
>> >> >> Sent: den 17 september 2008 16:42
>> >> >> To: java-user [at] lucene
>> >> >> Subject: Lucene search fails for japanese characters in URL
>> >> >>
>> >> >>
>> >> >> Hi ,
>> >> >>
>> >> >> I am facing below problem. Please help me in this.
>> >> >>
>> >> >> I have integrated CJK Analyzer for Japanese characters. I am
>> >> >> able to save
>> >> >> japanese double byte characters in mysql database in UTF-8
>> >> >> format without
>> >> >> issues. I could that data is getted indexed. Now when i
>> >> >> search the Japanese
>> >> >> characters which were indexed using the URL below , returns
>> >> >> empty results.
>> >> >>
>> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
>> >> >> x=100&cap=言語
>> >> >>
>> >> >> Noticed that the above url gets converted to the following
>> >> >> URL having some
>> >> >> HTML encoded strings in search.
>> >> >>
>> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
>> >> >> x=100&cap=%E8%A8%80%E8%AA%9E
>> >> >>
>> >> >> This does not match with the existing lucene indexes
>> >> >> henceforth returns
>> >> >> empty results. How do i solve this lucene search issue
>> >> >> having japanese
>> >> >> words in URLs.? Is there any way to convert such characters
>> >> >> back to Japanese
>> >> >> words???
>> >> >>
>> >> >> Any help/suggestions in this regards is highly appreciated.
>> >> >>
>> >> >> Thanks in Advance.
>> >> >>
>> >> >> Regards,
>> >> >> Anand
>> >> >>
>> >> >> --
>> >> >> View this message in context:
>> >> >> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
>> >> >> ers-in-URL-tp19533647p19533647.html
>> >> >> Sent from the Lucene - Java Users mailing list archive at
>> Nabble.com.
>> >> >>
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> >> For additional commands, e-mail: java-user-help [at] lucene
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
>> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >>
>> >
>> >
>> > --
>> > Sorry for my english!! 明
>> > Please help me to correct my english expression and error in syntax
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19547081.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
>
>
> --
> Sorry for my english!! 明
> Please help me to correct my english expression and error in syntax
>
>

--
View this message in context: http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19549854.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


yeshuangming at gmail

Sep 18, 2008, 5:53 PM

Post #8 of 8 (680 views)
Permalink
Re: Lucene search fails for japanese characters in URL [In reply to]

I still suggest you to setup and test a standalone IndexSearcher though you
believe it should work.
If it work, and tomcat get the right parameter, sorry, i don't know what is
the problem.

2008/9/18 anandsarwade <anand.sarwade [at] corp>

>
> This Luke tool seems to be pretty cool. I have installed and its very easy
> to
> find out the indexes and what is being stored. thanks for this info.
>
> I have tried in tomcat and things works fine without issues. Default
> operator is OR in my case. i havent tried with setting up stanalone
> indexsearcher but i believe it should work. Please let me know if any
> issues.
>
>
> 叶双明 wrote:
> >
> > And, you can use Tool luke to see what is in the index indeed.
> > what is in the Query which put into IndexSearcher.search(), what is the
> > defaultOperatoer of QueryParser.
> >
> > Can you get hits by setup a simple IndexSearcher, no through tomcat?
> >
> > 2008/9/18 anandsarwade <anand.sarwade [at] corp>
> >
> >>
> >> Hi,
> >>
> >> I do get the same string from Mysql and also in servlet request. I could
> >> observe the actaul string in eclipse while debugging. it is stored as
> >> UTF-8
> >> format so retrievel is coming as stored.
> >>
> >> plz let me know if iam not clear
> >>
> >>
> >> 叶双明 wrote:
> >> >
> >> > You must trace the string in each step!
> >> > Important step is get string from MYSQL and get parameter in servlet,
> >> > please
> >> > check it, do you get the right string?
> >> > Chinese has the same problem too.
> >> >
> >> > 2008/9/17 anandsarwade <anand.sarwade [at] corp>
> >> >
> >> >>
> >> >> Hello Jimi,
> >> >>
> >> >> Thanks a lot for your valuable suggestion.
> >> >>
> >> >> I am using tomcat 5 . As per your suggestions ,checked the server.xml
> >> but
> >> >> found that no URIEncoding was set.
> >> >> I have set now and to my great relief :-) i could see the Lucene
> >> results
> >> >> on
> >> >> my browser for japanese string with request objects in UTF-8 now.
> >> >>
> >> >> Thanks again for your help.
> >> >>
> >> >> Regards,
> >> >> Anand.
> >> >>
> >> >>
> >> >> JimiH wrote:
> >> >> >
> >> >> > What webserver are you using? For example, with Tomcat, it could be
> >> >> > because of the setting URIEncoding in server.xml.
> >> >> >
> >> >> > http://tomcat.apache.org/tomcat-5.5-doc/config/http.html
> >> >> >
> >> >> > /Jimi
> >> >> >
> >> >> > mogul | jimi hullegård | system developer | hudiksvallsgatan 4, 113
> >> 30
> >> >> > stockholm sweden | +46 8 506 66 172 | +46 765 27 19 55 |
> >> >> > jimi.hullegard [at] mogul | www.mogul.com
> >> >> >
> >> >> >
> >> >> >> -----Original Message-----
> >> >> >> From: anandsarwade [mailto:anand.sarwade [at] corp]
> >> >> >> Sent: den 17 september 2008 16:42
> >> >> >> To: java-user [at] lucene
> >> >> >> Subject: Lucene search fails for japanese characters in URL
> >> >> >>
> >> >> >>
> >> >> >> Hi ,
> >> >> >>
> >> >> >> I am facing below problem. Please help me in this.
> >> >> >>
> >> >> >> I have integrated CJK Analyzer for Japanese characters. I am
> >> >> >> able to save
> >> >> >> japanese double byte characters in mysql database in UTF-8
> >> >> >> format without
> >> >> >> issues. I could that data is getted indexed. Now when i
> >> >> >> search the Japanese
> >> >> >> characters which were indexed using the URL below , returns
> >> >> >> empty results.
> >> >> >>
> >> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> >> >> x=100&cap=言語
> >> >> >>
> >> >> >> Noticed that the above url gets converted to the following
> >> >> >> URL having some
> >> >> >> HTML encoded strings in search.
> >> >> >>
> >> >> >> http://xml.demo.myaol.jp:8082/portal/gallery-search?first=1&ma
> >> >> >> x=100&cap=%E8%A8%80%E8%AA%9E
> >> >> >>
> >> >> >> This does not match with the existing lucene indexes
> >> >> >> henceforth returns
> >> >> >> empty results. How do i solve this lucene search issue
> >> >> >> having japanese
> >> >> >> words in URLs.? Is there any way to convert such characters
> >> >> >> back to Japanese
> >> >> >> words???
> >> >> >>
> >> >> >> Any help/suggestions in this regards is highly appreciated.
> >> >> >>
> >> >> >> Thanks in Advance.
> >> >> >>
> >> >> >> Regards,
> >> >> >> Anand
> >> >> >>
> >> >> >> --
> >> >> >> View this message in context:
> >> >> >> http://www.nabble.com/Lucene-search-fails-for-japanese-charact
> >> >> >> ers-in-URL-tp19533647p19533647.html
> >> >> >> Sent from the Lucene - Java Users mailing list archive at
> >> Nabble.com.
> >> >> >>
> >> >> >>
> >> >> >>
> >> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19534342.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> >> For additional commands, e-mail: java-user-help [at] lucene
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Sorry for my english!! 明
> >> > Please help me to correct my english expression and error in syntax
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19547081.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> >
> > --
> > Sorry for my english!! 明
> > Please help me to correct my english expression and error in syntax
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Lucene-search-fails-for-japanese-characters-in-URL-tp19533647p19549854.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


--
Sorry for my english!! 明
Please help me to correct my english expression and error in syntax

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.