
erik at ehatchersolutions
Jun 20, 2007, 3:22 AM
Post #7 of 9
(3000 views)
Permalink
|
|
Re: Can't render html entities when adding documents
[In reply to]
|
|
Thiago, I'll have to look late this week/weekend if I get a chance then, but how did acts_as_solr create the XML passed to Solr? I think you used my original hack for that communication which used REXML, right? solr-ruby now supports both REXML and libxml2 - and I've found that libxml2 does things properly whereas REXML was screwing things up. I suspect we can come up with a simple test case that shows where things are wacky. If you can submit one of those I'll be glad to look into this as soon as I can (this weekend at the earliest). Erik On Jun 20, 2007, at 2:06 AM, Thiago Jackiw wrote: > Replying to my own post, I just tried with solr 1.2 with the last 2 > previous versions of acts_as_solr and it worked great, so I'm pretty > sure this is a solr-ruby issue. I'll do some more testing with the way > solr-ruby adds documents to Solr. > > -- > Thiago Jackiw > acts_as_solr => http://acts-as-solr.railsfreaks.com > > > On 6/19/07, Thiago Jackiw <tjackiw[at]gmail.com> wrote: >> What's interesting is that on the previous versions of acts_as_solr >> (without solr-ruby) the html entities where getting indexed fine >> without passing through ERB's html_escape method. That's that I >> did as >> a fast fix before starting this thread. >> >> Did anything change in Solr 1.2 in regards to xml parsing? And I >> guess >> I should try the previous version of the acts_as_solr plugin with >> Solr >> 1.2 to see if I get the same error. >> >> -- >> Thiago Jackiw >> acts_as_solr => http://acts-as-solr.railsfreaks.com >> >> >> On 6/19/07, Aaron Suggs <aaron[at]ktheory.com> wrote: >> > I'm was getting the same XmlPullParserException from solr while >> using >> > solr-ruby to index HTML. >> > >> > I solved things by running text through the html_escape() method in >> > ERB::Utils before submitting to Solr. >> > >> > In the console, the following generates the >> XmlPullParserException in >> > solr, which manifests itself as a Net::HTTPFatalError in solr-ruby: >> > >> > Solr::Connection.new(http://localhost:8083/solr, :autocommit => >> > :on).add(:id => 1, :value_t => ' ') >> > Net::HTTPFatalError: 500...XmlPullParserException... >> > >> > But escape_html (aliased as the h() method by default) characters >> > works like a charm: >> > >> > include ERB::Util >> > Solr::Connection.new(http://localhost:8083/solr, :autocommit => >> > :on).add(:id => 1, :value_t => h(' ')) >> > => true >> > >> > Subsequently, searching for strings like 'nbsp' returns hits on >> those >> > escaped entities, which may or may not be what you want: >> > >> Solr::Connection.new(SOLR_URL, :autocommit => :on).query >> ('value_t:nbsp').hits >> > => [{"score"=>10.771498, "id"=>1, "value_t"=>" "}] >> > >> > If you don't want searches for 'nbsp' to return all documents with >> > escaped non-breaking spaces, the solution lies in defining some new >> > fieldtype in solr/conf/schema.xml >> > >> > -Aaron Suggs >> > >> > On 6/19/07, Yonik Seeley <yonik[at]apache.org> wrote: >> > > On 6/19/07, Thiago Jackiw <tjackiw[at]gmail.com> wrote: >> > > > There's something funky with solr-ruby's xml processing when >> adding >> > > > documents, but I don't really know what it is yet. It can't >> process >> > > > html entities at all, not even an html blank space " ": >> > > >> > > nbsp is not a default XML entity. >> > > Try replacing it with   >> > > >> > > -Yonik >> > > >> > >>
|