Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Mediawiki

Re: MediaWiki + Lucene-Search2 + MWSearch extension = ZERO search results

 

 

Wikipedia mediawiki RSS feed   Index | Next | Previous | View Threaded


agentdcooper at gmail

Jun 30, 2008, 9:18 AM

Post #1 of 4 (530 views)
Permalink
Re: MediaWiki + Lucene-Search2 + MWSearch extension = ZERO search results

Sorry everyone for the "repost" but this was but that was my first post to
the MediaWiki-l mailing list, and I guess I didn't fully understand that
HTML would get stripped out, so the last post was truly ugly, I could see
alot of potential assistance could be turned away due to the horridmessage
format, please forgive me and allow me this "correction repost". thanks!

To cut the chase, and get to the point = I've been trying for a *very* long
time now attempting to roll-out my own MediaWiki-based website with
full-text search capabilities using the Lucene extension, I only wish I
could get it to just plain *work*! but, alas : I am at a dead end.

I've read any/every-thing I can find on the internet about MediaWiki +
Lucene-search2 + MWSearch, but no luck so far - I keep running into the same
problem which is ZERO SEARCH RESULTS via the MediaWiki search engine AFTER
installing the Lucene-search2 daemon & MWSearch extension.

Prior to the installation of Lucene-search2 daemon & MWSearch extension, my
MediaWiki search worked without a hitch -- but I *need* the full-text search
capability Lucene brings along, it is *the solution I need*.

Which brings me to my problem, and my plea to the MedaWiki-l mailing list ;
When-ever I search with Lucene-search2 daemon & MWSearch extension
installed, I get ZERO search results (specifically the search "results" page
error reads = 'No page text matches') but after troubleshooting the issue, I
found by enabling MediaWiki debugging, my
/var/log/mediawiki/debug_svn_log.txt shows (summarizing here) ;

Fetching search data from
http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10
Http::request: GET
http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10
total [0] hits
OutputPage::sendCacheControl: private caching; **
Request ended normally

Follow me here :: if I load up the URL in the debug log above (or
*everytime* I search now and read the debug log) in a web-browser, like
'lynx' it I see this (or something similar) ;

1
1.0 0 Main_Page

Now that tells me that I am actually getting a REAL result when going to the
link in debug manually! It seems to be saying there is 1 mention of the word
I searched for, and it is on the "Main_Page" (which is correct!)

I have put ALOT more information about my problem, _every step_ I took
starting with my MediaWiki installation on a newly formatted hard drive+new
Slackware linux v12.1 Operating System install (slackware 12.1), I also have
log files and configuration files posted at this website (actually, its the
"talk" page for the MediaWiki extension = MWSearch) =
http://www.mediawiki.org/wiki/Extension_talk:MWSearch#MediaWiki_SVN.2BLucene-Search2_SVN.2BMWSearch_SVN_.3D_ZERO_search_results

I hope someone out there has done this, and/or has a working setup that
could point me in the right direction (to a website/HOWTO document, or
anything!), or someone who would be willing to troubleshoot this issue with
me! I am NOT a linux newbie in any sense of the word, and can use anyones
assistance who may be able to help me FIX the issue. IMHO I think the
problem is with MWSearch extension itself, but it is VERY possible I did
something wrong, and am willing to admit my fault, and correct it - someone,
ANYONE, please guide me thru how to fix this issue!

(I've been at troubleshooting this issue for _OVER_ 6months now, just to
give you an idea of how much searching and troubleshooting I've given this
issue)

Feel free to email me, or post here, or even post on the MediaWiki Extension
talk:MWSearch page =
http://www.mediawiki.org/wiki/Extension_talk:MWSearch#MediaWiki_SVN.2BLucene-Search2_SVN.2BMWSearch_SVN_.3D_ZERO_search_results

Thanks for your time! peace -

agentdcooper[at]gmail.com


PS :: One last thing/FYI - the following is my current overall system setup
;;
* Slackware linux v12.1, (Linux 2.6.24.5-smp Slackware's smp-generic
kernel, unchanged)
* MediaWiki: 1.13alpha (SVN 06-25-2008)
* PHP: 5.2.6 (I used Slackware 12.1's PHP v5.2.6 update package)
* MySQL: 5.0.51b
* MediaWiki Extension(s): MWSearch SVN 06-25-2008, and Lucene-search2
SVN 06-25-2008, + I downloaded & installed mwdumper.jar into the
Lucene-search2 "lib" dir = /usr/local/search/ls2
* other tools: jre-6u6-i586-3, jdk-1_5_0_09-i586-1,
apache-ant-1.7.0-i486, rsync-3.0.2-i486-1
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l


tstarling at wikimedia

Jun 30, 2008, 3:02 PM

Post #2 of 4 (507 views)
Permalink
Re: MediaWiki + Lucene-Search2 + MWSearch extension = ZERO search results [In reply to]

agent dale cooper wrote:
> (I've been at troubleshooting this issue for _OVER_ 6months now, just to
> give you an idea of how much searching and troubleshooting I've given this
> issue)
>

It sounds like you've isolated the problem to within a couple of hundred
lines of code. Maybe you should spend less time searching the web for
someone with your exact problem, and more time reading that code.

> Follow me here :: if I load up the URL in the debug log above (or
> *everytime* I search now and read the debug log) in a web-browser, like
> 'lynx' it I see this (or something similar) ;
>
> 1
> 1.0 0 Main_Page

Is this the same response text that MWSearch sees? If yes, where does
MWSearch go wrong in interpreting it? If no, what is different about the
way MWSearch requests pages compared to lynx? Is it timing out? You can
use tcpdump to snoop on the communication between MWSearch and the search
server. You can use telnet to generate requests manually and see how the
search daemon responds.

-- Tim Starling


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l


agentdcooper at gmail

Jul 1, 2008, 7:34 AM

Post #3 of 4 (497 views)
Permalink
Re: MediaWiki + Lucene-Search2 + MWSearch extension = ZERO search results [In reply to]

>
> Subject:
> Re: [Mediawiki-l] MediaWiki + Lucene-Search2 + MWSearch extension =
> ZERO search results
> From:
> Tim Starling <tstarling[at]wikimedia.org>
> Date:
> Tue, 01 Jul 2008 08:02:05 +1000
>
>
> It sounds like you've isolated the problem to within a couple of
> hundred lines of code. Maybe you should spend less time searching the
> web for someone with your exact problem, and more time reading that code.
=) I'd agree with ya, if I wasn't so much of a PHP newbie... I'd
consider myself more of a Perl and Bash type coder, but I definarely
understand where you are coming from with your suggestion. Luckily, I
found someone over @ MediaWiki.org's MWSearch Extension_talk page that
helped me troubleshoot my issue!

>> Follow me here :: if I load up the URL in the debug log above (or
>> *everytime* I search now and read the debug log) in a web-browser, like
>> 'lynx' it I see this (or something similar) ;
>>
>> 1
>> 1.0 0 Main_Page
>
> Is this the same response text that MWSearch sees? If yes, where does
> MWSearch go wrong in interpreting it? If no, what is different about
> the way MWSearch requests pages compared to lynx? Is it timing out?
> You can use tcpdump to snoop on the communication between MWSearch and
> the search server. You can use telnet to generate requests manually
> and see how the search daemon responds.
>
> -- Tim Starling
Here's what "Brian" from MWSearch Extension_talk page helped identify,
summing up his last post, and the results we found from some
troubleshooting ;

"we can conclude from this that: 1) PHP can connect to Lucene properly
and 2) Your HTTP fetch capabilities are broken. I'm not sure what we can
do about it. The proper way is of course to fix the HTTP functions, but
I don't know how we can do that. The other option is to write a new HTTP
layer which will surely work."

<(root@/var/www/htdocs/wiki-svn06252008)> cd /var/www/htdocs/wiki-svn06252008
<(root@/var/www/htdocs/wiki-svn06252008)> php maintenance/eval.php
> $sock = fsockopen('127.0.0.1', 8123); fwrite($sock, "GET /search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10 HTTP/1.0\r\nHost: localhost\r\n\r\n"); print fread($sock, 8192);
HTTP/1.1 200 OK
Content-Type: text/plain

1
1.0 0 Main_Page


> print
Http::get('http://127.0.0.1:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10'
<http://127.0.0.1:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10%27>);
>
> print
Http::get('http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10'
<http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10%27>);
>


What's the chance some kind soul on the Mediawiki-l mailing list knows,
or can point me where I can figure out more in-depth information about
MediaWiki's HTTP get function that may be causing my querries to my
Lucene-Search-2 daemon on port 8123 to get stripped out?

When using PHP to talk directly to my LuceneSearch2 daemon I get a valid
response, and everything works great = the response is displayed, as
search results. The problem comes into play within my MediaWiki site
once I enable the MWSearch extension (ZERO search results), or as seen
above = when I start-up MediaWiki's PHP debug script and try to use
HTTP::get to talk with the LuceneSeach2 Daemon, I get no response so it
seems... but my LS2 daemon is definately responding to the HTTP::get
request! It sounds like MediaWiki is the culprit and MW's HTTP fetch
function is somehow stripping the search results --- as demonstrated
above. I can also get the search results from my LS2 daemon with a web
browser "lynx", telnet or with PHP.

I really hope someone can point me in the right direction, or help a
fella' out with diagnosing the issue! Thanks for your time, peace -

agentdcooper[at]gmail.com
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l


tstarling at wikimedia

Jul 1, 2008, 4:14 PM

Post #4 of 4 (502 views)
Permalink
Re: MediaWiki + Lucene-Search2 + MWSearch extension = ZERO search results [In reply to]

agent dale cooper wrote:
>> > It sounds like you've isolated the problem to within a couple of
>> > hundred lines of code. Maybe you should spend less time searching the
>> > web for someone with your exact problem, and more time reading that code.
>
> =) I'd agree with ya, if I wasn't so much of a PHP newbie... I'd
> consider myself more of a Perl and Bash type coder, but I definarely
> understand where you are coming from with your suggestion.

Just pretend it's perl, it's pretty much the same for these purposes.

> It sounds like MediaWiki is the culprit and MW's HTTP fetch
> function is somehow stripping the search results

Well, Http::get() is only 68 lines. It has two branches, one of uses
file_get_contents(), which should emit errors if display_errors is on, and
the other uses curl_exec(), which has two error branches which return
false silently:

if ( curl_getinfo( $c, CURLINFO_HTTP_CODE ) != 200 ) {
if ( curl_errno( $c ) != CURLE_OK ) {

You should determine which one of these MediaWiki is using, and either
enable display_errors, or add debugging statements to the two curl error
branches.

Or, again, you could use tcpdump, which would probably determine the
problem without dealing with the source code.

-- Tim Starling


_______________________________________________
MediaWiki-l mailing list
MediaWiki-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Wikipedia mediawiki RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.