Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

New search backend live on mediawiki.org

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


neverett at wikimedia

Aug 28, 2013, 11:20 AM

Post #1 of 7 (446 views)
Permalink
New search backend live on mediawiki.org

Today we threw the big lever and turned on our new search backend at
mediawiki.org. It isn't the default yet but it is just about ready for you
to try. Here is what is we think we've improved:
1. Templates are now expanded during search so:
1a. You can search for text included in templates
1b. You can search for categories included in templates
2. The search engine is updated very quickly after articles change.
3. A few funky things around intitle and incategory:
3a. You can combine them with a regular query (incategory:kings peaceful)
3b. You can use prefix searches with them (incategory:norma*)
3c. You can use them everywhere in the query (roger incategory:normans)

What we think we've made worse and we're working on fixing:
1. Because we're expanding templates some things that probably shouldn't
be searched are being searched. We've fixed a few of these issues but I
wouldn't be surprised if more come up. We opened Bug 53426 regarding audio
tags.
2. The relative weighting of matches is going to be different. We're
still fine tuning this and we'd appreciate any anecdotes describing search
results that seem out of order.
3. We don't currently index headings beyond the article title in any
special way. We'll be fixing that soon. (Bug 53481)
4. Searching for file names or clusters of punctuation characters doesn't
work as well as it used to. It still works reasonably well if you surround
your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948)
5. "Did you mean" suggestions currently aren't highlighted at all and
sometimes we'll suggest things that aren't actually better. (Bugs 52286 and
52860)
6. incategory:"category with spaces" isn't working. (Bug 53415)

What we've changed that you probably don't care about:
1. Updating search in bulk is much more slow then before. This is the
cost of expanding templates.
2. Search is now backed by a horizontally scalable search backend that is
being actively developed (Elasticsearch) so we're in a much better place to
expand on the new solution as time goes on.

Neat stuff if you run your own MediaWiki:
CirrusSearch is much easier to install than our current search
infrastructure.

So what will you notice? Nothing! That is because while the new search
backend (CirrusSearch) is indexing we've left the current search
infrastructure as the default while we work on our list of bugs. You can
see the results from CirrusSearch by performing your search as normal and
adding "&srbackend=CirrusSearch" to the url parameters.

If you notice any problems with CirrusSearch please file bugs directly for
it:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&component=CirrusSearch

Nik Everett
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


p.selitskas at gmail

Aug 28, 2013, 12:37 PM

Post #2 of 7 (420 views)
Permalink
Re: New search backend live on mediawiki.org [In reply to]

Will it be set as the search backend further on Wikimedia projects?

Is there source code available for Elasticsearch on Gerrit? I couldn't
find it. Stemming doesn't work for some languages at all, thus
searching exact matches only.

On Wed, Aug 28, 2013 at 9:20 PM, Nikolas Everett <neverett [at] wikimedia> wrote:
> Today we threw the big lever and turned on our new search backend at
> mediawiki.org. It isn't the default yet but it is just about ready for you
> to try. Here is what is we think we've improved:
> 1. Templates are now expanded during search so:
> 1a. You can search for text included in templates
> 1b. You can search for categories included in templates
> 2. The search engine is updated very quickly after articles change.
> 3. A few funky things around intitle and incategory:
> 3a. You can combine them with a regular query (incategory:kings peaceful)
> 3b. You can use prefix searches with them (incategory:norma*)
> 3c. You can use them everywhere in the query (roger incategory:normans)
>
> What we think we've made worse and we're working on fixing:
> 1. Because we're expanding templates some things that probably shouldn't
> be searched are being searched. We've fixed a few of these issues but I
> wouldn't be surprised if more come up. We opened Bug 53426 regarding audio
> tags.
> 2. The relative weighting of matches is going to be different. We're
> still fine tuning this and we'd appreciate any anecdotes describing search
> results that seem out of order.
> 3. We don't currently index headings beyond the article title in any
> special way. We'll be fixing that soon. (Bug 53481)
> 4. Searching for file names or clusters of punctuation characters doesn't
> work as well as it used to. It still works reasonably well if you surround
> your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948)
> 5. "Did you mean" suggestions currently aren't highlighted at all and
> sometimes we'll suggest things that aren't actually better. (Bugs 52286 and
> 52860)
> 6. incategory:"category with spaces" isn't working. (Bug 53415)
>
> What we've changed that you probably don't care about:
> 1. Updating search in bulk is much more slow then before. This is the
> cost of expanding templates.
> 2. Search is now backed by a horizontally scalable search backend that is
> being actively developed (Elasticsearch) so we're in a much better place to
> expand on the new solution as time goes on.
>
> Neat stuff if you run your own MediaWiki:
> CirrusSearch is much easier to install than our current search
> infrastructure.
>
> So what will you notice? Nothing! That is because while the new search
> backend (CirrusSearch) is indexing we've left the current search
> infrastructure as the default while we work on our list of bugs. You can
> see the results from CirrusSearch by performing your search as normal and
> adding "&srbackend=CirrusSearch" to the url parameters.
>
> If you notice any problems with CirrusSearch please file bugs directly for
> it:
> https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&component=CirrusSearch
>
> Nik Everett
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
З павагай,
Павел Селіцкас/Pavel Selitskas
Wizardist @ Wikimedia projects

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


neverett at wikimedia

Aug 28, 2013, 12:55 PM

Post #3 of 7 (422 views)
Permalink
Re: New search backend live on mediawiki.org [In reply to]

On Wed, Aug 28, 2013 at 3:37 PM, Paul Selitskas <p.selitskas [at] gmail>wrote:

> Will it be set as the search backend further on Wikimedia projects?
>

Yes. I'm not sure when though.


> Is there source code available for Elasticsearch on Gerrit?


Our plugin that interacts with Elasticsearch is called CirrusSearch and
lives in gerrit here:
<https://gerrit.wikimedia.org/r/#/projects/mediawiki/extensions/CirrusSearch,dashboards/default>
https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/CirrusSearch
Elasticsearch lives in github here:
https://github.com/elasticsearch/elasticsearch


> Stemming doesn't work for some languages at all, thus
> searching exact matches only.
>

Stemming is done based on the language of the wiki. I expect only English
stemming to work on mediawiki.org. Right now we use the default language
analysers for all the languages that Elasticsearch supports out of the box (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/lang-analyzer/)
with some customizations for English. Languages that aren't better
supported get a "default" analyser that doesn't do any stemming and splits
on spaces. I expect we'll have to add build some more analysers in the
future.

Nik
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


innocentkiller at gmail

Aug 28, 2013, 12:59 PM

Post #4 of 7 (419 views)
Permalink
Re: New search backend live on mediawiki.org [In reply to]

On Wed, Aug 28, 2013 at 12:37 PM, Paul Selitskas <p.selitskas [at] gmail>wrote:

> Will it be set as the search backend further on Wikimedia projects?
>
>
That's the plan eventually :)


> Is there source code available for Elasticsearch on Gerrit? I couldn't
> find it. Stemming doesn't work for some languages at all, thus
> searching exact matches only.
>

No, ES is not in Gerrit. It's an upstream project, their website is
elasticsearch.org
The CirrusSearch extension (our part of the project) is in Gerrit though.

-Chad
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


sumanah at wikimedia

Aug 28, 2013, 4:11 PM

Post #5 of 7 (411 views)
Permalink
Re: New search backend live on mediawiki.org [In reply to]

On 08/28/2013 02:20 PM, Nikolas Everett wrote:
> Today we threw the big lever and turned on our new search backend at
> mediawiki.org. It isn't the default yet but it is just about ready for you
> to try. Here is what is we think we've improved:
> 1. Templates are now expanded during search so:
> 1a. You can search for text included in templates
> 1b. You can search for categories included in templates
> 2. The search engine is updated very quickly after articles change.
> 3. A few funky things around intitle and incategory:
> 3a. You can combine them with a regular query (incategory:kings peaceful)
> 3b. You can use prefix searches with them (incategory:norma*)
> 3c. You can use them everywhere in the query (roger incategory:normans)

Template expansion and category intersection search - so exciting! Thank
you, Nik and Chad, for working on this.
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


z at mzmcbride

Aug 28, 2013, 5:08 PM

Post #6 of 7 (410 views)
Permalink
Re: New search backend live on mediawiki.org [In reply to]

Chad wrote:
>No, ES is not in Gerrit. It's an upstream project, their website is
>elasticsearch.org
>The CirrusSearch extension (our part of the project) is in Gerrit though.

https://www.mediawiki.org/wiki/Elasticsearch

https://wikitech.wikimedia.org/wiki/Elasticsearch

I'm not sure what should be at either title (perhaps redirects), but this
is where I went to find answers. When you get a chance. :-)

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


innocentkiller at gmail

Aug 28, 2013, 5:17 PM

Post #7 of 7 (410 views)
Permalink
Re: New search backend live on mediawiki.org [In reply to]

On Wed, Aug 28, 2013 at 5:08 PM, MZMcBride <z [at] mzmcbride> wrote:

> Chad wrote:
> >No, ES is not in Gerrit. It's an upstream project, their website is
> >elasticsearch.org
> >The CirrusSearch extension (our part of the project) is in Gerrit though.
>
> https://www.mediawiki.org/wiki/Elasticsearch
>
> https://wikitech.wikimedia.org/wiki/Elasticsearch
>
> I'm not sure what should be at either title (perhaps redirects), but this
> is where I went to find answers. When you get a chance. :-)
>
>
Those now redirect to real places. Thanks for the reminder :)

-Chad
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.