Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

So... status of category intersections?

 

 

First page Previous page 1 2 3 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


aerik at thesylvans

Apr 14, 2008, 9:16 PM

Post #1 of 59 (1364 views)
Permalink
So... status of category intersections?

At the risk of asking a stupid question: what is the status of category
intersections? I guess this is really directed to Brion, Tim, and anyone
capable of doing commits. Is there any interest/motivation in making it
happen? I think a lucene index is the way to go - if someone coded an
interface, could someone capable of doing it (Tim?) set up the index?

Best Regards,
Aerik


--
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 15, 2008, 7:00 AM

Post #2 of 59 (1324 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, Apr 15, 2008 at 12:16 AM, Aerik Sylvan <aerik[at]thesylvans.com> wrote:
> At the risk of asking a stupid question: what is the status of category
> intersections? I guess this is really directed to Brion, Tim, and anyone
> capable of doing commits. Is there any interest/motivation in making it
> happen? I think a lucene index is the way to go - if someone coded an
> interface, could someone capable of doing it (Tim?) set up the index?

I'm capable of doing commits, but not setting anything up on the site.
For my part, adding category intersection functionality for core is
probably the next significant thing I'd do, given some time to spend
on development work (which may or may not be available soon). I would
add it using MySQL fulltext in core (and PostgreSQL support also
belongs in core, if someone wants to write that), but with a pluggable
backend. But someone could easily preempt me, since I have no
timetable for this.

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


jf at mormo

Apr 15, 2008, 7:08 AM

Post #3 of 59 (1321 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, Apr 15, 2008 at 10:00:23AM -0400, Simetrical wrote:
> I'm capable of doing commits, but not setting anything up on the site.
> For my part, adding category intersection functionality for core is
> probably the next significant thing I'd do, given some time to spend
> on development work (which may or may not be available soon). I would
> add it using MySQL fulltext in core (and PostgreSQL support also
> belongs in core, if someone wants to write that), but with a pluggable
> backend.

MySQL fulltext search is only available in MyISAM. MyISAM has very poor
locking support. We can't use it for the WMF server farm.

Regards,

jens

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 15, 2008, 7:53 AM

Post #4 of 59 (1326 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, Apr 15, 2008 at 10:08 AM, Jens Frank <jf[at]mormo.org> wrote:
> On Tue, Apr 15, 2008 at 10:00:23AM -0400, Simetrical wrote:
> > I'm capable of doing commits, but not setting anything up on the site.
> > For my part, adding category intersection functionality for core is
> > probably the next significant thing I'd do, given some time to spend
> > on development work (which may or may not be available soon). I would
> > add it using MySQL fulltext in core (and PostgreSQL support also
> > belongs in core, if someone wants to write that), but with a pluggable
> > backend.
>
> MySQL fulltext search is only available in MyISAM. MyISAM has very poor
> locking support. We can't use it for the WMF server farm.

Yup, thus the pluggable backend. (Although Brion suggested that maybe
MyISAM could be tried on Wikimedia for this.) Something like Lucene
isn't suitable for support in core.

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


info at skierpage

Apr 15, 2008, 10:02 PM

Post #5 of 59 (1322 views)
Permalink
Re: So... status of category intersections? [In reply to]

Aerik Sylvan wrote:
> At the risk of asking a stupid question: what is the status of category
> intersections?

At the risk of making a stupid answer: you could install the Semantic
MediaWiki extension and make queries like

{{#ask:
[[Category:Actor]]
[[Category:Director]]
}}

SMW will query for membership of subcategories (thus it'll match members
of Child actors) , to a configurable depth limit. The nifty thing is
you can
display other properties and categories of matching pages.

See demo (temporarily) at,
http://www.semanticweb.org/wiki/Sandbox#Category_intersections

Cheers,
--
=S Page


_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 16, 2008, 6:23 AM

Post #6 of 59 (1316 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Wed, Apr 16, 2008 at 1:02 AM, S Page <info[at]skierpage.com> wrote:
> At the risk of making a stupid answer: you could install the Semantic
> MediaWiki extension

Probably most of us know of SMW. The goal here appears to be to get
something enabled on Wikipedia, which rules out SMW without an
extremely large amount of review and (presumably) revision.

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gerard.meijssen at gmail

Apr 16, 2008, 6:31 AM

Post #7 of 59 (1316 views)
Permalink
Re: So... status of category intersections? [In reply to]

Hoi,
One bit of revision that has been scheduled before Wikimania 2008 is
changing the localisation of Semantic MediaWiki in order to have it
supported in Betawiki. Compared to the version we saw demonstrated at
Wikimania 2007 SMW has become a lot easier to use. The performance and
scalability has improved a lot so a lot of revision has been done. This does
not mean that more review would not be welcome, it does mean that it is not
that obvious that Semantic MediaWiki should be ruled out.
Thanks,
GerardM

On Wed, Apr 16, 2008 at 3:23 PM, Simetrical
<Simetrical+wikilist[at]gmail.com<Simetrical%2Bwikilist[at]gmail.com>>
wrote:

> On Wed, Apr 16, 2008 at 1:02 AM, S Page <info[at]skierpage.com> wrote:
> > At the risk of making a stupid answer: you could install the Semantic
> > MediaWiki extension
>
> Probably most of us know of SMW. The goal here appears to be to get
> something enabled on Wikipedia, which rules out SMW without an
> extremely large amount of review and (presumably) revision.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l[at]lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mak at aifb

Apr 18, 2008, 12:05 AM

Post #8 of 59 (1303 views)
Permalink
Re: So... status of category intersections? [In reply to]

Hi all,

here are my two (point five) cents as SMW developer:

(1) Yes, SMW needs to be tuned for being used in Wikipedia. It has many
settings to enable or disable features, and some features are clearly too
much for one of the worlds-largest sites. The default settings, for obvious
reasons, are not tuned for Wikipedia ;-)

(2) SMW consists of many independent components. Especially, its common syntax
[[property::value]] is a *tiny* (30 lines of PHP ;-) part of the system, and
can readily be replaced by anything you like (including templates). So
the "standardised templates" vs. "typed links" is really just a minor issue!

But whenever I see people discussing SMW, I see talks about syntax and query
performance. Syntax can be changed easily and queries can even be turned off,
and still SMW is useful! Here are some things that SMW provides beyond
parsing square brackets:

** Datatype parsing, partly internationalised. E.g. the system recognises
that "+1234" is the same number as "1.234,0", support for Gregorian-Julian
calendar conversion is coming, and geographical coordinates can already be
written in many ways. This is computationally cheap, but you will want that
for template-based structuring as well.

** Storage. SMW has an object oriented storage API so that the storage (DB
tables or whatever) can be changed without changing the rest of the code. It
provides internal object-models and data structures that are useful for
dealing with structured data. Why reinvent all that or handle data values as
plain strings internally?

** Export. SMW has various interfaces to directly export data to other
systems. In addition to the long-standing RDF/XML export, we now also have
iCal support, and direct connections to "semantic" datastores that can also
be hosted on different servers. This means that all data entered in the wiki
is directly written into a separate database which has its own standard query
interfaces (the SPARQL query language typically being the method of choice).
No need to use SMW's internal query engine if this is too stressful for
servers.

** Extensions. Things like SemanticForms (form input) or SemanticLayers
(embedded maps beyond Google) already use SMW APIs internally and still need
not be computationally problematic.


(2.5) All apologies to the BetaWiki guys -- we really want to join as soon as
possible (and it should be possible!).

Cheers,

Markus


On Mittwoch, 16. April 2008, Gerard Meijssen wrote:
> Hoi,
> One bit of revision that has been scheduled before Wikimania 2008 is
> changing the localisation of Semantic MediaWiki in order to have it
> supported in Betawiki. Compared to the version we saw demonstrated at
> Wikimania 2007 SMW has become a lot easier to use. The performance and
> scalability has improved a lot so a lot of revision has been done. This
> does not mean that more review would not be welcome, it does mean that it
> is not that obvious that Semantic MediaWiki should be ruled out.
> Thanks,
> GerardM
>
> On Wed, Apr 16, 2008 at 3:23 PM, Simetrical
> <Simetrical+wikilist[at]gmail.com<Simetrical%2Bwikilist[at]gmail.com>>
>
> wrote:
> > On Wed, Apr 16, 2008 at 1:02 AM, S Page <info[at]skierpage.com> wrote:
> > > At the risk of making a stupid answer: you could install the Semantic
> > > MediaWiki extension
> >
> > Probably most of us know of SMW. The goal here appears to be to get
> > something enabled on Wikipedia, which rules out SMW without an
> > extremely large amount of review and (presumably) revision.
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l[at]lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l[at]lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Markus Krötzsch
Institut AIFB, Universität Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362 fax +49 (0)721 608 5998
mak[at]aifb.uni-karlsruhe.de www http://korrekt.org
Attachments: signature.asc (0.18 KB)


Simetrical+wikilist at gmail

Apr 18, 2008, 6:35 AM

Post #9 of 59 (1301 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Fri, Apr 18, 2008 at 3:05 AM, Markus Krötzsch
<mak[at]aifb.uni-karlsruhe.de> wrote:
> But whenever I see people discussing SMW, I see talks about syntax and query
> performance. Syntax can be changed easily and queries can even be turned off,
> and still SMW is useful! Here are some things that SMW provides beyond
> parsing square brackets: [etc.]

The point is that it *does* provide so many things. This makes
reviewing it pretty difficult, so it doesn't look likely to get
enabled any time soon, according to my interpretation of statements
I've seen from Brion. Thus we look to alternatives for use on
Wikipedia, which are small and narrow and can be easily reviewed. If
SMW were split into many small modules (possibly all with a dependency
on a small central core) it might stand a better chance of ever being
considered for use on Wikimedia projects.

Besides, stuff like tag searches should probably be in the core
software, not an extension. They're a semi-expected feature in fancy
Web 2.0 software these days.
_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mak at aifb

Apr 18, 2008, 8:45 AM

Post #10 of 59 (1294 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Freitag, 18. April 2008, Simetrical wrote:
> On Fri, Apr 18, 2008 at 3:05 AM, Markus Krötzsch
>
> <mak[at]aifb.uni-karlsruhe.de> wrote:
> > But whenever I see people discussing SMW, I see talks about syntax and
> > query performance. Syntax can be changed easily and queries can even be
> > turned off, and still SMW is useful! Here are some things that SMW
> > provides beyond parsing square brackets: [etc.]
>
> The point is that it *does* provide so many things. This makes
> reviewing it pretty difficult, so it doesn't look likely to get
> enabled any time soon, according to my interpretation of statements
> I've seen from Brion. Thus we look to alternatives for use on
> Wikipedia, which are small and narrow and can be easily reviewed. If
> SMW were split into many small modules (possibly all with a dependency
> on a small central core) it might stand a better chance of ever being
> considered for use on Wikimedia projects.

Great! Just let us know what you need. We can extract and bundle any feature
into a sub-piece of software, and you can decide how small you want it to be
to allow proper review (I am a very picky contribution reviewer myself, so I
feel with Brion here :). But SMW is fairly modular anyway, and I can quickly
separate most functions. The core certainly is the storage API that SMW and
many extensions refer to (the DB schema can be changed, just the store's
object API is somewhat central).

I can provide you with a more detailed overview of the components to let you
decide what you need. In any case that should be easier than rewriting things
from scratch, and it would ensure compatibility with the non-included SMW
functions (which is in our interest even if you want only a small part). So,
if Wikimedia is interested in features that we might possibly provide, then
there appears to be no reason not to challenge us before starting new
projects :-)

You might also contact Wikia, who already did tests before enabling SMW on
their machines. Maybe they have concrete complaints that we should address.

>
> Besides, stuff like tag searches should probably be in the core
> software, not an extension. They're a semi-expected feature in fancy
> Web 2.0 software these days.

I am happy with moving code to core ;-) But, seriously, even if you go for
completely new implementations, it would be great if we could discuss these
things to make all those additions at least minimally compatible. Is there
currently a core group of people at MW who are interested in that topic? Who
would be likely to develop such an in-core tagging feature anyway?


We may sometimes have trouble finding enough development time in our work
life, but we know how to put our priorities. And we have means to hire people
and to buy servers if motivated by Wikipedia requirements. So far, we have
not seen concrete requests/complaints from the Wikipedia side and have mainly
developed what our current users requested (well, not all of it ;-). Ask and
you will be answered.

Best regards,

Markus


--
Markus Krötzsch
Institut AIFB, Universität Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362 fax +49 (0)721 608 5998
mak[at]aifb.uni-karlsruhe.de www http://korrekt.org
Attachments: signature.asc (0.18 KB)


Simetrical+wikilist at gmail

Apr 18, 2008, 8:55 AM

Post #11 of 59 (1290 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Fri, Apr 18, 2008 at 11:45 AM, Markus Krötzsch
<mak[at]aifb.uni-karlsruhe.de> wrote:
> Great! Just let us know what you need.

Well, I'm not the right person to ask. :) Brion or Tim has to review
anything you want to go live. I don't have shell access and can't
enable extensions. If you want a particular feature of SMW enabled on
Wikimedia, the right course of action is to 1) break off the code for
that specific feature into some kind of small, narrow,
easily-reviewable bundle, 2) open a bug asking for that single
specific feature to be enabled, 3) pester people until they review it,
4) fix any complaints they have, 5) goto (3). Then repeat for any
other individual features.

At least that's my impression of what would work -- again, I'm not any
authority here. One thing that's for sure is that active pestering is
usually needed to get things done at present.

> I am happy with moving code to core ;-) But, seriously, even if you go for
> completely new implementations, it would be great if we could discuss these
> things to make all those additions at least minimally compatible. Is there
> currently a core group of people at MW who are interested in that topic? Who
> would be likely to develop such an in-core tagging feature anyway?

Well, there have been fairly extensive discussions on this list about
implementation of category intersections. Software discussions are
done here.
_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mak at aifb

Apr 19, 2008, 1:20 AM

Post #12 of 59 (1285 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Freitag, 18. April 2008, Simetrical wrote:
> On Fri, Apr 18, 2008 at 11:45 AM, Markus Krötzsch
>
> <mak[at]aifb.uni-karlsruhe.de> wrote:
> > Great! Just let us know what you need.
>
> Well, I'm not the right person to ask. :)

I know, but I assume the according people (also the ones who have requirements
for tagging/cat intersection) are also on this list ;-)

> Brion or Tim has to review
> anything you want to go live. I don't have shell access and can't
> enable extensions. If you want a particular feature of SMW enabled on
> Wikimedia, the right course of action is to 1) break off the code for
> that specific feature into some kind of small, narrow,
> easily-reviewable bundle, 2) open a bug asking for that single
> specific feature to be enabled, 3) pester people until they review it,
> 4) fix any complaints they have, 5) goto (3). Then repeat for any
> other individual features.
>
> At least that's my impression of what would work -- again, I'm not any
> authority here. One thing that's for sure is that active pestering is
> usually needed to get things done at present.

OK, thanks. We will then try to propose what parts of SMW could safely be used
on Very Large Sites already. The main insight for me here is that we should
not just extend SMW with more features, but also create lightweight versions
with less features! We will see what we can do.

>
> > I am happy with moving code to core ;-) But, seriously, even if you go
> > for completely new implementations, it would be great if we could discuss
> > these things to make all those additions at least minimally compatible.
> > Is there currently a core group of people at MW who are interested in
> > that topic? Who would be likely to develop such an in-core tagging
> > feature anyway?
>
> Well, there have been fairly extensive discussions on this list about
> implementation of category intersections. Software discussions are
> done here.

Yes, I see (but, alas, cannot follow all discussions going on here ...). I
will try to review the discussions on this list soon to gather requirements.
Anyway, if anyone working on tagging/cat intersections right now reads that,
I would appraciate direct feedback.

Best regards,

Markus


--
Markus Krötzsch
Institut AIFB, Universität Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362 fax +49 (0)721 608 5998
mak[at]aifb.uni-karlsruhe.de www http://korrekt.org
Attachments: signature.asc (0.18 KB)


ssanbeg at ask

Apr 21, 2008, 10:12 AM

Post #13 of 59 (1271 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Sat, 19 Apr 2008 10:20:07 +0200, Markus Krötzsch wrote:

> On Freitag, 18. April 2008, Simetrical wrote:
>> On Fri, Apr 18, 2008 at 11:45 AM, Markus Krötzsch
>>
>> <mak[at]aifb.uni-karlsruhe.de> wrote:
>> > Great! Just let us know what you need.
>>
>> Well, I'm not the right person to ask. :)
>
> I know, but I assume the according people (also the ones who have requirements
> for tagging/cat intersection) are also on this list ;-)
>
>> Brion or Tim has to review
>> anything you want to go live. I don't have shell access and can't
>> enable extensions. If you want a particular feature of SMW enabled on
>> Wikimedia, the right course of action is to 1) break off the code for
>> that specific feature into some kind of small, narrow,
>> easily-reviewable bundle, 2) open a bug asking for that single
>> specific feature to be enabled, 3) pester people until they review it,
>> 4) fix any complaints they have, 5) goto (3). Then repeat for any
>> other individual features.
>>
>> At least that's my impression of what would work -- again, I'm not any
>> authority here. One thing that's for sure is that active pestering is
>> usually needed to get things done at present.
>
> OK, thanks. We will then try to propose what parts of SMW could safely be used
> on Very Large Sites already. The main insight for me here is that we should
> not just extend SMW with more features, but also create lightweight versions
> with less features! We will see what we can do.
>

I'd guess that category intersections would be the place to start. That's
been talked about quite a bit here, and seems like the most basic feature
of SMW. Attributes are like an enhanced category/tag, which add more
functionality, and more weird quirks.

The kind of thing that would
improve the category intersection, such as a real category ID in core,
would also help other areas of SMW; like assigning a type to an attribute
ID rather than the attribute and every use of it.

-Steve



_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 21, 2008, 6:42 PM

Post #14 of 59 (1263 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Mon, Apr 21, 2008 at 1:12 PM, Steve Sanbeg <ssanbeg[at]ask.com> wrote:
> The kind of thing that would
> improve the category intersection, such as a real category ID in core,

We've had that for weeks now.

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 22, 2008, 6:39 AM

Post #15 of 59 (1266 views)
Permalink
Re: So... status of category intersections? [In reply to]

Simetrical schreef:
>
> Well, there have been fairly extensive discussions on this list about
> implementation of category intersections. Software discussions are
> done here.
MinuteElectron wrote a pretty good implementation of category
intersection as an extension [1]. The only downsides I see are:
* It uses the LinkUpdater to gradually build the categoryintersections
table, but there's no maintenance script to build the entire table at
once. I've written one today, but haven't figured out yet how to
properly integrate this into an extension (I can't really get the path
to the maintenance dir from there; are there any other extensions with
CLI scripts around?)
* It uses nested queries to intersect three or more categories, and it's
hard for me to judge how efficient they are. More about this later.
* It doesn't have a clean API to get a category intersection sub-query
(this could be written of course, and it should if we're gonna use it)

As to the subquery thing, I'll describe how the extension fetches pages
that are in categories A, B and C (all three of them). First, it
calculates hashes for A|B, A|C and B|C (will be called hashAB, hashBC
and hashAC respectively). Then, it queries the categoryintersections
table for pages that have all three hashes, as follows:

SELECT ci_page FROM categoryintersections
WHERE ci_hash = 'hashAB' AND ci_page IN (
SELECT ci_page FROM categoryintersections
WHERE ci_hash = 'hashBC' AND ci_page IN (
SELECT ci_page FROM categoryintersections
WHERE ci_hash = 'hashAC'
)
)

I ran an EXPLAIN on it, but I can't really judge if it's bad or good,
'cause I don't know how bad those dependent subqueries are:

id select_type table type
possible_keys
1 PRIMARY categoryintersections ref PRIMARY
2 DEPENDENT SUBQUERY categoryintersections eq_ref PRIMARY
3 DEPENDENT SUBQUERY categoryintersections eq_ref PRIMARY


id key key_len ref rows Extra
1 PRIMARY 4 const 2 Using where; Using index
2 PRIMARY 8 const,func 1 Using where; Using index
3 PRIMARY 8 const,func 1 Using where; Using index

For clarification, the structure of the categoryintersections table is
as follows:
|
CREATE TABLE `categoryintersections` (
`ci_page` int(10) unsigned NOT NULL,
`ci_hash` int(10) unsigned NOT NULL,
PRIMARY KEY (`ci_hash`,`ci_page`)
);
|
Can someone who knows more about database efficiency than I do comment
on this? Also, I'd like to suggest we merge this extension into core
(after improving it first), thoughts?

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 22, 2008, 7:47 AM

Post #16 of 59 (1266 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, Apr 22, 2008 at 9:39 AM, Roan Kattouw <roan.kattouw[at]home.nl> wrote:
> MinuteElectron wrote a pretty good implementation of category
> intersection as an extension [1].

You left out the reference here, but if you're talking about this,

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CategoryIntersection/?view=log

then it was written by Magnus. I've already commented on it somewhat.

> The only downsides I see are:
> * It uses the LinkUpdater to gradually build the categoryintersections
> table, but there's no maintenance script to build the entire table at
> once. I've written one today, but haven't figured out yet how to
> properly integrate this into an extension (I can't really get the path
> to the maintenance dir from there; are there any other extensions with
> CLI scripts around?)

Mostly we just assume that we're in wikiroot/extensions/.

> As to the subquery thing, I'll describe how the extension fetches pages
> that are in categories A, B and C (all three of them). First, it
> calculates hashes for A|B, A|C and B|C (will be called hashAB, hashBC
> and hashAC respectively). Then, it queries the categoryintersections
> table for pages that have all three hashes, as follows:
>
> SELECT ci_page FROM categoryintersections
> WHERE ci_hash = 'hashAB' AND ci_page IN (
> SELECT ci_page FROM categoryintersections
> WHERE ci_hash = 'hashBC' AND ci_page IN (
> SELECT ci_page FROM categoryintersections
> WHERE ci_hash = 'hashAC'
> )
> )

As I've commented before: this query won't work on MySQL 4, so it
can't be in core (unless perhaps disabled by default, or intelligently
auto-enabled depending on SQL engine). It will also probably run very
poorly on higher versions of MySQL, since MySQL is stupid about
rewriting subqueries into joins. This should be written as a join:

SELECT ci_page FROM categoryintersection AS ci1
JOIN categoryintersection AS ci2 ON ci1.ci_page = ci2.ci_page
WHERE ci1.ci_hash = 'hashAB' AND ci2.ci_hash = 'hashBC'

Note that you don't need the third table at all; if something is in A
intersect B and in B intersect C, it's automatically in A intersect C
as well.

In this case it's an extremely fast query, but that's because there
are only one or two rows returned for each result. In the worst case
of an empty match, or a match with fewer results than the LIMIT, it
will have to scan through both intersections in their entirety, which
may well be thousands of rows. It's also much less powerful than a
Boolean full-text search: it can't handle subtractions and will
probably have to be crippled to some small number of intersections.

> Also, I'd like to suggest we merge this extension into core
> (after improving it first), thoughts?

Well, if someone writes an interface for category intersections, it
might be reasonable to have multiple backends in core, given that the
backend will be flexible anyway. One advantage of Magnus' scheme is
that it will work in pretty much any SQL engine with no modification
(at least if rewritten to eliminate advanced features like subqueries
;) ). The alternative suggestion of a fulltext engine will work only
on MySQL 5, and possibly PostgreSQL (at least with appropriate extra
coding).

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 22, 2008, 7:59 AM

Post #17 of 59 (1263 views)
Permalink
Re: So... status of category intersections? [In reply to]

Simetrical schreef:
>
>> The only downsides I see are:
>> * It uses the LinkUpdater to gradually build the categoryintersections
>> table, but there's no maintenance script to build the entire table at
>> once. I've written one today, but haven't figured out yet how to
>> properly integrate this into an extension (I can't really get the path
>> to the maintenance dir from there; are there any other extensions with
>> CLI scripts around?)
>>
>
> Mostly we just assume that we're in wikiroot/extensions/.
>
I'll do that then, but it's still somewhat creepy.
>
>> As to the subquery thing, I'll describe how the extension fetches pages
>> that are in categories A, B and C (all three of them). First, it
>> calculates hashes for A|B, A|C and B|C (will be called hashAB, hashBC
>> and hashAC respectively). Then, it queries the categoryintersections
>> table for pages that have all three hashes, as follows:
>>
>> SELECT ci_page FROM categoryintersections
>> WHERE ci_hash = 'hashAB' AND ci_page IN (
>> SELECT ci_page FROM categoryintersections
>> WHERE ci_hash = 'hashBC' AND ci_page IN (
>> SELECT ci_page FROM categoryintersections
>> WHERE ci_hash = 'hashAC'
>> )
>> )
>>
>
> As I've commented before: this query won't work on MySQL 4, so it
> can't be in core (unless perhaps disabled by default, or intelligently
> auto-enabled depending on SQL engine). It will also probably run very
> poorly on higher versions of MySQL, since MySQL is stupid about
> rewriting subqueries into joins. This should be written as a join:
>
> SELECT ci_page FROM categoryintersection AS ci1
> JOIN categoryintersection AS ci2 ON ci1.ci_page = ci2.ci_page
> WHERE ci1.ci_hash = 'hashAB' AND ci2.ci_hash = 'hashBC'
>
Didn't think of that.
> Note that you don't need the third table at all; if something is in A
> intersect B and in B intersect C, it's automatically in A intersect C
> as well.
>
We probably need to hash generation functions then: one that generates
all hashes corresponding to a certain page (AB, AC and BC), and one
which generates hashes for a query (AB and BC only in this case).
> In this case it's an extremely fast query, but that's because there
> are only one or two rows returned for each result. In the worst case
> of an empty match, or a match with fewer results than the LIMIT, it
> will have to scan through both intersections in their entirety, which
> may well be thousands of rows. It's also much less powerful than a
> Boolean full-text search: it can't handle subtractions and will
> probably have to be crippled to some small number of intersections.
>
>
>> Also, I'd like to suggest we merge this extension into core
>> (after improving it first), thoughts?
>>
>
> Well, if someone writes an interface for category intersections, it
> might be reasonable to have multiple backends in core, given that the
> backend will be flexible anyway. One advantage of Magnus' scheme is
> that it will work in pretty much any SQL engine with no modification
> (at least if rewritten to eliminate advanced features like subqueries
> ;) ). The alternative suggestion of a fulltext engine will work only
> on MySQL 5, and possibly PostgreSQL (at least with appropriate extra
> coding).
>
>
I missed the explanation of the fulltext implementation. Something like
'Foo With_spaces Bar' and then do a fulltext search for the cats you
need? That would be more powerful, and would probably be faster for
complex intersections. I'll write an alternative to
CategoryIntersections that uses the fulltext schema and run some
benchmarks. I expect to have some results by the end of the week.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 22, 2008, 8:26 AM

Post #18 of 59 (1257 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, Apr 22, 2008 at 10:59 AM, Roan Kattouw <roan.kattouw[at]home.nl> wrote:
> I missed the explanation of the fulltext implementation. Something like
> 'Foo With_spaces Bar' and then do a fulltext search for the cats you
> need? That would be more powerful, and would probably be faster for
> complex intersections. I'll write an alternative to
> CategoryIntersections that uses the fulltext schema and run some
> benchmarks. I expect to have some results by the end of the week.

Aerik Sylvan has already done an implementation of the backend using
CLucene. If a front-end could be done in core, with a pluggable
backend, that might have the best chance of getting enabled on
Wikimedia relatively quickly. MyISAM fulltext is not necessarily
going to be fast enough due to the locking.

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ssanbeg at ask

Apr 22, 2008, 9:46 AM

Post #19 of 59 (1261 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Mon, 21 Apr 2008 21:42:43 -0400, Simetrical wrote:

> On Mon, Apr 21, 2008 at 1:12 PM, Steve Sanbeg <ssanbeg[at]ask.com> wrote:
>> The kind of thing that would
>> improve the category intersection, such as a real category ID in core,
>
> We've had that for weeks now.

Oh, funny I missed that; although w.r.t SMW, I've been looking more
closely at the latest released version of MW & SMW.

So maybe it would make sense to develop an extension that would use the
category ID with an SMW like front end, using code broken off from both
extensions?





_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 22, 2008, 11:01 AM

Post #20 of 59 (1257 views)
Permalink
Re: So... status of category intersections? [In reply to]

Steve Sanbeg schreef:
>
> So maybe it would make sense to develop an extension that would use the
> category ID with an SMW like front end, using code broken off from both
> extensions?
Wouldn't it be better just to improve SMW's category handling?

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gmane at kennel17

Apr 22, 2008, 2:59 PM

Post #21 of 59 (1260 views)
Permalink
Re: So... status of category intersections? [In reply to]

"Roan Kattouw" <roan.kattouw[at]home.nl> wrote in message
news:480DFD4D.8000806[at]home.nl...
> Simetrical schreef:
> >
> >> The only downsides I see are:
> >> * It uses the LinkUpdater to gradually build the categoryintersections
> >> table, but there's no maintenance script to build the entire table at
> >> once. I've written one today, but haven't figured out yet how to
> >> properly integrate this into an extension (I can't really get the path
> >> to the maintenance dir from there; are there any other extensions with
> >> CLI scripts around?)
> >>
> >
> > Mostly we just assume that we're in wikiroot/extensions/.
> >
> I'll do that then, but it's still somewhat creepy.

And, of course, it doesn't help when that's not the case, which is the
situation for us. For technical reasons, all extensions are outside the MW
source folder entirely. It would be good if MW provided a framework for
running per-extension maintenance scripts.

For example, to run the 'upgrade' script for the CategoryIntersection
extension, you could use:

~/wiki/maintenance/ $ php updateExtension.php CategoryIntersection upgrade

- Mark Clements (HappyDog)




_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Apr 22, 2008, 5:16 PM

Post #22 of 59 (1258 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Tue, Apr 22, 2008 at 5:59 PM, Mark Clements <gmane[at]kennel17.co.uk> wrote:
> And, of course, it doesn't help when that's not the case, which is the
> situation for us. For technical reasons, all extensions are outside the MW
> source folder entirely.

Symlinks work perfectly in that case (as is true for my localhost, for
instance, since it's running a checked-out version of
mediawiki/trunk/). I agree it's not great practice, though: maybe you
could try to use the current working directory? That seems even less
reliable.

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Apr 23, 2008, 4:18 AM

Post #23 of 59 (1249 views)
Permalink
Re: So... status of category intersections? [In reply to]

Simetrical schreef:
> On Tue, Apr 22, 2008 at 5:59 PM, Mark Clements <gmane[at]kennel17.co.uk> wrote:
>
>> And, of course, it doesn't help when that's not the case, which is the
>> situation for us. For technical reasons, all extensions are outside the MW
>> source folder entirely.
>>
>
> Symlinks work perfectly in that case (as is true for my localhost, for
> instance, since it's running a checked-out version of
> mediawiki/trunk/). I agree it's not great practice, though: maybe you
> could try to use the current working directory? That seems even less
> reliable.
The point is I gotta require_once() /maintenance/commandLine.inc or
whatever it's called. Of course, creating a symlink to commandLine.inc
(or copying it around if you're on Windows) will solve that.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gmane at kennel17

Apr 23, 2008, 4:40 AM

Post #24 of 59 (1244 views)
Permalink
Re: So... status of category intersections? [In reply to]

"Roan Kattouw" <roan.kattouw[at]home.nl> wrote in message
news:480F1B00.5090804[at]home.nl...
> Simetrical schreef:
> > On Tue, Apr 22, 2008 at 5:59 PM, Mark Clements
<gmane[at]kennel17.co.uk> wrote:
> >
> >> And, of course, it doesn't help when that's not the case, which is the
> >> situation for us. For technical reasons, all extensions are outside
the MW
> >> source folder entirely.
> >>
> >
> > Symlinks work perfectly in that case (as is true for my localhost, for
> > instance, since it's running a checked-out version of
> > mediawiki/trunk/). I agree it's not great practice, though: maybe you
> > could try to use the current working directory? That seems even less
> > reliable.
> The point is I gotta require_once() /maintenance/commandLine.inc or
> whatever it's called. Of course, creating a symlink to commandLine.inc
> (or copying it around if you're on Windows) will solve that.
>

From an extension writer's point of view, the current situation is to put in
a relative require_once() line to commandLine.inc and hope that the file is
in the expected place. You are then dependent on the user having set things
up 'correctly' on their server, and either let PHP throw whatever messages
it throws if it isn't, or add a load of checks to the system. The symlink
solution is not something that the writer can rely on. It is also not
terribly convenient if your extensions are in ~/mw_extensions to have to
create a ~/maintenance symlink, and probably a ~/AdminSettings.php symlink
as well - that's a lot of clutter in your home directory!

What would be better would be if extension writers could simply stick the
following code at the top of their maintenance scripts:

if (!defined("MEDIAWIKI") || !$wgCommandLine)
die("Maintenence scripts should be run from your wiki's maintenance
folder, using updateExtension.php");

updateExtension.php would do all the necessary checks and inclusions that
extensions will need (DB connection, LocalSettings, setting up paths, etc.)
so you, as an extension writer, don't have to worry about this side of
things at all.


- Mark Clements (HappyDog)




_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


andrew at epstone

Apr 23, 2008, 6:26 AM

Post #25 of 59 (1248 views)
Permalink
Re: So... status of category intersections? [In reply to]

On Wed, Apr 23, 2008 at 9:40 PM, Mark Clements <gmane[at]kennel17.co.uk> wrote:
> From an extension writer's point of view, the current situation is to put in
> a relative require_once() line to commandLine.inc and hope that the file is
> in the expected place.

global $IP;
require_once( "$IP/maintenance/commandLine.inc" );

What am I missing?

--
Andrew Garrett

_______________________________________________
Wikitech-l mailing list
Wikitech-l[at]lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

First page Previous page 1 2 3 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.