Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


innocentkiller at gmail

Aug 25, 2008, 4:38 AM

Post #1 of 7 (663 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php

On Mon, Aug 25, 2008 at 2:50 AM, <dantman [at] svn> wrote:
> Revision: 39938
> Author: dantman
> Date: 2008-08-25 06:50:31 +0000 (Mon, 25 Aug 2008)
>
> Log Message:
> -----------
> Revert 39936 and 39935;
> This 'fix' is merely a bad workaround and creates more issues rather than simply fixing.
> A) Part of the Title class is being /duplicated/ meaning more bugs are going to show up when someone improves stuff inside Title and doesn't know stuff is duplicated here.
> B) This change breaks cases as $wgCaptialLinks is now a per-namespace array, not a boolean.

No it isn't, that patch hasn't been committed. The correct function call
(once it's committed) will be MWNamespace::isCapitalized( $index ).
Fwiw: running a title through Title::newFromText() eventually runs it
through $title->secureAndSplit(), which does all kinds of fun normalization
(including forced first-letter capitalization). Personally, I'd rather
see _more_
of the logic for first-letter-case-sensitivity go there, rather than have
$wgCapitalLinks floating around everywhere.

As to the original bug: I say "invalid" myself. Trailing spaces and leading
spaces are _always_ trimmed. Thoughts on this?

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Aug 25, 2008, 5:01 AM

Post #2 of 7 (620 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php [In reply to]

The issue was on prefixes...
"Test_" was showing things like "TestMan".

I noted a way to fix that on the bug's page. Append a single character
to the title, and then strip it off once you have the key.
Using '.' in this case.

substr( ...titleToKey($text.'.'), 0, -1 );

Because the . is appended to the title "test_" will be normalized to the
db key "Test_." and then we strip off the . and end up with "Test_".
It's basically a placeholder character saying "Hey, I'm sitting here
representing the rest of the title... don't strip what's beside me!",
then we get rid of it when done.

Unless you want to start making complex logic for TitlePrefixes that'll
cause me to raise hell around here when I get back to the TitleRewrite
project and start complaining that it has become nearly impossible to
tweak the normalization process.

Hmmm... on that note, perhaps rather than my array list of functions, I
should create a list based system for normalization... Then it can
actually be outputted in the api in a format that other languages can
make use of. (Though, there's nothing wrong with creating a api module
to normalize a list of titles)

(text replace /[\s_]+/ with " ")
(text rtrim)
(dbkey setto text)
(dbkey ltrim)
(dbkey replace " " with "_")

^_^ Ok, ok... different syntax and overall idea... I just like to draft
in lisp inspired syntaxes...

~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Chad wrote:
> On Mon, Aug 25, 2008 at 2:50 AM, <dantman [at] svn> wrote:
>
>> Revision: 39938
>> Author: dantman
>> Date: 2008-08-25 06:50:31 +0000 (Mon, 25 Aug 2008)
>>
>> Log Message:
>> -----------
>> Revert 39936 and 39935;
>> This 'fix' is merely a bad workaround and creates more issues rather than simply fixing.
>> A) Part of the Title class is being /duplicated/ meaning more bugs are going to show up when someone improves stuff inside Title and doesn't know stuff is duplicated here.
>> B) This change breaks cases as $wgCaptialLinks is now a per-namespace array, not a boolean.
>>
>
> No it isn't, that patch hasn't been committed. The correct function call
> (once it's committed) will be MWNamespace::isCapitalized( $index ).
> Fwiw: running a title through Title::newFromText() eventually runs it
> through $title->secureAndSplit(), which does all kinds of fun normalization
> (including forced first-letter capitalization). Personally, I'd rather
> see _more_
> of the logic for first-letter-case-sensitivity go there, rather than have
> $wgCapitalLinks floating around everywhere.
>
> As to the original bug: I say "invalid" myself. Trailing spaces and leading
> spaces are _always_ trimmed. Thoughts on this?
>
> -Chad
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Aug 25, 2008, 10:33 AM

Post #3 of 7 (612 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Daniel Friesen wrote:
> The issue was on prefixes...
> "Test_" was showing things like "TestMan".
>
> I noted a way to fix that on the bug's page. Append a single character
> to the title, and then strip it off once you have the key.
> Using '.' in this case.
>
> substr( ...titleToKey($text.'.'), 0, -1 );
>
> Because the . is appended to the title "test_" will be normalized to the
> db key "Test_." and then we strip off the . and end up with "Test_".
> It's basically a placeholder character saying "Hey, I'm sitting here
> representing the rest of the title... don't strip what's beside me!",
> then we get rid of it when done.

That feels a little icky to me. :)

What I might recommend is having a couple of steps to the normalization:

1) Normalization of partial titles

...may end with / or whitespace or otherwise not be quite 100% a valid
title... for use in normalizing things to go into searches, prefix
searches, etc.

2) Complete title normalization

Finish that off with right-side trims, enforce length limits, etc.

- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiy7PAACgkQwRnhpk1wk45g9gCgnMNUQLGdSYBkiQlP93IRU3pM
vCQAoIg8ekHARGoyjlEuaeob/KS24q0k
=X6OU
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Aug 25, 2008, 12:41 PM

Post #4 of 7 (613 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php [In reply to]

Brion Vibber schreef:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Daniel Friesen wrote:
>
>> The issue was on prefixes...
>> "Test_" was showing things like "TestMan".
>>
>> I noted a way to fix that on the bug's page. Append a single character
>> to the title, and then strip it off once you have the key.
>> Using '.' in this case.
>>
>> substr( ...titleToKey($text.'.'), 0, -1 );
>>
>> Because the . is appended to the title "test_" will be normalized to the
>> db key "Test_." and then we strip off the . and end up with "Test_".
>> It's basically a placeholder character saying "Hey, I'm sitting here
>> representing the rest of the title... don't strip what's beside me!",
>> then we get rid of it when done.
>>
>
> That feels a little icky to me. :)
>
> What I might recommend is having a couple of steps to the normalization:
>
> 1) Normalization of partial titles
>
> ...may end with / or whitespace or otherwise not be quite 100% a valid
> title... for use in normalizing things to go into searches, prefix
> searches, etc.
>
> 2) Complete title normalization
>
> Finish that off with right-side trims, enforce length limits, etc.
Of course Brion's solution is the cleanest one and the best one in the
long term, but until someone has done that split I'm just gonna use the
hack Daniel suggested (although I'll put it *inside* the titleToKey()
function, not in the call).

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Aug 26, 2008, 12:18 AM

Post #5 of 7 (606 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php [In reply to]

Why inside of the function? That goes back to the whole issue of titles
not being normalized right. Someone asking for "...&title=foobar &..."
is going to get the dbkey "Foobar_" which won't be valid for the page.

~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Roan Kattouw wrote:
> Brion Vibber schreef:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Daniel Friesen wrote:
>>
>>
>>> The issue was on prefixes...
>>> "Test_" was showing things like "TestMan".
>>>
>>> I noted a way to fix that on the bug's page. Append a single character
>>> to the title, and then strip it off once you have the key.
>>> Using '.' in this case.
>>>
>>> substr( ...titleToKey($text.'.'), 0, -1 );
>>>
>>> Because the . is appended to the title "test_" will be normalized to the
>>> db key "Test_." and then we strip off the . and end up with "Test_".
>>> It's basically a placeholder character saying "Hey, I'm sitting here
>>> representing the rest of the title... don't strip what's beside me!",
>>> then we get rid of it when done.
>>>
>>>
>> That feels a little icky to me. :)
>>
>> What I might recommend is having a couple of steps to the normalization:
>>
>> 1) Normalization of partial titles
>>
>> ...may end with / or whitespace or otherwise not be quite 100% a valid
>> title... for use in normalizing things to go into searches, prefix
>> searches, etc.
>>
>> 2) Complete title normalization
>>
>> Finish that off with right-side trims, enforce length limits, etc.
>>
> Of course Brion's solution is the cleanest one and the best one in the
> long term, but until someone has done that split I'm just gonna use the
> hack Daniel suggested (although I'll put it *inside* the titleToKey()
> function, not in the call).
>
> Roan Kattouw (Catrope)
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at home

Aug 26, 2008, 4:52 AM

Post #6 of 7 (606 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php [In reply to]

Daniel Friesen schreef:
> Why inside of the function? That goes back to the whole issue of titles
> not being normalized right. Someone asking for "...&title=foobar &..."
> is going to get the dbkey "Foobar_" which won't be valid for the page.
Yeah, I thought of that later too. Still, it's probably a good idea to
have a separate function (like titlePartToKey() and keyToTitlePart() or
something similar) that does all the substr() magic rather than
duplicating it all over the place.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Aug 26, 2008, 5:16 AM

Post #7 of 7 (608 views)
Permalink
Re: [MediaWiki-CVS] SVN: [39938] trunk/phase3/includes/api/ApiQueryBase.php [In reply to]

Ya, that part would be a good idea.
It would also make migrating to any new normalization system easier
since you only need to change that one function, and don't need to worry
about anyone calling titleToKey and keyToTitle when you have different
code for prefixes.

~Daniel Friesen(Dantman, Nadir-Seen-Fire) of:
-The Nadir-Point Group (http://nadir-point.com)
--It's Wiki-Tools subgroup (http://wiki-tools.com)
--The ElectronicMe project (http://electronic-me.org)
--Games-G.P.S. (http://ggps.org)
-And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
--Animepedia (http://anime.wikia.com)
--Narutopedia (http://naruto.wikia.com)

Roan Kattouw wrote:
> Daniel Friesen schreef:
>
>> Why inside of the function? That goes back to the whole issue of titles
>> not being normalized right. Someone asking for "...&title=foobar &..."
>> is going to get the dbkey "Foobar_" which won't be valid for the page.
>>
> Yeah, I thought of that later too. Still, it's probably a good idea to
> have a separate function (like titlePartToKey() and keyToTitlePart() or
> something similar) that does all the substr() magic rather than
> duplicating it all over the place.
>
> Roan Kattouw (Catrope)
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.