Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Case insensitive links (not just titles).

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


subscribe at divog

Feb 28, 2008, 10:30 AM

Post #1 of 34 (2007 views)
Permalink
Case insensitive links (not just titles).

Hi



Sorry for my English :)



What I need is case insensitive titles. My solution for the problem was to
change collation in mysql from <unf8_bin> to <utf8_general_ci> in table
<page>, for field <page_title>.



But bigger problem with links persists. In my case, if there is an article
<Frank Dreben>, link [[Frank Dreben]] is treated like a link to an existent
article (GoodLink), but link [[frank dreben]] is treated like a link to a
non-existent article, so, this link opens editing of existent article <Frank
Dreben>. What can be fixed for that link [[frank dreben]] to be treated like
a GoodLink?



I've spent some time in Parser.php, LinkCache.php, Title.php, Linker.php,
LinkBatch.php but found nothing useful. The last thing I tried was to do
strtoupper on title every time array of link cache is filled, in
LinkCache.php. I also tried to do strtoupper on title every time data is
fetched from the array.

I've tried to make titles in cache be case insensitive, but it didn't work
out, not sure why - it seems like when links are constructed (parser, title,
linker, etc) only LinkCache methods are used.



Could anybody point a direction to dig in? :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Feb 28, 2008, 4:43 PM

Post #2 of 34 (1962 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

From my understanding Title::secureAndSplit(); is the only place where
anything to do with case-sensitivity of Titles is located.

^_^ If you poke me in the right way you could probably get me to hunt
down everything and create a patch to MediaWiki-trunk which would
introduce two new features:
* extend the global variable to allow for the options [full
case-sensitivity/full case-insensitivity/first letter only
case-insensitivity] while maintaining legacy support for previous
configurations.
* Add a new hook 'TitleCaseMods', or perhaps 'TitleSecureAndSplit', ^_^
or actually someone else should probably give me a good name for it for
what location it is put in; Which would allow for extensions to make
alterations to how titles are treated. This would allow for the creation
of extensions which would permit things like the per-namespace case
sensitivity which one group of wiki was asking for in Bugzilla at one
point in time.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

subscribe [at] divog wrote:
> Hi
>
>
>
> Sorry for my English :)
>
>
>
> What I need is case insensitive titles. My solution for the problem was to
> change collation in mysql from <unf8_bin> to <utf8_general_ci> in table
> <page>, for field <page_title>.
>
>
>
> But bigger problem with links persists. In my case, if there is an article
> <Frank Dreben>, link [[Frank Dreben]] is treated like a link to an existent
> article (GoodLink), but link [[frank dreben]] is treated like a link to a
> non-existent article, so, this link opens editing of existent article <Frank
> Dreben>. What can be fixed for that link [[frank dreben]] to be treated like
> a GoodLink?
>
>
>
> I've spent some time in Parser.php, LinkCache.php, Title.php, Linker.php,
> LinkBatch.php but found nothing useful. The last thing I tried was to do
> strtoupper on title every time array of link cache is filled, in
> LinkCache.php. I also tried to do strtoupper on title every time data is
> fetched from the array.
>
> I've tried to make titles in cache be case insensitive, but it didn't work
> out, not sure why - it seems like when links are constructed (parser, title,
> linker, etc) only LinkCache methods are used.
>
>
>
> Could anybody point a direction to dig in? :)
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Feb 28, 2008, 5:22 PM

Post #3 of 34 (1960 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Thu, Feb 28, 2008 at 7:43 PM, DanTMan <dan_the_man [at] telus> wrote:
> From my understanding Title::secureAndSplit(); is the only place where
> anything to do with case-sensitivity of Titles is located.

Explicitly, yeah, but any associative array using title strings as
keys will automatically be case-sensitive, just because array lookups
(and string comparisons generally) are case-sensitive. I have no idea
how many of those there are scattered about.

I really want some robust and generic normalization mechanism.
Instead of distinguishing between display titles and DB keys (which is
pointless: as though we can't store spaces in the database?),
distinguish between display titles and normalized titles. Normalized
titles would be stored separately in the database and used for lookup
and uniqueness checking, as well as in URLs, and are formed by
applying a canonical function to the display title. Then titles can
have underscores in them, for instance, in the default configuration
(just they'd be normalized to underscores), and someone who wanted to
muck around a bit could use all sorts of weird conventions if they
liked just by changing the normalization function and rebuilding the
page table.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Feb 29, 2008, 4:06 PM

Post #4 of 34 (1949 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

^_^ Complete Title backend rewrite!? A Image backend rewrite is being
worked on, why not start one for the Title class as a separate project.
We could compile a list of useful features in the Title system people
want that we currently don't have.
And come up with the most optimum way to deal with titles.
However, I'm not a fan of storing both a normalized underscore version
of the title, and a un-normalized space version of the title. I'm
thinking display title for display, and normalized title for all the
handling and other things. I think having the {{DISPLAYTITLE:}} function
store the display title inside of the page table would be best. And if
we made the normalized version depend on the display title then it
wouldn't be possible for someone to remove the requirement that the
displaytitle needs to normalize to the actual title. Some wiki would
like to have that not there, and have a subtitle added when they don't
match.
So DISPLAYTITLE and PAGETITLE stored in the database I would think. Or
we could actually to a tripple, we could decide what would be best after
considering all the possible features people might want to be able to
add into the title system, and consider various hooks to add which would
allow people to create Title modifying extensions without hacking core.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Simetrical wrote:
> On Thu, Feb 28, 2008 at 7:43 PM, DanTMan <dan_the_man [at] telus> wrote:
>
>> From my understanding Title::secureAndSplit(); is the only place where
>> anything to do with case-sensitivity of Titles is located.
>>
>
> Explicitly, yeah, but any associative array using title strings as
> keys will automatically be case-sensitive, just because array lookups
> (and string comparisons generally) are case-sensitive. I have no idea
> how many of those there are scattered about.
>
> I really want some robust and generic normalization mechanism.
> Instead of distinguishing between display titles and DB keys (which is
> pointless: as though we can't store spaces in the database?),
> distinguish between display titles and normalized titles. Normalized
> titles would be stored separately in the database and used for lookup
> and uniqueness checking, as well as in URLs, and are formed by
> applying a canonical function to the display title. Then titles can
> have underscores in them, for instance, in the default configuration
> (just they'd be normalized to underscores), and someone who wanted to
> muck around a bit could use all sorts of weird conventions if they
> liked just by changing the normalization function and rebuilding the
> page table.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


subscribe at divog

Mar 1, 2008, 2:42 AM

Post #5 of 34 (1943 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

> Explicitly, yeah, but any associative array using title strings as
> keys will automatically be case-sensitive, just because array lookups
> (and string comparisons generally) are case-sensitive. I have no idea
> how many of those there are scattered about.

Is there many of them - such things? The only one I found was LinkCache
class.
Parser, Linker, Title use only methods of LinkCache, when it's about
Good|BadLinks.
Maybe there are no other cases of use title string as keys of associative
array?


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 1, 2008, 4:47 PM

Post #6 of 34 (1959 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Fri, Feb 29, 2008 at 7:06 PM, DanTMan <dan_the_man [at] telus> wrote:
> However, I'm not a fan of storing both a normalized underscore version
> of the title, and a un-normalized space version of the title. I'm
> thinking display title for display, and normalized title for all the
> handling and other things. I think having the {{DISPLAYTITLE:}} function
> store the display title inside of the page table would be best. And if
> we made the normalized version depend on the display title then it
> wouldn't be possible for someone to remove the requirement that the
> displaytitle needs to normalize to the actual title. Some wiki would
> like to have that not there, and have a subtitle added when they don't
> match.

First of all, DISPLAYTITLE is a hack that should be removed in favor
of just using the move function, if this gets implemented and that
becomes possible. (Thanks to Rob, it's a much better hack than what
we used to have, but it's still a hack.) The interface for adding it
makes no sense -- to change the title you should move the page.
Having your perfectly sensible new page name be mangled in terms of
capitalization and '_' => ' ' is uninituitive, and DISPLAYTITLE is not
discoverable as a mechanism for evading it. It should Just Work when
you create a page with an underscore in its name.

Its implementation is also horribly incomplete. *Everything* in the
user interface should know about the display title, and use it.
Because it's currently stored in the page text, nothing knows about it
except when the page itself is actually being displayed. The display
title *has* to be stored in its own normalized database field for
arbitrary parts of code to have access to it.

As for wikis that want the normalized title displayed in a subtitle or
something, that's something an extension can implement using hooks as
an entirely separate mechanism. It's not relevant to this discussion,
IMO, especially if no one has any examples.

On Sat, Mar 1, 2008 at 5:42 AM, <subscribe [at] divog> wrote:
> Is there many of them - such things? The only one I found was LinkCache
> class.
> Parser, Linker, Title use only methods of LinkCache, when it's about
> Good|BadLinks.
> Maybe there are no other cases of use title string as keys of associative
> array?

It could be. But the general principle is, everyone's assumed titles
are case-sensitive until now, so you're probably going to find lots of
random places where that assumption is built in in various ways.
Hopefully not an unmanageably large number, but probably more than
just one or two.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 1, 2008, 7:49 PM

Post #7 of 34 (1947 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

Well, if you've checked any number of active wiki, you're likely to run
into the {{Title}} hack. Last I checked wiki like Wookiepedia and
Uncyclopedia which are only second to the Wikimedia wiki in size have
been using it for ages. And there are a few bugzilla entries asking for
the functionality to. So it's not something void of examples, use, or
demand:
http://starwars.wikia.com/wiki/Star_Wars_Episode_III:_Revenge_of_the_Sith
http://starwars.wikia.com/wiki/NR-N99_Persuader-class_droid_enforcer
http://starwars.wikia.com/wiki/Acclamator_I-class_assault_ship
http://uncyclopedia.org/wiki/Communism
http://uncyclopedia.org/wiki/Game:Zork/knife
http://uncyclopedia.org/wiki/Death
https://bugzilla.wikimedia.org/show_bug.cgi?id=12998

I can go for allowing MediaWiki to handle case, space/underscore, and
extra padding issues (Extra padding as in titles like _Summer, which
have valid uses <http://en.wikipedia.org/wiki/Underbar_Summer>) natively
in a title rewrite.
And having an extension handle the extra cases like WikiMarkup in titles
(Italics, Bolding, and class/styling of titles), stripping ()'s,
allowing # for display, and other off uses which would require the use
of a subtitle.
However, to reduce the complaints and negative comments. Perhaps we
should actually build that extension along-side a proper title rewrite
as a Proof of Point, that it can be done without making it an absolute
hack like it is.
Also, it would let us compile a full list of all the possible and
already desired features for Titles, and then dictate which ones
MediaWiki should support natively, and which ones should be something
only allowed with an installed extension.
Keep the code clean, but give the public the features they want.

Btw, DISPLAYTITLE did previously allow for off titles and did add the
subtitle. Some wiki were actually making use of that as a feature awhile
back and complained when it was /Fixed/ to never allow that whatsoever.
Without even letting people allow it using a config variable.

On a similar note, there's another feature which is used in some cases:
http://www.mediawiki.org/wiki/Extension:Ascii_Translit
That idea of allowing extensions to change the normalization process
would void out the use of that extension, and allow for that kind of
functionality without making it a hack, or needing to use redirects or
double pages.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Simetrical wrote:
> On Fri, Feb 29, 2008 at 7:06 PM, DanTMan <dan_the_man [at] telus> wrote:
>
>> However, I'm not a fan of storing both a normalized underscore version
>> of the title, and a un-normalized space version of the title. I'm
>> thinking display title for display, and normalized title for all the
>> handling and other things. I think having the {{DISPLAYTITLE:}} function
>> store the display title inside of the page table would be best. And if
>> we made the normalized version depend on the display title then it
>> wouldn't be possible for someone to remove the requirement that the
>> displaytitle needs to normalize to the actual title. Some wiki would
>> like to have that not there, and have a subtitle added when they don't
>> match.
>>
>
> First of all, DISPLAYTITLE is a hack that should be removed in favor
> of just using the move function, if this gets implemented and that
> becomes possible. (Thanks to Rob, it's a much better hack than what
> we used to have, but it's still a hack.) The interface for adding it
> makes no sense -- to change the title you should move the page.
> Having your perfectly sensible new page name be mangled in terms of
> capitalization and '_' => ' ' is uninituitive, and DISPLAYTITLE is not
> discoverable as a mechanism for evading it. It should Just Work when
> you create a page with an underscore in its name.
>
> Its implementation is also horribly incomplete. *Everything* in the
> user interface should know about the display title, and use it.
> Because it's currently stored in the page text, nothing knows about it
> except when the page itself is actually being displayed. The display
> title *has* to be stored in its own normalized database field for
> arbitrary parts of code to have access to it.
>
> As for wikis that want the normalized title displayed in a subtitle or
> something, that's something an extension can implement using hooks as
> an entirely separate mechanism. It's not relevant to this discussion,
> IMO, especially if no one has any examples.
>
> On Sat, Mar 1, 2008 at 5:42 AM, <subscribe [at] divog> wrote:
>
>> Is there many of them - such things? The only one I found was LinkCache
>> class.
>> Parser, Linker, Title use only methods of LinkCache, when it's about
>> Good|BadLinks.
>> Maybe there are no other cases of use title string as keys of associative
>> array?
>>
>
> It could be. But the general principle is, everyone's assumed titles
> are case-sensitive until now, so you're probably going to find lots of
> random places where that assumption is built in in various ways.
> Hopefully not an unmanageably large number, but probably more than
> just one or two.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 1, 2008, 8:35 PM

Post #8 of 34 (1956 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Sat, Mar 1, 2008 at 10:49 PM, DanTMan <dan_the_man [at] telus> wrote:
> Well, if you've checked any number of active wiki, you're likely to run
> into the {{Title}} hack. Last I checked wiki like Wookiepedia and
> Uncyclopedia which are only second to the Wikimedia wiki in size have
> been using it for ages.

What is that, a JavaScript hack? Looks to be. This won't interfere with it.

> However, to reduce the complaints and negative comments. Perhaps we
> should actually build that extension along-side a proper title rewrite
> as a Proof of Point, that it can be done without making it an absolute
> hack like it is.

I want to improve a certain class of functionality in certain ways.
You want to improve it even more. That's fine, but it's not what I'm
focusing on right now. I'm not as interested in the further
improvements you propose, and I don't see why you would think they
should be a requirement for implementing the smaller set of
improvements I suggested.

> On a similar note, there's another feature which is used in some cases:
> http://www.mediawiki.org/wiki/Extension:Ascii_Translit
> That idea of allowing extensions to change the normalization process
> would void out the use of that extension, and allow for that kind of
> functionality without making it a hack, or needing to use redirects or
> double pages.

That would be an immediate application for a custom normalization
function, yes, in the setup I envision. Not that I think anyone will
do it anytime soon.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 1, 2008, 8:51 PM

Post #9 of 34 (1955 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

The smaller set of improvements you propose will likely require a large
amount of change to the MediaWiki code.
Which is more sane?
* Editing a large amount of code to make small changes. And then ending
up finding out that further improvements can't be made without hacks and
needing to edit a large amount of code again.
* One group editing a large amount of code to make small changes, at the
same time that another group decides to do something similar yet
incompatible with the other than extends the functionality in another way.
* Or one group editing a large amount of code to make small changes at
the same time as opening up the ability to improve that further without
the use of hacks.

I noted the Title hack because as it is, both the css and js versions
are complete hacks, the DISPLAYTITLE function was created to try and
stop people from using those hacks by giving functionality for it inside
of MediaWiki itself. However as you see, people are still using the
Title hack and haven't stopped using it despite the fact that
DISPLAYTITLE exists, that shows that there is something left to be
desired in the current implementation before people are going to stop
using ugly hacks on common wiki.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Simetrical wrote:
> On Sat, Mar 1, 2008 at 10:49 PM, DanTMan <dan_the_man [at] telus> wrote:
>
>> Well, if you've checked any number of active wiki, you're likely to run
>> into the {{Title}} hack. Last I checked wiki like Wookiepedia and
>> Uncyclopedia which are only second to the Wikimedia wiki in size have
>> been using it for ages.
>>
>
> What is that, a JavaScript hack? Looks to be. This won't interfere with it.
>
>
>> However, to reduce the complaints and negative comments. Perhaps we
>> should actually build that extension along-side a proper title rewrite
>> as a Proof of Point, that it can be done without making it an absolute
>> hack like it is.
>>
>
> I want to improve a certain class of functionality in certain ways.
> You want to improve it even more. That's fine, but it's not what I'm
> focusing on right now. I'm not as interested in the further
> improvements you propose, and I don't see why you would think they
> should be a requirement for implementing the smaller set of
> improvements I suggested.
>
>
>> On a similar note, there's another feature which is used in some cases:
>> http://www.mediawiki.org/wiki/Extension:Ascii_Translit
>> That idea of allowing extensions to change the normalization process
>> would void out the use of that extension, and allow for that kind of
>> functionality without making it a hack, or needing to use redirects or
>> double pages.
>>
>
> That would be an immediate application for a custom normalization
> function, yes, in the setup I envision. Not that I think anyone will
> do it anytime soon.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 1, 2008, 9:40 PM

Post #10 of 34 (1951 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

What I'm saying is this:

Title's supporting various bits of text and other such stuff is not a
small thing, it's a feature and issue which has been around for awhile
and is something that a large number of people are involved in.
What one developer things is a way to solve the issue may not be what
others may think, and it may not even be the best way.

What I'm saying is, that with something with as large an involvement as
this, rather than one dev making a small change to how things worked, we
should get input from many of those who are involved on what is needed,
and what is the best way to go about it all.

And I'm not saying that adding the extension functionality is something
for you to do in addition. I'm saying that this could be best done as
multiple people working on different parts at the same time, and making
sure that the different parts are compatible with each other and work
cleanly instead of someone making a big hack later (Isn't changing a
small bit of functionality at one point and a hack needing to be created
later the whole reason we got into this whole big DISPLAYTITLE mess in
the first place? Repeating the past isn't good).
I'm even fine with being the one to do the extension stuff, while
working with you to make sure both our changes work together rather than
breaking each other, or locking the others features out and limiting
people to pick between.

Next, I'm not saying that both things coincide. In fact, we've been
talking in the notion that there are two types of titles, while ignoring
what's really there. There are three types of titles.
* Title key - keeps the complete normalized form. Used for uniqueness
checking, finding things, and such.
* Real title - keeps information on what the real padding, case, and
characters are actually inside of the title. Used in clean display of
the title and this is what is normalized to create the title key.
* Display title - this is what we actually display to the user, rather
than a bunch of technical limitations, the point is to make the display
suit the reader's eyes and deliver a name in a understandable means.
This may or may not be completely unique, and if it doesn't normalize to
the title key like the real title does, then some notification should be
added to make sure that bad links aren't created. In fact, rather than
just "Link with: Foo", we could output something like "Link with:
[[Foo|'''F'''oo]]" which considers limited parts of the displaytitle
(only italic and bold should be considered if markup is allowed) as well
as the real title to create proper links that can actually be used in
the best manor.

The key and display title I've been talking about is the key and display
to the user, what you've been talking about is actually the key and the
real title of the article. We should be considering key (backend use),
real (inline display), and display (title header display) rather than
just two of the three.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 2, 2008, 7:00 AM

Post #11 of 34 (1939 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Sat, Mar 1, 2008 at 11:51 PM, DanTMan <dan_the_man [at] telus> wrote:
> Which is more sane?
> * Editing a large amount of code to make small changes. And then ending
> up finding out that further improvements can't be made without hacks and
> needing to edit a large amount of code again.
> * One group editing a large amount of code to make small changes, at the
> same time that another group decides to do something similar yet
> incompatible with the other than extends the functionality in another way.
> * Or one group editing a large amount of code to make small changes at
> the same time as opening up the ability to improve that further without
> the use of hacks.

The third, which is why I never said the mechanism shouldn't be
perfectly extensible. It should be.

On Sun, Mar 2, 2008 at 12:40 AM, DanTMan <dan_the_man [at] telus> wrote:
> And I'm not saying that adding the extension functionality is something
> for you to do in addition. I'm saying that this could be best done as
> multiple people working on different parts at the same time, and making
> sure that the different parts are compatible with each other and work
> cleanly instead of someone making a big hack later (Isn't changing a
> small bit of functionality at one point and a hack needing to be created
> later the whole reason we got into this whole big DISPLAYTITLE mess in
> the first place? Repeating the past isn't good).
> I'm even fine with being the one to do the extension stuff, while
> working with you to make sure both our changes work together rather than
> breaking each other, or locking the others features out and limiting
> people to pick between.

I think you're assuming I'm actually going to do this. I doubt I am,
for the foreseeable future. I don't have the time to do much serious
hacking. I was just expressing a fond wish.

> Next, I'm not saying that both things coincide. In fact, we've been
> talking in the notion that there are two types of titles, while ignoring
> what's really there. There are three types of titles.
> * Title key - keeps the complete normalized form. Used for uniqueness
> checking, finding things, and such.
> * Real title - keeps information on what the real padding, case, and
> characters are actually inside of the title. Used in clean display of
> the title and this is what is normalized to create the title key.
> * Display title - this is what we actually display to the user, rather
> than a bunch of technical limitations, the point is to make the display
> suit the reader's eyes and deliver a name in a understandable means.
> This may or may not be completely unique, and if it doesn't normalize to
> the title key like the real title does, then some notification should be
> added to make sure that bad links aren't created. In fact, rather than
> just "Link with: Foo", we could output something like "Link with:
> [[Foo|'''F'''oo]]" which considers limited parts of the displaytitle
> (only italic and bold should be considered if markup is allowed) as well
> as the real title to create proper links that can actually be used in
> the best manor.

The second and third titles you name may or may not be required to
coincide. Permitting them not to (i.e., allowing the display title
not to normalize to the title key, and/or permitting odd things like
HTML in the display title) raises its own set of difficulties that
will require a lot more thought than the initial proposal, and go a
lot further. And I don't think they should be in core.

But I think this discussion has gotten to the point where it may as
well stop, unless someone says they're willing to write the code.
Further argument over implementation details is probably not very
productive without anyone seriously considering an implementation.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dgerard at gmail

Mar 2, 2008, 7:06 AM

Post #12 of 34 (1941 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On 02/03/2008, Simetrical <Simetrical+wikilist [at] gmail> wrote:

> But I think this discussion has gotten to the point where it may as
> well stop, unless someone says they're willing to write the code.
> Further argument over implementation details is probably not very
> productive without anyone seriously considering an implementation.


If someone could write this thread up for mediawiki.org, that would be
most helpful for others in the future. (When I get silly requests for
our work wiki, I look on mediawiki.org first.)


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 2, 2008, 6:28 PM

Post #13 of 34 (1934 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

:/ oh, now you just poked the coding bone...
http://uploads.screenshot-program.com/upl9489088627.png
First note, NO there is no {{DISPLAYTITLE:}} inside that page.

Though this is just a partial change. Only some parts are done, and
others missing.

Now onto the actual notes:
- It may be a compatibility break for some extensions which are doing
things they aren't supposed to do and using $title->mTextform instead of
$title->getText() because when a title is initialized now with a DB Key
the Textform is left as a null string and will be grabbed on the fly in
getText when we need it. (Avoids excessive database queries) (Yes there
is internal use of getText instead of mTextform to avoid using null
titles where they shouldn't be used)
- makeTitle now accepts a third optional parameter, the realtitle of a
title. This way we can initialize both when we have them and it'll be
there for when it's needed.
- equals now accepts a second optional parameter similar to the $valid
parameter we use in User:: stuff. It defaults to 'key' but if you pass
'real' to it, it'll compare real tittles instead. This is for use in
page move interfaces so that we can move titles from say [[Main Page]]
to [[Main_Page]].
- I do have the update and table sql and other stuff already in to add
the needed page_real field (Hope no-one minds I used an AFTER page_title
in the SQL patch to keep reading of the tables clean)
- However, it's not yet added for use. The stuff you see in the demo is
actually done by doing some mugging of mDbkey to initialize mTextform
when getText is called.
- I don't have the normalization stuff inside yet. A lot of str_replaces
are going to be replaced with the extend able normalization and others
removed because they're trying to backconvert where they shouldn't.
- Also, as you can see the functions for subpage names and basepage
names will need some tweaking to differentiate between the _ and _E
forms which should actually use the realtitle and titlekey forms
respectively rather than just the textform.

A note on DISPLATITLE:
Yes the DISPLAYTITLE is a hack, however it's widely used already. So I
won't be dropping support in the rewrite, otherwise current uses will break.
I'm going to come up with a maintenance script to populate the page_real
fields, and another which will hunt down every page with a DISPLAYTITLE
in it, and then move it to a proper title, and if possible try and
remove the DISPLAYTITLE from it if the script tells it to. (Though,
something like this can never be made to not leave cruft behind, so I'd
suggest Wikimedia wiki should do moves by hand rather than trying this
automatically. Especially since they use things like {{Lowercase}}
rather than hardcoded displaytitles).
Because DISPLAYTITLE already exists, rather than marking it as
depreciated or to be removed I'll try and make it a little less hacky,
and instead turn it into a function meant for extension of title
displays into a third type of title only meant for display when viewing
the page, not for other interface elements.
Note that extension of title displays is means a few things:
* Rather than DISPLAYTITLE doing everything, it's actually merely going
to call another set of stuff meant for displaytitle stuff (Meaning that
extensions can change the displaytitle in the background without needing
DISPLAYTITLE everywhere).
* The Displaytitle, unlike how it currently is done, will never show up
inside of the Pagetitle, the realtitle is what will show up in the
pagetitle (So wiki will want to move current titles using DISPLAYTITLE
to actual realtitles to have the current stuff inside the title show up).
* The purpose of a Displaytitle will not be for minor title things like
iPod or _Summer, but will actually be meant for things like Foo #1, Lisp
instead of Lisp (Programming language), Foo/Bar/, and Miniwiki or use of
MediaWiki in an alternative use where they actually use a special title
format and then modify how it looks by perhaps using a directory > like
> structure and linking previous portions of the title.
The current implementation does not allow for extensions to extend what
a DISPLAYTITLE actually is. I'll make a proof of point extension or two
for common use to test it out and satisfy a few people who are
complaining about the new restrictions to DISPLAYTITLE.

Oh, off topic but... No-one probably noticed it because it isn't used
anywhere inside of the code. But on Line 321 of includes/Title.php the
definition for the Title::nameOf function is missing the "public static"
that should be there. It's not used, but someone's going to get a big
shock when they try and use the function that says it's static but they
need an arbitrary instance to use it.

Well on topic...
Could Brion or someone the like give me SVN Commit access and create a
/branches/titlerewrite for this to be worked on in?

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

David Gerard wrote:
> On 02/03/2008, Simetrical <Simetrical+wikilist [at] gmail> wrote:
>
>
>> But I think this discussion has gotten to the point where it may as
>> well stop, unless someone says they're willing to write the code.
>> Further argument over implementation details is probably not very
>> productive without anyone seriously considering an implementation.
>>
>
>
> If someone could write this thread up for mediawiki.org, that would be
> most helpful for others in the future. (When I get silly requests for
> our work wiki, I look on mediawiki.org first.)
>
>
> - d.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 2, 2008, 7:04 PM

Post #14 of 34 (1929 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Sun, Mar 2, 2008 at 9:28 PM, DanTMan <dan_the_man [at] telus> wrote:
> Oh, off topic but... No-one probably noticed it because it isn't used
> anywhere inside of the code. But on Line 321 of includes/Title.php the
> definition for the Title::nameOf function is missing the "public static"
> that should be there. It's not used, but someone's going to get a big
> shock when they try and use the function that says it's static but they
> need an arbitrary instance to use it.

Well, actually PHP will just give an E_STRICT notice when you try to
use a non-static method statically. :)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 5, 2008, 11:43 PM

Post #15 of 34 (1903 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

:/ And I think I found out something even worse than E_STRICT...

I have no clue who came up with the dumb idea, but all of User.php is
using getText(); instead of getDBkey();
Which is insanely stupid, because getText is supposed to output text for
display, getDBkey is supposed to output the version of the text which
should be used for unique identification.
Unfortunately... Instead of relying on functional output, all of
User.php is relying on the assumption that the display version of the
text will always be as static as the actual unique identifying key.

Practical point?
If you move [[User:Username]] to [[User:username]], because getText now
outputs "username" instead of "Username", Username now cannot login to
the wiki.

So, we have two options:
A) Hack up User.php to use getDBkey and replaces _'s with spaces instead
of getText.
B) Make use of getDBkey for identification of the user and have the
update script refactor the users table to use underscores like it should
instead of spaces.
I'm in strong favor of B. If there is a place which aims for display of
a user's name we can also make use of getText, this will also have the
impressive benefit that if you move User:Username to User:_username the
software will go and display "_username" instead of "Username". So users
who like a special form of their username will actually be able to make
the interface display that instead of a normalized form with spaces.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Simetrical wrote:
> On Sun, Mar 2, 2008 at 9:28 PM, DanTMan <dan_the_man [at] telus> wrote:
>
>> Oh, off topic but... No-one probably noticed it because it isn't used
>> anywhere inside of the code. But on Line 321 of includes/Title.php the
>> definition for the Title::nameOf function is missing the "public static"
>> that should be there. It's not used, but someone's going to get a big
>> shock when they try and use the function that says it's static but they
>> need an arbitrary instance to use it.
>>
>
> Well, actually PHP will just give an E_STRICT notice when you try to
> use a non-static method statically. :)
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 6, 2008, 6:47 AM

Post #16 of 34 (1904 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Thu, Mar 6, 2008 at 2:43 AM, DanTMan <dan_the_man [at] telus> wrote:
> So, we have two options:
> A) Hack up User.php to use getDBkey and replaces _'s with spaces instead
> of getText.

In particular, of course, using some nice User method that hides the
ugly conversion in one place.

> B) Make use of getDBkey for identification of the user and have the
> update script refactor the users table to use underscores like it should
> instead of spaces.

The idea of having separate normalized/display names makes as much
sense for users as for titles, certainly. This seems like the more
logical option. It's not like we aren't going to have be doing
rebuilding and repopulating of the page table to do this anyway, so
why not the user table too?

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 6, 2008, 9:13 AM

Post #17 of 34 (1898 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

Ok, B it is. I'll add another entry to updaters.inc when I get home and
start by first converting all uses of getText in User.php to getDBkey.
After the actual title stuff is built, we can track down all the places
which use a displayable version of the name and make them use the
displayname instead.

On another note, I guess this is my official statement on this part, but
I intend to create a new class for the normalization of titles.
The TitleNormalizer class.

It acts as an instance, the primary purpose of it is for use of it's
normalize function. It's constructed with a default set of sequence
groups and sequence passes.
A few notes on that:
- Because of how it sequentially goes through things it has a nicely
defined order, to add another sequence inside of an area a new group can
even be inserted to group sequences of another type.
- The reason that the normalizer is used as an Instance, and not used
statically is for optimum extensibility. There may be cases where just
defining an extra sequence or two, or removing some won't be enough to
make a change that you want to make. To facilitate the larger
alterations to normalization someone can subclass the TitleNormalizer
with a new class which includes their major normalizations, and use a
Hook (Probably 'TitleNormalizerClass' or 'TitleNormalizerClassname'), to
have MediaWiki instantiate a different type of class.

Also another important note. Currently secureAndSplit includes the
trimming of whitespace as part of it's task before splitting interwiki
and namespaces out. For various reasons I will be changing that order.
Nothing will be trimmed from the title before those are split out, the
prefix splitter will be responsible for temporarily trimming whitespace
and other stuff out of the split text before trying to find out what the
prefix is. The actual trimming of whitespace will only happen after
that, and also only after the fragment is extracted to, when we know we
are actually working on the title portion only.
The current set of passes is actually quite hacky, as it basically trims
whitespace, splits interwiki, re-trims whitespace, splits fragment, then
re-trims whitespace again just to make sure that the actual title gets
it's whitespace trimmed. And note that all three of those are meant for
trimming the title, not the prefix or fragment, because I know at least,
that the regex used to grab the prefix is specifically coded to ignore
extra whitespace in the namespace/interwiki in the first place. Actually
on that note, it doesn't look like there is much reason for the use of
the regex. So to cut down on that, I'm going to try using normal string
functions to pull out the prefixes and trim them off. A strpos, substr,
and trim set together is much quicker than a full blown regex pattern match.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Simetrical wrote:
> On Thu, Mar 6, 2008 at 2:43 AM, DanTMan <dan_the_man [at] telus> wrote:
>
>> So, we have two options:
>> A) Hack up User.php to use getDBkey and replaces _'s with spaces instead
>> of getText.
>>
>
> In particular, of course, using some nice User method that hides the
> ugly conversion in one place.
>
>
>> B) Make use of getDBkey for identification of the user and have the
>> update script refactor the users table to use underscores like it should
>> instead of spaces.
>>
>
> The idea of having separate normalized/display names makes as much
> sense for users as for titles, certainly. This seems like the more
> logical option. It's not like we aren't going to have be doing
> rebuilding and repopulating of the page table to do this anyway, so
> why not the user table too?
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Mar 8, 2008, 6:56 AM

Post #18 of 34 (1900 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

DanTMan wrote:
> So to cut down on that, I'm going to try using normal string
> functions to pull out the prefixes and trim them off. A strpos, substr,
> and trim set together is much quicker than a full blown regex pattern match.

Not always. Remember that the PHP code surrounding that functions is
interpreted, while the regex call is run on compiled code.
I think some of the sysadmins remarked that the use of regex *improved*
the perfomance.
I'm not saying you shouldn't change it to traditional means, just that
time should be checked to be sure it's not slower.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 8, 2008, 1:41 PM

Post #19 of 34 (1901 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

Pherhaps, however looking over the regex: /^(.+?)_*:_*(.*)$/S
Checking over it with my regex tool, I notice that when it encounters a
_ it ends up doubling back over it when not followed by a : . Not to
mention that for each character it needs to check that it's any
character, or an underscore/followed by any, and if that's followed by a : .
I think that using a string function to find the first : in the string
(CPU's are best at incrementation so that's nothing), and then trimming
would be faster than using the regex.

Oh, ya, also there is something to remember. With the new format of
normalization the splitting should NOT trim whitespace as the current
setup does. If that were done then it would be eliminating whitespace
from the title which someone's altered normalization may actually wish
to keep.
So a altered version of that regex to suit, would be: /^(.+?):(.*)$/S
which most definitely is no where near as efficient as a simple find :
and split. I'll probably use list() and explode() actually.

On another note, I noticed something with the normalization. While : is
the standard separator, abstracting the normalization process like this
is actually loosening the definition of what is what in a title, while
still keeping it stable. Honestly, if someone changed the methods used
to prefix things, and altered the splitting sequence, someone could
probably change MediaWiki to use something like :: as the separator
instead. If they went to even more work, they could probably introduce a
special type of Namespace to MediaWiki which could use a different kind
of prefix, or even restrict to inclusion of only certain types of pages.
(Basically, wiki like card game wiki could force their package redirects
and card ids into special namespaces dedicated to them).
Actually, in light of that, I might add another hook or two, or clean up
some of the title functions to properly abstract the prefixing to where
it should be instead of mixing it up all over the place.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Platonides wrote:
> DanTMan wrote:
>
>> So to cut down on that, I'm going to try using normal string
>> functions to pull out the prefixes and trim them off. A strpos, substr,
>> and trim set together is much quicker than a full blown regex pattern match.
>>
>
> Not always. Remember that the PHP code surrounding that functions is
> interpreted, while the regex call is run on compiled code.
> I think some of the sysadmins remarked that the use of regex *improved*
> the perfomance.
> I'm not saying you shouldn't change it to traditional means, just that
> time should be checked to be sure it's not slower.
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 8, 2008, 4:53 PM

Post #20 of 34 (1898 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Sat, Mar 8, 2008 at 9:56 AM, Platonides <Platonides [at] gmail> wrote:
> DanTMan wrote:
> > So to cut down on that, I'm going to try using normal string
> > functions to pull out the prefixes and trim them off. A strpos, substr,
> > and trim set together is much quicker than a full blown regex pattern match.
>
> Not always. Remember that the PHP code surrounding that functions is
> interpreted, while the regex call is run on compiled code.
> I think some of the sysadmins remarked that the use of regex *improved*
> the perfomance.
> I'm not saying you shouldn't change it to traditional means, just that
> time should be checked to be sure it's not slower.

Or you should just ignore the difference and use whichever you think
is easier to read. There's no point in micro-optimization like this,
unless you have reason to believe that the particular functions are
important to performance.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 8, 2008, 6:34 PM

Post #21 of 34 (1897 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

Ok, new issue...

I've changed usage inside of User.php from getText and getPrefixedText
to getDBkey and getPrefixedDBkey.

However I notice an issue with the User functions themselves:
We have a getName function, and additionally for awhile we've had a
getTitleKey, but other than a single occurrence inside of Article.php,
it's widely unused.
That means that getName is used for both uniqueness testing, and display.

Obviously, because usernames are now stored in key form rather than text
form we are going to need to separate the functional use of functions
inside the User class for backend and display use.

We've got a few options here:
A) Create a new function getTitle which returns the title object which
the User matches, and make use of it's functions for the standard
displays and other things. Of course this is bascialy the same as
getUserPage.
B) Change getName's definition to be the display form of a user's name,
and getTitleKey to be the key form of the user's name. And change the
large number of comparison functions inside of MediaWiki to use
getTitleKey instead of getName.
C) Create a new function getDisplayName, and have getName's definition
changed to the key form of the user's name, and getDisplayName as the
display form of the user's name. Depreciate the use of getTitleKey
because of it's lack of use or need anymore (changing a single reference
to it). And change the uses of getName as a display value inside of
MediaWiki to use getDisplayName instead.
D) Create two new functions for the key and display forms of the user's
name. And depreciate the old functions, slowly changing use of them to
the new functions in MW to keep compatibility.

I'm probably in favor of C, as if it's definition is changed to key
form, all the backend testing and stuff will still work fine, and we
then will only need to worry about changing the areas that the username
is used for display. Which won't really be a problem if we miss
anything, because if we end up missing one conversion, the system will
simply be displaying a semi-ugly form with underscores inside of it and
none of the case stuff picked by the user. And it'll still be likable
and won't break anything in the backend.

Oh, ^_^ an interesting new ability due to switching to key form:
SELECT user_editcount FROM `user`, `page` WHERE user_name=page_title AND
page_namespace=2 AND page_id=3
In this example, page_id 3 is the userpage of my user's userpage.
What does it do? Well, if you were on the userpage of a user and just
had the page ID, you could now easily grab any information from the
database on that user himself because page_name and user_name are stored
in the same format instead of different formats.
^_^ Of course, this example isn't to useful, but I'm sure someone will
find some use for the two being a match now.


Though it looks like I'm also going to have to add some more database stuff.
* archives is going to need a ar_real field so that deleting a page
doesn't break it's titling.
* rev_user_text is going to need the same conversion that user_name
underwent, and I'll also need to do some stuff inside the backend to
change how the name of the user is displayed.
* same goes for image_user_text, ipb_by_text, oi_user_text,
rc_user_text, and ar_user_text

Oh, minor off topic. But what about putting a log_user_text in at some
point. Honnestly I know of a few extensions which intended to allow
certain things to be done by anons in addition to normal users, but
which broke when anons were allowed use of them because anon users were
not properly logged.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

DanTMan wrote:
> Ok, B it is. I'll add another entry to updaters.inc when I get home and
> start by first converting all uses of getText in User.php to getDBkey.
> After the actual title stuff is built, we can track down all the places
> which use a displayable version of the name and make them use the
> displayname instead.
>
> ...
>
> ~Daniel Friesen(Dantman) of:
> -The Gaiapedia (http://gaia.wikia.com)
> -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
> -and Wiki-Tools.com (http://wiki-tools.com)
>
> Simetrical wrote:
>
>> On Thu, Mar 6, 2008 at 2:43 AM, DanTMan <dan_the_man [at] telus> wrote:
>>
>>
>>
>>> B) Make use of getDBkey for identification of the user and have the
>>> update script refactor the users table to use underscores like it should
>>> instead of spaces.
>>>
>>>
>> The idea of having separate normalized/display names makes as much
>> sense for users as for titles, certainly. This seems like the more
>> logical option. It's not like we aren't going to have be doing
>> rebuilding and repopulating of the page table to do this anyway, so
>> why not the user table too?
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l [at] lists
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Mar 9, 2008, 7:01 AM

Post #22 of 34 (1906 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

Simetrical wrote:
> On Sat, Mar 8, 2008 at 9:56 AM, Platonides wrote:
>> DanTMan wrote:
>> > So to cut down on that, I'm going to try using normal string
>> > functions to pull out the prefixes and trim them off. A strpos, substr,
>> > and trim set together is much quicker than a full blown regex pattern match.
>>
>> Not always. Remember that the PHP code surrounding that functions is
>> interpreted, while the regex call is run on compiled code.
>> I think some of the sysadmins remarked that the use of regex *improved*
>> the perfomance.
>> I'm not saying you shouldn't change it to traditional means, just that
>> time should be checked to be sure it's not slower.
>
> Or you should just ignore the difference and use whichever you think
> is easier to read. There's no point in micro-optimization like this,
> unless you have reason to believe that the particular functions are
> important to performance.

Easiness to read could be a good reason to change that. I just warned
against changing for optimization reasons.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 12, 2008, 6:56 AM

Post #23 of 34 (1872 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Sat, Mar 8, 2008 at 10:34 PM, DanTMan <dan_the_man [at] telus> wrote:
> We've got a few options here:
> A) Create a new function getTitle which returns the title object which
> the User matches, and make use of it's functions for the standard
> displays and other things. Of course this is bascialy the same as
> getUserPage.

I don't like this. There's no reason username normalization can't be
stricter than title normalization, for instance. (It had better not
be less strict, of course, if you don't want user page collisions.)
In general I don't like mixing up titles with users. Of course in
practice the backend may use title normalization for users as well,
but that fact should all be hidden in a User method, not exposed to
callers.

> B) Change getName's definition to be the display form of a user's name,
> and getTitleKey to be the key form of the user's name. And change the
> large number of comparison functions inside of MediaWiki to use
> getTitleKey instead of getName.

getTitleKey sounds like a poor name to me.

> C) Create a new function getDisplayName, and have getName's definition
> changed to the key form of the user's name, and getDisplayName as the
> display form of the user's name. Depreciate the use of getTitleKey
> because of it's lack of use or need anymore (changing a single reference
> to it). And change the uses of getName as a display value inside of
> MediaWiki to use getDisplayName instead.

getName is then ambiguous: which name are you talking about?
getNormalizedName or getNameKey or something would be better.

> D) Create two new functions for the key and display forms of the user's
> name. And depreciate the old functions, slowly changing use of them to
> the new functions in MW to keep compatibility.

. . . so I would go for (D).

> I'm probably in favor of C, as if it's definition is changed to key
> form, all the backend testing and stuff will still work fine, and we
> then will only need to worry about changing the areas that the username
> is used for display. Which won't really be a problem if we miss
> anything, because if we end up missing one conversion, the system will
> simply be displaying a semi-ugly form with underscores inside of it and
> none of the case stuff picked by the user. And it'll still be likable
> and won't break anything in the backend.

Your logic is good, and applies equally to (D): alias getName to
getNormalizedName or whatever, rather than getDisplayName.

> Oh, ^_^ an interesting new ability due to switching to key form:
> SELECT user_editcount FROM `user`, `page` WHERE user_name=page_title AND
> page_namespace=2 AND page_id=3
> In this example, page_id 3 is the userpage of my user's userpage.
> What does it do? Well, if you were on the userpage of a user and just
> had the page ID, you could now easily grab any information from the
> database on that user himself because page_name and user_name are stored
> in the same format instead of different formats.

Interesting. Of course, my suggestion above that we don't rely on
user and title normalization being the same would break this. I don't
know if there's any good reason to go either way here.

> Oh, minor off topic. But what about putting a log_user_text in at some
> point. Honnestly I know of a few extensions which intended to allow
> certain things to be done by anons in addition to normal users, but
> which broke when anons were allowed use of them because anon users were
> not properly logged.

Yes, please! That's been to-do for ages.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dan_the_man at telus

Mar 12, 2008, 8:22 AM

Post #24 of 34 (1872 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

Mkay, D it is...
getName will be depreciated...
To go with the whole key/real namescheme I've been going with in
Title.php a new getRealName function will get the name to use for
interface display.
And to match that, getKeyName will get the name for use in uniqueness
checking and comparison, and getName will be aliased to it.

^_^ Actually about your note on User and Title normalization not being
the same. There is no real reason for them not to be (With the exception
of the stuff that we stick in functions like isValidName)...
Why's that? A little bonus I already theorized but never mentioned (I'm
good at grasping a lot of theory and wrapping my mind around how things
work and are supposed to, so I get a lot of them)
Because of the new extensible normalization, and how all the username
stuff relies on getDBkey and directly uses getText for displaying the
username, there is a little bonus.
If you go and extend the normalization of Titles specifically for the
User: namespace (remember that because of the way it's setup, you can
now create per-namespace normalization), the normalization of Usernames
will be directly affected by it (Which is kinda why I needed to alter
User.php because of that login bug).
So, if you go and make the User: namespace completely case-insensitive
and leave other namespaces the way they are, the Usernames will suddenly
all become completely case-insensitive to match that, without altering
any normalization code for usernames.

Btw: I have a function inside of the normalizer.
TitleNormalizer::backconvert( $title ); basically it does the normal
replacing of underscores with spaces. The point of it is for when we
don't have a page_real stored in the database (ie: nonexistant page),
then backconvert will be used to create a temporary title for displaying
while the page doesn't exist. Of course, there is a hook inside of it
which lets extensions override it in case they do something like
changing the ' ' to '_' normalization to ' ' to '-' for some reason.

Heh, I guess I'll take a look at that log stuff sometime later to see
how easy it will be.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Simetrical wrote:
> On Sat, Mar 8, 2008 at 10:34 PM, DanTMan <dan_the_man [at] telus> wrote:
>
>> We've got a few options here:
>> A) Create a new function getTitle which returns the title object which
>> the User matches, and make use of it's functions for the standard
>> displays and other things. Of course this is bascialy the same as
>> getUserPage.
>>
>
> I don't like this. There's no reason username normalization can't be
> stricter than title normalization, for instance. (It had better not
> be less strict, of course, if you don't want user page collisions.)
> In general I don't like mixing up titles with users. Of course in
> practice the backend may use title normalization for users as well,
> but that fact should all be hidden in a User method, not exposed to
> callers.
>
>
>> B) Change getName's definition to be the display form of a user's name,
>> and getTitleKey to be the key form of the user's name. And change the
>> large number of comparison functions inside of MediaWiki to use
>> getTitleKey instead of getName.
>>
>
> getTitleKey sounds like a poor name to me.
>
>
>> C) Create a new function getDisplayName, and have getName's definition
>> changed to the key form of the user's name, and getDisplayName as the
>> display form of the user's name. Depreciate the use of getTitleKey
>> because of it's lack of use or need anymore (changing a single reference
>> to it). And change the uses of getName as a display value inside of
>> MediaWiki to use getDisplayName instead.
>>
>
> getName is then ambiguous: which name are you talking about?
> getNormalizedName or getNameKey or something would be better.
>
>
>> D) Create two new functions for the key and display forms of the user's
>> name. And depreciate the old functions, slowly changing use of them to
>> the new functions in MW to keep compatibility.
>>
>
> . . . so I would go for (D).
>
>
>> I'm probably in favor of C, as if it's definition is changed to key
>> form, all the backend testing and stuff will still work fine, and we
>> then will only need to worry about changing the areas that the username
>> is used for display. Which won't really be a problem if we miss
>> anything, because if we end up missing one conversion, the system will
>> simply be displaying a semi-ugly form with underscores inside of it and
>> none of the case stuff picked by the user. And it'll still be likable
>> and won't break anything in the backend.
>>
>
> Your logic is good, and applies equally to (D): alias getName to
> getNormalizedName or whatever, rather than getDisplayName.
>
>
>> Oh, ^_^ an interesting new ability due to switching to key form:
>> SELECT user_editcount FROM `user`, `page` WHERE user_name=page_title AND
>> page_namespace=2 AND page_id=3
>> In this example, page_id 3 is the userpage of my user's userpage.
>> What does it do? Well, if you were on the userpage of a user and just
>> had the page ID, you could now easily grab any information from the
>> database on that user himself because page_name and user_name are stored
>> in the same format instead of different formats.
>>
>
> Interesting. Of course, my suggestion above that we don't rely on
> user and title normalization being the same would break this. I don't
> know if there's any good reason to go either way here.
>
>
>> Oh, minor off topic. But what about putting a log_user_text in at some
>> point. Honnestly I know of a few extensions which intended to allow
>> certain things to be done by anons in addition to normal users, but
>> which broke when anons were allowed use of them because anon users were
>> not properly logged.
>>
>
> Yes, please! That's been to-do for ages.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Mar 12, 2008, 4:06 PM

Post #25 of 34 (1871 views)
Permalink
Re: Case insensitive links (not just titles). [In reply to]

On Wed, Mar 12, 2008 at 11:22 AM, DanTMan <dan_the_man [at] telus> wrote:
> getName will be depreciated...
> To go with the whole key/real namescheme I've been going with in
> Title.php a new getRealName function will get the name to use for
> interface display.
> And to match that, getKeyName will get the name for use in uniqueness
> checking and comparison, and getName will be aliased to it.

Are "RealName" and "KeyName" the best terms to use? We already use
"Text" and "DBKey" for titles, but I recall that confused me somewhat
for a while. I would probably have done "DisplayName" and
"NormalizedName", but that might not be ideal either. We may as well
think about this now instead of being stuck with weird names forever.

> ^_^ Actually about your note on User and Title normalization not being
> the same. There is no real reason for them not to be (With the exception
> of the stuff that we stick in functions like isValidName)...
> Why's that? A little bonus I already theorized but never mentioned (I'm
> good at grasping a lot of theory and wrapping my mind around how things
> work and are supposed to, so I get a lot of them)
> Because of the new extensible normalization, and how all the username
> stuff relies on getDBkey and directly uses getText for displaying the
> username, there is a little bonus.
> If you go and extend the normalization of Titles specifically for the
> User: namespace (remember that because of the way it's setup, you can
> now create per-namespace normalization), the normalization of Usernames
> will be directly affected by it (Which is kinda why I needed to alter
> User.php because of that login bug).

Oh, that's very neat. It preserves a one-to-one correspondence
between usernames and User-namespace titles -- almost. Are you going
to do stuff like ban '@' and other things not allowed in usernames
from the namespace? That would make it a perfect bijection between
User pages and user names-plus-IP addresses.

> Btw: I have a function inside of the normalizer.
> TitleNormalizer::backconvert( $title ); basically it does the normal
> replacing of underscores with spaces. The point of it is for when we
> don't have a page_real stored in the database (ie: nonexistant page),
> then backconvert will be used to create a temporary title for displaying
> while the page doesn't exist. Of course, there is a hook inside of it
> which lets extensions override it in case they do something like
> changing the ' ' to '_' normalization to ' ' to '-' for some reason.

Hmm, I see. When would this be a concern? Shouldn't the page_real be
generated from the URL? I guess not exactly, if link targets are
normalized. I'm thinking if the user types, I dunno, "str_repeat"
into the search box, they should get links asking them to edit
"str_repeat", not "str repeat" or any other variant. The same should
apply to ordinary wikilinks, ideally -- but on the other hand,
non-broken wikilinks should still point to prettified locations. So I
guess this would require [[has space]] to translate to
?title=Has_space (or whatever normalized form) if it exists, but
?title=has%20space&action=edit if it doesn't. Which isn't perfect.
But I don't see any other way to achieve the effect.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

First page Previous page 1 2 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.