Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Language variants

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


heldergeovane at gmail

Sep 9, 2009, 4:10 AM

Post #1 of 28 (1941 views)
Permalink
Language variants

Hello!

I noticed at sr.wikipedia there is an option "Variant" under
"Internationalization" at the preferences. How is that different from
the 'sr', 'sr-ec' and 'sr-el' which are shown at "Language" option
(also under "Internationalization")?

I'm interested in this because there are some differences between
"Brazilian Portuguese" ('pt-br') and "Portuguese of Portugal" ('pt')
which usually cause troubles for the admins at the Portuguese
projects, who needs to warn the users not to change the wording of the
texts from one variant to another (this usually happens, mainly from
anonymous contributions), because some differences between the
variants seems to be [at a first glance] a typo, and they want to
"correct" it...

So, I would like to know if there is currently any feature which could
help us to avoid the problem of having a divided community of users
('pt' x 'pt-br') "fighting" with each other ad infinitum... (and to
avoid proposals like that [1] of a new "Brazilian Wikipedia", which
IMHO will not have any good result, and is not the better way of
solving the problem...)

I found [http://strategy.wikimedia.org/w/index.php?title=Proposal_talk%3AA_Brazilian_Portuguese_Wikipedia&diff=14163&oldid=13621
a comment] about the existence of "on-the-fly translation" for some
languages (Chinese and Serbian), but I don't know how it works, and if
it solves or improve the situation.

And before this I was also thinking of use (a possible enhanced
version of) a procedure like this: considering that currently it is
possible to show a system message using {{int:MESSAGE}} in the
wikitext in a way that the result changes according to the user's
language, would it be possible to create new messages at "MediaWiki:"
Namespace just for defining language variants of words which usually
appears at the content of the projects? For example, would it be
possible to create "MediaWiki:WORD/pt-br" and "MediaWiki:WORD/pt", and
use them (with {{int:WORD}}) instead of the actual word variant in
wikitext? This isn't likely to be the better solution, but it could be
a first step towards a solution...

Any thoughts on how could Portuguese community improve the situation
at pt.* projects?
(is there any other list I should ask about this?)

Helder

[1] http://strategy.wikimedia.org/wiki/Proposal:A_Brazilian_Portuguese_Wikipedia

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Sep 9, 2009, 4:50 AM

Post #2 of 28 (1894 views)
Permalink
Re: Language variants [In reply to]

2009/9/9 Helder Geovane Gomes de Lima <heldergeovane [at] gmail>:
> Hello!
>
> I noticed at sr.wikipedia there is an option "Variant" under
> "Internationalization" at the preferences. How is that different from
> the 'sr', 'sr-ec' and 'sr-el' which are shown at "Language" option
> (also under "Internationalization")?
>
> I'm interested in this because there are some differences between
> "Brazilian Portuguese" ('pt-br') and "Portuguese of Portugal" ('pt')
> which usually cause troubles for the admins at the Portuguese
> projects, who needs to warn the users not to change the wording of the
> texts from one variant to another (this usually happens, mainly from
> anonymous contributions), because some differences between the
> variants seems to be [at a first glance] a typo, and they want to
> "correct" it...
>
sr-ec and sr-el refer to the Latin and Cyrillic variants of Serbian
(not sure which is which), and AFAIK the software can convert
everything, even article text, because the conversion rules are so
simple that a computer can execute them. Basically, sr-ec and sr-el
have the same text in the same language, but in different alphabets.
(This is my understanding, which may be completely wrong; in that
case, please correct me.)

The difference between pt and pt-br are more delicate than that, and
the two can't be autoconverted between by a computer, because of
differences in spelling word usage and grammar(?).

> So, I would like to know if there is currently any feature which could
> help us to avoid the problem of having a divided community of users
> ('pt' x 'pt-br') "fighting" with each other ad infinitum... (and to
> avoid proposals like that [1] of a new "Brazilian Wikipedia", which
> IMHO will not have any good result, and is not the better way of
> solving the problem...)
>
No. We already offer users the choice between having the interface in
pt or pt-br (or any other language, really), but such a choice doesn't
exist for the content.

> I found [http://strategy.wikimedia.org/w/index.php?title=Proposal_talk%3AA_Brazilian_Portuguese_Wikipedia&diff=14163&oldid=13621
> a comment] about the existence of "on-the-fly translation" for some
> languages (Chinese and Serbian), but I don't know how it works, and if
> it solves or improve the situation.
>
That's the alphabet variant thing I mentioned earlier. If the majority
of the differences between pt and pt-br can be summed up with simple
rules that a computer can handle, we might be able to work something
out. However, that's usually not the case; I don't know Portugese, but
I do know that handling even simple differences between en-us and
en-gb is too complex already: a system that would successfully convert
'realise' to 'realize' may also try to wrongfully convert 'disguise'.

> And before this I was also thinking of use (a possible enhanced
> version of) a procedure like this: considering that currently it is
> possible to show a system message using {{int:MESSAGE}} in the
> wikitext in a way that the result changes according to the user's
> language, would it be possible to create new messages at "MediaWiki:"
> Namespace just for defining language variants of words which usually
> appears at the content of the projects? For example, would it be
> possible to create "MediaWiki:WORD/pt-br" and "MediaWiki:WORD/pt", and
> use them (with {{int:WORD}}) instead of the actual word variant in
> wikitext? This isn't likely to be the better solution, but it could be
> a first step towards a solution...
>
This sounds like it could work, but only if the /langcode trick
actually works (I don't know what that depends on) and if there's a
relatively small set of words that makes a relatively big difference
(otherwise it'd be more trouble than it's worth IMO; but that's up to
the community).

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dgerard at gmail

Sep 9, 2009, 5:10 AM

Post #3 of 28 (1891 views)
Permalink
Re: Language variants [In reply to]

2009/9/9 Roan Kattouw <roan.kattouw [at] gmail>:
> 2009/9/9 Helder Geovane Gomes de Lima <heldergeovane [at] gmail>:

>> So, I would like to know if there is currently any feature which could
>> help us to avoid the problem of having a divided community of users
>> ('pt' x 'pt-br') "fighting" with each other ad infinitum... (and to
>> avoid proposals like that [1] of a new "Brazilian Wikipedia", which
>> IMHO will not have any good result, and is not the better way of
>> solving the problem...)

> No. We already offer users the choice between having the interface in
> pt or pt-br (or any other language, really), but such a choice doesn't
> exist for the content.


This is a community issue. Having a single pt:wp is a win because
there's more content in one place and it avoids local-POV bias, same
as there's one en:wp rather than US-English and Commonwealth-English.

So you need a community rule.

The rule we have on en:wp is:

1. It doesn't matter.
2. Use the variant spoken in the location, if relevant.
3. Don't change articles from one to the other except per 2.
4. Try not to worry too much about it.

4. is the important step ;-) It should be simple enough to let new
users know the rule and "not to worry about which variant" :-)


- d.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 9, 2009, 3:50 PM

Post #4 of 28 (1893 views)
Permalink
Re: Language variants [In reply to]

Roan Kattouw wrote:
> That's the alphabet variant thing I mentioned earlier. If the majority
> of the differences between pt and pt-br can be summed up with simple
> rules that a computer can handle, we might be able to work something
> out. However, that's usually not the case; I don't know Portugese, but
> I do know that handling even simple differences between en-us and
> en-gb is too complex already: a system that would successfully convert
> 'realise' to 'realize' may also try to wrongfully convert 'disguise'.

I don't know why you're writing this nonsense, you obviously haven't
looked at the code at all.

The language variant system that we have could easily convert between
US and UK English. In fact it already does convert between a language
pair with a far more complex relationship, that is Simplified and
Traditional Chinese.

The language conversion system is very simple, it's just a table of
translated pairs, where the longest match takes precedence. The
translation table in one direction (e.g. UK -> US) can be different to
the table in the other direction (US -> UK). You would not list "ize
-> ise", you would list every word in the dictionary with an -ize
ending that can be translated to -ise without controversy. The current
software could handle 50k pairs or so without serious performance
problems, and it could be extended and optimised to allow millions of
pairs if there was a need for that.

It's possible to handle any pair of languages which are separated only
by vocabulary, and transliteration or spelling. It's only differences
in grammar, such as word order, that would give it trouble.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


heldergeovane at gmail

Sep 9, 2009, 5:53 PM

Post #5 of 28 (1889 views)
Permalink
Re: Language variants [In reply to]

Nice! ;-)

Do you think tables like these
http://pt.wiktionary.org/wiki/Wikcionrio:Verses da lngua portuguesa/Tabela
http://pt.wikipedia.org/wiki/Wikipedia:Verses da lngua portuguesa/tabela
could be a start point to a similar conversion system for pt <-> pt-br?

Meanwhile, I was also trying to adapt the Template:LangSwitch from
Wikimedia Commons
(http://commons.wikimedia.org/wiki/Template:LangSwitch), in order to
be able to use the template syntax like this:
{{Language variations| pt = word 1| pt-br = word 2}}

For this, I've created two pages:
* MediaWiki:Lang, with 'pt'
* MediaWiki:Lang/pt-br, with 'pt-br'

and the template code is essentially:
{{#switch:{{int:Lang}}
|pt-br={{{pt-br|}}}
|pt
|#default={{{pt|}}}
}}

But I wasn't able to create a param "default" in order we could set
which of the variants will be shown by default for anonymous users. It
would be good if we could use {{Language variations| default = pt-br |
pt = word 1| pt-br = word 2}} to get:
(a) word 2, for annonimous users;
(b) word 1, for logged users which choose 'pt' in their preferences;
(c) word 2, for logged users which choose 'pt-br' in their preferences;
The option (a) would be necessary if we don't want to change an
existing text from 'pt-br' to 'pt' (for anonymous users) just because
we want the logged users to be able to choose the "content variant".

Is there any way of detect if the reader is logged in with something
in the style {{#if: <what?> | foo| bar}}?
(the problem with {{int:Lang}} is that for anonymous users and for
users who choose 'pt' the result is the same: 'pt', so I can't
distinguish these two cases at the template...)

Anyway, I think it would be better to have some kind of an automatized
conversion system, even if it doesn't convert all cases ( at least for
the words in the tables above it would be useful)

Thank you for all,

Helder

2009/9/9 Tim Starling <tstarling [at] wikimedia>:
> Roan Kattouw wrote:
>> That's the alphabet variant thing I mentioned earlier. If the majority
>> of the differences between pt and pt-br can be summed up with simple
>> rules that a computer can handle, we might be able to work something
>> out. However, that's usually not the case; I don't know Portugese, but
>> I do know that handling even simple differences between en-us and
>> en-gb is too complex already: a system that would successfully convert
>> 'realise' to 'realize' may also try to wrongfully convert 'disguise'.
>
> I don't know why you're writing this nonsense, you obviously haven't
> looked at the code at all.
>
> The language variant system that we have could easily convert between
> US and UK English. In fact it already does convert between a language
> pair with a far more complex relationship, that is Simplified and
> Traditional Chinese.
>
> The language conversion system is very simple, it's just a table of
> translated pairs, where the longest match takes precedence. The
> translation table in one direction (e.g. UK -> US) can be different to
> the table in the other direction (US -> UK). You would not list "ize
> -> ise", you would list every word in the dictionary with an -ize
> ending that can be translated to -ise without controversy. The current
> software could handle 50k pairs or so without serious performance
> problems, and it could be extended and optimised to allow millions of
> pairs if there was a need for that.
>
> It's possible to handle any pair of languages which are separated only
> by vocabulary, and transliteration or spelling. It's only differences
> in grammar, such as word order, that would give it trouble.
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Sep 10, 2009, 6:43 AM

Post #6 of 28 (1881 views)
Permalink
Re: Language variants [In reply to]

Helder Geovane Gomes de Lima wrote:
> But I wasn't able to create a param "default" in order we could set
> which of the variants will be shown by default for anonymous users. It
> would be good if we could use {{Language variations| default = pt-br |
> pt = word 1| pt-br = word 2}} to get:
> (a) word 2, for annonimous users;
> (b) word 1, for logged users which choose 'pt' in their preferences;
> (c) word 2, for logged users which choose 'pt-br' in their preferences;
> The option (a) would be necessary if we don't want to change an
> existing text from 'pt-br' to 'pt' (for anonymous users) just because
> we want the logged users to be able to choose the "content variant".

There's no difference. Anonymous users get the default language.
What you could do is having thee "languages": pt (generic Portuguese,
default), pt-pt and pt-br.

> Is there any way of detect if the reader is logged in with something
> in the style {{#if: <what?> | foo| bar}}?
No.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 10, 2009, 10:06 AM

Post #7 of 28 (1883 views)
Permalink
Re: Language variants [In reply to]

On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling <tstarling [at] wikimedia> wrote:
> I don't know why you're writing this nonsense, you obviously haven't
> looked at the code at all.

This paragraph is unnecessary.

> The language variant system that we have could easily convert between
> US and UK English. In fact it already does convert between a language
> pair with a far more complex relationship, that is Simplified and
> Traditional Chinese.
>
> The language conversion system is very simple, it's just a table of
> translated pairs, where the longest match takes precedence. The
> translation table in one direction (e.g. UK -> US) can be different to
> the table in the other direction (US -> UK). You would not list "ize
> -> ise", you would list every word in the dictionary with an -ize
> ending that can be translated to -ise without controversy. The current
> software could handle 50k pairs or so without serious performance
> problems, and it could be extended and optimised to allow millions of
> pairs if there was a need for that.
>
> It's possible to handle any pair of languages which are separated only
> by vocabulary, and transliteration or spelling. It's only differences
> in grammar, such as word order, that would give it trouble.

Is there any reason nobody's tried adding such support for us/uk
English? It would resolve some long-standing tension on enwiki.
Would anons have to be given one variant or the other, or would they
get untransformed text or what? Does the variant transformation apply
to the edit page as well?

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tparscal at wikimedia

Sep 10, 2009, 10:39 AM

Post #8 of 28 (1885 views)
Permalink
Re: Language variants [In reply to]

On 9/10/09 10:06 AM, Aryeh Gregor wrote:
> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling<tstarling [at] wikimedia> wrote:
>
>> I don't know why you're writing this nonsense, you obviously haven't
>> looked at the code at all.
>>
> This paragraph is unnecessary.
>
Seriously! Please read things aloud before clicking send. You will
hopefully then be able to better detect when it's time to take a break,
eat some fruit and take it down a notch.
>> The language variant system that we have could easily convert between
>> US and UK English. In fact it already does convert between a language
>> pair with a far more complex relationship, that is Simplified and
>> Traditional Chinese.
>>
>> The language conversion system is very simple, it's just a table of
>> translated pairs, where the longest match takes precedence. The
>> translation table in one direction (e.g. UK -> US) can be different to
>> the table in the other direction (US -> UK). You would not list "ize
>> -> ise", you would list every word in the dictionary with an -ize
>> ending that can be translated to -ise without controversy. The current
>> software could handle 50k pairs or so without serious performance
>> problems, and it could be extended and optimised to allow millions of
>> pairs if there was a need for that.
>>
>> It's possible to handle any pair of languages which are separated only
>> by vocabulary, and transliteration or spelling. It's only differences
>> in grammar, such as word order, that would give it trouble.
>>
> Is there any reason nobody's tried adding such support for us/uk
> English? It would resolve some long-standing tension on enwiki.
> Would anons have to be given one variant or the other, or would they
> get untransformed text or what? Does the variant transformation apply
> to the edit page as well?
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
The variant system seems poorly understood by most people (including me)
which often tends to cause something (like it for instance) to also be
under-utilized...

Perhaps we need more information on what it intends to provide the user.
All I find in Google on this topic are blurbs about configuration
variables and lots of people confused as to what language variants even
are...

Is there some awesome documentation somewhere I have yet to find?

- Trevor

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


innocentkiller at gmail

Sep 10, 2009, 11:05 AM

Post #9 of 28 (1899 views)
Permalink
Re: Language variants [In reply to]

On Thu, Sep 10, 2009 at 1:39 PM, Trevor Parscal <tparscal [at] wikimedia> wrote:
> On 9/10/09 10:06 AM, Aryeh Gregor wrote:
>> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling<tstarling [at] wikimedia>  wrote:
>>
>>> I don't know why you're writing this nonsense, you obviously haven't
>>> looked at the code at all.
>>>
>> This paragraph is unnecessary.
>>
> Seriously! Please read things aloud before clicking send. You will
> hopefully then be able to better detect when it's time to take a break,
> eat some fruit and take it down a notch.
>>> The language variant system that we have could easily convert between
>>> US and UK English. In fact it already does convert between a language
>>> pair with a far more complex relationship, that is Simplified and
>>> Traditional Chinese.
>>>
>>> The language conversion system is very simple, it's just a table of
>>> translated pairs, where the longest match takes precedence. The
>>> translation table in one direction (e.g. UK ->  US) can be different to
>>> the table in the other direction (US ->  UK). You would not list "ize
>>> ->  ise", you would list every word in the dictionary with an -ize
>>> ending that can be translated to -ise without controversy. The current
>>> software could handle 50k pairs or so without serious performance
>>> problems, and it could be extended and optimised to allow millions of
>>> pairs if there was a need for that.
>>>
>>> It's possible to handle any pair of languages which are separated only
>>> by vocabulary, and transliteration or spelling. It's only differences
>>> in grammar, such as word order, that would give it trouble.
>>>
>> Is there any reason nobody's tried adding such support for us/uk
>> English?  It would resolve some long-standing tension on enwiki.
>> Would anons have to be given one variant or the other, or would they
>> get untransformed text or what?  Does the variant transformation apply
>> to the edit page as well?
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l [at] lists
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> The variant system seems poorly understood by most people (including me)
> which often tends to cause something (like it for instance) to also be
> under-utilized...
>
> Perhaps we need more information on what it intends to provide the user.
> All I find in Google on this topic are blurbs about configuration
> variables and lots of people confused as to what language variants even
> are...
>
> Is there some awesome documentation somewhere I have yet to find?
>
> - Trevor
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Nope, but there's a bug asking for documentation :)

https://bugzilla.wikimedia.org/show_bug.cgi?id=19044

I certainly agree that it's completely undocumented and thus not usable
to many people. The vast majority of devs--myself included--don't even
understand how it works, much less how to use it. Maybe if we had docs,
it'd be more usable outside of the (very) small minority who do use and
maintain it.

-Chad

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 10, 2009, 11:16 AM

Post #10 of 28 (1874 views)
Permalink
Re: Language variants [In reply to]

On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn <ariel [at] wikimedia> wrote:
> The differences between the UK and American varieties of English are not
> limited just to spelling and vocabulary.

Those account for the large majority of the more noticeable
differences, however.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ariel at wikimedia

Sep 10, 2009, 11:23 AM

Post #11 of 28 (1882 views)
Permalink
Re: Language variants [In reply to]

The differences between the UK and American varieties of English are not
limited just to spelling and vocabulary.

Ariel

Στις 10-09-2009, ημέρα Πεμ, και ώρα 13:06 -0400, ο/η Aryeh Gregor
έγραψε:
> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling <tstarling [at] wikimedia> wrote:
> > I don't know why you're writing this nonsense, you obviously haven't
> > looked at the code at all.
>
> This paragraph is unnecessary.
>
> > The language variant system that we have could easily convert between
> > US and UK English. In fact it already does convert between a language
> > pair with a far more complex relationship, that is Simplified and
> > Traditional Chinese.
> >
> > The language conversion system is very simple, it's just a table of
> > translated pairs, where the longest match takes precedence. The
> > translation table in one direction (e.g. UK -> US) can be different to
> > the table in the other direction (US -> UK). You would not list "ize
> > -> ise", you would list every word in the dictionary with an -ize
> > ending that can be translated to -ise without controversy. The current
> > software could handle 50k pairs or so without serious performance
> > problems, and it could be extended and optimised to allow millions of
> > pairs if there was a need for that.
> >
> > It's possible to handle any pair of languages which are separated only
> > by vocabulary, and transliteration or spelling. It's only differences
> > in grammar, such as word order, that would give it trouble.
>
> Is there any reason nobody's tried adding such support for us/uk
> English? It would resolve some long-standing tension on enwiki.
> Would anons have to be given one variant or the other, or would they
> get untransformed text or what? Does the variant transformation apply
> to the edit page as well?
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


heldergeovane at gmail

Sep 10, 2009, 11:49 AM

Post #12 of 28 (1877 views)
Permalink
Re: Language variants [In reply to]

2009/9/10 Aryeh Gregor
<Simetrical+wikilist [at] gmail<Simetrical%2Bwikilist [at] gmail>
>

> On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn <ariel [at] wikimedia>
> wrote:
> > The differences between the UK and American varieties of English are not
> > limited just to spelling and vocabulary.
>
> Those account for the large majority of the more noticeable
> differences, however.


I think this is also the case for Portuguese ('pt' x 'pt-br'). So, even if
the table doesn't solves every case, what it solves is sufficiently good...

2009/9/10 Aryeh Gregor
<Simetrical+wikilist [at] gmail<Simetrical%2Bwikilist [at] gmail>
>
>
> Is there any reason nobody's tried adding such support for us/uk
> English? It would resolve some long-standing tension on enwiki.
> Would anons have to be given one variant or the other, or would they
> get untransformed text or what? Does the variant transformation apply
> to the edit page as well?
>

I have the same questions...

Helder
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


node.ue at gmail

Sep 10, 2009, 12:39 PM

Post #13 of 28 (1876 views)
Permalink
Re: Language variants [In reply to]

It might be possible to make it apply to the edit page as well, but in
zh.wp, sr.wp, and kk.wp currently it does not. I'm guessing (could be
wrong) that it would eat up a lot more resources.

Mark

skype: node.ue



On Thu, Sep 10, 2009 at 11:49 AM, Helder Geovane Gomes de Lima
<heldergeovane [at] gmail> wrote:
> 2009/9/10 Aryeh Gregor
> <Simetrical+wikilist [at] gmail<Simetrical%2Bwikilist [at] gmail>
>>
>
>> On Thu, Sep 10, 2009 at 2:23 PM, Ariel T. Glenn <ariel [at] wikimedia>
>> wrote:
>> > The differences between the UK and American varieties of English are not
>> > limited just to spelling and vocabulary.
>>
>> Those account for the large majority of the more noticeable
>> differences, however.
>
>
> I think this is also the case for Portuguese ('pt' x 'pt-br'). So, even if
> the table doesn't solves every case, what it solves is sufficiently good...
>
> 2009/9/10 Aryeh Gregor
> <Simetrical+wikilist [at] gmail<Simetrical%2Bwikilist [at] gmail>
>>
>>
>> Is there any reason nobody's tried adding such support for us/uk
>> English? It would resolve some long-standing tension on enwiki.
>> Would anons have to be given one variant or the other, or would they
>> get untransformed text or what? Does the variant transformation apply
>> to the edit page as well?
>>
>
> I have the same questions...
>
> Helder
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


heldergeovane at gmail

Sep 10, 2009, 3:20 PM

Post #14 of 28 (1866 views)
Permalink
Re: Language variants [In reply to]

2009/9/9 Tim Starling <tstarling [at] wikimedia>

> The language variant system that we have could easily convert between
> US and UK English. In fact it already does convert between a language
> pair with a far more complex relationship, that is Simplified and
> Traditional Chinese.
>
> The language conversion system is very simple, it's just a table of
> translated pairs, where the longest match takes precedence. The
> translation table in one direction (e.g. UK -> US) can be different to
> the table in the other direction (US -> UK). You would not list "ize
> -> ise", you would list every word in the dictionary with an -ize
> ending that can be translated to -ise without controversy. The current
> software could handle 50k pairs or so without serious performance
> problems, and it could be extended and optimised to allow millions of
> pairs if there was a need for that.


Hello again!

What would be needed in order to use pages like MediaWiki:Conversiontable/pt
and MediaWiki:Conversiontable/pt-br at the wikimedia projects in Portuguese
for the conversion? Is it easy to have the language conversion enabled?
Could we gradually create the conversion tables?

Sorry for so many questions...

Helder
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Sep 10, 2009, 3:44 PM

Post #15 of 28 (1869 views)
Permalink
Re: Language variants [In reply to]

2009/9/10 Trevor Parscal <tparscal [at] wikimedia>:
> On 9/10/09 10:06 AM, Aryeh Gregor wrote:
>> On Wed, Sep 9, 2009 at 6:50 PM, Tim Starling<tstarling [at] wikimedia> wrote:
>>
>>> I don't know why you're writing this nonsense, you obviously haven't
>>> looked at the code at all.
>>>
>> This paragraph is unnecessary.
>>
> Seriously! Please read things aloud before clicking send. You will
> hopefully then be able to better detect when it's time to take a break,
> eat some fruit and take it down a notch.
In Tim's defense: I had indeed not looked at the code at all, and what
I wrote was incorrect, so what he wrote was completely true. I also
mentioned that my understanding of the variant conversion system was
limited, and that I might be completely wrong. Turns out I was, and
Tim corrected me. It's true that he probably didn't use the most
friendly tone in the world, but I've seen much worse, so I don't
really care. Let's just drop this before it turns into a flame war;
I'd like to keep those off wikitech-l.

> The variant system seems poorly understood by most people (including me)
> which often tends to cause something (like it for instance) to also be
> under-utilized...
>
Seems I'm not the only one who had a completely wrong idea about how
variants work. We definitely need more documentation and fame for this
system, so its potential doesn't go to waste.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 10, 2009, 4:56 PM

Post #16 of 28 (1872 views)
Permalink
Re: Language variants [In reply to]

On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <roan.kattouw [at] gmail> wrote:
> Seems I'm not the only one who had a completely wrong idea about how
> variants work. We definitely need more documentation and fame for this
> system, so its potential doesn't go to waste.

I theoretically knew that it was just a string-replace system, but it
didn't occur to me that it would be useful for more than
transliteration. It makes sense now that Tim pointed that out. How
would it handle word breaks, though? It would just ignore them, so
color -> colour also changes uncolored -> uncoloured? What about
things like HTML id's or even attribute/property names (<span
style="color:red">)? I'm sure I could dig through the code to find
the answers to these, but actually I'm not even sure offhand where the
code *is*.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


heldergeovane at gmail

Sep 10, 2009, 5:05 PM

Post #17 of 28 (1862 views)
Permalink
Re: Language variants [In reply to]

Hello!

I think the code is these:
http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00018
http://svn.wikimedia.org/doc/LanguageZh_8php-source.html#l00009

and a comment at
http://svn.wikimedia.org/doc/LanguageConverter_8php-source.html#l00258
says:

00271 /* we convert everything except:
00272 1. html markups (anything between < and >)
00273 2. html entities
00274 3. place holders created by the parser
00275 */

So, I don't think it will convert <span style="color:red">. But I'm
not sure, because I'm still learning php...

By the way, I can't understand Chinese, but (after using an on-line
translator) I think the page they have for documenting the system is
this:
http://zh.wikipedia.org/wiki/Help:%E4%B8%AD%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91%E7%9A%84%E7%B9%81%E7%AE%80%E5%A4%84%E7%90%86

Helder




2009/9/10 Aryeh Gregor <Simetrical+wikilist [at] gmail>
>
> On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <roan.kattouw [at] gmail> wrote:
> > Seems I'm not the only one who had a completely wrong idea about how
> > variants work. We definitely need more documentation and fame for this
> > system, so its potential doesn't go to waste.
>
> I theoretically knew that it was just a string-replace system, but it
> didn't occur to me that it would be useful for more than
> transliteration. It makes sense now that Tim pointed that out. How
> would it handle word breaks, though? It would just ignore them, so
> color -> colour also changes uncolored -> uncoloured? What about
> things like HTML id's or even attribute/property names (<span
> style="color:red">)? I'm sure I could dig through the code to find
> the answers to these, but actually I'm not even sure offhand where the
> code *is*.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 10, 2009, 5:31 PM

Post #18 of 28 (1862 views)
Permalink
Re: Language variants [In reply to]

Aryeh Gregor wrote:
> On Thu, Sep 10, 2009 at 6:44 PM, Roan Kattouw <roan.kattouw [at] gmail> wrote:
>> Seems I'm not the only one who had a completely wrong idea about how
>> variants work. We definitely need more documentation and fame for this
>> system, so its potential doesn't go to waste.
>
> I theoretically knew that it was just a string-replace system, but it
> didn't occur to me that it would be useful for more than
> transliteration. It makes sense now that Tim pointed that out. How
> would it handle word breaks, though? It would just ignore them, so
> color -> colour also changes uncolored -> uncoloured?

Neither of the implementations so far has required any knowledge of
word breaks, and so it has not been implemented. In theory you could
just list every larger word that contains a smaller transformed word, e.g.

humor -> humour
humorous -> humorous

But it might be better to just add a word segmentation feature.

> What about
> things like HTML id's or even attribute/property names (<span
> style="color:red">)? I'm sure I could dig through the code to find
> the answers to these, but actually I'm not even sure offhand where the
> code *is*.

languages/LanguageConverter.php. There are some rather inelegant
regexes to deal with cases like these, they seem to work. The
converter operates at a near-HTML stage of the parser, so it's not too
hard to skip attributes.

Note that the FastStringSearch extension is important for acheiving
good performance, especially in Chinese.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 10, 2009, 5:39 PM

Post #19 of 28 (1864 views)
Permalink
Re: Language variants [In reply to]

Ariel T. Glenn wrote:
> The differences between the UK and American varieties of English are not
> limited just to spelling and vocabulary.


Note that the -{...}- structure is available in wikitext to translate
article-specific fragments of text, so you can also translate worldview:

A popular game played with a bat and ball is -{en-gb:Cricket;
en-us:Baseball}-.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


nospam at vyznev

Sep 10, 2009, 9:58 PM

Post #20 of 28 (1858 views)
Permalink
Re: Language variants [In reply to]

Tim Starling wrote:
> Ariel T. Glenn wrote:
>> The differences between the UK and American varieties of English are not
>> limited just to spelling and vocabulary.
>
> Note that the -{...}- structure is available in wikitext to translate
> article-specific fragments of text, so you can also translate worldview:
>
> A popular game played with a bat and ball is -{en-gb:Cricket;
> en-us:Baseball}-.

That reminds me... some time ago, someone proposed to enable
LanguageConverter on Commons (but without any automatic conversion,
presumably) and to (ab?)use it to replace the existing autotranslation
hacks based on {{int:lang}}. Would that be in any sense feasible?

There would presumably be two major use cases: the easy one, which I do
believe the converter should handle just fine, would be to replace the
current <http://commons.wikipedia.org/wiki/Template:LangSwitch>,
generally used to autotranslate short phrases, with syntax like:

-{de:Eigene Arbeit; en:Own work; fi:Oma teos; fr:Travail personnel; etc.}-

(See <http://commons.wikipedia.org/wiki/Template:Own> for the source of
the example.)

The not-so-simple case would be replacing
<http://commons.wikipedia.org/wiki/Template:Autotranslate>, which is
used to translate entire templates, usually (though by no means
necessarily) combined with a long list of links to the various
translations so that users can easily browse them if the automatically
chosen version is no good or something. A naive implementation of that
would look something like:

-{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; <!--
...and so on for about 70 more languages -->}-

(Source: <http://commons.wikipedia.org/wiki/Template:GFDL>.)

I'd like to hope that there might be some better way of doing it,
though, even if I can't offhand think of what it might look like.

Still, would something like that work, even in theory, and would it be
an improvement over the way these things are currently done (which is
hacky enough itself)?

--
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Sep 11, 2009, 2:43 PM

Post #21 of 28 (1816 views)
Permalink
Re: Language variants [In reply to]

Ilmari Karonen wrote:
>> A popular game played with a bat and ball is -{en-gb:Cricket;
>> en-us:Baseball}-.
>
> That reminds me... some time ago, someone proposed to enable
> LanguageConverter on Commons (but without any automatic conversion,
> presumably) and to (ab?)use it to replace the existing autotranslation
> hacks based on {{int:lang}}. Would that be in any sense feasible?
>
> There would presumably be two major use cases: the easy one, which I do
> believe the converter should handle just fine, would be to replace the
> current <http://commons.wikipedia.org/wiki/Template:LangSwitch>,
> generally used to autotranslate short phrases, with syntax like:
>
> -{de:Eigene Arbeit; en:Own work; fi:Oma teos; fr:Travail personnel; etc.}-
>
> (See <http://commons.wikipedia.org/wiki/Template:Own> for the source of
> the example.)

I don't think it's really a saner syntax.


> The not-so-simple case would be replacing
> <http://commons.wikipedia.org/wiki/Template:Autotranslate>, which is
> used to translate entire templates, usually (though by no means
> necessarily) combined with a long list of links to the various
> translations so that users can easily browse them if the automatically
> chosen version is no good or something. A naive implementation of that
> would look something like:
>
> -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
> ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; <!--
> ...and so on for about 70 more languages -->}-
>
> (Source: <http://commons.wikipedia.org/wiki/Template:GFDL>.)
>
> I'd like to hope that there might be some better way of doing it,
> though, even if I can't offhand think of what it might look like.
>
> Still, would something like that work, even in theory, and would it be
> an improvement over the way these things are currently done (which is
> hacky enough itself)?

I don't think so. It's terribly ugly. You would want something like
{{GFDL/{{ENABLEDVARIANT}}}} (no, such magic word doesn't seem to exist yet).
But you would still have the problem of having people *choose* them. You
wouldn't put dozens of tabs to choose the variant. Which in fact isn't a
variant.

These are languages, variant system is not appropiate for them.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


happy-melon at live

Sep 11, 2009, 3:49 PM

Post #22 of 28 (1816 views)
Permalink
Re: Language variants [In reply to]

"Platonides" <Platonides [at] gmail> wrote in message
news:h8eg97$eh0$1 [at] ger
> Ilmari Karonen wrote:
>
> I don't think it's really a saner syntax.

That's not the point. It's a *safer* syntax. Using {{int:lang}} breaks
cache integrity: if you put {{SomeTemplate/{{int:lang}}} (or equally some
{{USERLANGUAGE}} magic word if it existed) on a page and save it, the link
that's added to the templatelinks table is the template subpage the *editor*
gets, but a viewer with a different language can get a different page. I
assume (before Tim shouts at me too, no I haven't read the code either) that
"The converter operates at a near-HTML stage of the parser" implies that
it's *way* after template expansion... are the "-{...}-" strings
stripmarked-out at that stage? Essentially, the key is that they can't
affect the transclusion structure of the rest of the page.

>>
>> -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
>> ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; <!--
>> ...and so on for about 70 more languages -->}-

The above begs the question, of course, would this switch actually work?
And if it does, how does it affect the cache and linktables? More
investigation needed, methinks....

I think the obstructions to implementing en-gb/en-us conversion on enwiki
would be social rather than technical. They've just gone through six months
of hell over date autoformatting, culminating in a decision to scrap the
system entirely and hence not support users being able to choose between
American and International *date formats*. If they don't even want to
support those, getting a full language conversion supported *would* be like
herding cats...

--HM



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


nospam at vyznev

Sep 12, 2009, 1:05 AM

Post #23 of 28 (1807 views)
Permalink
Re: Language variants [In reply to]

Happy-melon wrote:
>> Ilmari Karonen wrote:
>>>
>>> -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
>>> ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; <!--
>>> ...and so on for about 70 more languages -->}-
>
> The above begs the question, of course, would this switch actually work?
> And if it does, how does it affect the cache and linktables? More
> investigation needed, methinks....

Indeed, that was what I was wondering about too. Without actually
trying it out, my guess would be that it would indeed work, but at a
cost: it'd first parse all the 75 or so subtemplates and then throw all
but one of them away.

Of course, that's what one would have to do anyway, to get full link
table consistency.

It does seem to me that it might not be *that* inefficient, *if* the
page were somehow cached in its pre-languageconverted state but after
the expensive template parsing has been done. Does such a cache
actually exist, or, if not, could one be added with reasonable ease?

--
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gerard.meijssen at gmail

Sep 12, 2009, 1:48 AM

Post #24 of 28 (1797 views)
Permalink
Re: Language variants [In reply to]

Hoi,
When we are to do this for English and have digitise and digitize, we have
to keep in mind that this ONLY deals with issues that are differences
between GB and US English. There are other varieties of English that may
make this more complicated.

Given the size of the GB and US populations it would split the cache and
effectively double the cache size. There are more languages where this would
provide serious benefits. I can easily imagine that the German, Spanish and
Portuguese community would be interested.. Then there are many of the
"other" languages that may have an interest.. The first order of business is
not can it be done but who will implement and maintain the language part of
this.
Thanks,
GerardM

2009/9/12 Ilmari Karonen <nospam [at] vyznev>

> Happy-melon wrote:
> >> Ilmari Karonen wrote:
> >>
> >>> -{af: {{GFDL/af}}; als: {{GFDL/als}}; an: {{GFDL/an}}; ar: {{GFDL/ar}};
> >>> ast: {{GFDL/ast}}; be: {{GFDL/be}}; be-tarask: {{GFDL/be-tarask}}; <!--
> >>> ...and so on for about 70 more languages -->}-
> >
> > The above begs the question, of course, would this switch actually work?
> > And if it does, how does it affect the cache and linktables? More
> > investigation needed, methinks....
>
> Indeed, that was what I was wondering about too. Without actually
> trying it out, my guess would be that it would indeed work, but at a
> cost: it'd first parse all the 75 or so subtemplates and then throw all
> but one of them away.
>
> Of course, that's what one would have to do anyway, to get full link
> table consistency.
>
> It does seem to me that it might not be *that* inefficient, *if* the
> page were somehow cached in its pre-languageconverted state but after
> the expensive template parsing has been done. Does such a cache
> actually exist, or, if not, could one be added with reasonable ease?
>
> --
> Ilmari Karonen
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


midom.lists at gmail

Sep 12, 2009, 2:01 AM

Post #25 of 28 (1799 views)
Permalink
Re: Language variants [In reply to]

> Given the size of the GB and US populations it would split the cache
> and
> effectively double the cache size.

Did I just see you putting performance ahead of language support? Just
checkin'

Domas

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

First page Previous page 1 2 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.