Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Foundation

Tragical dynamics: that run for the number of articles

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded


zvandijk at googlemail

Jun 27, 2008, 6:49 AM

Post #1 of 28 (1166 views)
Permalink
Tragical dynamics: that run for the number of articles

Maybe this is not the most popular item, but I do like to comment on
the news about Japanese and Polish Wikipedias and their 500,000
articles each. In fact, jp.WP actually has 500,000, but pl.WP does
not.
In an attempt to compare Wikipedia language editions I have clicked
the button "random articles" and with a sample of 50 clicks each I
have calculated how many articles a language edition really has, minus
all those pseudo articles.

A pseudo article is e.g.
http://pdc.wikipedia.org/wiki/Bikini
http://co.wikipedia.org/wiki/191
http://ksh.wikipedia.org/wiki/Varsseveld
http://pl.wikipedia.org/wiki/Tandil
http://vo.wikipedia.org/wiki/Poplar_Bluff

Many Wikipedias loose, in my calculation, quite a huge percentage of
their articles. There is one honourable exception: Japanese Wikipedia,
which in 50 clicks showed absolutely no pseudo article. If Japanese
Wikipedia would have such a floppy policy about new articles as many
others have, jp.WP were already close to one million "articles". Pl.WP
has for about 300,000 real articles, very respectable, but not what it
seems to be.

Since the beginnings, Wikipedians report about the number of articles,
having to tell something about to the media and to be proud about
their achievements. They rank Wikipedia language editions by the
number of articles. This has caused tragical dynamics: many
Wikipedians and Wikipedias are so obsessed with this number that they
produce rubbish articles to show off. Volapük Wikipedia with more than
100,000 pseudo articles created by a single bot using user is only the
top of the iceberg, and when someone called to close vo.WP, vo.WP was
supported by a amazing number of users from many language editions:
cosi fan tutte. Wikipedians could and should use their time for more
useful article work.

It would be good if the community found a different way to compare or
to measure it's successes.

Ziko





--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


andreengels at gmail

Jun 27, 2008, 6:59 AM

Post #2 of 28 (1148 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

On Fri, Jun 27, 2008 at 3:49 PM, Ziko van Dijk <zvandijk [at] googlemail> wrote:

> A pseudo article is e.g.
> http://pdc.wikipedia.org/wiki/Bikini
> http://co.wikipedia.org/wiki/191
> http://ksh.wikipedia.org/wiki/Varsseveld
> http://pl.wikipedia.org/wiki/Tandil
> http://vo.wikipedia.org/wiki/Poplar_Bluff

Ok, I understand numbers 2, 4 and 5 in your list. Number 1 is
presumably included for being extremely stubby, but what's the issue
with the ksh: page? Only thing I notice is that the text part hasn't
got any internal links. But to consider something like that a 'non
article' like the co: and pl: examples seems harsh in the extreme.


--
Andre Engels, andreengels [at] gmail
ICQ: 6260644 -- Skype: a_engels

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Jun 27, 2008, 7:24 AM

Post #3 of 28 (1150 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

http://ksh.wikipedia.org/wiki/Varsseveld
It's not Ripuarian (ksh), but Nedersaksisch, the text is taken
directly from nds-nl.
Ziko

2008/6/27 Andre Engels <andreengels [at] gmail>:
> On Fri, Jun 27, 2008 at 3:49 PM, Ziko van Dijk <zvandijk [at] googlemail> wrote:
>
>> A pseudo article is e.g.
>> http://pdc.wikipedia.org/wiki/Bikini
>> http://co.wikipedia.org/wiki/191
>> http://ksh.wikipedia.org/wiki/Varsseveld
>> http://pl.wikipedia.org/wiki/Tandil
>> http://vo.wikipedia.org/wiki/Poplar_Bluff
>
> Ok, I understand numbers 2, 4 and 5 in your list. Number 1 is
> presumably included for being extremely stubby, but what's the issue
> with the ksh: page? Only thing I notice is that the text part hasn't
> got any internal links. But to consider something like that a 'non
> article' like the co: and pl: examples seems harsh in the extreme.
>
>
> --
> Andre Engels, andreengels [at] gmail
> ICQ: 6260644 -- Skype: a_engels
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


harel.cain at gmail

Jun 27, 2008, 7:36 AM

Post #4 of 28 (1141 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

The depth criterion available here:
http://meta.wikimedia.org/wiki/List_of_wikipedias is a good starting
point. I quote: "The "Depth" column ((Edits/Articles) ×
(Non-Articles/Articles) × (Stub-ratio)) is a rough indicator of a
Wikipedia's quality, showing how frequently its articles are updated."

Note that indeed Volapuek, Polish, Ripuarian and others have very low
depth ranking.


Harel

On Fri, Jun 27, 2008 at 4:49 PM, Ziko van Dijk <zvandijk [at] googlemail> wrote:

>
> It would be good if the community found a different way to compare or
> to measure it's successes.


--
Quidquid latine dictum sit, altum viditur.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


andrew.lih at gmail

Jun 27, 2008, 7:51 AM

Post #5 of 28 (1144 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

On Fri, Jun 27, 2008 at 9:49 PM, Ziko van Dijk <zvandijk [at] googlemail> wrote:
> Maybe this is not the most popular item, but I do like to comment on
> the news about Japanese and Polish Wikipedias and their 500,000
> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
> not.
> In an attempt to compare Wikipedia language editions I have clicked
> the button "random articles" and with a sample of 50 clicks each I
> have calculated how many articles a language edition really has, minus
> all those pseudo articles.

Yes, it's good to remind folks that "article count" is not a good
metric as it fails to take into account the cultural norms within the
language communities.

For a real startling view of what you are observing, you can see the
wikistats show Ja: (orange) has never had a "bot bump" like pl:, where
all those jagged jumps (yellow) are bot additions, meaning those
articles very likely have never been edited by humans.

http://stats.wikimedia.org/EN/PlotsPngArticlesTotal.htm#p2

-Andrew (User:Fuzheado)

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Jun 27, 2008, 8:14 AM

Post #6 of 28 (1144 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

Alas, judging a language edition by Wikimedia Statistics does not work.

Indonesian, Asturian and Volapük WPs have the same "depth" (8), but
id.WP is a very good WP. How comes? There not so many edits per
article in id.WP, because it has translated a lot from English. A
legitimate way to create (good) articles, but it does not need a lot
of edits.

Bot activity: Indeed, "bot bumps" can often easily be detected in stats tables.
Especially the small Wikipedias (I suppose) show (relatively) many bot
activities due to interwiki linking. On the other hand, pseudo
articles can be created by hand (let a script create it outside WP and
then insert it "manually").

Ziko



2008/6/27 Harel Cain <harel.cain [at] gmail>:
> The depth criterion available here:
> http://meta.wikimedia.org/wiki/List_of_wikipedias is a good starting
> point. I quote: "The "Depth" column ((Edits/Articles) ×
> (Non-Articles/Articles) × (Stub-ratio)) is a rough indicator of a
> Wikipedia's quality, showing how frequently its articles are updated."
>
> Note that indeed Volapuek, Polish, Ripuarian and others have very low
> depth ranking.
>
>
> Harel
>
> On Fri, Jun 27, 2008 at 4:49 PM, Ziko van Dijk <zvandijk [at] googlemail> wrote:
>
>>
>> It would be good if the community found a different way to compare or
>> to measure it's successes.
>
>
> --
> Quidquid latine dictum sit, altum viditur.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


polimerek at gmail

Jun 27, 2008, 9:45 AM

Post #7 of 28 (1127 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/27 Ziko van Dijk <zvandijk [at] googlemail>:
> Maybe this is not the most popular item, but I do like to comment on
> the news about Japanese and Polish Wikipedias and their 500,000
> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
> not.
> In an attempt to compare Wikipedia language editions I have clicked
> the button "random articles" and with a sample of 50 clicks each I
> have calculated how many articles a language edition really has, minus
> all those pseudo articles.
>
> A pseudo article is e.g.
> http://pdc.wikipedia.org/wiki/Bikini
> http://co.wikipedia.org/wiki/191
> http://ksh.wikipedia.org/wiki/Varsseveld
> http://pl.wikipedia.org/wiki/Tandil
> http://vo.wikipedia.org/wiki/Poplar_Bluff
>
> Many Wikipedias loose, in my calculation, quite a huge percentage of
> their articles. There is one honourable exception: Japanese Wikipedia,
> which in 50 clicks showed absolutely no pseudo article. If Japanese
> Wikipedia would have such a floppy policy about new articles as many
> others have, jp.WP were already close to one million "articles". Pl.WP
> has for about 300,000 real articles, very respectable, but not what it
> seems to be.
>
> Since the beginnings, Wikipedians report about the number of articles,
> having to tell something about to the media and to be proud about
> their achievements. They rank Wikipedia language editions by the
> number of articles. This has caused tragical dynamics: many
> Wikipedians and Wikipedias are so obsessed with this number that they
> produce rubbish articles to show off. Volapük Wikipedia with more than
> 100,000 pseudo articles created by a single bot using user is only the
> top of the iceberg, and when someone called to close vo.WP, vo.WP was
> supported by a amazing number of users from many language editions:
> cosi fan tutte. Wikipedians could and should use their time for more
> useful article work.
>

Well... Bear in mind that English Wikipedia also contains quite a lot
of bot-created articles and in fact English Wikipedia was the first
one to produce it. The others just followed the idea and started to do
it in order to artifically increase the number of articles. Polish
started to do it, when our rank went down due to mass production of
bot-created articles in Swedish, Italian, French and other Wikipedias.

Comapare:

http://pl.wikipedia.org/wiki/Aignerville

and

http://en.wikipedia.org/wiki/Aignerville

or

http://pl.wikipedia.org/wiki/Is%C3%B2vol

and

http://it.wikipedia.org/wiki/Is%C3%B2vol

http://nl.wikipedia.org/wiki/Eksj%C3%B6_(stad)

and

http://pl.wikipedia.org/wiki/Eksj%C3%B6

http://pl.wikipedia.org/wiki/Dystrykt_Set%C3%BAbal

and

http://nn.wikipedia.org/wiki/Set%C3%BAbal

etc...

Nothing really special with Polish Wikipedia - many others do exactly
the same including English. We had simply more active coders who knew
how to feed bots. But - as you can compare with other Wikipedias they
did sometimes really good job - in a sense that many bot created stubs
in Polish Wikipedia contains more data than their equivalents in for
example Swedish or French Wikipedia.

http://fr.wikipedia.org/wiki/Gr%C3%B3dek

http://fr.wikipedia.org/wiki/Drzewica

http://fr.wikipedia.org/wiki/Pszczyna

http://fr.wikipedia.org/wiki/Jas%C5%82o

etc...


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Jun 27, 2008, 10:09 AM

Post #8 of 28 (1135 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

Among the Big Wikipedias, the pl.WP has one of the lowest quota of
real articles:

Artikel (off.) realt. Art. Artikel W (Quot.)
EN 1400000 1344000 0,96
DE 696000 668160 0,96
FR 613000 514920 0,84
JA 466000 466000 1
IT 408000 301920 0,74
PL 467000 298880 0,64
ES 326000 293400 0,9
NL 404000 274720 0,68
SV 272000 217600 0,8
PT 338000 209560 0,62
RU 233000 195720 0,84
ZH 164000 144320 0,88
(most numbers from jan. 2008, en, de and pt older; estimations should
be rounded, in fact)

Only 64 % real articles in pl.WP, while the much criticized sv.WP has 80%.
But this is not about blaming some Wikipedians, but about finding out
how to compare WPs in a more effective way.
The average size (bytes per article) does not work either. Take the
article "Berlin" in Opper Sorabian (hsb). It has 3740 bytes. Sounds
good, but only 454 bytes (six short sentences) are the actual text.
1823 bytes alone are for the interwikis. This is not a manipulation,
but you see the difficulties when reading Wikimedia statistics. Even a
"geographical stub" with infoboxes, categories and interwikis produces
a lot of bytes.
It takes a human to evaluate.
Ziko

2008/6/27 Tomasz Ganicz <polimerek [at] gmail>:
> 2008/6/27 Ziko van Dijk <zvandijk [at] googlemail>:
>> Maybe this is not the most popular item, but I do like to comment on
>> the news about Japanese and Polish Wikipedias and their 500,000
>> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
>> not.
>> In an attempt to compare Wikipedia language editions I have clicked
>> the button "random articles" and with a sample of 50 clicks each I
>> have calculated how many articles a language edition really has, minus
>> all those pseudo articles.
>>
>> A pseudo article is e.g.
>> http://pdc.wikipedia.org/wiki/Bikini
>> http://co.wikipedia.org/wiki/191
>> http://ksh.wikipedia.org/wiki/Varsseveld
>> http://pl.wikipedia.org/wiki/Tandil
>> http://vo.wikipedia.org/wiki/Poplar_Bluff
>>
>> Many Wikipedias loose, in my calculation, quite a huge percentage of
>> their articles. There is one honourable exception: Japanese Wikipedia,
>> which in 50 clicks showed absolutely no pseudo article. If Japanese
>> Wikipedia would have such a floppy policy about new articles as many
>> others have, jp.WP were already close to one million "articles". Pl.WP
>> has for about 300,000 real articles, very respectable, but not what it
>> seems to be.
>>
>> Since the beginnings, Wikipedians report about the number of articles,
>> having to tell something about to the media and to be proud about
>> their achievements. They rank Wikipedia language editions by the
>> number of articles. This has caused tragical dynamics: many
>> Wikipedians and Wikipedias are so obsessed with this number that they
>> produce rubbish articles to show off. Volapük Wikipedia with more than
>> 100,000 pseudo articles created by a single bot using user is only the
>> top of the iceberg, and when someone called to close vo.WP, vo.WP was
>> supported by a amazing number of users from many language editions:
>> cosi fan tutte. Wikipedians could and should use their time for more
>> useful article work.
>>
>
> Well... Bear in mind that English Wikipedia also contains quite a lot
> of bot-created articles and in fact English Wikipedia was the first
> one to produce it. The others just followed the idea and started to do
> it in order to artifically increase the number of articles. Polish
> started to do it, when our rank went down due to mass production of
> bot-created articles in Swedish, Italian, French and other Wikipedias.
>
> Comapare:
>
> http://pl.wikipedia.org/wiki/Aignerville
>
> and
>
> http://en.wikipedia.org/wiki/Aignerville
>
> or
>
> http://pl.wikipedia.org/wiki/Is%C3%B2vol
>
> and
>
> http://it.wikipedia.org/wiki/Is%C3%B2vol
>
> http://nl.wikipedia.org/wiki/Eksj%C3%B6_(stad)
>
> and
>
> http://pl.wikipedia.org/wiki/Eksj%C3%B6
>
> http://pl.wikipedia.org/wiki/Dystrykt_Set%C3%BAbal
>
> and
>
> http://nn.wikipedia.org/wiki/Set%C3%BAbal
>
> etc...
>
> Nothing really special with Polish Wikipedia - many others do exactly
> the same including English. We had simply more active coders who knew
> how to feed bots. But - as you can compare with other Wikipedias they
> did sometimes really good job - in a sense that many bot created stubs
> in Polish Wikipedia contains more data than their equivalents in for
> example Swedish or French Wikipedia.
>
> http://fr.wikipedia.org/wiki/Gr%C3%B3dek
>
> http://fr.wikipedia.org/wiki/Drzewica
>
> http://fr.wikipedia.org/wiki/Pszczyna
>
> http://fr.wikipedia.org/wiki/Jas%C5%82o
>
> etc...
>
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


mathias.schindler at gmail

Jun 27, 2008, 2:00 PM

Post #9 of 28 (1124 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

On Fri, Jun 27, 2008 at 4:51 PM, Andrew Lih <andrew.lih [at] gmail> wrote:

> Yes, it's good to remind folks that "article count" is not a good
> metric as it fails to take into account the cultural norms within the
> language communities.

A year ago, some admins at de.wp (including me) tried and failed to
replace the "article count" on the de.wp front page by a rather vague
statement of "hundreds of thousands of articles" along with the
counter for articles with the "featured article" status. Not that it
would be impossible to compromise that number, but at least I felt it
was something worth focusing on to display it to our visitors.

Mathias

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


polimerek at gmail

Jun 28, 2008, 12:47 AM

Post #10 of 28 (1108 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/27 Ziko van Dijk <zvandijk [at] googlemail>:
> Among the Big Wikipedias, the pl.WP has one of the lowest quota of
> real articles:
>
> Artikel (off.) realt. Art. Artikel W (Quot.)
> EN 1400000 1344000 0,96
> DE 696000 668160 0,96
> FR 613000 514920 0,84
> JA 466000 466000 1
> IT 408000 301920 0,74
> PL 467000 298880 0,64
> ES 326000 293400 0,9
> NL 404000 274720 0,68
> SV 272000 217600 0,8
> PT 338000 209560 0,62
> RU 233000 195720 0,84
> ZH 164000 144320 0,88
> (most numbers from jan. 2008, en, de and pt older; estimations should
> be rounded, in fact)
>


Can you explain how this evalution been done? How do you distinguish
between "real" and other articles? Especially I don't believe in
statiscts shown for en Wikipedia. I have a feeing that there is much
more bot created articles in en Wikipedia than your statistcs show.

About a year ago I wanted to evaluate the number of bot created
articles created in Polish Wikipedia, and then evaluate how many of
them were expanded by humans. Unfortunatelly it was impossible to
perform as the bot owners do not keep records of its activity. Anyway
we checked randomly what happened with bot-created articles about
Polish villages and small towns, which was the very first bot
produciton in our Wikikipedia. As I was strongly opposed several years
ago to produce bot-created articles but failed to persuade my fellow
wikipedians, I just wanted to prove that it was indeed bad idea.
However, the study shown that around 70% of them were efectively
expanded by humans. Villagers added quite a lot of useful stuff to
these articles like histories of their villages, pictures of
interesting buildings etc. Can you explain if these articles are
treated "real" or "not real" in your statistics and why?


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


andreengels at gmail

Jun 28, 2008, 1:39 AM

Post #11 of 28 (1103 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

On Sat, Jun 28, 2008 at 9:47 AM, Tomasz Ganicz <polimerek [at] gmail> wrote:

> Can you explain how this evalution been done? How do you distinguish
> between "real" and other articles? Especially I don't believe in
> statiscts shown for en Wikipedia. I have a feeing that there is much
> more bot created articles in en Wikipedia than your statistcs show.

That is described in his first mail: He did 'random article' 50 times
and used that as a sample.


--
Andre Engels, andreengels [at] gmail
ICQ: 6260644 -- Skype: a_engels

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


polimerek at gmail

Jun 28, 2008, 2:20 AM

Post #12 of 28 (1100 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/28 Andre Engels <andreengels [at] gmail>:
> On Sat, Jun 28, 2008 at 9:47 AM, Tomasz Ganicz <polimerek [at] gmail> wrote:
>
>> Can you explain how this evalution been done? How do you distinguish
>> between "real" and other articles? Especially I don't believe in
>> statiscts shown for en Wikipedia. I have a feeing that there is much
>> more bot created articles in en Wikipedia than your statistcs show.
>
> That is described in his first mail: He did 'random article' 50 times
> and used that as a sample.
>

Well it is not described - I mean there is no clear criteria of
evaluation mentioned.
Does he speak Japanese or Polish? Is it possible to recognize "real"
and "unreal" articles without understanding them?

Compare:

http://he.wikipedia.org/wiki/%D7%9C%D7%95%D7%93%D7%96%27

Is it "real" or "unreal" article and why? I have a feeling that it is
bot created, but I am no sure about it, as I don't speak Hebrew :-)

And what about this:

http://uk.wikipedia.org/wiki/%D0%A4%D1%96%D0%B3%D1%83%D0%BB%D1%81_%D1%96_%D0%90%D0%BB%D1%96%D0%BD%D1%8C%D1%8F

It is quite long, but I am almost sure that it is bot created and
untouch by any human, because it contains only statistical data and
sentences looking as if they were machine created. I don't speak
Ukrainian well but understand it a little bit. But it is still just my
feelings...

It is funny that this article is longer than similar in es-Wikipedia,
although Spanish one was edited by humans for sure :-)

http://es.wikipedia.org/wiki/F%C3%ADgols_y_Ali%C3%B1%C3%A1

and moreover - if you check all Wikipedias which contain article about
Fígols i Alinyŕ only Spanish one looks as edited by human (but it is
just my feelings I can be wrong).

And this:

http://ta.wikipedia.org/wiki/%E0%AE%B5%E0%AE%BE%E0%AE%B0%E0%AF%8D%E0%AE%9A%E0%AE%BE

real or not real? I really don't know, probably bot-created :-)

I think if we would like to perform serios evaluation of "real" and
"unreal" articles it should be based on clear, not based on "feelings"
criteria, done on larger samples (at least 500 articles) and by people
who understand what they are reading.


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Jun 28, 2008, 5:34 AM

Post #13 of 28 (1102 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

There is Google Translater, and the Interwikis help as well. That
article of he.WP about Lodz I would count as a real article, because
there is information more than in a data base (links to Holocaust
related articles, something about 19th century, economy (textile)).
Indeed, I would like to make a more scientific scheme and apply it to
a larger sample, maybe there will establish a research group about. I
believe that my method does give a reasonable picture; of course,
whether my results say "50.000" real articles or "52.000" is not
really a measurable difference.
Ziko

PS: By the way, it is fun to browse a foreign language Wikipedia with
the help of Google translater - not perfect, but interesting what
others write about.


2008/6/28 Tomasz Ganicz <polimerek [at] gmail>:
> 2008/6/28 Andre Engels <andreengels [at] gmail>:
>> On Sat, Jun 28, 2008 at 9:47 AM, Tomasz Ganicz <polimerek [at] gmail> wrote:
>>
>>> Can you explain how this evalution been done? How do you distinguish
>>> between "real" and other articles? Especially I don't believe in
>>> statiscts shown for en Wikipedia. I have a feeing that there is much
>>> more bot created articles in en Wikipedia than your statistcs show.
>>
>> That is described in his first mail: He did 'random article' 50 times
>> and used that as a sample.
>>
>
> Well it is not described - I mean there is no clear criteria of
> evaluation mentioned.
> Does he speak Japanese or Polish? Is it possible to recognize "real"
> and "unreal" articles without understanding them?
>
> Compare:
>
> http://he.wikipedia.org/wiki/%D7%9C%D7%95%D7%93%D7%96%27
>
> Is it "real" or "unreal" article and why? I have a feeling that it is
> bot created, but I am no sure about it, as I don't speak Hebrew :-)
>
> And what about this:
>
> http://uk.wikipedia.org/wiki/%D0%A4%D1%96%D0%B3%D1%83%D0%BB%D1%81_%D1%96_%D0%90%D0%BB%D1%96%D0%BD%D1%8C%D1%8F
>
> It is quite long, but I am almost sure that it is bot created and
> untouch by any human, because it contains only statistical data and
> sentences looking as if they were machine created. I don't speak
> Ukrainian well but understand it a little bit. But it is still just my
> feelings...
>
> It is funny that this article is longer than similar in es-Wikipedia,
> although Spanish one was edited by humans for sure :-)
>
> http://es.wikipedia.org/wiki/F%C3%ADgols_y_Ali%C3%B1%C3%A1
>
> and moreover - if you check all Wikipedias which contain article about
> Fígols i Alinyŕ only Spanish one looks as edited by human (but it is
> just my feelings I can be wrong).
>
> And this:
>
> http://ta.wikipedia.org/wiki/%E0%AE%B5%E0%AE%BE%E0%AE%B0%E0%AF%8D%E0%AE%9A%E0%AE%BE
>
> real or not real? I really don't know, probably bot-created :-)
>
> I think if we would like to perform serios evaluation of "real" and
> "unreal" articles it should be based on clear, not based on "feelings"
> criteria, done on larger samples (at least 500 articles) and by people
> who understand what they are reading.
>
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


polimerek at gmail

Jun 28, 2008, 6:27 AM

Post #14 of 28 (1102 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/28 Ziko van Dijk <zvandijk [at] googlemail>:
> There is Google Translater, and the Interwikis help as well. That
> article of he.WP about Lodz I would count as a real article, because
> there is information more than in a data base (links to Holocaust
> related articles, something about 19th century, economy (textile)).
> Indeed, I would like to make a more scientific scheme and apply it to
> a larger sample, maybe there will establish a research group about. I
> believe that my method does give a reasonable picture; of course,
> whether my results say "50.000" real articles or "52.000" is not
> really a measurable difference.

Sorry about it, but it only shows that your results are not reliable,
because it is based on your feelings and poor quality machine
translations which could change in unpredictable way your feelings. I
might be affraid that the results shown in your table is just a
reflection of:
a) the quality of machine translation performed by google - it is
better for latin and germanic based languages (English, French,
Italian, German, Dutch etc.) and much worse for slavic, arabic and
East Asian languages.
b)your own subconcious attitude toward various nations and Wikipedias
- even if you are trying to evaluate them all fair


Google translate produces sometimes really funny results when
translating from Polish to English. For example:

"Przyszłość partii przyszłością narodu" (Future of the party is the
future of the nation) is translated to:

"The future of the future of the nation lot" :-)

Or:

"Byłbym spał, gdybym mógł." ( I would sleep, if only I could)

is translated to:

"I would be he lay, if I only could."

http://en.wikipedia.org/wiki/Machine_translation_software_usability#Trustworthiness_and_Security

http://www.nist.gov/speech/tests/mt/2006/doc/mt06eval_official_results.html

I think that a method to distunguish between "real" and "unreal"
articles should be based on analysis of the history of article and
formal "hard" criteria.

For example one can make a criteria that if there are at least 4
sentences writen by a human it is "real article".

--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Jun 28, 2008, 1:09 PM

Post #15 of 28 (1094 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

I have discussed my study with many people (one had similar results),
but no one was so aggressive, Tomasz.

> b)your own subconcious attitude toward various nations and Wikipedias

? Is this an accusation?

Ziko

--
Ziko van Dijk
NL-Silvolde

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


polimerek at gmail

Jun 28, 2008, 2:47 PM

Post #16 of 28 (1085 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/28 Ziko van Dijk <zvandijk [at] googlemail>:
> I have discussed my study with many people (one had similar results),
> but no one was so aggressive, Tomasz.
>
>> b)your own subconcious attitude toward various nations and Wikipedias
>
> ? Is this an accusation?
>

No, I am just a scientist, so I have a tendency to be sceptical and
have basic knowledge about typical mistakes of doing statistical
research.Too small sample, no clear criteria of evaluating it, and you
did not tested the experimental error or replication of your method,
by comparing results from several experiments asking other people to
use your meaning of what "real" article is.

50 articles sample tested by one person, who for sure have its own
attitudes is not enough to say that this or another Wikipedia is
better or worse. Everyone has its own attitudes towards one or another
nation. It is very natural thing. And if there is no clear definition
of what is "real" article and what is not, and to evaluate this it was
used google machine translation (which according to NIST survey from
2006 is found to be OK in only around 49% cases) so I am quite sure
that your results cannot be taken seriously. You could have stastical
error at least around 15-20% (if not more), so the results 0,60 or
0,80 is in experimental error range.

Anyway it would be interesting to make better planned experiments to
evaluate the quality of Wikipedia articles, but for sure it has to be
done on larger sample, some sort of "hard" criteria or a group of at
least 10 researchers speaking diffrent languages and having different
cultural background when to use "soft, human based" criteria.

--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


node.ue at gmail

Jun 28, 2008, 4:18 PM

Post #17 of 28 (1093 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

First of all, that is for Arabic and Chinese, which probably have the
worst quality of Google Translate.

Second of all, Google consistently fared better than almost every
other system, a surprising feat for a very recently developed system.

Even if machine translation isn't completely accurate, it's often
enough to get an idea of the content of the page, and I know I have
learned about several topics through reading translated articles from
pl.wp.

Mark

On 28/06/2008, Tomasz Ganicz <polimerek [at] gmail> wrote:
> 2008/6/28 Ziko van Dijk <zvandijk [at] googlemail>:
>
> > I have discussed my study with many people (one had similar results),
> > but no one was so aggressive, Tomasz.
> >
> >> b)your own subconcious attitude toward various nations and Wikipedias
> >
> > ? Is this an accusation?
> >
>
>
> No, I am just a scientist, so I have a tendency to be sceptical and
> have basic knowledge about typical mistakes of doing statistical
> research.Too small sample, no clear criteria of evaluating it, and you
> did not tested the experimental error or replication of your method,
> by comparing results from several experiments asking other people to
> use your meaning of what "real" article is.
>
> 50 articles sample tested by one person, who for sure have its own
> attitudes is not enough to say that this or another Wikipedia is
> better or worse. Everyone has its own attitudes towards one or another
> nation. It is very natural thing. And if there is no clear definition
> of what is "real" article and what is not, and to evaluate this it was
> used google machine translation (which according to NIST survey from
> 2006 is found to be OK in only around 49% cases) so I am quite sure
> that your results cannot be taken seriously. You could have stastical
> error at least around 15-20% (if not more), so the results 0,60 or
> 0,80 is in experimental error range.
>
> Anyway it would be interesting to make better planned experiments to
> evaluate the quality of Wikipedia articles, but for sure it has to be
> done on larger sample, some sort of "hard" criteria or a group of at
> least 10 researchers speaking diffrent languages and having different
> cultural background when to use "soft, human based" criteria.
>
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
>
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


lars at aronsson

Jun 28, 2008, 5:03 PM

Post #18 of 28 (1086 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

Tomasz Ganicz wrote:

> And if there is no clear definition of what is "real" article
> and what is not,

Apparently it was the 500k article event that caused Ziko to bring
the topic up this time. He's frustrated (and so am I) that 500K
articles is reported as an achievement, when it is indeed doubtful
what quality these articles have. Still, I think he exaggerates
the problem.

Earlier this year, when the topic came up on meta, it was because
of which languages were featured as the top 10 on
www.wikipedia.org,
http://meta.wikimedia.org/wiki/Top_Ten_Wikipedias

Since then, the Russian Wikipedia has gained the 10th position and
Swedish ("the one with all the stubs") is down to 11th, so there
is one problem less to care about. During that discussion, I
proposed to use the size of the compressed database dump
(pages-articles.xml.bz2) as the official metric, since it both
counts the total database size (one long article counts the same
as two short ones) and it completely removes the impact of bot
generated articles. The compressed size of the VolapĂźk Wikipedia
is very small, becase the same patterns appear in many of its
numerous articles.

On the talk page, there is a table where this is shown, and you
can sort by column by clicking the little boxes,
http://meta.wikimedia.org/wiki/Talk:Top_Ten_Wikipedias#What_problem_do_we_want_to_solve

I'd like to propose a quality metric: The difference in rank
between the article count and the compressed database size.

The English Wikipedia is the biggest (rank 1), whether you count
articles or compressed database size. So its quality is 0.

The Polish Wikipedia was the 4th by article count, but the 7th by
compressed database size, for a quality of 4 - 7 = -3.

The Swedish Wikipedia was (when this table was compiled) the 10th
biggest by article count, but the 12th biggest by compressed
database size, so its quality is 10 - 12 = -2.

The Russian Wikipedia was the 11th by article count, but 9th by
compressed database size, so its quality is +2. This doesn't mean
the Russian Wikipedia is better than the English one, only that it
is better than (two of) its peers of similar size.

The VolapĂźk Wikipedia was the 15th by article count, but the worse
than the 30th by compressed database size (the table is
incomplete), so its quality is worse than -15.



--
Lars Aronsson (lars [at] aronsson)
Aronsson Datateknik - http://aronsson.se

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


dgerard at gmail

Jun 28, 2008, 5:11 PM

Post #19 of 28 (1094 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/29 Mark Williamson <node.ue [at] gmail>:

> Second of all, Google consistently fared better than almost every
> other system, a surprising feat for a very recently developed system.


Not that recently - it's based on SYSTRAN.


- d.

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


node.ue at gmail

Jun 28, 2008, 11:12 PM

Post #20 of 28 (1089 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

No, it's not.

Mark

2008/6/28 David Gerard <dgerard [at] gmail>:
> 2008/6/29 Mark Williamson <node.ue [at] gmail>:
>
>> Second of all, Google consistently fared better than almost every
>> other system, a surprising feat for a very recently developed system.
>
>
> Not that recently - it's based on SYSTRAN.
>
>
> - d.
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


stephen.bain at gmail

Jun 29, 2008, 1:27 AM

Post #21 of 28 (1085 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

On Sun, Jun 29, 2008 at 10:03 AM, Lars Aronsson <lars [at] aronsson> wrote:
>
> I'd like to propose a quality metric: The difference in rank
> between the article count and the compressed database size.

I think this is a good metric, especially because it's a relative
metric (since it's effectively comparing projects against their peers
to see how mature they are).

Someone earlier was discussing article sizes, so I hacked up a script
to graph the distribution of article sizes:

http://www.toolserver.org/~thebainer/articlesizes/

Most graphs share the same basic shape, with a roughly logarithmic
distribution once you get past the initial peak (see the English
Wikipedia graph for an example of what I mean), but some are
different, and it tends to coincide with what has already been
observed.

> The Swedish Wikipedia was (when this table was compiled) the 10th
> biggest by article count, but the 12th biggest by compressed
> database size, so its quality is 10 - 12 = -2.

Swedish Wikipedia is distributed in almost exactly the same way as
English Wikipedia, with the difference being that its average size is
less than half that of En's, at around 1900 bytes.

> The Russian Wikipedia was the 11th by article count, but 9th by
> compressed database size, so its quality is +2. This doesn't mean
> the Russian Wikipedia is better than the English one, only that it
> is better than (two of) its peers of similar size.

Not only does the Russian Wikipedia have a high average article size
(about 5500 bytes, compared with, for example, English Wikipedia at
around 4100 bytes) but its graph, which has multiple peaks, seems to
show that, unlike many other projects, it has more mature, medium-size
articles than it does stubs.

> The Volapük Wikipedia was the 15th by article count, but the worse
> than the 30th by compressed database size (the table is
> incomplete), so its quality is worse than -15.

The Volapük Wikipedia has an unusual distribution, with two peaks. One
is in the usual place, just below the average size (which is low, at
just over 1000 bytes) while the other is around 2 - 2.5kb, which
corresponds to the size of all the geography stubs created by
SmeiraBot.

--
Stephen Bain
stephen.bain [at] gmail

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


polimerek at gmail

Jun 29, 2008, 2:00 AM

Post #22 of 28 (1089 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

2008/6/29 Stephen Bain <stephen.bain [at] gmail>:
> On Sun, Jun 29, 2008 at 10:03 AM, Lars Aronsson <lars [at] aronsson> wrote:
>>
>> I'd like to propose a quality metric: The difference in rank
>> between the article count and the compressed database size.
>
> I think this is a good metric, especially because it's a relative
> metric (since it's effectively comparing projects against their peers
> to see how mature they are).
>
> Someone earlier was discussing article sizes, so I hacked up a script
> to graph the distribution of article sizes:
>
> http://www.toolserver.org/~thebainer/articlesizes/
>

Yes, but size of avarage article can be easily icreased artifically
by bot-creation of inofoboxes, navigation templates, long list of
categories and interwiki etc.

Take a look for example on:

http://pl.wikipedia.org/wiki/Telmisartan

infobox is around 90% of its content. Blame Polish Wikipedia that it
allows creating such articles, I would agree with you :-) The article
was created by human not by bot.

Higher average size of articles in any Wikipedia can be easilly
achieved just by creating bot-only articles with plenty of statstical
data, huge infobox and several navigation templates. See example, I
have already shown:

http://uk.wikipedia.org/wiki/%D0%A4%D1%96%D0%B3%D1%83%D0%BB%D1%81_%D1%96_%D0%90%D0%BB%D1%96%D0%BD%D1%8C%D1%8F

Would be nice to have a tool comparing the "real" size of articles, by
which I mean counting size of free text only - without all templates
and other non-text stuff.


--
Tomek "Polimerek" Ganicz
http://pl.wikimedia.org/wiki/User:Polimerek
http://www.ganicz.pl/poli/
http://www.ptchem.lodz.pl/en/TomaszGanicz.html

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


node.ue at gmail

Jun 29, 2008, 2:07 AM

Post #23 of 28 (1086 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

Who's to say some bot-created articles can't be useful?

Some may not be very useful, but others are very helpful.

Mark

2008/6/29 Tomasz Ganicz <polimerek [at] gmail>:
> 2008/6/29 Stephen Bain <stephen.bain [at] gmail>:
>> On Sun, Jun 29, 2008 at 10:03 AM, Lars Aronsson <lars [at] aronsson> wrote:
>>>
>>> I'd like to propose a quality metric: The difference in rank
>>> between the article count and the compressed database size.
>>
>> I think this is a good metric, especially because it's a relative
>> metric (since it's effectively comparing projects against their peers
>> to see how mature they are).
>>
>> Someone earlier was discussing article sizes, so I hacked up a script
>> to graph the distribution of article sizes:
>>
>> http://www.toolserver.org/~thebainer/articlesizes/
>>
>
> Yes, but size of avarage article can be easily icreased artifically
> by bot-creation of inofoboxes, navigation templates, long list of
> categories and interwiki etc.
>
> Take a look for example on:
>
> http://pl.wikipedia.org/wiki/Telmisartan
>
> infobox is around 90% of its content. Blame Polish Wikipedia that it
> allows creating such articles, I would agree with you :-) The article
> was created by human not by bot.
>
> Higher average size of articles in any Wikipedia can be easilly
> achieved just by creating bot-only articles with plenty of statstical
> data, huge infobox and several navigation templates. See example, I
> have already shown:
>
> http://uk.wikipedia.org/wiki/%D0%A4%D1%96%D0%B3%D1%83%D0%BB%D1%81_%D1%96_%D0%90%D0%BB%D1%96%D0%BD%D1%8C%D1%8F
>
> Would be nice to have a tool comparing the "real" size of articles, by
> which I mean counting size of free text only - without all templates
> and other non-text stuff.
>
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>

_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


zvandijk at googlemail

Jun 29, 2008, 2:55 AM

Post #24 of 28 (1068 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

Tomasz,
My impression is that you do not like the results because pl.WP has a
poor ratio, that's what you initially complained about. I know - and
never denied - that 50 is a small sample; I did it for 53 Wikipedias.
I do have criteria, even if I did not list them up for you, you have
not read my paper. But you are immediately accusing me on judging
purely on feelings and attitudes to nations.

For example, at geo stubs I want at least two informations that are
not bot created.
http://pl.wikipedia.org/wiki/Abisynia_(powiat_bialski)
I checked the first some of that Kategoria:Zalążki artykułów o polskich wsiach
and only Abramy I'd count as real, because there are two informations
about history (1599 and 1676). I suppose that many of the other 50,159
articles of that category are pseudo articles (they all have that part
about the administrative division of 1975-1998).
The same thing can be said about Kategoria:Zalążek artykułu o
miejscowości francuskiej with 35,066 cities in France.
Schematic Planetoid articles are no real articles, like the 14.444 in
Kategoria:Planetoidy pasa głównego.

So, I can imagine why pl.WP has only 64% real articles according to my sample.

Ziko

2008/6/28 Tomasz Ganicz <polimerek [at] gmail>:
> 2008/6/28 Ziko van Dijk <zvandijk [at] googlemail>:
>> I have discussed my study with many people (one had similar results),
>> but no one was so aggressive, Tomasz.
>>
>>> b)your own subconcious attitude toward various nations and Wikipedias
>>
>> ? Is this an accusation?
>>
>
> No, I am just a scientist, so I have a tendency to be sceptical and
> have basic knowledge about typical mistakes of doing statistical
> research.Too small sample, no clear criteria of evaluating it, and you
> did not tested the experimental error or replication of your method,
> by comparing results from several experiments asking other people to
> use your meaning of what "real" article is.
>
> 50 articles sample tested by one person, who for sure have its own
> attitudes is not enough to say that this or another Wikipedia is
> better or worse. Everyone has its own attitudes towards one or another
> nation. It is very natural thing. And if there is no clear definition
> of what is "real" article and what is not, and to evaluate this it was
> used google machine translation (which according to NIST survey from
> 2006 is found to be OK in only around 49% cases) so I am quite sure
> that your results cannot be taken seriously. You could have stastical
> error at least around 15-20% (if not more), so the results 0,60 or
> 0,80 is in experimental error range.
>
> Anyway it would be interesting to make better planned experiments to
> evaluate the quality of Wikipedia articles, but for sure it has to be
> done on larger sample, some sort of "hard" criteria or a group of at
> least 10 researchers speaking diffrent languages and having different
> cultural background when to use "soft, human based" criteria.
>
> --
> Tomek "Polimerek" Ganicz
> http://pl.wikimedia.org/wiki/User:Polimerek
> http://www.ganicz.pl/poli/
> http://www.ptchem.lodz.pl/en/TomaszGanicz.html
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>



--
Ziko van Dijk
NL-Silvolde
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


gerard.meijssen at gmail

Jun 29, 2008, 3:08 AM

Post #25 of 28 (1074 views)
Permalink
Re: Tragical dynamics: that run for the number of articles [In reply to]

Hoi,
When you base your statistics on numbers that have a pseudo relevance, the
resulting statistics have as a consequence the same pseudo relevance or
less. When as a result of concentrating on inflating the wrong numbers the
results are called "tragic" as you have done, it is clear that the numbers
everybody is concentrating on are the wrong ones.

No amount of quibbling will change this fact, you can increase your sample
size as much as you like, you can include all kinds of other factors that
have a tangential relation to the numbers considered but it will not make
the results any better. It will not change the numbers you are basing the
argument on; they will not give meaningful results when people try and
improve the numbers.

This whole argumentation is based on the metric of number of articles. More
relevant are the numbers of reads for a project. By and large, there is no
way in which the numbers can be manipulated in a way that can be called
detrimental to individual Wikipedias and Wikipedia in general.

The most tragic part of this whole argument is that it is based on the wrong
premises. The result of all of these argument make no real difference. I can
safely argue that the big increase in the number of articles for the Volapuk
Wikipedia provided not only a large amount of new articles, it increased the
visibility of this language, it increased the number of people editing in
Volapuk it is a genuine success. The problem is that for all kinds of
reasons people are of the opinion that it diminishes the success of *their*
Wikipedia and a rich variety of arguments have been used to diminish the
success of all the hard work that went into making this happen. THIS IS SAD

When we use as a different metric particularly the number of people using a
project, there is no way that anyone can argue that the numbers are
unacceptable. It is obvious that the number of people speaking a language
will impact the relative numbers. At the same time any and all activities
that stimulate the numbers of readers are positive to the Wikipedia and WMF
aims. For languages with few people the method of gaining more readers may
be different. When all the intellectual activity is centred on these
methods, we have a positive discussion in stead of the current discussion
that will not bring us anything that I consider worthwhile..

My challenge to you all is to argue that your arguments have any relevance
except for the fact that we have always considered relevance by number of
activities... If your arguments are not convincing, you have all the
arguments why we should ditch the number of articles as the yardstick we
measure the relevance of our Wikipedia projets by...

Thanks,
GerardM

On Fri, Jun 27, 2008 at 3:49 PM, Ziko van Dijk <zvandijk [at] googlemail>
wrote:

> Maybe this is not the most popular item, but I do like to comment on
> the news about Japanese and Polish Wikipedias and their 500,000
> articles each. In fact, jp.WP actually has 500,000, but pl.WP does
> not.
> In an attempt to compare Wikipedia language editions I have clicked
> the button "random articles" and with a sample of 50 clicks each I
> have calculated how many articles a language edition really has, minus
> all those pseudo articles.
>
> A pseudo article is e.g.
> http://pdc.wikipedia.org/wiki/Bikini
> http://co.wikipedia.org/wiki/191
> http://ksh.wikipedia.org/wiki/Varsseveld
> http://pl.wikipedia.org/wiki/Tandil
> http://vo.wikipedia.org/wiki/Poplar_Bluff
>
> Many Wikipedias loose, in my calculation, quite a huge percentage of
> their articles. There is one honourable exception: Japanese Wikipedia,
> which in 50 clicks showed absolutely no pseudo article. If Japanese
> Wikipedia would have such a floppy policy about new articles as many
> others have, jp.WP were already close to one million "articles". Pl.WP
> has for about 300,000 real articles, very respectable, but not what it
> seems to be.
>
> Since the beginnings, Wikipedians report about the number of articles,
> having to tell something about to the media and to be proud about
> their achievements. They rank Wikipedia language editions by the
> number of articles. This has caused tragical dynamics: many
> Wikipedians and Wikipedias are so obsessed with this number that they
> produce rubbish articles to show off. VolapĂźk Wikipedia with more than
> 100,000 pseudo articles created by a single bot using user is only the
> top of the iceberg, and when someone called to close vo.WP, vo.WP was
> supported by a amazing number of users from many language editions:
> cosi fan tutte. Wikipedians could and should use their time for more
> useful article work.
>
> It would be good if the community found a different way to compare or
> to measure it's successes.
>
> Ziko
>
>
>
>
>
> --
> Ziko van Dijk
> NL-Silvolde
>
> _______________________________________________
> foundation-l mailing list
> foundation-l [at] lists
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
_______________________________________________
foundation-l mailing list
foundation-l [at] lists
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

First page Previous page 1 2 Next page Last page  View All Wikipedia foundation RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.