Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

(no subject)

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


qli at ica

Jan 22, 2010, 10:21 PM

Post #1 of 6 (722 views)
Permalink
(no subject)

hi all,
I built a local wiki, and I want to set the recentchange limit to
500|1000|5000|10000.
I changed the $wgRCLinkLimits = array( 50, 100, 250, 500 );
to $wgRCLinkLimits = array( 500, 1000, 5000, 10000 ); and 'rclimit' =>
10000.

Is this right? Or is there something more to do?

Thanks

vanessa
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


roan.kattouw at gmail

Jan 23, 2010, 6:52 AM

Post #2 of 6 (686 views)
Permalink
Re: (no subject) [In reply to]

2010/1/23 李琴 <qli [at] ica>:
> hi all,
> I built a local wiki, and I want to set the recentchange limit to
> 500|1000|5000|10000.
> I changed the $wgRCLinkLimits = array( 50, 100, 250, 500 );
> to $wgRCLinkLimits = array( 500, 1000, 5000, 10000 ); and 'rclimit'   =>
> 10000.
>
> Is this right? Or is there something more to do?
>
Looks OK. Have you tried to see if it works yet?

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


petr.kadlec at gmail

Jan 23, 2010, 11:13 AM

Post #3 of 6 (686 views)
Permalink
Re: (no subject) [In reply to]

On 23 January 2010 07:21, 李琴 <qli [at] ica> wrote:
> I changed the $wgRCLinkLimits = array( 50, 100, 250, 500 );
> to $wgRCLinkLimits = array( 500, 1000, 5000, 10000 ); and 'rclimit'   =>
> 10000.

Just to be sure: do not _change_ the line in DefaultSettings.php!
Override the setting by adding the second quoted command into your
LocalSettings.php. Otherwise, the change will be overwritten every
time you update to a newer version of MediaWiki.

-- [[cs:User:Mormegil | Petr Kadlec]]

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


qli at ica

Jan 28, 2010, 6:06 AM

Post #4 of 6 (620 views)
Permalink
(no subject) [In reply to]

Hi all,
I have built a LocalWiki. Now I want the data of it to keep consistent
with the
Wikipedia and one work I should do is to get the data of update from
Wikipedia.
I get the URLs through analyzing the RSS
(http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9&feed=rss)
and get all HTML content of the edit box by analyzing
these URLs after opening an URL and clicking the ’edit this page’.
(eg:
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%81%8A%E6%88%B2%E7%AF%80%E7%9B%AE)&diff=12199398&oldid=prev
and its edit interface is
http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%81%8A%E6%88%B2%E7%AF%80%E7%9B%AE)&action=edit
. However, I encounter two problems during my work.
Firstly, sometimes I can’t open a URL which is from the RSS and I don’t
know why.
That’s because I visit it too frequently and my IP address is prohibited
or the network is too slow?
If the reason is the former, how often can I visit a page of Wikipedia?
Is there a timeout?
Secondly, just as mentioned before
I want to download all HTML of the content in the edit box from Wikipedia,
however,
I can do sometimes but other times I just can download part of it, what’s
the reason?

Thanks

vanessa
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


oscar.vives at gmail

Jan 28, 2010, 8:02 AM

Post #5 of 6 (617 views)
Permalink
Re: (no subject) [In reply to]

On 28 January 2010 15:06, 李琴 <qli [at] ica> wrote:
> Hi all,
>  I have  built a LocalWiki.   Now I want the data of it to keep consistent
> with the
> Wikipedia and one work I should do is to get the data of update from
> Wikipedia.
> I get the URLs through analyzing the RSS
> (http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9&feed=rss)
> and get all HTML content of the edit box by analyzing
> these URLs after opening an URL and clicking the ’edit this page’.
....
> That’s because I visit it too frequently and my IP address is prohibited
> or the network is too slow?

李琴 well.. thats webscrapping, that is a poor tecnique, one with lots
of errors that generate lots of trafic.

One thing a robot must do is read and follow the
http://zh.wikipedia.org/robots.txt file ( probably you sould read it
too)
As a general rule of Internet, a "rude" robot will be banned by the
site admins.

It would be a good idea to anounce your bot as a bot in the user_agent
string . Good bot beavior is one that read a website like a human. I
don't know, like 10 request minute?. I don't know about this
"Wikipedia" site rules about it.

What you are suffering could be automatic or manual throttling, since
is detected a abusive number of request from your IP.

"Wikipedia" seems to provide fulldumps of his wiki, but are unusable
for you, since are giganteous :-/, trying to rebuilt wikipedia on your
PC with a snapshot would be like summoning Tchulu in a teapot. But.. I
don't know, maybe the zh version is smaller, or your resources
powerfull enough. One feels that what you have built has a severe
overload (wastage of resources) and there must be better ways to do
it...



--
--
ℱin del ℳensaje.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Jan 28, 2010, 1:18 PM

Post #6 of 6 (614 views)
Permalink
Re: (no subject) [In reply to]

李琴 wrote:
> Hi all,
> I have built a LocalWiki. Now I want the data of it to keep consistent
> with the
> Wikipedia and one work I should do is to get the data of update from
> Wikipedia.
> I get the URLs through analyzing the RSS
> (http://zh.wikipedia.org/w/index.php?title=Special:%E6%9C%80%E8%BF%91%E6%9B%B4%E6%94%B9&feed=rss)
> and get all HTML content of the edit box by analyzing
> these URLs after opening an URL and clicking the ’edit this page’.
> (eg:
> http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%81%8A%E6%88%B2%E7%AF%80%E7%9B%AE)&diff=12199398&oldid=prev
> and its edit interface is
> http://zh.wikipedia.org/w/index.php?title=%E8%B2%A1%E7%A5%9E%E5%88%B0_(%E9%81%8A%E6%88%B2%E7%AF%80%E7%9B%AE)&action=edit
> . However, I encounter two problems during my work.
> Firstly, sometimes I can’t open a URL which is from the RSS and I don’t
> know why.
> That’s because I visit it too frequently and my IP address is prohibited
> or the network is too slow?
> If the reason is the former, how often can I visit a page of Wikipedia?
> Is there a timeout?
> Secondly, just as mentioned before
> I want to download all HTML of the content in the edit box from Wikipedia,
> however,
> I can do sometimes but other times I just can download part of it, what’s
> the reason?
>
> Thanks
>
> vanessa

Using the api or special:export you can request several pages per http
request, which is nicer to the system. You should also add a maxlag
parameter.
Obviously you must put a proper User-Agent, so that if your bot causes
issues you can be contacted/banned.

Wikimedia Foundation offers a live feed to keep the wikis up-to-date,
check <http://meta.wikimedia.org/wiki/Wikimedia_update_feed_service>


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.