Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech
Re: Request for Comments: Cross site data access for Wikidata
 

Index | Next | Previous | View Flat


benapetr at gmail

Apr 23, 2012, 7:09 AM


Views: 345
Permalink
Re: Request for Comments: Cross site data access for Wikidata [In reply to]

I mean, in simple words:

Your idea: when the data on wikidata is changed the new content is
pushed to all local wikis / somewhere

My idea: local wikis retrieve data from wikidata db directly, no need
to push anything on change

On Mon, Apr 23, 2012 at 4:07 PM, Petr Bena <benapetr [at] gmail> wrote:
> I think it would be much better if the local wikis where it is
> supposed to access this would have some sort of client extension which
> would allow them to render the content using the db of wikidata. That
> would be much simpler and faster
>
> On Mon, Apr 23, 2012 at 2:45 PM, Daniel Kinzler <daniel [at] brightbyte> wrote:
>> Hi all!
>>
>> The wikidata team has been discussing how to best make data from wikidata
>> available on local wikis. Fetching the data via HTTP whenever a page is
>> re-rendered doesn't  seem prudent, so we (mainly Jeroen) came up with a
>> push-based architecture.
>>
>> The proposal is at
>> <http://meta.wikimedia.org/wiki/Wikidata/Notes/Caching_investigation#Proposal:_HTTP_push_to_local_db_storage>,
>> I have copied it below too.
>>
>> Please have a lot and let us know if you think this is viable, and which of the
>> two variants you deem better!
>>
>> Thanks,
>> -- daniel
>>
>> PS: Please keep the discussion on  wikitech-l, so we have it all in one place.
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> == Proposal: HTTP push to local db storage ==
>>
>> * Every time an item on Wikidata is changed, an HTTP push is issued to all
>> subscribing clients (wikis)
>> ** initially, "subscriptions" are just entries in an array in the configuration.
>> ** Pushes can be done via the job queue.
>> ** pushing is done via the mediawiki API, but other protocols such as PubSub
>> Hubbub / AtomPub can easily be added to support 3rd parties.
>> ** pushes need to be authenticated, so we don't get malicious crap. Pushes
>> should be done using a special user with a special user right.
>> ** the push may contain either the full set of information for the item, or just
>> a delta (diff) + hash for integrity check (in case an update was missed).
>>
>> * When the client receives a push, it does two things:
>> *# write the fresh data into a local database table (the local wikidata cache)
>> *# invalidate the (parser) cache for all pages that use the respective item (for
>> now we can assume that we know this from the language links)
>> *#* if we only update language links, the page doesn't even need to be
>> re-parsed: we just update the languagelinks in the cached ParserOutput object.
>>
>> * when a page is rendered, interlanguage links and other info is taken from the
>> local wikidata cache. No queries are made to wikidata during parsing/rendering.
>>
>> * In case an update is missed, we need a mechanism to allow requesting a full
>> purge and re-fetch of all data from on the client side and not just wait until
>> the next push which might very well take a very long time to happen.
>> ** There needs to be a manual option for when someone detects this. maybe
>> action=purge can be made to do this. Simple cache-invalidation however shouldn't
>> pull info from wikidata.
>> **A time-to-live could be added to the local copy of the data so that it's
>> updated by doing a pull periodically so the data does not stay stale
>> indefinitely after a failed push.
>>
>> === Variation: shared database tables ===
>>
>> Instead of having a local wikidata cache on each wiki (which may grow big - a
>> first guesstimate of Jeroen and Reedy is up to 1TB total, for all wikis), all
>> client wikis could  access the same central database table(s) managed by the
>> wikidata wiki.
>>
>> * this is similar to the way the globalusage extension tracks the usage of
>> commons images
>> * whenever a page is re-rendered, the local wiki would query the table in the
>> wikidata db. This means a cross-cluster db query whenever a page is rendered,
>> instead a local query.
>> * the HTTP push mechanism described above would still be needed to purge the
>> parser cache when needed. But the push requests would not need to contain the
>> updated data, they may just be requests to purge the cache.
>> * the ability for full HTTP pushes (using the mediawiki API or some other
>> interface) would still be desirable for 3rd party integration.
>>
>> * This approach greatly lowers the amount of space used in the database
>> * it doesn't change the number of http requests made
>> ** it does however reduce the amount of data transferred via http (but not by
>> much, at least not compared to pushing diffs)
>> * it doesn't change the number of database requests, but it introduces
>> cross-cluster requests
>>
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l [at] lists
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Subject User Time
Request for Comments: Cross site data access for Wikidata daniel at brightbyte Apr 23, 2012, 5:45 AM
    Re: Request for Comments: Cross site data access for Wikidata benapetr at gmail Apr 23, 2012, 7:07 AM
    Re: Request for Comments: Cross site data access for Wikidata benapetr at gmail Apr 23, 2012, 7:09 AM
    Re: Request for Comments: Cross site data access for Wikidata innocentkiller at gmail Apr 23, 2012, 7:13 AM
    Re: Request for Comments: Cross site data access for Wikidata daniel at brightbyte Apr 23, 2012, 7:22 AM
    Re: Request for Comments: Cross site data access for Wikidata Platonides at gmail Apr 23, 2012, 8:28 AM
        Re: Request for Comments: Cross site data access for Wikidata daniel at brightbyte Apr 23, 2012, 9:42 AM
            Re: Request for Comments: Cross site data access for Wikidata Platonides at gmail Apr 23, 2012, 10:20 AM
                Re: Request for Comments: Cross site data access for Wikidata daniel at brightbyte Apr 23, 2012, 10:34 AM
                    Re: Request for Comments: Cross site data access for Wikidata Platonides at gmail Apr 23, 2012, 11:08 AM
    Re: Request for Comments: Cross site data access for Wikidata wicke at wikidev Apr 23, 2012, 10:47 AM
    Re: Request for Comments: Cross site data access for Wikidata rlane32 at gmail Apr 23, 2012, 11:06 AM
    Re: Request for Comments: Cross site data access for Wikidata gerard.meijssen at gmail Apr 23, 2012, 11:21 AM
    Re: Request for Comments: Cross site data access for Wikidata daniel at brightbyte Apr 23, 2012, 12:28 PM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.