Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

JS2 design (was Re: Working towards branching MediaWiki 1.16)

 

 

First page Previous page 1 2 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


tstarling at wikimedia

Sep 24, 2009, 1:41 AM

Post #1 of 32 (2106 views)
Permalink
JS2 design (was Re: Working towards branching MediaWiki 1.16)

Trevor Parscal wrote:
> If you are really doing a JS2 rewrite/reorganization, would it be
> possible for some of us (especially those of us who deal almost
> exclusively with JavaScript these days) to get a chance to ask
> questions/give feedback/help in general?

I've mostly been working on analysis and planning so far. I made a few
false starts with the code and so ended up planning in a more detailed
way than I initially intended. I've discussed various issues with the
people in #mediawiki, including our resident client-side guru Splarka.

I started off working on fixing the coding style and the most glaring
errors from the JS2 branch, but I soon decided that I shouldn't be
putting so much effort into that when a lot of the code would have to
be deleted or rewritten from scratch.

I did a survey of script loaders in other applications, to get an idea
of what features would be desirable. My observations came down to the
following:

* The namespacing in Google's jsapi is very nice, with everything
being a member of a global "google" object. We would do well to
emulate it, but migrating all JS to such a scheme is beyond the scope
of the current project.

* You need to deal with CSS as well as JS. All the script loaders I
looked at did that, except ours. We have a lot of CSS objects that
need concatenation, and possibly minification.

* JS loading can be deferred until near the </body> or until the
DOMContentLoaded event. This means that empty-cache requests will
render faster. Wordpress places emphasis on this.

* Dependency tracking is useful. The idea is to request a given
module, and all dependencies of that module, such as other scripts,
will automatically be loaded first.



I then looked more closely at the current state of script loading in
MediaWiki. I made the following observations:

* Most linked objects (styles and scripts) on a typical page view come
from the Skin. If the goal is performance enhancement, then working on
the skins and OutputPage has to be a priority.

* The "class" abstraction as implemented in JS2 has very little value
to PHP callers. It's just as easy to use filenames. It could be made
more useful with features such as dependency tracking, better
concatenation and CSS support. But it seems to me that the most useful
abstraction for PHP code would be for client-side modules to be
multi-file, potentially with supporting PHP code for each module.

* Central registration of all client-side resources in a global
variable would be onerous and should be avoided.

* Dynamic requests such as [[MediaWiki:Handheld.css]] have a large
impact on site performance and need to be optimised. I'm planning a
new interface, similar to action=raw, allowing these objects to be
concatenated.



The following design documents are in my user space on mediawiki.org:

<http://www.mediawiki.org/wiki/User:Tim_Starling/CSS_and_JS_caller_survey_(r56220)>
- A survey of MW functions that add CSS and JS, especially the
terribly confusing situation in Skin and OutputPage

<http://www.mediawiki.org/wiki/User:Tim_Starling/JS_load_order_issues_(r56220)>
- A breakdown of JS files by the issues that might be had in moving
them to the footer or DOMContentLoaded. I favour a conservative
approach, with wikibits.js and the site and user JS staying in the
<head>.

<http://www.mediawiki.org/wiki/User:Tim_Starling/Proposed_modularisation_of_client-side_resources>
- A proposed reorganisation of core scripts (Skin and OutputPage)
according to the MW modules they are most associated with.



The object model I'm leaning towards on the PHP side is:

* A client-side resource manager (CSRM) class. This would be
responsible for maintaining a list of client-side resources that have
been requested and need to be sent to the skin. It would also handle
caching, distribution of incoming dynamic requests, dependencies,
minification, etc. This is quite a complex job and might need to be
split up somewhat.

* A hierarchy of client-side module classes. A module object would
contain a list of files, dependencies and concatenation hints. Objects
would be instantiated by parent classes such as skins and special
pages, and added to the CSRM. Classes could be registered globally,
and then used to generate dynamic CSS and JS, such as the user
preference stylesheet.

* The module base class would be non-abstract and featureful, with a
constructor that accepts an array-based description. This allows
simple creation of modules by classes with no interest in dynamic
script generation.

* A new script loader entry point would provide an interface to
registered modules.



There are some design decisions I still have to make, which are tricky
due to performance tradeoffs:

* With concatenation, there is the question of which files to combine
and which to leave separate. I would like to have a "combine"
parameter which is a string, and files with the same combine parameter
will be combined.

* Like Wordpress, we could store minified and concatenated files in a
public cache and then link to that cache directly in the HTML.

* The cache invalidation scheme is tricky, there's not really an ideal
system. A combination of cache-breaking parameters (like Michael's
design) and short expiry times is probably the way to go. Using
cache-breaking parameters alone doesn't work because there is
referring HTML cached on both the server and client side, and
regenerating that HTML periodically would be much more expensive than
regenerating the scripts.

Here are my notes:

* Concatenation
* Performance problems:
* Changing inclusions. When inclusions change, whole contents has
to be sent again.
* BUT people don't change skins very often.
* So combine=all=skin should save time for most
* Expiry times have to be synchronised. Take the minimum expiry of
all, and force freshness check for all.
* Makes the task of squid cache purging more difficult
* Defeats browser concurrency

* Performance advantages:
* For dynamic requests:
* Avoids MW startup time.
* Avoids DoSing small servers with concurrent requests.
* For all requests:
* Reduces squid CPU
* Removes a few RTTs for non-pipelining clients
* Improves gzip compression ratio

* Combine to static file idea:
* Pros:
* Fast to stream out, on all systems
* Doesn't break HughesNet
* Cons:
* Requires splitting the request into static and dynamic
* Need webserver config to add Expires header and gzip

With some help from Splarka, I've determined that it would be possible
to merge the requests for [[MediaWiki:Common.css]],
[[MediaWiki:Skinname.css]], [[MediaWiki:Handheld.css]] and
[[MediaWiki:Print.css]], using @media blocks for the last two, for a
significant performance win in almost all cases.



Once the architectural issues have been fixed, the stylistic issues in
both ancient JS and the merged code will have to be dealt with, for
example:

* Poorly-named functions, classes, files, etc. There's a need for
proper namespacing and consistency in naming style.

* Poorly-written comments

* Unnecessary use of the global namespace. The jQuery style is nice,
with local functions inside an anonymous closure:

function () {
function setup() {
...
}
addOnloadHook( setup );
}();

* Unsafe construction of HTML. This is ubiquitous in the mwEmbed
directory and there will be a huge potential for XSS, as soon as user
input is added. HTML construction with innerHTML can be replaced by
document.createElement() or its jQuery equivalent.

* The identity crisis. The whole js2 concept encourages code which is
poorly integrated with the rest of MediaWiki, and which is written
without proper study of the existing code or thought to refactoring.
It's like SkinTemplate except with a more pretentious name. I'd like
to get rid of all instances of "js2", to move its scripts into other
directories, and to remove the global variables which turn it on and
off. Also the references to MetavidWiki and the mv prefixes should be
fixed.

* Lack of modularisation. The proposed registration system makes it
possible to have extensions which are almost entirely client-side
code. A module like libClipEdit could be moved to its own extension. I
see no problem with extensions depending on other extensions, the SMW
extensions do this with no problems.



A few ideas for cool future features also occur to me. Once we have a
system set up for generating and caching client-side resources, why not:

* Allow the user to choose a colour scheme for their wiki and
automatically generate stylesheets with the appropriate colours.

* Include images in the system. Use GD to automatically generate and
cache images with the appropriate anti-aliased background colour.

* Automatically create CSS sprites?

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


gtisza at gmail

Sep 24, 2009, 5:10 AM

Post #2 of 32 (2050 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Tim Starling <tstarling <at> wikimedia.org> writes:

> * Unnecessary use of the global namespace. The jQuery style is nice,
> with local functions inside an anonymous closure:
>
> function () {
> function setup() {
> ...
> }
> addOnloadHook( setup );
> }();

This would make it impossible to overwrite the function locally on a wiki, which
is done sometimes, either because it conflicts with some local script, or for
better localization (such as changing the sorting algorithm in the
sortable-table script to handle non-ASCII characters decently). You should
rather use a global MediaWiki object, that works just as well for clearing the
global namespace, and it leaves the functions accessible.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Sep 24, 2009, 6:13 AM

Post #3 of 32 (2050 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Also take into account on the javascript redesign, javascript wiki-side
extensions.

[[MediaWiki:Common.js]] importScripts [[MediaWiki:Wikiminiatlas.js]],
[[MediaWiki:niceGalleries.js]] and [[MediaWiki:buttonForRFA.js]], which
then load [[MediaWiki:buttonForRFA/lang.js]]... plus the several Gadgets
the user may have enabled.

On Wikimedia Commons I load 38 scripts located at the MediaWiki
namespace (plus gen=js).
I'm pretty sure loading all of them when they aren't in the cache slows
it much more than the organization of the core mediawiki javascript.

Transcluding in the same request files would benefit a lot (either
automatically detecting calls to importScript or with a new syntax).


Finally, a dependence you may not have taken into account would be that
some CSS from the shared repository should be usable by host wikis when
viewing the pages.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


oscar.vives at gmail

Sep 24, 2009, 7:00 AM

Post #4 of 32 (2046 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Possibly-OFF-TOPIC-here


I see that ImageMagick can combine images in a single one.

A single image mean a single hit to a Apache, so it only have to spawn once.

On the clientside, a single image can draw multiple elements with some
ninja CSS stuff. ( background-position?).

For such thing to be possible to a MediaWiki skins, do changes are needed?.

This is "minimize" but for graphics.

Is possible a idea for the future, for a future full of divs and CSS3 happynes.


--
--
ℱin del ℳensaje.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

Sep 24, 2009, 7:13 AM

Post #5 of 32 (2054 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Tei wrote:
> Possibly-OFF-TOPIC-here
>
> I see that ImageMagick can combine images in a single one.
>
> A single image mean a single hit to a Apache, so it only have to spawn once.
>
> On the clientside, a single image can draw multiple elements with some
> ninja CSS stuff. ( background-position?).
>
> For such thing to be possible to a MediaWiki skins, do changes are needed?.
>
> This is "minimize" but for graphics.
>
> Is possible a idea for the future, for a future full of divs and CSS3 happynes.

I don't think it fits our normal image usage into the pages. Could be
tried for the images used by the skins. Although I would worry about
support for that CSS on legacy browsers.


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 24, 2009, 7:48 AM

Post #6 of 32 (2057 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Thu, Sep 24, 2009 at 4:41 AM, Tim Starling <tstarling [at] wikimedia> wrote:
>      * Removes a few RTTs for non-pipelining clients

Do you mean to imply that there's such a thing as a pipelining client
on the real web? (Okay, okay, Opera.) This concern seems like it
outweighs all the others put together pretty handily -- especially for
script files that aren't at the end, which block page loading.

> * Automatically create CSS sprites?

That would be neat, but perhaps a bit tricky.

On Thu, Sep 24, 2009 at 9:13 AM, Platonides <Platonides [at] gmail> wrote:
> Also take into account on the javascript redesign, javascript wiki-side
> extensions.
>
> [[MediaWiki:Common.js]] importScripts [[MediaWiki:Wikiminiatlas.js]],
> [[MediaWiki:niceGalleries.js]] and [[MediaWiki:buttonForRFA.js]], which
> then load [[MediaWiki:buttonForRFA/lang.js]]... plus the several Gadgets
> the user may have enabled.
>
> On Wikimedia Commons I load 38 scripts located at the MediaWiki
> namespace (plus gen=js).
> I'm pretty sure loading all of them when they aren't in the cache slows
> it much more than the organization of the core mediawiki javascript.

Hmm, yeah. This scheme needs to support combining admin-added
JavaScript, unless we can convince everyone to just put everything in
Common.css. Maybe we could support some sort of transclusion
mechanism for JS files -- like rather than serving JS pages raw, MW
first substitutes templates (but nothing else)?

On Thu, Sep 24, 2009 at 10:00 AM, Tei <oscar.vives [at] gmail> wrote:
> I see that ImageMagick can combine images in a single one.
>
> A single image mean a single hit to a Apache, so it only have to spawn once.
>
> On the clientside, a single image can draw multiple elements with some
> ninja CSS stuff. ( background-position?).
>
> For such thing to be possible to a MediaWiki skins, do changes are needed?.

This is image spriting, which Tim mentioned as a possibility. It's
not a big issue for us right now because we use so few images, and
images don't block page parsing or rendering, but it might be worth
considering eventually.

On Thu, Sep 24, 2009 at 10:13 AM, Platonides <Platonides [at] gmail> wrote:
> I don't think it fits our normal image usage into the pages. Could be
> tried for the images used by the skins. Although I would worry about
> support for that CSS on legacy browsers.

Image spriting is very well-studied and works in all browsers of
import. It's used by all the fancy high-performance sites, like
Google:

http://www.google.com/images/nav_logo7.png

It would be nice if we didn't have to go to such lengths to hack
around the fact that HTTP pipelining is broken, wouldn't it?

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 24, 2009, 8:49 AM

Post #7 of 32 (2047 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Tei wrote:
> Possibly-OFF-TOPIC-here
>
>
> I see that ImageMagick can combine images in a single one.
>
> A single image mean a single hit to a Apache, so it only have to spawn once.
>
> On the clientside, a single image can draw multiple elements with some
> ninja CSS stuff. ( background-position?).

People have taken to calling that the "CSS sprite" technique, I
mentioned it as a possibility in my original post.

http://www.alistapart.com/articles/sprites

I always thought the defining characteristic of a sprite was that it
moved around the screen, not that it was copied from a grid, but there
you have it.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 24, 2009, 9:49 AM

Post #8 of 32 (2049 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Aryeh Gregor wrote:
> On Thu, Sep 24, 2009 at 4:41 AM, Tim Starling <tstarling [at] wikimedia> wrote:
>> * Removes a few RTTs for non-pipelining clients
>
> Do you mean to imply that there's such a thing as a pipelining client
> on the real web? (Okay, okay, Opera.) This concern seems like it
> outweighs all the others put together pretty handily -- especially for
> script files that aren't at the end, which block page loading.

It's not really as simple as that. The major browsers use concurrency
as a substitute for pipelining. Instead of queueing up multiple
requests in a single TCP connection and then waiting, they queue up
multiple requests in multiple connections and then wait. The effect is
very similar in terms of RTTs.

By concatenating, you eliminate concurrency in the browser. The effect
of this could actually be to make the initial page view slower,
despite the increased TCP window size at the end of the concatenated
request. The net performance impact would depend on all sorts of
factors, but you can see that the concurrent case would be faster when
the RTT is very long, the number of objects is large, the number of
connections is equally large, and the unmerged object size is slightly
smaller than the initial TCP window.

In a default install, it's not harmful to concatenate the
[[MediaWiki:*.css]] pages regardless of network distance, because the
pages are so small that even the merged object will fit in the initial
TCP window.

There is a potential reduction in RTT count due to concatenation,
that's why I included that item on the list. But it's client-dependent
and might not exist at all in the most common case. That's why I'm
focusing on other benefits of concatenation to justify why I'm doing it.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mdale at wikimedia

Sep 24, 2009, 12:23 PM

Post #9 of 32 (2045 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

~some comments inline~

Tim Starling wrote:

[snip]
> I started off working on fixing the coding style and the most glaring
> errors from the JS2 branch, but I soon decided that I shouldn't be
> putting so much effort into that when a lot of the code would have to
> be deleted or rewritten from scratch.
>

I agree there are some core components that should be separated out and
re-factored. And some core pieces that your probably focused on do need
to be removed & rewritten as they are aged quite a bit. (parts of
mv_embed.js where created in SOC 06) ... I did not focus on the ~best~
core loader that could have been created I have just built on what I
already had available that has "worked" reasonably well for the
application set that I was targeting. Its been an iterative process
which I feel is moving in the right direction as I will outline below.

Obviously more input is helpful and I am open to implementing most of
the changes you describe as they make sense. But exclusion and dismissal
may not be less helpful... unless that is your targeted end in which
case just say so ;)

Its normal for 3rd party observer to say the whole system should be
scraped and rewritten. Of course starting from scratch is much easier to
design an ideal system and what it should/could be.

> I did a survey of script loaders in other applications, to get an idea
> of what features would be desirable. My observations came down to the
> following:
>
> * The namespacing in Google's jsapi is very nice, with everything
> being a member of a global "google" object. We would do well to
> emulate it, but migrating all JS to such a scheme is beyond the scope
> of the current project.
>

You somewhat contradict this approach by recommending against "class"
abstraction below.. ie how will you cleanly load components and
dependencies if not by a given name?

I agree we should move things into a global object ie: $j and all our
components / features should extend that object. (like jquery plugins).
That is the direction we are already going.

Dependency loading is not really beyond the scope... we are already
supporting that. If you check out the mv_jqueryBindings function in
mv_embed.js ... here we have loader calls integrated into the jquery
binding. This integrates loading the high level application interfaces
into their interface call.

The idea is to move more and more of the structure of the application
into that system. so right now mwLoad is a global function but should be
re-factored into the jquery space and be called via $j.load(); |
|
> * You need to deal with CSS as well as JS. All the script loaders I
> looked at did that, except ours. We have a lot of CSS objects that
> need concatenation, and possibly minification.
>

Brion did not set that as high priority when I inquired about it, but of
course we should add in style grouping as well. It's not like I said we
should exclude that in our script-loader just a matter of setting
priority which I agree is high priority.
> * JS loading can be deferred until near the </body> or until the
> DOMContentLoaded event. This means that empty-cache requests will
> render faster. Wordpress places emphasis on this.
>

true. I agree that we should put the script includes at the bottom. Also
all non-core js2 scripts is already loaded via DOMContentLoaded ready
event. Ideally we should only provide "loaders" and maybe some small bit
of configuration for the client side applications they provide. As
briefly described here:
http://www.mediawiki.org/wiki/JS2_Overview#How_to_structure_your_JavaScript_application
> * Dependency tracking is useful. The idea is to request a given
> module, and all dependencies of that module, such as other scripts,
> will automatically be loaded first.
>

As mentioned above we do some dependency tracking via binding jquery
helpers that do that setup internally on a per application interface level.
We could add that convention directly into the script-loader function if
desired so that on a per class level we include dependencies. Like
mwLoad('ui.dialog') would know to load ui.core etc.
>
>
> I then looked more closely at the current state of script loading in
> MediaWiki. I made the following observations:
>
> * Most linked objects (styles and scripts) on a typical page view come
> from the Skin. If the goal is performance enhancement, then working on
> the skins and OutputPage has to be a priority.
>

agreed. The script-loading was more urgent for my application task set.
But for the common case of per page view performance css grouping has
bigger wins.
> * The "class" abstraction as implemented in JS2 has very little value
> to PHP callers. It's just as easy to use filenames.
The idea with "class" abstraction is that you don't know what script set
you have available at any given time. Maybe one script included
ui.resizable and ui.move and now your script depends on ui.resizable
and ui.move and ui.drag... your loader call will only include ui.drag
(since the other are already defined).

This avoids re-parse and re-including the same javascript file as part
of a separate group request or src include. Alternatively you can check
against including the same script when your just using raw src but a bit
trickery when using scriptloader and call define checks and class/file
convention is compatible with XHR getting a javascript file and
evaluating the result. (which is the way some frameworks include
javascript that to ensure a consistent onLoaded callback)...

Which brings us to another point about class / file bindings. It lets us
test the typeof the variable it should define and then issue a callback
once we definitely know that this script is loaded and ready.

The trade-off for grouping distinct class set requests is chacheability
for return visit vs script reuse vs fastest display time for un-cached
visit vs server resource cost. Also perhaps some scripts can always
grouped while other components are rarely included individually. But
that changes with application development. Combining scripts is not too
costly relative to the round trip time... and we could pre-minify.

Its "optimal" to avoid the script-loader all together and just have a
single small core updated file with short expire that sets the version
number of each script. Then everything else could have a high expire
since its tagged by version number. That would be "optimal" but a slower
first load experience. And we still have to cache and package
localizations per language.

I have not done a definitive evaluation of the trade offs and am open to
more thoughts on that front.

> It could be made
> more useful with features such as dependency tracking, better
> concatenation and CSS support. But it seems to me that the most useful
> abstraction for PHP code would be for client-side modules to be
> multi-file, potentially with supporting PHP code for each module.
>
We want to move away from php code dependencies for each javascript
module. Javascript should just directly hit a single exposure point of
the mediawiki api. If we have php code generating bits and pieces of
javascript everywhere it quickly gets complicated, is difficult to
maintain, much more resource intensive, and requires a whole new
framework to work right.

Php's integration with the javascript should be minimal. php should
supply configuration, and package in localized msgs.

> * Central registration of all client-side resources in a global
> variable would be onerous and should be avoided.
>

You can always add to the registered global. This works well by having
the php read the javascript file directly to ascertain the global list.
That way your javascript works stand alone as well as integrated with a
script-loader that provides localization and configuration.

> * Dynamic requests such as [[MediaWiki:Handheld.css]] have a large
> impact on site performance and need to be optimised. I'm planning a
> new interface, similar to action=raw, allowing these objects to be
> concatenated.
>

Sounds good ;) The present script-loader does this for javacript and
take the most recent revision number of the included pages and the
grouped version to that. I think it has to be integrated into page
output so you can have a long expire time.
>
>
> The following design documents are in my user space on mediawiki.org:
>
> <http://www.mediawiki.org/wiki/User:Tim_Starling/CSS_and_JS_caller_survey_(r56220)>
> - A survey of MW functions that add CSS and JS, especially the
> terribly confusing situation in Skin and OutputPage
>
I did a small commit r56746 to try and start to clean that up... but it
is a mess.
> <http://www.mediawiki.org/wiki/User:Tim_Starling/JS_load_order_issues_(r56220)>
> - A breakdown of JS files by the issues that might be had in moving
> them to the footer or DOMContentLoaded. I favour a conservative
> approach, with wikibits.js and the site and user JS staying in the
> <head>.
>
A sperate somewhat related effort should be to depreciate all non-jquery
style helpers. A lot of the functions in wikibits.js for example could
use jquery functions or be re-factored into a few lines of jquery which
may make it unnessesary to have thouse global function abstractions to
begin with. I am in-favor of moving things to the bottom of the page.
Likewise all new javascript should be compatible with being run at
DOMContentLoaded time.

> <http://www.mediawiki.org/wiki/User:Tim_Starling/Proposed_modularisation_of_client-side_resources>
> - A proposed reorganisation of core scripts (Skin and OutputPage)
> according to the MW modules they are most associated with.
>
>
>
> The object model I'm leaning towards on the PHP side is:
>
> * A client-side resource manager (CSRM) class. This would be
> responsible for maintaining a list of client-side resources that have
> been requested and need to be sent to the skin. It would also handle
> caching, distribution of incoming dynamic requests, dependencies,
> minification, etc. This is quite a complex job and might need to be
> split up somewhat.
>
That sounds cleaner than the present outputPage and Skin.php and
associated script-loader grafting. Having a cleaner system would be
nice... but will probably break skins and other stuff... or have
OutputPage and Skin old api mappings or change almost every extension
and break every 3rd party skin out there?

You could probably have something "working" fairly quickly the trick is
compatibility with the broken old system. It is a core issue and people
working on other projects have added on the functionality needed to "get
it working" with existing stuff ... If you want to clean it up I don't
think anyone will protest as long as it does not take away features or
require major reworking of other code.
> * A hierarchy of client-side module classes. A module object would
> contain a list of files, dependencies and concatenation hints. Objects
> would be instantiated by parent classes such as skins and special
> pages, and added to the CSRM. Classes could be registered globally,
> and then used to generate dynamic CSS and JS, such as the user
> preference stylesheet.
>
The main problem of defining all the objects and hierarchy relationships
in php is that it won't work stand alone. An ideal system retains
flexibility in being able to work with the script loader or without it.
Ultimately your javascript code will dictate what class is required when
and where. If you have to go back to php to define this all the time
that won't be fun.

Additionally how do you describe call chains that happen purely in JS.
Say you do a search to insert an image then you decide you want to look
for video now we load a video clip. The serer can't map out that the
client needs native handler to be packaged with the javascript instead
of the cortado video handler. We have to run the detection client side
then get the code. The server could know that if you request cortado
handler you also need the parent video object, but it seems cleaner to
map out that dependency in javascript instead of php side. Then say now
you want to run the code without the script-loader it won't work at all.

> * The module base class would be non-abstract and featureful, with a
> constructor that accepts an array-based description. This allows
> simple creation of modules by classes with no interest in dynamic
> script generation.
>
What are you planning on including in this array beside the path to the
javascript file? Again it will suck for the javascript author to go back
into php and define all the dependencies instead of just listing them as
needed in the js. Furthermore how will this work with scripts in the
mediaWiki namespace. How will they define classes and decencies they
need if not in the javascript?

I think the php should read the javascript for this information as its
presently done with the script loader.
> * A new script loader entry point would provide an interface to
> registered modules.
>
The scirptloader is already defined as part of the javascript Loader so
the name of the entry point does not matter so much as the calling
conventions.
>
>
> There are some design decisions I still have to make, which are tricky
> due to performance tradeoffs:
>
> * With concatenation, there is the question of which files to combine
> and which to leave separate. I would like to have a "combine"
> parameter which is a string, and files with the same combine parameter
> will be combined.
>
right... see discussion above. I think in practice ad-hock grouping via
post page load javascript interface requests will naturally group and
cache together common requests by nature of consistent javascript
application flow. So I don't think the concatenation "hit" will be that
substantial. Javascript grouped at the page-loading level will of course
want to try and avoid grouping something that will later be included
by-its-self a separate page.
> * Like Wordpress, we could store minified and concatenated files in a
> public cache and then link to that cache directly in the HTML.
>
That seems perfectly reasonable... Is the idea that this will help small
sites that don't have things behind a squid proxy? Although small sites
seem to work oky with mediaWiki pages being served via php reading
cached files.
> * The cache invalidation scheme is tricky, there's not really an ideal
> system. A combination of cache-breaking parameters (like Michael's
> design) and short expiry times is probably the way to go. Using
> cache-breaking parameters alone doesn't work because there is
> referring HTML cached on both the server and client side, and
> regenerating that HTML periodically would be much more expensive than
> regenerating the scripts.
>
An option is to write out a bit of dynamic javascript to a single low
expire static cached core script that sets the versions for everything
that could be included. But that does not work well with live hacks.
(hence the checking of filemodified date) ... If version updates are
generally highly correlated with localization updates anyway... I don't
see too much problem with old javascript persisting until a page is
purged and rendered with the new interface.

I don't see benefit in hurting our cache rate to support ~new
javascript~ with ~old html~

New javascript could depend on new html no? (like an added configuration
variable)? or new div element? You could add that level of complexity to
the CSRM concept ... or just tie javascript to a given html page. (This
reuses the cached javascript if the javascript has not been updated...
at the cost of re-rendering the html as is done with other updates.


> Here are my notes:
>
> * Concatenation
> * Performance problems:
> * Changing inclusions. When inclusions change, whole contents has
> to be sent again.
> * BUT people don't change skins very often.
> * So combine=all=skin should save time for most
> * Expiry times have to be synchronised. Take the minimum expiry of
> all, and force freshness check for all.
> * Makes the task of squid cache purging more difficult
> * Defeats browser concurrency
>
> * Performance advantages:
> * For dynamic requests:
> * Avoids MW startup time.
> * Avoids DoSing small servers with concurrent requests.
> * For all requests:
> * Reduces squid CPU
> * Removes a few RTTs for non-pipelining clients
> * Improves gzip compression ratio
>
> * Combine to static file idea:
> * Pros:
> * Fast to stream out, on all systems
> * Doesn't break HughesNet
> * Cons:
> * Requires splitting the request into static and dynamic
> * Need webserver config to add Expires header and gzip
>

We could support both if we build the logic into the js as done with the
present system. The present scirpt-loader works both by feeding the
loader info from the javascript files. (although does not send the
client to cached group requests if the script-loader is off). But a
simple addition of a maintenance script could output the combined
scripts sets into a public dir based on loader set defections from the js.

> With some help from Splarka, I've determined that it would be possible
> to merge the requests for [[MediaWiki:Common.css]],
> [[MediaWiki:Skinname.css]], [[MediaWiki:Handheld.css]] and
> [[MediaWiki:Print.css]], using @media blocks for the last two, for a
> significant performance win in almost all cases.
>
sounds good.
>
>
> Once the architectural issues have been fixed, the stylistic issues in
> both ancient JS and the merged code will have to be dealt with, for
> example:
>
> * Poorly-named functions, classes, files, etc. There's a need for
> proper namespacing and consistency in naming style.
>
Yea there is a bit of identity crisis based on the inherited code. But
variable renaming is not too hard. Also there is transitioning in under
way to go from old style to more jQuery style.
> * Poorly-written comments
>
True. (no defense there) (expect to say that i am dyslexic)
> * Unnecessary use of the global namespace. The jQuery style is nice,
> with local functions inside an anonymous closure:
>
> function () {
> function setup() {
> ...
> }
> addOnloadHook( setup );
> }();
>
right as mentioned above I am moving in that direction see:
mv_jqueryBindings(); even read the comment right above:
* @@ eventually we should refactor mwCode over to jQuery style plugins
* and mv_embed.js will just handle dependency mapping and loading.
> * Unsafe construction of HTML. This is ubiquitous in the mwEmbed
> directory and there will be a huge potential for XSS, as soon as user
> input is added. HTML construction with innerHTML can be replaced by
> document.createElement() or its jQuery equivalent.
>

I build a lot of html as static strings because its faster than
generating every element with function calls. If you can inject
arbitrary content into some javscript string then I imagine you can do
so with the createElement as well. You don't gain much escaping already
defined javascript. If you do something to get some value into some one
elses JavaScript instance then you might as well call your evilJs
directly. Perhaps I am understanding this wrong? Could you illustrate
how that would be exploited in one case but not the other?

> * The identity crisis. The whole js2 concept encourages code which is
> poorly integrated with the rest of MediaWiki, and which is written
> without proper study of the existing code or thought to refactoring.
> It's like SkinTemplate except with a more pretentious name. I'd like
> to get rid of all instances of "js2", to move its scripts into other
> directories, and to remove the global variables which turn it on and
> off. Also the references to MetavidWiki and the mv prefixes should be
> fixed.
>

Yes being "stand alone" is a primary "feature" of the concept ... The
whole mwEmbed system can "stand alone" that will enable us to easily
share interface components with other CMS or platforms. This enables us
share things like the add-media-wizard with a blogs that wants to insert
a asset from commons or a set of free licensed repositories. It enables
3rd parties to remote embed video clips and do mash-ups with the timed
text and mediaWIki api calls. Or just use the firefogg encoder as a
stand alone application. and or use any edit tools we integrate for
image / audio / video manipulation.

You can compare it to the Google api thing you mentioned early on... its
very convenient to do a single load call and get everything you need
from the google application interfaces. The api is one level of
supporting external integrations. An application level interfaces for
external applications is another level that holds interesting
possibilities in my mind. But is a fundamentally new direction for
mediaWiki.

> * Lack of modularisation. The proposed registration system makes it
> possible to have extensions which are almost entirely client-side
> code. A module like libClipEdit could be moved to its own extension. I
> see no problem with extensions depending on other extensions, the SMW
> extensions do this with no problems.
>

I am not entirely against extension based modularization and we
definitely need to support it for extensions that depend on php code.

But its nice to be able to pull any part of the application form any
point. For example in the add-media-wizard for the description of assets
I will want to pull the wikiEditor to support formating in the
description of the imported asset. It sucks to have to check if a
component is available all the time.

Imagine the sequencer that depends on pretty much everything in the
mwEmebed directory. For it to resolve all its dependencies across 1/2
dozen extensions and "versions of extensions" in different locations
will not be fun.

And of-course will have to build a separate packaging system for the
application to work as a stand alone tool.

Making it near impossible to test any component stand alone since it
will be dependent on the mediaWiki framework to get up and running.
Testing components stand alone has been very valuable.

A single client side code repository can help ensures consistency of
included modules. Ie we won't have multiple versions of jquery, jquery
ui, or any other reusable component that is used across multiple
interfaces and conflicting in our loading system. (presently we have a
lot of copies of jquery and its plugins in extensions for example)

If this is the ultimate blocker in your mind I could restructure things
as scatted across extensions. Its not entirety painfully to re-factor
that way since everything is loaded via js script loader helpers but the
above mentioned issues would be a bummer.

I prefer if we have a concept of the javascirpt components/folders
within the mwEmbed folder being "client side modules" as different from
php code so it does not need to be tied to php code extension. Moving
directories around won't inherently improve "modularity". Perhaps we
need a way to just include portions of the javascript set?... We can
always strip folders in releases. Perhaps it should be moved to a
separate directory and only parts of it copied over at deployment time?

>
> A few ideas for cool future features also occur to me. Once we have a
> system set up for generating and caching client-side resources, why not:
>
> * Allow the user to choose a colour scheme for their wiki and
> automatically generate stylesheets with the appropriate colours.
>
> * Include images in the system. Use GD to automatically generate and
> cache images with the appropriate anti-aliased background colour.
>
> * Automatically create CSS sprites?
>

Don't forget about localization packing which was a primary motivation
for the script-loader to begin with ;)

peace,
--michael

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 24, 2009, 12:57 PM

Post #10 of 32 (2045 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Thu, Sep 24, 2009 at 12:49 PM, Tim Starling <tstarling [at] wikimedia> wrote:
> It's not really as simple as that. The major browsers use concurrency
> as a substitute for pipelining. Instead of queueing up multiple
> requests in a single TCP connection and then waiting, they queue up
> multiple requests in multiple connections and then wait. The effect is
> very similar in terms of RTTs.

Except that even on a page with 30 or 40 includes, the number of
concurrent requests will typically be something like 4 or 8, so RTT
becomes a huge issue if you have lots of includes. Not to mention
that most browsers before very recently won't do concurrency at all
for scripts -- script loads block parsing, so no new requests start
when a script is still loading or executing. If you're talking about
cutting four includes down to one, then maybe the benefit would be
insignificant or even negative, but if you're talking about cutting 30
includes down to ten, AFAIK the benefit just from RTT should swamp all
other considerations. This is why Yahoo!'s #1 rule for good front-end
performance is "Minimize HTTP Requests":

http://developer.yahoo.com/performance/rules.html

> you can see that the concurrent case would be faster when
> the RTT is very long, the number of objects is large, the number of
> connections is equally large

This last point is the major failure here. If browsers really
requested everything in parallel, then we wouldn't need any of these
hacks -- not combining, not spriting. But they don't, they request
very few things in parallel.

> There is a potential reduction in RTT count due to concatenation,
> that's why I included that item on the list. But it's client-dependent
> and might not exist at all in the most common case.

AFAIK this is not true in practice.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tparscal at wikimedia

Sep 24, 2009, 1:29 PM

Post #11 of 32 (2045 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On 9/24/09 1:41 AM, Tim Starling wrote:
> Trevor Parscal wrote:
>
>> If you are really doing a JS2 rewrite/reorganization, would it be
>> possible for some of us (especially those of us who deal almost
>> exclusively with JavaScript these days) to get a chance to ask
>> questions/give feedback/help in general?
>>
> I've mostly been working on analysis and planning so far. I made a few
> false starts with the code and so ended up planning in a more detailed
> way than I initially intended. I've discussed various issues with the
> people in #mediawiki, including our resident client-side guru Splarka.
>
> I started off working on fixing the coding style and the most glaring
> errors from the JS2 branch, but I soon decided that I shouldn't be
> putting so much effort into that when a lot of the code would have to
> be deleted or rewritten from scratch.
>
> I did a survey of script loaders in other applications, to get an idea
> of what features would be desirable. My observations came down to the
> following:
>
> * The namespacing in Google's jsapi is very nice, with everything
> being a member of a global "google" object. We would do well to
> emulate it, but migrating all JS to such a scheme is beyond the scope
> of the current project.
>
> * You need to deal with CSS as well as JS. All the script loaders I
> looked at did that, except ours. We have a lot of CSS objects that
> need concatenation, and possibly minification.
>
> * JS loading can be deferred until near the</body> or until the
> DOMContentLoaded event. This means that empty-cache requests will
> render faster. Wordpress places emphasis on this.
>
> * Dependency tracking is useful. The idea is to request a given
> module, and all dependencies of that module, such as other scripts,
> will automatically be loaded first.
>
>
>
> I then looked more closely at the current state of script loading in
> MediaWiki. I made the following observations:
>
> * Most linked objects (styles and scripts) on a typical page view come
> from the Skin. If the goal is performance enhancement, then working on
> the skins and OutputPage has to be a priority.
>
> * The "class" abstraction as implemented in JS2 has very little value
> to PHP callers. It's just as easy to use filenames. It could be made
> more useful with features such as dependency tracking, better
> concatenation and CSS support. But it seems to me that the most useful
> abstraction for PHP code would be for client-side modules to be
> multi-file, potentially with supporting PHP code for each module.
>
> * Central registration of all client-side resources in a global
> variable would be onerous and should be avoided.
>
> * Dynamic requests such as [[MediaWiki:Handheld.css]] have a large
> impact on site performance and need to be optimised. I'm planning a
> new interface, similar to action=raw, allowing these objects to be
> concatenated.
>
>
>
> The following design documents are in my user space on mediawiki.org:
>
> <http://www.mediawiki.org/wiki/User:Tim_Starling/CSS_and_JS_caller_survey_(r56220)>
> - A survey of MW functions that add CSS and JS, especially the
> terribly confusing situation in Skin and OutputPage
>
> <http://www.mediawiki.org/wiki/User:Tim_Starling/JS_load_order_issues_(r56220)>
> - A breakdown of JS files by the issues that might be had in moving
> them to the footer or DOMContentLoaded. I favour a conservative
> approach, with wikibits.js and the site and user JS staying in the
> <head>.
>
> <http://www.mediawiki.org/wiki/User:Tim_Starling/Proposed_modularisation_of_client-side_resources>
> - A proposed reorganisation of core scripts (Skin and OutputPage)
> according to the MW modules they are most associated with.
>
>
>
> The object model I'm leaning towards on the PHP side is:
>
> * A client-side resource manager (CSRM) class. This would be
> responsible for maintaining a list of client-side resources that have
> been requested and need to be sent to the skin. It would also handle
> caching, distribution of incoming dynamic requests, dependencies,
> minification, etc. This is quite a complex job and might need to be
> split up somewhat.
>
> * A hierarchy of client-side module classes. A module object would
> contain a list of files, dependencies and concatenation hints. Objects
> would be instantiated by parent classes such as skins and special
> pages, and added to the CSRM. Classes could be registered globally,
> and then used to generate dynamic CSS and JS, such as the user
> preference stylesheet.
>
> * The module base class would be non-abstract and featureful, with a
> constructor that accepts an array-based description. This allows
> simple creation of modules by classes with no interest in dynamic
> script generation.
>
> * A new script loader entry point would provide an interface to
> registered modules.
>
>
>
> There are some design decisions I still have to make, which are tricky
> due to performance tradeoffs:
>
> * With concatenation, there is the question of which files to combine
> and which to leave separate. I would like to have a "combine"
> parameter which is a string, and files with the same combine parameter
> will be combined.
>
> * Like Wordpress, we could store minified and concatenated files in a
> public cache and then link to that cache directly in the HTML.
>
> * The cache invalidation scheme is tricky, there's not really an ideal
> system. A combination of cache-breaking parameters (like Michael's
> design) and short expiry times is probably the way to go. Using
> cache-breaking parameters alone doesn't work because there is
> referring HTML cached on both the server and client side, and
> regenerating that HTML periodically would be much more expensive than
> regenerating the scripts.
>
> Here are my notes:
>
> * Concatenation
> * Performance problems:
> * Changing inclusions. When inclusions change, whole contents has
> to be sent again.
> * BUT people don't change skins very often.
> * So combine=all=skin should save time for most
> * Expiry times have to be synchronised. Take the minimum expiry of
> all, and force freshness check for all.
> * Makes the task of squid cache purging more difficult
> * Defeats browser concurrency
>
> * Performance advantages:
> * For dynamic requests:
> * Avoids MW startup time.
> * Avoids DoSing small servers with concurrent requests.
> * For all requests:
> * Reduces squid CPU
> * Removes a few RTTs for non-pipelining clients
> * Improves gzip compression ratio
>
> * Combine to static file idea:
> * Pros:
> * Fast to stream out, on all systems
> * Doesn't break HughesNet
> * Cons:
> * Requires splitting the request into static and dynamic
> * Need webserver config to add Expires header and gzip
>
> With some help from Splarka, I've determined that it would be possible
> to merge the requests for [[MediaWiki:Common.css]],
> [[MediaWiki:Skinname.css]], [[MediaWiki:Handheld.css]] and
> [[MediaWiki:Print.css]], using @media blocks for the last two, for a
> significant performance win in almost all cases.
>
>
>
> Once the architectural issues have been fixed, the stylistic issues in
> both ancient JS and the merged code will have to be dealt with, for
> example:
>
> * Poorly-named functions, classes, files, etc. There's a need for
> proper namespacing and consistency in naming style.
>
> * Poorly-written comments
>
> * Unnecessary use of the global namespace. The jQuery style is nice,
> with local functions inside an anonymous closure:
>
> function () {
> function setup() {
> ...
> }
> addOnloadHook( setup );
> }();
>
> * Unsafe construction of HTML. This is ubiquitous in the mwEmbed
> directory and there will be a huge potential for XSS, as soon as user
> input is added. HTML construction with innerHTML can be replaced by
> document.createElement() or its jQuery equivalent.
>
> * The identity crisis. The whole js2 concept encourages code which is
> poorly integrated with the rest of MediaWiki, and which is written
> without proper study of the existing code or thought to refactoring.
> It's like SkinTemplate except with a more pretentious name. I'd like
> to get rid of all instances of "js2", to move its scripts into other
> directories, and to remove the global variables which turn it on and
> off. Also the references to MetavidWiki and the mv prefixes should be
> fixed.
>
> * Lack of modularisation. The proposed registration system makes it
> possible to have extensions which are almost entirely client-side
> code. A module like libClipEdit could be moved to its own extension. I
> see no problem with extensions depending on other extensions, the SMW
> extensions do this with no problems.
>
>
>
> A few ideas for cool future features also occur to me. Once we have a
> system set up for generating and caching client-side resources, why not:
>
> * Allow the user to choose a colour scheme for their wiki and
> automatically generate stylesheets with the appropriate colours.
>
> * Include images in the system. Use GD to automatically generate and
> cache images with the appropriate anti-aliased background colour.
>
> * Automatically create CSS sprites?
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l [at] lists
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
It's great to see that this is being paid attention to. I would agree
with you that the current implementation of JS2 is not what I see as
ideal either.

The use of "class" loading seems a little strange to me as well - I
mean, there's not really such thing as a class in JavaScript, nor does
the class loaded only load a specific JavaScript object or function, so
it's really more of a file loader - if we drop the .js from the file
names in a system where some resources are MediaWiki messages who's
names also end in .js, thats a purely aesthetic maneuver - I'm find
either way, but let's not call it something it's not. It's a file loader.

The dependency thing is an interesting problem, but I think it could be
handled more elegantly than having to define meta-information. Just an
idea for a solution...

1. Other than jQuery and a MediaWiki jquery plugin, scripts can be
loaded on the client in any order
2. Each script after that adds code to a queuing system provided by
the MediaWiki plugin
3. Code is identified by a name and may include an optional list of
the names for any dependencies.
4. When document ready happens, the queuing generates an order for
execution based on given dependencies.
5. Even after document ready, the queuing system can continue it's
work whenever a script is added - such that if "bar" which depends
on "foo" is registered before document.ready, and then sometime
well after document.ready "foo" is run using the queuing system,
"bar" will be executed directly after because it's dependency has
finally been met.

// Hypothetical code...

// Example of points 3 and 4 ($.run is provided by the MediaWiki jQuery plugin)
$.run( 'foo', function() { /* bar code */ } );
$.run( 'bar', ['foo'], function() { /* bar code */ } );
// document on load happens, foo is executed, bar is executed

// Example of point 5
$.run( 'bar', ['foo'], function() { /* bar code */ } );
// document.ready happens .. time passes
$.run( 'foo', function() { /* bar code */ } );
// bar is executed now

I think there is a clever way to merge a solution for dynamic script
loading into this as well... But essentially this solves most problems
already.

Ideally dynamic script loading would never be needed, as it introduces
additional latency to user interaction, and no amount of spinner
graphics will ever replace faster interaction. Lazy script loading
however is awesome, and should be considered in these design changes.
For lazy loading, we could tell $wgOut that a script being included is
either to be included immediately, or after document.ready - in which
case a bit-o-JavaScript could be added to the page listing which files
to load - which could be acted upon after the document is ready.

Let's also try and pay attention to the issue of i18n for dynamic UI
elements. So far I've been defining a long list of messages to include
in a JSON object in my PHP code, then using them in my JavaScript code.
Michael has some magic going on in his script loader that does some
injection of message values based on their presence in the js file (not
totally clear on the details there). I think once again I would like to
see that we let messages required for use in JavaScript be defined in
JavaScript - so something like what michael is doing seems ideal...

// Code in .js file
loadMessages( ['foo', 'bar'] );
// Code in JavaScript sent to client after magic transformations made by PHP code
loadMessages( { 'foo': 'Foo', 'bar': 'Bar' } );

Thus allowing us to define messages we want loaded in the JavaScript
space without making additional (and very latent) calls to the server
just to get some text. Even in the case of dynamic script loading, the
messages of the incoming script just get added to the collection on the
client. I think this is similar if not identical to what Micheal's code
does.

Bottom line, meta-info about things that go on in JavaScript land being
defined and dealt with in PHP land is not a good thing - and it should
be avoided. The good thing is, there are all sorts of clever ways to do so.

I'm still digesting some of the other topics being brought up - there
are so many good points - I'm sure I will have more input soon...

- Trevor
_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 24, 2009, 10:19 PM

Post #12 of 32 (2046 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

>> * The namespacing in Google's jsapi is very nice, with everything
>> being a member of a global "google" object. We would do well to
>> emulate it, but migrating all JS to such a scheme is beyond the scope
>> of the current project.
>>
>
> You somewhat contradict this approach by recommending against "class"
> abstraction below.. ie how will you cleanly load components and
> dependencies if not by a given name?

By module name. Each module can contain multiple files. I don't see
any problem with allowing anonymous modules, as long as the caller is
happy with the fact that such modules can't be used in dependencies or
loaded on demand on the client side.

> I agree we should move things into a global object ie: $j and all our
> components / features should extend that object. (like jquery plugins).
> That is the direction we are already going.

I think it would be better if jQuery was called window.jQuery and
MediaWiki was called window.mw. Then we could share the jQuery
instance with JS code that's not aware of MediaWiki, and we wouldn't
need to worry about namespace conflicts between third-party jQuery
plugins and MediaWiki.

> Dependency loading is not really beyond the scope... we are already
> supporting that. If you check out the mv_jqueryBindings function in
> mv_embed.js ... here we have loader calls integrated into the jquery
> binding. This integrates loading the high level application interfaces
> into their interface call.

Your so-called dependency functions (e.g. doLoadDepMode) just seemed
to be a batch load feature, there was no actual dependency handling.
Every caller was required to list the dependencies for the classes it
was loading.

> The idea is to move more and more of the structure of the application
> into that system. so right now mwLoad is a global function but should be
> re-factored into the jquery space and be called via $j.load(); |
> |

That would work well until jQuery introduced its own script-loader
plugin with the same name and some extension needed to use it.

[...]
> We could add that convention directly into the script-loader function if
> desired so that on a per class level we include dependencies. Like
> mwLoad('ui.dialog') would know to load ui.core etc.

Yes, that is what real dependency handling would do.

>> * The "class" abstraction as implemented in JS2 has very little value
>> to PHP callers. It's just as easy to use filenames.
> The idea with "class" abstraction is that you don't know what script set
> you have available at any given time. Maybe one script included
> ui.resizable and ui.move and now your script depends on ui.resizable
> and ui.move and ui.drag... your loader call will only include ui.drag
> (since the other are already defined).

I think you're missing the point. I'm saying it doesn't provide enough
features. I want to add more, not take away some.

You can remove duplicates by filename.

[...]
> We want to move away from php code dependencies for each javascript
> module. Javascript should just directly hit a single exposure point of
> the mediawiki api. If we have php code generating bits and pieces of
> javascript everywhere it quickly gets complicated, is difficult to
> maintain, much more resource intensive, and requires a whole new
> framework to work right.
>
> Php's integration with the javascript should be minimal. php should
> supply configuration, and package in localized msgs.

I don't think it will be too complicated or resource intensive. JS
generation in PHP is very flexible and you admit that there is a role
for it. I don't think there's a problem with adding a few more
features on the PHP side.

If necessary, we can split it back out to a non-MediaWiki standalone
mode by generating some static JS.

What is your reason for saying this? Have you worked on some other
framework where integration of PHP and JavaScript has caused problems?


>> * Central registration of all client-side resources in a global
>> variable would be onerous and should be avoided.
>>
>
> You can always add to the registered global. This works well by having
> the php read the javascript file directly to ascertain the global list.
> That way your javascript works stand alone as well as integrated with a
> script-loader that provides localization and configuration.

There's a significant CPU cost to loading and parsing JS files on
every PHP request. I want to remove that behaviour. Instead, we can
list client-side files in PHP. Then from the PHP list, we can generate
static JS files in order to recover the standalone functionality.

[...]
> That sounds cleaner than the present outputPage and Skin.php and
> associated script-loader grafting. Having a cleaner system would be
> nice... but will probably break skins and other stuff... or have
> OutputPage and Skin old api mappings or change almost every extension
> and break every 3rd party skin out there?

I think I'll probably break most third-party skins, if they have PHP
code. We break them with just about every major release so there won't
be much surprise there.

On this point, I think we need:
* Easier management of non-PHP skins (i.e. CSS and images only)
* Automated CSS generation (per original post)
* Easier ways to modify the document structure, with less PHP
involved. XSLT?
* An interface in PHP that we can live with, so we don't feel obliged
to keep breaking it.

I should be able to retain compatibility with non-skin extensions, and
I won't break interfaces unnecessarily. But we're committed to an
incremental development process, rather than a sequence of rewrites,
and that means that some interfaces will get old and die within the
1.X.0 sequence.

[...]
>> * Like Wordpress, we could store minified and concatenated files in a
>> public cache and then link to that cache directly in the HTML.
>>
> That seems perfectly reasonable... Is the idea that this will help small
> sites that don't have things behind a squid proxy?

Yes, and it also benefits Wikimedia.

> Although small sites
> seem to work oky with mediaWiki pages being served via php reading
> cached files.

Have you looked at the profiling? On the Wikimedia app servers, even
the simplest MW request takes 23ms, and gen=js takes 46ms. A static
file like wikibits.js takes around 0.5ms. And that's with APC. You say
MW on small sites is OK, I think it's slow and resource-intensive.

That's not to say I'm sold on the idea of a static file cache, it
brings its own problems, which I listed.

>> * The cache invalidation scheme is tricky, there's not really an ideal
>> system. A combination of cache-breaking parameters (like Michael's
>> design) and short expiry times is probably the way to go. Using
>> cache-breaking parameters alone doesn't work because there is
>> referring HTML cached on both the server and client side, and
>> regenerating that HTML periodically would be much more expensive than
>> regenerating the scripts.
>>
> An option is to write out a bit of dynamic javascript to a single low
> expire static cached core script that sets the versions for everything
> that could be included. But that does not work well with live hacks.
> (hence the checking of filemodified date) ... If version updates are
> generally highly correlated with localization updates anyway... I don't
> see too much problem with old javascript persisting until a page is
> purged and rendered with the new interface.
>
> I don't see benefit in hurting our cache rate to support ~new
> javascript~ with ~old html~

The performance impact of refreshing a common file once every hour or
two is not large. Your code sets the expiry time to a year, and
changes the urid parameter regularly, which sounds great until you
accidentally cache some buggy JS into squid and you have no way to
reconstruct the URID parameters and thus purge the object. Then you'd
be stuck with the choice of either waiting a month for all the
referring HTML to expire, or clearing the entire squid cache.

If there's a need for the versions of the HTML and JS to match, that
should be handled rigorously, with old versions retained at the origin
server, instead of relying on squid to keep a record of every object
it's served.

[...]
>> * Unsafe construction of HTML. This is ubiquitous in the mwEmbed
>> directory and there will be a huge potential for XSS, as soon as user
>> input is added. HTML construction with innerHTML can be replaced by
>> document.createElement() or its jQuery equivalent.
>>
>
> I build a lot of html as static strings because its faster than
> generating every element with function calls. If you can inject
> arbitrary content into some javscript string then I imagine you can do
> so with the createElement as well. You don't gain much escaping already
> defined javascript. If you do something to get some value into some one
> elses JavaScript instance then you might as well call your evilJs
> directly. Perhaps I am understanding this wrong? Could you illustrate
> how that would be exploited in one case but not the other?

Say if MediaWiki emits an input box with a properly escaped attribute
derived from user input

<input type="text" id="filename"
value="&lt;iframe src=&quot;http://example.com/&quot;/&gt;"/>

Then consider JS code such as:

dialog.innerHTML =
"<div>" +
document.getElementById( 'filename' ).value +
"</div>";

This unescapes the value attribute, and puts the contents straight
into HTML. The iframe will be created. This is a security vulnerability.

The alternative style used by jQuery UI is:

$j('<div/>')
.text( $j('#filename')[0].value) )
.appendTo(dialog);

Or equivalently in plain DOM:

var div = document.createElement( 'div' );
div.appendChild( document.createTextNode(
document.getElementById( 'filename' ).value ) );
dialog.appendChild( div );

The single text node contains the same literal text that was in the
input box, no iframe element is created. You could think of it as
implicit escaping, since if you ask for HTML back:

alert( dialog.innerHTML )

The browser will show you properly escaped HTML:

&lt;div&gt;&lt;iframe src"http://example.com/"/&gt;&lt;/div&gt;

In OggHandler, I found that it was necessary to use innerHTML in some
cases, because there were bugs involved with creating a Java applet
and then changing its attributes. But I made sure that all the HTML I
created was properly escaped, so that there was no possibility of
arbitrary HTML being created, either from trusted or untrusted input.

It's best to escape even trusted input, for two reasons:
* Correctness. Even trusted input can contain quotation marks.
* Ease of review. Reviewers should not have to determine which of your
inputs are trusted and which are untrusted in order to verify the
safety of the code.

There's more on ease of review and other security issues in my article
on the subject:

http://www.mediawiki.org/wiki/Security_for_developers

Security takes precedence over performance. There are better ways to
improve performance than to open up your code to systematic exploit by
malicious parties.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 25, 2009, 6:48 AM

Post #13 of 32 (2042 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Fri, Sep 25, 2009 at 1:19 AM, Tim Starling <tstarling [at] wikimedia> wrote:
> On this point, I think we need:
> * Easier management of non-PHP skins (i.e. CSS and images only)
> * Automated CSS generation (per original post)
> * Easier ways to modify the document structure, with less PHP
> involved. XSLT?
> * An interface in PHP that we can live with, so we don't feel obliged
> to keep breaking it.

XSLT is a non-starter unless we want fatal errors (or at least the
skin completely breaking) on pages where we emit malformed XML. And
there always have been some of those, and probably always will be.
Probably even more significantly, XSLT is a programming language and a
rather obscure one. If we're going to make MediaWiki skins so hard to
make, we may as well stick with just requiring that they be in PHP.

The standard way to handle skinning in web apps, AFAICT, is to chop
the interface up into templates, and stitch them together at runtime.
Then skinners can modify the templates one by one, and on upgrade they
only have to merge changes for the templates they've changed. Which
is still a huge pain for even moderate customizations, as I can attest
from personal experience. But it has the advantage that skinners only
need to modify HTML and CSS, not PHP or XSLT or whatnot.

As it happens, most of the essential differences between skins can be
reproduced using only CSS, if you know enough CSS. I once personally
wrote, in about an hour, some CSS that made Monobook look almost
pixel-for-pixel identical to Modern, with no HTML changes. The only
problem is I didn't bother fixing IE, so it wasn't committable. I
don't think almost any reskin should need to change the HTML at all,
except maybe to add classes and such (which can be done in core). It
should only be necessary if you really want to change how the
interface behaves somehow (like having extra buttons), rather than
just how it looks.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 25, 2009, 8:41 AM

Post #14 of 32 (2049 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Aryeh Gregor wrote:
> On Fri, Sep 25, 2009 at 1:19 AM, Tim Starling <tstarling [at] wikimedia> wrote:
>> On this point, I think we need:
>> * Easier management of non-PHP skins (i.e. CSS and images only)
>> * Automated CSS generation (per original post)
>> * Easier ways to modify the document structure, with less PHP
>> involved. XSLT?
>> * An interface in PHP that we can live with, so we don't feel obliged
>> to keep breaking it.
>
> XSLT is a non-starter unless we want fatal errors (or at least the
> skin completely breaking) on pages where we emit malformed XML. And
> there always have been some of those, and probably always will be.
> Probably even more significantly, XSLT is a programming language and a
> rather obscure one. If we're going to make MediaWiki skins so hard to
> make, we may as well stick with just requiring that they be in PHP.

I think it makes sense to provide some way to modify the DOM after the
base skin is finished making HTML. Some things can be done with CSS,
but you don't want to be making heavy use of #id:after{content:"..."}
to add in some advertising or analytics HTML. And some modifications
are quite arcane, like reordering boxes by switching them from
ordinary floats to carefully constructed absolute positioning.

You can do DOM manipulation in PHP, I just thought that using a more
restricted language might help avoid some of the migration issues that
keep coming up.

> The standard way to handle skinning in web apps, AFAICT, is to chop
> the interface up into templates, and stitch them together at runtime.
> Then skinners can modify the templates one by one, and on upgrade they
> only have to merge changes for the templates they've changed. Which
> is still a huge pain for even moderate customizations, as I can attest
> from personal experience. But it has the advantage that skinners only
> need to modify HTML and CSS, not PHP or XSLT or whatnot.

The template engine libraries are slow, and PHP with embedded HTML
(like MonoBook) leads to code which is scary from a security
perspective due to the difficulty of reviewing the many echo
statements. And it doesn't solve the problem, because you end up with
migration issues when you need to add more items to the output or
change the existing items in some fundamental way.

I mentioned the fact that Wordpress accelerates loading by moving
scripts to the bottom of the page, I didn't mention that it only works
for properly maintained skins since many Wordpress skins don't call
the correct footer function.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 25, 2009, 10:26 AM

Post #15 of 32 (2039 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Fri, Sep 25, 2009 at 11:41 AM, Tim Starling <tstarling [at] wikimedia> wrote:
> I think it makes sense to provide some way to modify the DOM after the
> base skin is finished making HTML. Some things can be done with CSS,
> but you don't want to be making heavy use of #id:after{content:"..."}
> to add in some advertising or analytics HTML.

Adding content is no problem. Just provide a bunch of places where
arbitrary HTML can be injected by configuration. The particular cases
of Analytics and ads should be cross-skin anyway, and currently you'd
be best off doing them using hooks (that's how I do Analytics on my
wiki). What are use-cases for *skins* being able to alter the HTML
output, at anywhere near the level of precision provided by XSLT?

> And some modifications
> are quite arcane, like reordering boxes by switching them from
> ordinary floats to carefully constructed absolute positioning.

That's true, yes. Later versions of CSS look like they'll provide
saner ways to do things, but we're a ways off from being able to use
any of those yet. (The advanced positioning stuff in CSS3 isn't even
close to finished AFAIK, let alone widely implemented.)

> The template engine libraries are slow, and PHP with embedded HTML
> (like MonoBook) leads to code which is scary from a security
> perspective due to the difficulty of reviewing the many echo
> statements. And it doesn't solve the problem, because you end up with
> migration issues when you need to add more items to the output or
> change the existing items in some fundamental way.

I don't think there's any way to entirely avoid migration issues.
You'd have migration issues with XSLT too, the same way we have
JavaScript that breaks when we add a wrapper div or reorder some
things. The best you can do is localize the damage, so things only
break if they changed that exact bit of HTML.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


ssanbeg at ask

Sep 25, 2009, 12:46 PM

Post #16 of 32 (2049 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Fri, 25 Sep 2009 09:48:04 -0400, Aryeh Gregor wrote:

> On Fri, Sep 25, 2009 at 1:19 AM, Tim Starling <tstarling [at] wikimedia>
> wrote:
>> On this point, I think we need:
>> * Easier management of non-PHP skins (i.e. CSS and images only) *
>> Automated CSS generation (per original post) * Easier ways to modify the
>> document structure, with less PHP involved. XSLT?
>> * An interface in PHP that we can live with, so we don't feel obliged to
>> keep breaking it.
>
> XSLT is a non-starter unless we want fatal errors (or at least the skin
> completely breaking) on pages where we emit malformed XML. And there
> always have been some of those, and probably always will be. Probably even
> more significantly, XSLT is a programming language and a rather obscure
> one. If we're going to make MediaWiki skins so hard to make, we may as
> well stick with just requiring that they be in PHP.
>

I'm not sure that's entirely accurate. XSLT works on DOM trees, so
malformed XML shouldn't really apply. Of course, the standard command
line processors create this tree with a standard parser, usually an XML
parser. But in PHP, creating the DOM with a parser and transforming it
with XSLT are handled separately.



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 25, 2009, 3:00 PM

Post #17 of 32 (2043 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Fri, Sep 25, 2009 at 3:46 PM, Steve Sanbeg <ssanbeg [at] ask> wrote:
> I'm not sure that's entirely accurate.  XSLT works on DOM trees, so
> malformed XML shouldn't really apply.  Of course, the standard command
> line processors create this tree with a standard parser, usually an XML
> parser.  But in PHP, creating the DOM with a parser and transforming it
> with XSLT are handled separately.

Interesting. In that case, theoretically, you could use an HTML5
parser, which is guaranteed to *always* produce a DOM even on random
garbage input (much like wikitext!). Now, who's up for writing an
HTML5 parser in PHP whose performance is acceptable? I thought not.
:P

Anyway, my other points (e.g., may as well use PHP instead if you want
that much power) still hold.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mdale at wikimedia

Sep 25, 2009, 6:55 PM

Post #18 of 32 (2034 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

thanks for the constructive response :) ... comments inline

Tim Starling wrote:
>> I agree we should move things into a global object ie: $j and all our
>> components / features should extend that object. (like jquery plugins).
>> That is the direction we are already going.
>>
>
> I think it would be better if jQuery was called window.jQuery and
> MediaWiki was called window.mw. Then we could share the jQuery
> instance with JS code that's not aware of MediaWiki, and we wouldn't
> need to worry about namespace conflicts between third-party jQuery
> plugins and MediaWiki.
>
Right but there are benefits to connecting into the jQuery plugin system
that would not be as clean to wrap into our window.mw object. For
example $('#textbox').wikiEditor() is using jQuery selectors for the
target, and maybe other jQuery plugin conventions like the jquery class
alias inside the function(){})(jQuery);

Although if not designing your tool as a jQuery plugin then yea ;) ...
but I think most of the tools should be designed as jQuery plug-ins.

>> Dependency loading is not really beyond the scope... we are already
>> supporting that. If you check out the mv_jqueryBindings function in
>> mv_embed.js ... here we have loader calls integrated into the jquery
>> binding. This integrates loading the high level application interfaces
>> into their interface call.
>>
>
> Your so-called dependency functions (e.g. doLoadDepMode) just seemed
> to be a batch load feature, there was no actual dependency handling.
> Every caller was required to list the dependencies for the classes it
> was loading.
>

I was referring to defining the dependencies in the module call ... ie
$j('target').addMediaWiz( config ) and having the addMediaWiz module map
out the dependencies in the javascript. doLoadDepMode just lets you get
around an IE bug that when inserting scripts via the dom you have no
gurantee one script will execute in the order inserted. If you your
conncatinaging your scripts doLoadDepMode would not be needed as order
will be preserved in the concatenated file.

I like mapping out the dependencies in javascript at that module level
since it makes it easier to do custom things like read the passed in
configuration and decide which dependencies you need to fulfill. If not
you have to define many dependency sets in php or have much more
detailed model of your javscript inside php.

But I do understand that it will eventually result in lots of extra
javascript module definitions that the given installation may not want.
So perhaps we generate that module definition via php configuration ...
or we define the set of javascript files to include that define the
various module loaders we want with a given configuration.

This is sort the approach taken with the wikiEditor that has a few thin
javascript files that make calls to add modules (like add-sidebar) to a
core component (wikiEditor). That way the feature set can be
controlled by the php configuration while retaining runtime flexibility
for dependence mapping.

>> The idea is to move more and more of the structure of the application
>> into that system. so right now mwLoad is a global function but should be
>> re-factored into the jquery space and be called via $j.load(); |
>> |
>>
>
> That would work well until jQuery introduced its own script-loader
> plugin with the same name and some extension needed to use it.
>
>
>

That is part of the idea of centrally hosting reusable client-side
components so we control the jquery version and plugin set. So a new
version won't "come along" until its been tested and integrated.

If the function does mediawiki specifc scriptloader load stuff then yea
it should be called mwLoad or what not. If some other plugin or native
jquery piece comes along we can just have our plugin override it and or
store the native as a parent (if its of use) ... if that ever happens...


>> We could add that convention directly into the script-loader function if
>> desired so that on a per class level we include dependencies. Like
>> mwLoad('ui.dialog') would know to load ui.core etc.
>>
>
> Yes, that is what real dependency handling would do.
>

Thinking about this more ... I think its a bad idea to exclusively put
the dependency mapping in php. It will be difficult to avoid
re-including the same things in client side loading chains. Say you have
your suggest search system once the user starts typing we load
jquery.suggest it knows that it needs jquery ui via dependency mapping
stored in php. It sends both ui and suggest to the client. Now the user
in the same page instance decides instead to edit a section. The
editTool script-loader gets called its dependencies also include
jquery.ui. How will the dependency-loader script-server know that the
client already has the jquery.ui dependency from the suggest tool?

In the end you need these dependencies mapped out in the JS so that the
client can intelligibly request the script set it needs. In that same
example if the dependencies where mapped out in js we could avoid
re-including jquery.ui.

Alternatively we can just put a crap load of js at the bottom of the
page to ensure php knew what could possibly be used for every possible
interface interaction chain of events... But the idea is it will be
better for page display performance not to try and predict all of that
... so its better to store dependency mapping in javascript. I could
give a few more examples if that would be helpful.

>
>>> * The "class" abstraction as implemented in JS2 has very little value
>>> to PHP callers. It's just as easy to use filenames.
>>>
>> The idea with "class" abstraction is that you don't know what script set
>> you have available at any given time. Maybe one script included
>> ui.resizable and ui.move and now your script depends on ui.resizable
>> and ui.move and ui.drag... your loader call will only include ui.drag
>> (since the other are already defined).
>>
>
> I think you're missing the point. I'm saying it doesn't provide enough
> features. I want to add more, not take away some.
> You can remove duplicates by filename.
>

see above example for why it will be difficult to remove duplicates by
file name if your including dependency mappings that are not visible to
the js in your script includes.
> [...]
>
>> We want to move away from php code dependencies for each javascript
>> module. Javascript should just directly hit a single exposure point of
>> the mediawiki api. If we have php code generating bits and pieces of
>> javascript everywhere it quickly gets complicated, is difficult to
>> maintain, much more resource intensive, and requires a whole new
>> framework to work right.
>>
>> Php's integration with the javascript should be minimal. php should
>> supply configuration, and package in localized msgs.
>>
>
> I don't think it will be too complicated or resource intensive. JS
> generation in PHP is very flexible and you admit that there is a role
> for it. I don't think there's a problem with adding a few more
> features on the PHP side.
>
> If necessary, we can split it back out to a non-MediaWiki standalone
> mode by generating some static JS.
>

The nice thing about the way its working right now is you can just turn
off the script-loader and the system continues to work ... you can build
a page that includes the js and it "works"

Having an export mode, scripts doing transformations, dependency
management output sounds complicated. I can imagine it ~sort of~
working... but it seems much easier to go the other way around.

> What is your reason for saying this? Have you worked on some other
> framework where integration of PHP and JavaScript has caused problems?
>

I am referring more to the php-javascript remoting type systems that
seem to try and capture a language functionality inside a separate
language. There is inevitably leakage and its complicity is rarely less
than a more simple clean separation of systems. (not saying that your
suggesting we go to that extream (ie defining most javascript classes
and methods in php)

... but trying to map dependencies in that space is a step in that
direction and will get complicated for applications interactions that go
beyond the initial page display without adding more complexity on the
php side.

>
> There's a significant CPU cost to loading and parsing JS files on
> every PHP request. I want to remove that behaviour. Instead, we can
> list client-side files in PHP. Then from the PHP list, we can generate
> static JS files in order to recover the standalone functionality.
>
As mentioned above I think it would be easier to make the "export" thing
work the other way around. Ie instead of running a script to "export"
the static javascript. We code our javscript in a way that it works
stand alone to begin with and we "export" the information we want into
the php.

I agree that the present system of parsing top of the javascipt file on
every script-loader generation request is un-optimized. (the idea is
those script-loader generations calls happen rarely but even still it
should be cached at any number of levels. (ie checking the
filemodifcation timestamp, witting out a php or serialized file .. or
storing it in any of the other cache levels we have available, memcahce,
database, etc )

> [snip]
>
> Have you looked at the profiling? On the Wikimedia app servers, even
> the simplest MW request takes 23ms, and gen=js takes 46ms. A static
> file like wikibits.js takes around 0.5ms. And that's with APC. You say
> MW on small sites is OK, I think it's slow and resource-intensive.
>
> That's not to say I'm sold on the idea of a static file cache, it
> brings its own problems, which I listed.
>

yea... but almost all script-loader request will be cached. it does not
need to check the DB or anything its just a key-file lookup (since
script-loader request pass a request key either its there in cache or
its not ...it should be on par with the simplest MW request. Which is
substantially shorter then around trip time for getting each script
individually, not to mention gziping which can't otherwise be easily
enabled for 3rd party installations.

>
> [...]
> The performance impact of refreshing a common file once every hour or
> two is not large. Your code sets the expiry time to a year, and
> changes the urid parameter regularly, which sounds great until you
> accidentally cache some buggy JS into squid and you have no way to
> reconstruct the URID parameters and thus purge the object. Then you'd
> be stuck with the choice of either waiting a month for all the
> referring HTML to expire, or clearing the entire squid cache.
>
...right... we would want to avoid lots of live hacks. But I think we
want to avoid lots of live hacks anyway. A serious javascript bug would
only affect the pages that where generated in thous hours that it was a
bug was present not the 30 days that your characterizing the lag time of
page generation.

Do you have stats on that?... its surprising to me that pages are
re-generated that rarely... How do central notice campaigns work?

[...]
> Security takes precedence over performance. There are better ways to
> improve performance than to open up your code to systematic exploit by
> malicious parties.
>

ic.... I guess I rarely run into things being displayed are not A) not
your own input or B) running through the mediaWiki api ... But your
general point is valid. In theory this could come up. (even via api calls)

But... I think it would be just as easy if not easier to check for
"escape( val )" as ".text( val ) which would be at the end of a long
chain of jquery calls. Or you could set variable values post DOM
insertion via .val() or .text() also avoiding non-native dom construction.

I guess it really comes down to readability. I find tabbed html a bit
more readable then long chain of jquery elements. But it may be that
people find the later more readable... If so I can start heading in that
direction.
Performance wise I attached a quick test.. seems pretty fast on my
machine with a recent firefox build .. but older browsers / machines
might be slower...at any rate we should read for both for speed and
readability and "security review" ;)

--michael


roan.kattouw at gmail

Sep 26, 2009, 2:05 AM

Post #19 of 32 (2041 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

2009/9/26 Michael Dale <mdale [at] wikimedia>:
> Performance wise I attached a quick test.. seems pretty fast on my machine
> with a recent firefox build .. but older browsers / machines might be
> slower...at any rate we should read for both for speed and readability and
> "security review" ;)
>
This mailing list scrubs attachments.

Roan Kattouw (Catrope)

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Simetrical+wikilist at gmail

Sep 27, 2009, 4:15 AM

Post #20 of 32 (2020 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On Fri, Sep 25, 2009 at 9:55 PM, Michael Dale <mdale [at] wikimedia> wrote:
> ...right... we would want to avoid lots of live hacks. But I think we want
> to avoid lots of live hacks anyway.  A serious javascript bug would only
> affect the pages that where generated in thous hours that it was a bug was
> present not the 30 days that your characterizing the lag time of page
> generation.
>
> Do you have stats on that?... its surprising to me that pages are
> re-generated that rarely... How do central notice campaigns work?

They insert the notice client-side using JavaScript. The HTML served
is thus always the same.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 27, 2009, 10:13 PM

Post #21 of 32 (2009 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Michael Dale wrote:
> That is part of the idea of centrally hosting reusable client-side
> components so we control the jquery version and plugin set. So a
> new version won't "come along" until its been tested and
> integrated.

You can't host every client-side component in the world in a
subdirectory of the MediaWiki core. Not everyone has commit access to
it. Nobody can hope to properly test every MediaWiki extension.

Most extension developers write an extension for a particular site,
and distribute their code as-is for the benefit of other users. They
have no interest in integration with the core. If they find some
jQuery plugin on the web that defines an interface that conflicts with
MediaWiki, say jQuery.load() but with different parameters, they're
not going to be impressed when you tell them that to make it work with
MediaWiki, they need to rewrite the plugin and get it tested and
integrated.

Different modules should have separate namespaces. This is a key
property of large, maintainable systems of code.

> The nice thing about the way its working right now is you can just
> turn off the script-loader and the system continues to work ... you
> can build a page that includes the js and it "works"

The current system kind of works. It's not efficient or scalable and
it doesn't have many features.

> Having an export mode, scripts doing transformations, dependency
> management output sounds complicated. I can imagine it ~sort of~
> working... but it seems much easier to go the other way around.

Sometimes complexity is necessary in the course of achieving other
goals, such as performance, features, and ease of use for extension
developers.

> I agree that the present system of parsing top of the javascipt
> file on every script-loader generation request is un-optimized.
> (the idea is those script-loader generations calls happen rarely
> but even still it should be cached at any number of levels. (ie
> checking the filemodifcation timestamp, witting out a php or
> serialized file .. or storing it in any of the other cache levels
> we have available, memcahce, database, etc )

Actually it parses the whole of the JavaScript file, not the top, and
it does it on every request that invokes WebStart.php, not just on
mwScriptLoader.php requests. I'm talking about
jsAutoloadLocalClasses.php if that's not clear.

>> Have you looked at the profiling? On the Wikimedia app servers,
>> even the simplest MW request takes 23ms, and gen=js takes 46ms. A
>> static file like wikibits.js takes around 0.5ms. And that's with
>> APC. You say MW on small sites is OK, I think it's slow and
>> resource-intensive.
>>
>> That's not to say I'm sold on the idea of a static file cache, it
>> brings its own problems, which I listed.
>>
>
> yea... but almost all script-loader request will be cached. it
> does not need to check the DB or anything its just a key-file
> lookup (since script-loader request pass a request key either its
> there in cache or its not ...it should be on par with the simplest
> MW request. Which is substantially shorter then around trip time
> for getting each script individually, not to mention gziping which
> can't otherwise be easily enabled for 3rd party installations.

I don't think that that comparison can be made so lightly. For the
server operator, CPU time is much more expensive than time spent
waiting for the network. And I'm not proposing that the client fetches
each script individually, I'm proposing that scripts be concatentated
and stored in a cache file which is then referenced directly in the HTML.

I'm aware of the gzip issue, I mentioned it in my original post.

> ...right... we would want to avoid lots of live hacks. But I think
> we want to avoid lots of live hacks anyway. A serious javascript
> bug would only affect the pages that where generated in thous hours
> that it was a bug was present not the 30 days that your
> characterizing the lag time of page generation.

Bugs don't only come from live hacks. Most bugs come to the site from
the developers who wrote the code in the first place, via subversion.

> Do you have stats on that?... its surprising to me that pages are
> re-generated that rarely... How do central notice campaigns work?

$wgSquidMaxage is set to 31 days (2678400 seconds) for all wikis
except wikimediafoundation.org. It's necessary to have a very long
expiry time in order to fill the caches and achieve a high hit rate,
because Wikimedia's access pattern is very broad, with the "long tail"
dominating the request rate.

The CentralNotice extension was created to overcome this problem and
display short-lived messages. Aryeh described how it works.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

Sep 27, 2009, 10:21 PM

Post #22 of 32 (1998 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Here's what I'm taking out of this thread:

* Platonides mentions the case of power-users with tens of scripts loaded via
gadgets or user JS with importScript().
* Tisza asks that core onload hooks and other functions be overridable by user JS.
* Trevor and Michael both mention i18n as an important consideration which I
have not discussed.
* Michael wants certain components in the js2 directory to be usable as
standalone client-side libraries, which operate without MediaWiki or any other
server-side application.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mdale at wikimedia

Sep 28, 2009, 11:33 AM

Post #23 of 32 (1999 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

Tim Starling wrote:
> Michael Dale wrote:
>
>> That is part of the idea of centrally hosting reusable client-side
>> components so we control the jquery version and plugin set. So a
>> new version won't "come along" until its been tested and
>> integrated.
>>
>
> You can't host every client-side component in the world in a
> subdirectory of the MediaWiki core. Not everyone has commit access to
> it. Nobody can hope to properly test every MediaWiki extension.
>
> Most extension developers write an extension for a particular site,
> and distribute their code as-is for the benefit of other users. They
> have no interest in integration with the core. If they find some
> jQuery plugin on the web that defines an interface that conflicts with
> MediaWiki, say jQuery.load() but with different parameters, they're
> not going to be impressed when you tell them that to make it work with
> MediaWiki, they need to rewrite the plugin and get it tested and
> integrated.
>
> Different modules should have separate namespaces. This is a key
> property of large, maintainable systems of code.
>

Right.. I agree the client side code needs more deployable modularly.

If designing a given component as a jquery plug-in, then I think it
makes sense to put it in the jQuery namespace ... otherwise you won't be
able to reference jquery things in a predictable way. Alternativly you


>> I agree that the present system of parsing top of the javascipt
>> file on every script-loader generation request is un-optimized.
>> (the idea is those script-loader generations calls happen rarely
>> but even still it should be cached at any number of levels. (ie
>> checking the filemodifcation timestamp, witting out a php or
>> serialized file .. or storing it in any of the other cache levels
>> we have available, memcahce, database, etc )
>>
>
> Actually it parses the whole of the JavaScript file, not the top, and
> it does it on every request that invokes WebStart.php, not just on
> mwScriptLoader.php requests. I'm talking about
> jsAutoloadLocalClasses.php if that's not clear.
>
Ah right... previously I had it in php. I wanted to avoid listing it
twice but obviously thats a pretty costly way to do that.
This will make more sense to put in php if we start splitting up
components into the extension folders and generate the path list
dynamically for a given feature set.

>>> Have you looked at the profiling? On the Wikimedia app servers,
>>> even the simplest MW request takes 23ms, and gen=js takes 46ms. A
>>> static file like wikibits.js takes around 0.5ms. And that's with
>>> APC. You say MW on small sites is OK, I think it's slow and
>>> resource-intensive.
>>>
>>> That's not to say I'm sold on the idea of a static file cache, it
>>> brings its own problems, which I listed.
>>>
>>>
>> yea... but almost all script-loader request will be cached. it
>> does not need to check the DB or anything its just a key-file
>> lookup (since script-loader request pass a request key either its
>> there in cache or its not ...it should be on par with the simplest
>> MW request. Which is substantially shorter then around trip time
>> for getting each script individually, not to mention gziping which
>> can't otherwise be easily enabled for 3rd party installations.
>>
>
> I don't think that that comparison can be made so lightly. For the
> server operator, CPU time is much more expensive than time spent
> waiting for the network. And I'm not proposing that the client fetches
> each script individually, I'm proposing that scripts be concatentated
> and stored in a cache file which is then referenced directly in the HTML.
>

I understand. We could even check gziping support at page output time
and point to the gziped cached versions (analogous to making direct
links to the /script-cache folder of the of the present script-loader
setup )

My main question is how will this work for dynamic groups of scripts set
post page load that are dictated by user interaction or client state?

Its not as easy to setup static combined output files to point to when
you don't know what set of scripts you will be requesting ahead of time.

> $wgSquidMaxage is set to 31 days (2678400 seconds) for all wikis
> except wikimediafoundation.org. It's necessary to have a very long
> expiry time in order to fill the caches and achieve a high hit rate,
> because Wikimedia's access pattern is very broad, with the "long tail"
> dominating the request rate.
>
oky... so to preserve high cache level you could then have a single
static file that lists versions of js with a low expire and the rest
with high expire? Or maybe its so cheep to serve static files that it
does not mater and just leave everything with a low expire?

--michael


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


brion at wikimedia

Sep 28, 2009, 11:58 AM

Post #24 of 32 (1996 views)
Permalink
Re: JS2 design (was Re: Working towards branching MediaWiki 1.16) [In reply to]

On 9/27/09 4:15 AM, Aryeh Gregor wrote:
> On Fri, Sep 25, 2009 at 9:55 PM, Michael Dale<mdale [at] wikimedia> wrote:
>> ...right... we would want to avoid lots of live hacks. But I think we want
>> to avoid lots of live hacks anyway. A serious javascript bug would only
>> affect the pages that where generated in thous hours that it was a bug was
>> present not the 30 days that your characterizing the lag time of page
>> generation.
>>
>> Do you have stats on that?... its surprising to me that pages are
>> re-generated that rarely... How do central notice campaigns work?
>
> They insert the notice client-side using JavaScript. The HTML served
> is thus always the same.

Yeah, it's kind of tricky to do right; but if you can keep the loader
consistent and compatible, and have predictable expirations on the JS,
such things can work pretty reliably.

-- brion

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


mdale at wikimedia

Sep 28, 2009, 12:21 PM

Post #25 of 32 (1984 views)
Permalink
Re: JS2 design (Read this Not Previous) [In reply to]

~ dough ~ Disregard previous, bad key stroke sent rather than save to draft.

Tim Starling wrote:
> Michael Dale wrote:
>
>> That is part of the idea of centrally hosting reusable client-side
>> components so we control the jquery version and plugin set. So a
>> new version won't "come along" until its been tested and
>> integrated.
>>
>
> You can't host every client-side component in the world in a
> subdirectory of the MediaWiki core. Not everyone has commit access to
> it. Nobody can hope to properly test every MediaWiki extension.
>
> Most extension developers write an extension for a particular site,
> and distribute their code as-is for the benefit of other users. They
> have no interest in integration with the core. If they find some
> jQuery plugin on the web that defines an interface that conflicts with
> MediaWiki, say jQuery.load() but with different parameters, they're
> not going to be impressed when you tell them that to make it work with
> MediaWiki, they need to rewrite the plugin and get it tested and
> integrated.
>
> Different modules should have separate namespaces. This is a key
> property of large, maintainable systems of code.
>

Right.. I agree the client side code needs more deployable modularly.
It just tricky to manage all those relationships in php, but it appears
it will be necessary to do so...

If designing a given component as a jQuery plug-in, then I think it
makes sense to put it in the jQuery namespace ... otherwise you won't be
able to reference jQuery things locally and no-conflict compatible way.
Unless we create a mw wrapper of some sorts but I don't know how
necessary that is atm... i guess it would be slightly cleaner.


>> I agree that the present system of parsing top of the javascipt
>> file on every script-loader generation request is un-optimized.
>> (the idea is those script-loader generations calls happen rarely
>> but even still it should be cached at any number of levels. (ie
>> checking the filemodifcation timestamp, witting out a php or
>> serialized file .. or storing it in any of the other cache levels
>> we have available, memcahce, database, etc )
>>
>
> Actually it parses the whole of the JavaScript file, not the top, and
> it does it on every request that invokes WebStart.php, not just on
> mwScriptLoader.php requests. I'm talking about
> jsAutoloadLocalClasses.php if that's not clear.
>
Ah right... previously I had it in php. I wanted to avoid listing it
twice but obviously thats a pretty costly way to do that.
This will make more sense to put in php if we start splitting up
components into the extension folders and generate the path list
dynamically for a given feature set.

>>> Have you looked at the profiling? On the Wikimedia app servers,
>>> even the simplest MW request takes 23ms, and gen=js takes 46ms. A
>>> static file like wikibits.js takes around 0.5ms. And that's with
>>> APC. You say MW on small sites is OK, I think it's slow and
>>> resource-intensive.
>>>
>>> That's not to say I'm sold on the idea of a static file cache, it
>>> brings its own problems, which I listed.
>>>
>>>
>> yea... but almost all script-loader request will be cached. it
>> does not need to check the DB or anything its just a key-file
>> lookup (since script-loader request pass a request key either its
>> there in cache or its not ...it should be on par with the simplest
>> MW request. Which is substantially shorter then around trip time
>> for getting each script individually, not to mention gziping which
>> can't otherwise be easily enabled for 3rd party installations.
>>
>
> I don't think that that comparison can be made so lightly. For the
> server operator, CPU time is much more expensive than time spent
> waiting for the network. And I'm not proposing that the client fetches
> each script individually, I'm proposing that scripts be concatentated
> and stored in a cache file which is then referenced directly in the HTML.
>

I understand. (its analogous to making direct links to the /script-cache
folder instead of requesting the files through the script-loader entry
point )

My main question is how will this work for dynamic groups of scripts set
post page load that are dictated by user interaction or client state?

Do we just ignore this possibly and grab any necessary module components
based on pre-defined module sets in php that get passed down to javascript?

Its not as easy to setup static combined output files to point to when
you don't know what set of scripts you will be requesting...

hmm... if we had a predictable key format we could do a request for the
static file. if we get a 404 then we do a request a dynamic request to
generate the static file?.. Subsequent interactions would hit that
static file? that seems ugly though.

> $wgSquidMaxage is set to 31 days (2678400 seconds) for all wikis
> except wikimediafoundation.org. It's necessary to have a very long
> expiry time in order to fill the caches and achieve a high hit rate,
> because Wikimedia's access pattern is very broad, with the "long tail"
> dominating the request rate.
>
oky... so to preserve high cache level you could have a single static
file that lists versions of js with a low expire and the rest with high
expire? Or maybe its so cheep to serve static files that it does not
mater and just leave everything with a low expire?

--michael


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

First page Previous page 1 2 Next page Last page  View All Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.