Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Lua: parser interface

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


tstarling at wikimedia

May 1, 2012, 12:15 AM

Post #1 of 10 (449 views)
Permalink
Lua: parser interface

I've written up a proposed interface between the MediaWiki parser and Lua:

<https://www.mediawiki.org/wiki/Extension:Scribunto/Parser_interface_design>

In summary: the Lua function is called with a single argument, which
is an object representing the parser interface. The object is roughly
equivalent to a PPFrame.

The object would have a property called "args", which is a table with
its "index" metamethod overridden to provide lazy-initialised access
to the parser function arguments with a brief syntax:

{{#invoke:module|func|name=value}}

function p.func(frame)
return frame.args.name --- returns "value"
end

There would be two methods for recursive preprocessing:

* preprocess() provides basic expansion of wikitext
* callTemplate() provides an API for template invocation, since I
imagine that would otherwise be a common use case for preprocess().
Using preprocess() to expand a template with arbitrary arguments would
be difficult.

Like a normal parser function, the Lua function returns text which is
not modified any further by the preprocessor.

Please see the wiki page for a more detailed description, including
rationale.

Any comments would be greatly appreciated.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


vasilvv at gmail

May 1, 2012, 10:24 AM

Post #2 of 10 (451 views)
Permalink
Re: Lua: parser interface [In reply to]

Thank you for bringing those issues to the public discussion, Tim,
they are really worth it.

On Tue, May 1, 2012 at 11:15 AM, Tim Starling <tstarling [at] wikimedia> wrote:
> I've written up a proposed interface between the MediaWiki parser and Lua:
>
> <https://www.mediawiki.org/wiki/Extension:Scribunto/Parser_interface_design>
>
> In summary: the Lua function is called with a single argument, which
> is an object representing the parser interface. The object is roughly
> equivalent to a PPFrame.
>
> The object would have a property called "args", which is a table with
> its "index" metamethod overridden to provide lazy-initialised access
> to the parser function arguments with a brief syntax:
>
> {{#invoke:module|func|name=value}}
>
> function p.func(frame)
>   return frame.args.name --- returns "value"
> end

I like this part. Also, I really enjoy the idea of making a separate
parser frame for script instead of running it in the parent template's
frame.

I am a bit leery though about the part where you suggest that
name-value arguments ({{#invoke:module|func|param=value}}) should be
parsed by engine, not the script. Don't you have to expand those
arguments in order to parse them, hence making any form of
lazy-expanding impossible?

> There would be two methods for recursive preprocessing:
>
> * preprocess() provides basic expansion of wikitext
> * callTemplate() provides an API for template invocation, since I
> imagine that would otherwise be a common use case for preprocess().
> Using preprocess() to expand a template with arbitrary arguments would
> be difficult.
>
> Like a normal parser function, the Lua function returns text which is
> not modified any further by the preprocessor.

This is the part which I strongly oppose. Providing direct
preprocessor access to Lua scripts is a bad idea. There are two key
reasons for this:
1. Preprocessor is slow.
2. You would have to work out many very subtle issues with time out
and nested Lua scripts. This includes timeout subtleties caused by the
preprocessor slowness (load a slow template, and given the small Lua
time limit, it will cause PHP to show a fatal error due to emergency
timeout; even if you fix it, the standalone version uses ulimit, and
it may be more difficult to fix).

Now, let me go through your suggested use cases and propose some alternatives:

1. As an alternative to a string literal, to include snippets of
wikitext which are intended to be editable by people who don't know
Lua.
I think it would be in fact better if you provided an interface for
getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text
makes it is difficult to combine human-readable and machine-readable
versions.

2. During migration, to call complex metatemplates which have not yet
been ported to Lua, or to test migrated components independently
instead of migrating all at once.
That would eventually lead them to becoming permanent. Bugzilla quips,
an authoritative reference on Wikimedia practices, says that
"temporary solutions have a terrible habit of becoming permanent,
around here". Hence I would suggest that we avoid the temptation in
first place.

3. To provide access to miscellaneous parser functions and variables.
Now, this is a really bad idea. It is like making a scary hack an
official way to do things. It actually defies the first design
principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much
more uglier than using appropriate API like mw.page.name(), it is also
a one of the slowest ways to do this. I have benchmarked it, and it is
actually ~450 times slower than accessing the title object directly.
Lua was (and is) meant to improve the readability of templates, not to
clutter them with stuff like articlesNum = tonumber( preprocess(
"{{NUMBEROFARTICLES:R}}" ) ).
Solution: proper API would do the job (actually I am currently working on it).

4. To allow Lua to construct tag invocations, such as <ref> and <gallery>.
We could make a #tag-like function to do this, just as we do with
parser functions.

I feel myself much more comfortable with the original return {expand =
true} idea, which causes the wikitext to be expanded in the new
Scribunto call frame.

> Please see the wiki page for a more detailed description, including
> rationale.

Thank you for writing such a detail description.

I am a bit puzzled about the "always use named arguments scheme" part,
because it is not how the standard Lua library works.

I guess that's all my concerns for now.

Thanks,
Victor.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


wicke at wikidev

May 1, 2012, 11:51 AM

Post #3 of 10 (432 views)
Permalink
Re: Lua: parser interface [In reply to]

On 05/01/2012 09:15 AM, Tim Starling wrote:
> In summary: the Lua function is called with a single argument, which
> is an object representing the parser interface. The object is roughly
> equivalent to a PPFrame.

+1 for the abstract frame object.

> The object would have a property called "args", which is a table with
> its "index" metamethod overridden to provide lazy-initialised access
> to the parser function arguments with a brief syntax:
>
> {{#invoke:module|func|name=value}}
>
> function p.func(frame)
> return frame.args.name --- returns "value"
> end
>
> There would be two methods for recursive preprocessing:
>
> * preprocess() provides basic expansion of wikitext

An alternative to a wikitext-specific preprocess() method and plain-text
argument values could be a conversion / expansion method on an opaque
'parser value' object:

frame.args.name.expandTo( 'text/x-mediawiki' ) --- returns "value"

This would make it possible to work with other formats apart from wikitext.

I recently added an API like this in Parsoid (the method is called 'as'
there), and liked the way that worked out for parser functions. I am
currently using the 'text/plain' type to retrieve a text expansion with
comments etc stripped, and 'tokens/x-mediawiki' for expanded tokens
(~list of tags and strings). Additional formats can be supported without
a proliferation of methods. Each value object has a reference to its
frame, and can be passed around and eventually lazily expanded
elsewhere. Expansion results can be cached inside the value object and
shared between multiple use sites (the value is associated with a single
frame after all).

The Parsoid .as method additionally takes a callback argument to support
asynchronous expansions. This might be too complex for user-friendly Lua
scripting, but could still be something worth considering in the longer
term. It could be added as a separate 'expandToAsync' method.

The conversion of wikitext or other formats to an opaque value object
could be achieved using an object constructor:

--- 'value text' is parsed lazily
ParserValue( 'text/x-mediawiki', 'value text', frame )

The frame might be the passed-in parent frame, or a custom one
constructed with args assembled from other ParserValues.

Calls to existing templates could be supported with a convenient
TemplateParserValue constructor, which does not specify how a template
call is represented internally.

TemplateParserValue( 'tpl', args ).expandTo( 'text/plain' )

Finally, a ParserValue (or a list of those) could be used for the return
type of functions to support output formats other than plain text.

Overall, I would love to keep the access to values as opaque as possible
to enable back-end optimizations and lazy expansions with sharing.
Opening a path towards content representations other than plain
(wiki-)text such as tokens, an AST or a DOM tree should be very useful
for future parser development.

Gabriel


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

May 1, 2012, 5:21 PM

Post #4 of 10 (439 views)
Permalink
Re: Lua: parser interface [In reply to]

On 02/05/12 03:24, Victor Vasiliev wrote:
> I am a bit leery though about the part where you suggest that
> name-value arguments ({{#invoke:module|func|param=value}}) should be
> parsed by engine, not the script. Don't you have to expand those
> arguments in order to parse them, hence making any form of
> lazy-expanding impossible?

No, you don't have to expand the arguments in order to extract equals
signs for name/value pairs. The equals signs are already identified by
the preprocessor's parser, for the purposes of lazy expansion of
template arguments. See PPFrame::newChild() and the implementation of
the #switch parser function.

[...]
> This is the part which I strongly oppose. Providing direct
> preprocessor access to Lua scripts is a bad idea. There are two key
> reasons for this:
> 1. Preprocessor is slow.

We can limit the input size, or temporarily reduce the general parser
limits like post-expand include size and node count. We can also hook
into PPFrame::expand() to periodically check for a Lua timeout, if
that is necessary.

The preprocessor is slow now, it won't become slower by allowing Lua
to call it.

> 2. You would have to work out many very subtle issues with time out
> and nested Lua scripts. This includes timeout subtleties caused by the
> preprocessor slowness (load a slow template, and given the small Lua
> time limit, it will cause PHP to show a fatal error due to emergency
> timeout; even if you fix it, the standalone version uses ulimit, and
> it may be more difficult to fix).

The scenario you give in brackets will not happen. If a Lua timeout
occurs when the parser is executing, the Lua script will terminate
when the parser returns control to it. The timeout is not missed.

It doesn't matter if there are several levels of parser/Lua recursion
when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.

The emergency timeout mechanism is functionally equivalent to PHP's
request timeout, so the emergency timeout can probably just be
infinite, and we can rely on the request timeout to terminate
long-running parse requests, as we do now. We could have a Lua script
time limit of a few seconds, and a request timeout of 3 minutes.

> Now, let me go through your suggested use cases and propose some alternatives:
>
> 1. As an alternative to a string literal, to include snippets of
> wikitext which are intended to be editable by people who don't know
> Lua.
> I think it would be in fact better if you provided an interface for
> getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text
> makes it is difficult to combine human-readable and machine-readable
> versions.

Maybe you are thinking of some sort of virtual wikidata system
involving extracting little snippets of text from infobox invocations
or something. I am not. I would rather use the real wikidata for that.

I am talking about including large, wikitext-formatted chunks of
content language.

> 2. During migration, to call complex metatemplates which have not yet
> been ported to Lua, or to test migrated components independently
> instead of migrating all at once.
> That would eventually lead them to becoming permanent. Bugzilla quips,
> an authoritative reference on Wikimedia practices, says that
> "temporary solutions have a terrible habit of becoming permanent,
> around here". Hence I would suggest that we avoid the temptation in
> first place.

I don't think it's morally wrong to provide a migration tool.
Migration will be a huge task, and will continue for years. People who
migrate metatemplates to Lua will need lots of tools.

> 3. To provide access to miscellaneous parser functions and variables.
> Now, this is a really bad idea. It is like making a scary hack an
> official way to do things. It actually defies the first design
> principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much
> more uglier than using appropriate API like mw.page.name(), it is also
> a one of the slowest ways to do this. I have benchmarked it, and it is
> actually ~450 times slower than accessing the title object directly.
> Lua was (and is) meant to improve the readability of templates, not to
> clutter them with stuff like articlesNum = tonumber( preprocess(
> "{{NUMBEROFARTICLES:R}}" ) ).
> Solution: proper API would do the job (actually I am currently working on it).

We can provide an API for such things at some point in the future. I
am not very keen on just merging whatever interface you are privately
working on, without any public review.

I am publishing my proposed interface before I write the code for it,
so that I can respond to the comments on it without appearing to be
too invested in any given solution. I wish that you would occasionally
do the same. Rewriting code that you've spent many hours on can be
emotionally difficult. Perhaps that's why you've made no more changes
to ustring.c despite the problems with its interface.

> 4. To allow Lua to construct tag invocations, such as <ref> and <gallery>.
> We could make a #tag-like function to do this, just as we do with
> parser functions.
>
> I feel myself much more comfortable with the original return {expand =
> true} idea, which causes the wikitext to be expanded in the new
> Scribunto call frame.

That would lead to double-expansion in cases where text derived from
input arguments need to be concatenated with wikitext to be expanded.
Consider:

return {
expand = true,
text = formatHeader( frame.args.gallery_header ) .. '\n' ..
'<gallery>' .. images .. '</gallery>' }

> I am a bit puzzled about the "always use named arguments scheme" part,
> because it is not how the standard Lua library works.

It gives flexibility for future development. That was not a core
principle driving the design of the standard Lua library.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


vasilvv at gmail

May 1, 2012, 6:28 PM

Post #5 of 10 (433 views)
Permalink
Re: Lua: parser interface [In reply to]

On Wed, May 2, 2012 at 4:21 AM, Tim Starling <tstarling [at] wikimedia> wrote:
> We can limit the input size, or temporarily reduce the general parser
> limits like post-expand include size and node count. We can also hook
> into PPFrame::expand() to periodically check for a Lua timeout, if
> that is necessary.
>
> The preprocessor is slow now, it won't become slower by allowing Lua
> to call it.

What I meant is that one of the goals of Lua project is to improve the
performance of template system, and by invoking the preprocessor you
slow it down because of parser overhauls.

>> 2. You would have to work out many very subtle issues with time out
>> and nested Lua scripts. This includes timeout subtleties caused by the
>> preprocessor slowness (load a slow template, and given the small Lua
>> time limit, it will cause PHP to show a fatal error due to emergency
>> timeout; even if you fix it, the standalone version uses ulimit, and
>> it may be more difficult to fix).
>
> The scenario you give in brackets will not happen. If a Lua timeout
> occurs when the parser is executing, the Lua script will terminate
> when the parser returns control to it. The timeout is not missed.

But the parser working time would still be included in normal Lua time limit?

> It doesn't matter if there are several levels of parser/Lua recursion
> when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.

What I meant is that it should be able to handle the time limit
correctly and avoid things like doubling time because of the nested
scripts.

[...]

>> 1. As an alternative to a string literal, to include snippets of
>> wikitext which are intended to be editable by people who don't know
>> Lua.
>> I think it would be in fact better if you provided an interface for
>> getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text
>> makes it is difficult to combine human-readable and machine-readable
>> versions.
>
> Maybe you are thinking of some sort of virtual wikidata system
> involving extracting little snippets of text from infobox invocations
> or something. I am not. I would rather use the real wikidata for that.

I am talking about the usual situation around there when the same data
(say, list of TFAs) is displayed in a variety of ways among the wiki.

> I am talking about including large, wikitext-formatted chunks of
> content language.

Well, then you can just dump its content into an output and tell
parser to expand it.

>> 2. During migration, to call complex metatemplates which have not yet
>> been ported to Lua, or to test migrated components independently
>> instead of migrating all at once.
>> That would eventually lead them to becoming permanent. Bugzilla quips,
>> an authoritative reference on Wikimedia practices, says that
>> "temporary solutions have a terrible habit of becoming permanent,
>> around here". Hence I would suggest that we avoid the temptation in
>> first place.
>
> I don't think it's morally wrong to provide a migration tool.
> Migration will be a huge task, and will continue for years. People who
> migrate metatemplates to Lua will need lots of tools.

Agreed.

(though I am still skeptical about preprocess() and believe there
might be pitfalls with this we are not currently seeing)

>> 3. To provide access to miscellaneous parser functions and variables.
>> Now, this is a really bad idea. It is like making a scary hack an
>> official way to do things. It actually defies the first design
>> principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much
>> more uglier than using appropriate API like mw.page.name(), it is also
>> a one of the slowest ways to do this. I have benchmarked it, and it is
>> actually ~450 times slower than accessing the title object directly.
>> Lua was (and is) meant to improve the readability of templates, not to
>> clutter them with stuff like articlesNum = tonumber( preprocess(
>> "{{NUMBEROFARTICLES:R}}" ) ).
>> Solution: proper API would do the job (actually I am currently working on it).
>
> We can provide an API for such things at some point in the future. I
> am not very keen on just merging whatever interface you are privately
> working on, without any public review.

Neither am I.

> I am publishing my proposed interface before I write the code for it,
> so that I can respond to the comments on it without appearing to be
> too invested in any given solution. I wish that you would occasionally
> do the same.

By "working" I meant prototyping the API with some demo functions and
writing a proposed API description for public review.

> Rewriting code that you've spent many hours on can be
> emotionally difficult. Perhaps that's why you've made no more changes
> to ustring.c despite the problems with its interface.

ustring.c work is on hold because of the problems with pure Lua
implementation design issues. I probably will include it into an API
proposal and discuss it together with other API issues.

>> 4. To allow Lua to construct tag invocations, such as <ref> and <gallery>.
>> We could make a #tag-like function to do this, just as we do with
>> parser functions.
>>
>> I feel myself much more comfortable with the original return {expand =
>> true} idea, which causes the wikitext to be expanded in the new
>> Scribunto call frame.
>
> That would lead to double-expansion in cases where text derived from
> input arguments need to be concatenated with wikitext to be expanded.
> Consider:
>
> return {
>   expand = true,
>   text = formatHeader( frame.args.gallery_header ) .. '\n' ..
>      '<gallery>' .. images .. '</gallery>' }

formatHeader( "{{{gallery_header}}}" )?

>> I am a bit puzzled about the "always use named arguments scheme" part,
>> because it is not how the standard Lua library works.
>
> It gives flexibility for future development. That was not a core
> principle driving the design of the standard Lua library.

Agreed.

Thanks for detailed response,
Victor.

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

May 1, 2012, 7:01 PM

Post #6 of 10 (440 views)
Permalink
Re: Lua: parser interface [In reply to]

On 02/05/12 04:51, Gabriel Wicke wrote:
> frame.args.name.expandTo( 'text/x-mediawiki' ) --- returns "value"
>
> This would make it possible to work with other formats apart from wikitext.

I can see how that would make sense when you're writing a parser, but
given the target audience for the Lua API, I think I would prefer to
provide an abbreviated interface.

How about frame.args.name as an abbreviation for
frame:getArgument('name'):expandTo( 'text/x-mediawiki' ) ?

And how about frame.plainArgs.name as an abbreviation for
frame:getArgument('name'):expandTo( 'text/plain' ) ?

> I recently added an API like this in Parsoid (the method is called 'as'
> there), and liked the way that worked out for parser functions. I am
> currently using the 'text/plain' type to retrieve a text expansion with
> comments etc stripped, and 'tokens/x-mediawiki' for expanded tokens
> (~list of tags and strings).

I know you're not really asking for a review of Parsoid and its
interfaces, but I worry as to whether your use of text/plain to
indicate wikitext with comments stripped is appropriate.

In MediaWiki, PPFrame::expand() takes any combination of 5 boolean
flags, and modifies its behaviour based on which of the Parser's 4
output types is selected, so if you were to match it for flexibility,
you would need 128 MIME types.

If I were to provide Lua with a richer interface to PPFrame::expand(),
I would be inclined to support at least some of those flags via named
options, rather than rolling them up into a single string parameter.
So instead of expandTo( 'text/plain' ) we might have:

frame:getArgument('name'):expand{
expand_args = false,
expand_templates = false,
respect_noinclude = false,
strip_comments = true }

Or, if forwards-compatibility requires that we don't support so many
orthogonal options, some of the options could be rolled in together.
The preceding could perhaps be written as:

frame:getArgument('name'):expand{ plain = true }

That doesn't preclude the use of overrides:

frame:getArgument('name'):expand{
plain = true,
strip_comments = false }

But it does seem like a can of worms. How about providing
getArgument(), which will return an opaque ParserValue object with a
single method called expand(). This method would theoretically take
named parameters, but currently, none are defined. With no parameters,
it provides some kind of reasonable template-expanding behaviour. Then
frame.args would provide an abbreviated syntax for expand() with no
parameters.

If there is a compelling use case for "plain" expansion, then we would
have to decide what options to PPFrame::expand() are needed to support
that use case, and then we would need to decide how to map them to
parameters to ParserValue.expand().

> The conversion of wikitext or other formats to an opaque value object
> could be achieved using an object constructor:
>
> --- 'value text' is parsed lazily
> ParserValue( 'text/x-mediawiki', 'value text', frame )
>
> The frame might be the passed-in parent frame, or a custom one
> constructed with args assembled from other ParserValues.

Yes, this is an interesting idea. But I think I would prefer the
factory to be a frame method rather than a global function. Also,
again, I am skeptical about the value of using a MIME type. How about
an interface allowing either:

frame:newParserValue( 'value text' )

or named arguments:

frame:newParserValue{
text = 'value text',
fruitiness = 'high',
}

> Calls to existing templates could be supported with a convenient
> TemplateParserValue constructor, which does not specify how a template
> call is represented internally.
>
> TemplateParserValue( 'tpl', args ).expandTo( 'text/plain' )

Yes, this is attractive, and could be done in the same way as
ParserValue objects above. But I think there is still a need for an
abbreviated interface:

frame:newTemplateParserValue{title = 'tpl', args = args}:expand()

abbreviated to:

frame:expandTemplate{title = 'tpl', args = args}

It doesn't just make the text shorter, it also reduces the number of
concepts that the user has to understand before they are able to use
the interface.

I know that adding such concepts gives greater flexibility, but an
increase in the number of concepts will steepen the learning curve,
and the terminology required to explain them risks being daunting. For
example, if someone has never programmed before, you can't expect them
to understand terms like "opaque object".

> Finally, a ParserValue (or a list of those) could be used for the return
> type of functions to support output formats other than plain text.
>
> Overall, I would love to keep the access to values as opaque as possible
> to enable back-end optimizations and lazy expansions with sharing.
> Opening a path towards content representations other than plain
> (wiki-)text such as tokens, an AST or a DOM tree should be very useful
> for future parser development.

For me, the main motivation behind providing a parallel "advanced
interface" along the lines you suggest would be to establish a
direction for future interface development.

Interfaces evolve mostly by analogy, so providing a well thought-out
"advanced interface" will influence future development even if nobody
ever uses it.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


tstarling at wikimedia

May 1, 2012, 7:23 PM

Post #7 of 10 (438 views)
Permalink
Re: Lua: parser interface [In reply to]

On 02/05/12 11:28, Victor Vasiliev wrote:
>> The scenario you give in brackets will not happen. If a Lua timeout
>> occurs when the parser is executing, the Lua script will terminate
>> when the parser returns control to it. The timeout is not missed.
>
> But the parser working time would still be included in normal Lua time limit?

For LuaSandbox, yes the parser time is included. For LuaStandalone the
parser time is not included in the limit, but it could be measured
using getrusage() if that were deemed important.

>> It doesn't matter if there are several levels of parser/Lua recursion
>> when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.
>
> What I meant is that it should be able to handle the time limit
> correctly and avoid things like doubling time because of the nested
> scripts.

Yes, that is done correctly also. Each LuaSandbox object has a single
timer which is started and stopped at the base recursion level and
ignored at higher levels of recursion.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Platonides at gmail

May 2, 2012, 3:28 AM

Post #8 of 10 (426 views)
Permalink
Re: Lua: parser interface [In reply to]

Is it possible to hook Lua function calls? If so, I'd make a template
expansion a "call" to a function with that name.
That was the interface I envisioned when thinking how I'd do it if
making the language from scratch to suit wikitext
(I drafted some code, but didn't reach to a barely mature level).




_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


vasilvv at gmail

May 2, 2012, 4:04 AM

Post #9 of 10 (426 views)
Permalink
Re: Lua: parser interface [In reply to]

On Wed, May 2, 2012 at 2:28 PM, Platonides <Platonides [at] gmail> wrote:
> Is it possible to hook Lua function calls? If so, I'd make a template
> expansion a "call" to a function with that name.
> That was the interface I envisioned when thinking how I'd do it if
> making the language from scratch to suit wikitext
> (I drafted some code, but didn't reach to a barely mature level).

Do you mean a situation when you have template X and call to function
X is a transclusion of template X? I have also thought about that as
well. Not all titles are legitimate Lua function names, but there are
more serious issues with that.

This is close to how the first implementation of InlineScripts was
done in 2009. As it turned out, this approach has numerous
disadvantages, the main of which are performance issues (introducing
overhauls by calling the functions through parser instead of direct
call; this is actually a big problem when you have many function
calls), inability to return non-string data (like arrays) and
impossibility of exporting multiple functions from one template.
That's why I strongly believe that all code should be in modules and
modules should be interacting only through Lua itself.

—Victor

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


wicke at wikidev

May 2, 2012, 5:55 AM

Post #10 of 10 (437 views)
Permalink
Re: Lua: parser interface [In reply to]

On 05/02/2012 04:01 AM, Tim Starling wrote:
> How about frame.args.name as an abbreviation for
> frame:getArgument('name'):expandTo( 'text/x-mediawiki' ) ?

Yep, that looks good to me. Maybe the specialized wikitext argument
variant could be called 'wikitextArgs' so that the general variant can
be used for ParserValues instead of the getArgument method?

> And how about frame.plainArgs.name as an abbreviation for
> frame:getArgument('name'):expandTo( 'text/plain' ) ?

Adding more xxxArgs methods does not seem to scale that well, and would
introduce a lot of extra method names to remember. It would also
encourage users to pick one representation when they would not need to
care about it, especially if they just pass through some content.

> I know you're not really asking for a review of Parsoid and its
> interfaces, but I worry as to whether your use of text/plain to
> indicate wikitext with comments stripped is appropriate.

I am not that happy with the text/plain bit too. text/x-mediawiki with a
separate progress or 'processing stage' component might be better, as it
would not conflate processing stage with the format. I am using a
numerical 'rank' value to track progress internally in Parsoid, but a
string would likely be user-friendlier for an external API like this.
There is an example for this further down in this post.

> If I were to provide Lua with a richer interface to PPFrame::expand(),
> I would be inclined to support at least some of those flags via named
> options, rather than rolling them up into a single string parameter.
> So instead of expandTo( 'text/plain' ) we might have:
>
> frame:getArgument('name'):expand{
> expand_args = false,
> expand_templates = false,
> respect_noinclude = false,
> strip_comments = true }
>
> Or, if forwards-compatibility requires that we don't support so many
> orthogonal options, some of the options could be rolled in together.
> The preceding could perhaps be written as:
>
> frame:getArgument('name'):expand{ plain = true }
>
> That doesn't preclude the use of overrides:
>
> frame:getArgument('name'):expand{
> plain = true,
> strip_comments = false }
>
> But it does seem like a can of worms.

Some grepping through core and extension code left me with the
impression that there are relatively few common sets of flags used. As
an example, NO_ARGS and NO_TEMPLATES always seem to be used as a pair in
situations where just comment and noinclude (and company) handling is
needed.

If there remain use cases for fully orthogonal flags, then those could
still be supported with optional (named) argument of course, as you note.

> How about providing
> getArgument(), which will return an opaque ParserValue object with a
> single method called expand(). This method would theoretically take
> named parameters, but currently, none are defined. With no parameters,
> it provides some kind of reasonable template-expanding behaviour. Then
> frame.args would provide an abbreviated syntax for expand() with no
> parameters.

Yes, this looks very good to me. ParseValue with a heavily defaulted
expand() method should be a good compromise between convenience for the
currently common case without compromising the ability to work with
other content types, or specify the processing stage through a name or
flags. So the 'plain' example could look somewhat like this:

arg:expand( format = 'tokens/x-mediawiki',
phase = '0.1_noComments' ) --- named processing phase

>> The conversion of wikitext or other formats to an opaque value object
>> could be achieved using an object constructor:
>>
>> --- 'value text' is parsed lazily
>> ParserValue( 'text/x-mediawiki', 'value text', frame )
>>
>> The frame might be the passed-in parent frame, or a custom one
>> constructed with args assembled from other ParserValues.
>
> Yes, this is an interesting idea. But I think I would prefer the
> factory to be a frame method rather than a global function.

I'd be happy with that too. Custom child frames could still be created
using a frame:newChild method.

> Also,
> again, I am skeptical about the value of using a MIME type. How about
> an interface allowing either:
>
> frame:newParserValue( 'value text' )
>
> or named arguments:
>
> frame:newParserValue{
> text = 'value text',
> fruitiness = 'high',
> }

+1, but I think some way to indicate the type and processing state of
the passed-in value will be needed if non-wikitext values are to be
supported. Further processing of a ParserValue depends on this
knowledge. Optional (named) arguments could be employed for this too:

frame:newParserValue{
type = 'tokens/x-mediawiki',
value = { { type = 'tag',
name = 'a',
attribs = { href = 'http://foo' }
},
"Some link text",
{ type = 'endtag', name = 'a' }
}
}

Defaulting to wikitext and fully-preprocessed text for the processing
phase would be fine with me. Any kind of type identifiers (MIME or not)
could of course be used. MIME has the advantage of being somewhat known
already, but other type identifiers might have other advantages.

> frame:newTemplateParserValue{title = 'tpl', args = args}:expand()
>
> abbreviated to:
>
> frame:expandTemplate{title = 'tpl', args = args}

+1

> It doesn't just make the text shorter, it also reduces the number of
> concepts that the user has to understand before they are able to use
> the interface.
>
> I know that adding such concepts gives greater flexibility, but an
> increase in the number of concepts will steepen the learning curve,
> and the terminology required to explain them risks being daunting. For
> example, if someone has never programmed before, you can't expect them
> to understand terms like "opaque object".

I pretty much agree. The abbreviated interface adds some complication by
providing a second way to do the same thing. This still seems to be
worth it if it helps people to get started.

Gabriel


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.