Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

CPRT feasibility

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


dnessett at yahoo

Aug 20, 2009, 10:19 AM

Post #1 of 3 (419 views)
Permalink
CPRT feasibility

I am looking into the feasibility of writing a comprehensive parser regression test (CPRT). Before writing code, I thought I would try to get some idea of how well such a tool would perform and what gotchas might pop up. An easy first step is to run dump_HTML and capture some data and statistics.

I tried to run the version of dumpHTML in r54724, but it failed. So, I went back to 1.14 and ran that version against a small personal wiki database I have. I did this to get an idea of what structures dump_HTML produces and also to get some performance data with which to do projections of runtime/resource usage.

I ran dumpHTML twice using the same MW version and same database. I then diff'd the two directories produced. One would expect no differences, but that expectation is wrong. I got a bunch of diffs of the following form (I have put a newline between the two file names to shorten the line length):

diff -r HTML_Dump/articles/d/n/e/User~Dnessett_Bref_Examples_Example1_Chapter_1_4083.html
HTML_Dump2/articles/d/n/e/User~Dnessett_Bref_Examples_Example1_Chapter_1_4083.html
77,78c77,78
< Post-expand include size: 16145/2097152 bytes
< Template argument size: 12139/2097152 bytes
---
> Post-expand include size: 16235/2097152 bytes
> Template argument size: 12151/2097152 bytes

I looked at one of the html files to see where these differences appear. They occur in an html comment:

<!--
NewPP limit report
Preprocessor node count: 1891/1000000
Post-expand include size: 16145/2097152 bytes
Template argument size: 12139/2097152 bytes
Expensive parser function count: 0/100
-->

Does anyone have an idea of what this is for? Is there any way to configure MW so it isn't produced?

I will post some performance data later.

Dan




_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


agarrett at wikimedia

Aug 20, 2009, 11:50 AM

Post #2 of 3 (392 views)
Permalink
Re: CPRT feasibility [In reply to]

On 20/08/2009, at 6:19 PM, dan nessett wrote:
> <!--
> NewPP limit report
> Preprocessor node count: 1891/1000000
> Post-expand include size: 16145/2097152 bytes
> Template argument size: 12139/2097152 bytes
> Expensive parser function count: 0/100
> -->
>
> Does anyone have an idea of what this is for? Is there any way to
> configure MW so it isn't produced?

As the title implies, it is a performance limit report. You can remove
it by changing the parser options passed to the parser. Look at the
ParserOptions and Parser classes.

--
Andrew Garrett
agarrett [at] wikimedia
http://werdn.us/


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


dnessett at yahoo

Aug 20, 2009, 3:01 PM

Post #3 of 3 (390 views)
Permalink
Re: CPRT feasibility [In reply to]

--- On Thu, 8/20/09, Andrew Garrett <agarrett [at] wikimedia> wrote:

> As the title implies, it is a performance limit report. You
> can remove 
> it by changing the parser options passed to the parser.
> Look at the 
> ParserOptions and Parser classes.

Thanks. It appears dumpHTML has no command option to turn off this report (the parser option is mEnableLimitReport).

A question to the developer community: Is it better to change dumpHTML to accept a new option (to turn off Limit Reports) or copy dumpHTML into a new CPRT extension and change it. I strongly feel that having two extensions with essentially the same functionality is bad practice. On the other hand, changing dumpHTML means it becomes dual purposed, which has the potential of making it big and ugly. One compromise position is to attempt to factor dumpHTML so that a core provides common functionality to two different upper layers. However, I don't know if that is acceptable practice for extensions.

A short term fix is to pipe the output of dumpHTML through a filter that removes the Limit Report. That would allow developers to use dumpHTML (as a CPRT) fairly quickly to find and fix the known-to-fail parser bugs. The downside to this is it may significantly degrade performance.

Dan




_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.