
dnessett at yahoo
Aug 20, 2009, 10:19 AM
Post #1 of 3
(419 views)
Permalink
|
I am looking into the feasibility of writing a comprehensive parser regression test (CPRT). Before writing code, I thought I would try to get some idea of how well such a tool would perform and what gotchas might pop up. An easy first step is to run dump_HTML and capture some data and statistics. I tried to run the version of dumpHTML in r54724, but it failed. So, I went back to 1.14 and ran that version against a small personal wiki database I have. I did this to get an idea of what structures dump_HTML produces and also to get some performance data with which to do projections of runtime/resource usage. I ran dumpHTML twice using the same MW version and same database. I then diff'd the two directories produced. One would expect no differences, but that expectation is wrong. I got a bunch of diffs of the following form (I have put a newline between the two file names to shorten the line length): diff -r HTML_Dump/articles/d/n/e/User~Dnessett_Bref_Examples_Example1_Chapter_1_4083.html HTML_Dump2/articles/d/n/e/User~Dnessett_Bref_Examples_Example1_Chapter_1_4083.html 77,78c77,78 < Post-expand include size: 16145/2097152 bytes < Template argument size: 12139/2097152 bytes --- > Post-expand include size: 16235/2097152 bytes > Template argument size: 12151/2097152 bytes I looked at one of the html files to see where these differences appear. They occur in an html comment: <!-- NewPP limit report Preprocessor node count: 1891/1000000 Post-expand include size: 16145/2097152 bytes Template argument size: 12139/2097152 bytes Expensive parser function count: 0/100 --> Does anyone have an idea of what this is for? Is there any way to configure MW so it isn't produced? I will post some performance data later. Dan _______________________________________________ Wikitech-l mailing list Wikitech-l [at] lists https://lists.wikimedia.org/mailman/listinfo/wikitech-l
|