Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

PATCH make mktables run less often

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


public at khwilliamson

Nov 23, 2009, 3:53 PM

Post #1 of 8 (429 views)
Permalink
PATCH make mktables run less often

Attached is a patch to cut down the frequency of mktables running. It
turns out the problem was a combination of both Makefile and mktables.
I didn't realize that the files I removed were listed as dependencies in
Makefile, so it always thought it was out-of-date, and called mktables.
Part of the old code I hadn't changed caused mktables to remove all the
files when called with the -makelist option. This code is no longer
necessary, so I removed it.

I also changed Makefile to always call mktables, to let it decide if it
actually should recompile. I just commented it out for easy reversal.
The problem is that mktables depends on almost 800 files, located in the
To and lib subdirectories, and Makefile doesn't know about them all; and
they change from Unicode release to release. I'm assuming mktables.lst
was created because of the issues of trying to get them all known by
Makefile.

I did a search for the removed files and removed the only other
reference to them (in .gitignore); except I see them in Makefiles for
other platforms. I assume these get automatically generated from the
changed Makefile.SH

I also changed some typos in comments, rephrased a little of the
documentation, removed a no-longer-used subroutine, and fixed another
bug, in which it would fail to run if there was no mktables.lst at all,
added -p option to its call.

I'll work soon on a wrapper to the .t. I'm wondering though about the
generated pod. It should be put into the toc, etc. Would a zero-length
place-holder that mktables overwrites be the right solution?

Rafael Garcia-Suarez wrote:
> 2009/11/23 karl williamson <public [at] khwilliamson>:
>> There needs to be some discussion about how mktables gets called. It
>> generates all three of t/re/uniprops.t, lib/unicore/mktables.lst, and
>> pod/perluniprops.pod (besides Heavy.pl and all the tables). I meant to
>> patch Makefile to cause all three of these to be generated, but didn't
>> realize that I should really patch Makefile.SH instead, so my changes were
>> lost. But let's figure out what they should be before I submit a revision.
>
> As far as I can tell it will only generate perluniprops.pod when passed
> "-P pod", and t/re/uniprops.t when passed "-T t/re/uniprops.t".
>
>> These three files are essentially static from Unicode release to Unicode
>> release (as are all the tables, except if someone changes mktables itself to
>> generate different things.) The reason you saw mktables.lst change is
>> because it has a time stamp in it (that should have been the only
>> difference, and it was because I did a final make test, just to be sure,
>> after I had committed, but before I delivered; sorry). mktables.lst retains
>> the same format as it traditionally has had.
>
> No, I saw other changes, notably in the list of output files, since by
> default it doesn't generate the two files I just mentioned.
>
>> I was attempting to add the generation of all three of them to Makefile
>> because of what Nicolas said several months ago about wanting to generate
>> the pod file every time, as it was one less thing for the pumpking to worry
>> about when getting things out the door. I figured that we should be
>> consistent, so made that the behavior for all three. But this is a change
>> for mktables.lst. I don't know when that got generated in the past. Perhaps
>> it is something on the pumpking's list?
>
> I would favor generating everything by default, and making uniprops.t
> be, as Nicholas said, a wrapper around a generated file (so uniprops.t
> is listed in the MANIFEST and is picked up by the test harness.)
>
> I would also make -p the default (show progress of mktables during run),
> except maybe when stdout isn't a tty.
>
>> I see a couple of possibilities. One is to generate all three every time.
>> I guess that means they should come off the MANIFEST. But mktables.lst has
>> always been generated, and has been in MANIFEST.
>>
>> The other option I see is to generate them by hand when a new release of
>> Unicode is installed, and to put them in the MANIFEST. But if someone
>> changes mktables to generate different files, these would have to be
>> regenerated at that time as well, and someone might forget.
>>
>> What are people's opinions?
>>
>> Somewhat related, this new mktables takes longer than the old one to
>> execute. In part it is because it's doing a lot more things; in part it's
>> because it uses more object-oriented, inside-out-hash, functions and is
>> getting them in pure Perl. (Nicholas' suggestion from last week helped
>> quite a bit here, though). Even on my this-year's-model Linux box, it seems
>> like it takes forever. (I compile my development Perls without
>> optimization; it seems like it runs twice as fast on a 5.10 Perl that has
>> optimization, but I haven't done any benchmarking.) Adding the -p option to
>> the call to show progress helps me see how things are progressing, and
>> realize how much work it's actually doing. Should I do that in the
>> Makefile?
>>
>> I think this longer compilation time would be perfectly acceptable if it
>> didn't actually happen very often, like with the Encode tables. And there
>> is no reason to compile more frequently than the Encode tables. As I
>> mentioned, the underlying tables are static. New Unicode releases come out
>> about once a year or so. But Makefile is removing outputs before testing
>> whether it should call mktables, and of course, decides it should. I
>> haven't sat down to figure out how this is happening yet; nor have I
>> investigated when the Encode tables get cleaned; is it a normal clean, I
>> don't think so; maybe its a realclean. But I think mktables' outputs
>> shouldn't be cleaned at a lower level than Encode's. Comments?
>>
>>
>
Attachments: 0001-mktables-not-run-unless-needed.patch (8.79 KB)


rgs at consttype

Nov 24, 2009, 12:45 AM

Post #2 of 8 (396 views)
Permalink
Re: PATCH make mktables run less often [In reply to]

2009/11/24 karl williamson <public [at] khwilliamson>:
> Attached is a patch to cut down the frequency of mktables running.  It turns
> out the problem was a combination of both Makefile and mktables. I didn't
> realize that the files I removed were listed as dependencies in Makefile, so
> it always thought it was out-of-date, and called mktables. Part of the old
> code I hadn't changed caused mktables to remove all the files when called
> with the -makelist option.  This code is no longer necessary, so I removed
> it.

Thanks, I've applied it now.

> I also changed Makefile to always call mktables, to let it decide if it
> actually should recompile.  I just commented it out for easy reversal. The
> problem is that mktables depends on almost 800 files, located in the To and
> lib subdirectories, and Makefile doesn't know about them all; and they
> change from Unicode release to release.  I'm assuming mktables.lst was
> created because of the issues of trying to get them all known by Makefile.
>
> I did a search for the removed files and removed the only other reference to
> them (in .gitignore); except I see them in Makefiles for other platforms.  I
> assume these get automatically generated from the changed Makefile.SH

Well, no : they will need adjustment too. But I don't have Windows or
VMS machines or knowledge to test.

> I also changed some typos in comments, rephrased a little of the
> documentation, removed a no-longer-used subroutine, and fixed another bug,
> in which it would fail to run if there was no mktables.lst at all, added -p
> option to its call.
>
> I'll work soon on a wrapper to the .t.  I'm wondering though about the
> generated pod.  It should be put into the toc, etc.  Would a zero-length
> place-holder that mktables overwrites be the right solution?

That's a good point. I added it already in pod.lst. I don't think that
a placeholder is necessary for the pod file. I'll have a look at it.


rgs at consttype

Nov 24, 2009, 1:12 AM

Post #3 of 8 (394 views)
Permalink
Re: PATCH make mktables run less often [In reply to]

2009/11/24 Rafael Garcia-Suarez <rgs [at] consttype>:
>> I'll work soon on a wrapper to the .t.  I'm wondering though about the
>> generated pod.  It should be put into the toc, etc.  Would a zero-length
>> place-holder that mktables overwrites be the right solution?
>
> That's a good point. I added it already in pod.lst. I don't think that
> a placeholder is necessary for the pod file. I'll have a look at it.

Done now :
http://perl5.git.perl.org/perl.git/commitdiff/524ce141dcd46f87e73a300e1436937336261b19
Also rebuilt makefiles with pod/buildtoc (that rebuilds only the
pod-related part of makefiles.)


SteveHay at planit

Nov 24, 2009, 1:24 AM

Post #4 of 8 (397 views)
Permalink
RE: PATCH make mktables run less often [In reply to]

Rafael Garcia-Suarez wrote on 2009-11-24:
> 2009/11/24 karl williamson <public [at] khwilliamson>:
>> I did a search for the removed files and removed the only other
>> reference to them (in .gitignore); except I see them in Makefiles for
>> other platforms.  I assume these get automatically generated from the
>> changed Makefile.SH
>
> Well, no : they will need adjustment too. But I don't have Windows or
> VMS machines or knowledge to test.
>

Fixed for Win32 by 36ff7f95732aeec6ca1f7152b75dbedf5ce669d1


rgs at consttype

Nov 24, 2009, 1:30 AM

Post #5 of 8 (395 views)
Permalink
Re: PATCH make mktables run less often [In reply to]

2009/11/24 Steve Hay <SteveHay [at] planit>:
> Rafael Garcia-Suarez wrote on 2009-11-24:
>> 2009/11/24 karl williamson <public [at] khwilliamson>:
>>> I did a search for the removed files and removed the only other
>>> reference to them (in .gitignore); except I see them in Makefiles for
>>> other platforms.  I assume these get automatically generated from the
>>> changed Makefile.SH
>>
>> Well, no : they will need adjustment too. But I don't have Windows or
>> VMS machines or knowledge to test.
>>
>
> Fixed for Win32 by 36ff7f95732aeec6ca1f7152b75dbedf5ce669d1

More fixes for UNIX and Win32 (to be checked) :

http://perl5.git.perl.org/perl.git/commitdiff/58fa074c95937d22a584fe789986e618c9fec5ff

I suspect VMS might be broken, too.


demerphq at gmail

Dec 9, 2009, 12:27 PM

Post #6 of 8 (338 views)
Permalink
Re: PATCH make mktables run less often [In reply to]

2009/11/24 karl williamson <public [at] khwilliamson>:
> Attached is a patch to cut down the frequency of mktables running.  It turns
> out the problem was a combination of both Makefile and mktables. I didn't
> realize that the files I removed were listed as dependencies in Makefile, so
> it always thought it was out-of-date, and called mktables. Part of the old
> code I hadn't changed caused mktables to remove all the files when called
> with the -makelist option.  This code is no longer necessary, so I removed
> it.
>
> I also changed Makefile to always call mktables, to let it decide if it
> actually should recompile.  I just commented it out for easy reversal. The
> problem is that mktables depends on almost 800 files, located in the To and
> lib subdirectories, and Makefile doesn't know about them all; and they
> change from Unicode release to release.  I'm assuming mktables.lst was
> created because of the issues of trying to get them all known by Makefile.

Yes, and also because we used to basically just generate them all, and
there was no list, we just had dependencies on a few files, and for
some irritating reason mktables would run like, umm, 6 times on win32.
And so back in the day it used to regenerate everything each of those
six times. So i taught it about the list of files and made it more
intelligent so that it doing things over and over turns into a noop on
following runs. Please dont make it too slow, in particular certain
stat calls on win32 are effectively a file open on every file in the
tree. If you do do that pleasee enable sloppy stat.

cheers,
Yves


public at khwilliamson

Dec 9, 2009, 9:09 PM

Post #7 of 8 (333 views)
Permalink
Re: PATCH make mktables run less often [In reply to]

demerphq wrote:
> 2009/11/24 karl williamson <public [at] khwilliamson>:
>> Attached is a patch to cut down the frequency of mktables running. It turns
>> out the problem was a combination of both Makefile and mktables. I didn't
>> realize that the files I removed were listed as dependencies in Makefile, so
>> it always thought it was out-of-date, and called mktables. Part of the old
>> code I hadn't changed caused mktables to remove all the files when called
>> with the -makelist option. This code is no longer necessary, so I removed
>> it.
>>
>> I also changed Makefile to always call mktables, to let it decide if it
>> actually should recompile. I just commented it out for easy reversal. The
>> problem is that mktables depends on almost 800 files, located in the To and
>> lib subdirectories, and Makefile doesn't know about them all; and they
>> change from Unicode release to release. I'm assuming mktables.lst was
>> created because of the issues of trying to get them all known by Makefile.
>
> Yes, and also because we used to basically just generate them all, and
> there was no list, we just had dependencies on a few files, and for
> some irritating reason mktables would run like, umm, 6 times on win32.
> And so back in the day it used to regenerate everything each of those
> six times. So i taught it about the list of files and made it more
> intelligent so that it doing things over and over turns into a noop on
> following runs. Please dont make it too slow, in particular certain
> stat calls on win32 are effectively a file open on every file in the
> tree. If you do do that pleasee enable sloppy stat.
>
> cheers,
> Yves
>
I don't know about sloppy stat. But the patch does fix the bug in which
mktables got it wrong about needing to run or not. On my Linux/Vista
box (this year's model) it compiles in under 30 seconds, and now doesn't
redo itself unless it should.


demerphq at gmail

Dec 10, 2009, 12:48 AM

Post #8 of 8 (335 views)
Permalink
Re: PATCH make mktables run less often [In reply to]

2009/12/10 karl williamson <public [at] khwilliamson>:
> demerphq wrote:
>>
>> 2009/11/24 karl williamson <public [at] khwilliamson>:
>>>
>>> Attached is a patch to cut down the frequency of mktables running.  It
>>> turns
>>> out the problem was a combination of both Makefile and mktables. I didn't
>>> realize that the files I removed were listed as dependencies in Makefile,
>>> so
>>> it always thought it was out-of-date, and called mktables. Part of the
>>> old
>>> code I hadn't changed caused mktables to remove all the files when called
>>> with the -makelist option.  This code is no longer necessary, so I
>>> removed
>>> it.
>>>
>>> I also changed Makefile to always call mktables, to let it decide if it
>>> actually should recompile.  I just commented it out for easy reversal.
>>> The
>>> problem is that mktables depends on almost 800 files, located in the To
>>> and
>>> lib subdirectories, and Makefile doesn't know about them all; and they
>>> change from Unicode release to release.  I'm assuming mktables.lst was
>>> created because of the issues of trying to get them all known by
>>> Makefile.
>>
>> Yes, and also because we used to basically just generate them all, and
>> there was no list, we just had dependencies on a few files, and for
>> some irritating reason mktables would run like, umm, 6 times on win32.
>> And so back in the day it used to regenerate everything each of those
>> six times. So i taught it about the list of files and made it more
>> intelligent so that it doing things over and over turns into a noop on
>> following runs. Please dont make it too slow, in particular certain
>> stat calls on win32 are effectively a file open on every file in the
>> tree. If you do do that pleasee enable sloppy stat.
>>
>> cheers,
>> Yves
>>
> I don't know about sloppy stat.  But the patch does fix the bug in which
> mktables got it wrong about needing to run or not.  On my Linux/Vista box
> (this year's model) it compiles in under 30 seconds, and now doesn't redo
> itself unless it should.

Win32 doesnt support the full linux stat interface. In order to
determine the nlink (iirc) count on a file on windows the file must be
opened and then the calls made to against the open handle. This means
that a File::Find on win32 is massively slower than a File::Find on
linux, and it is much much worse if the directory being searched is a
NTFS share.

The sloppy stat variable $^WIN32_SLOPPY_STAT (iirc) will disable the
nlink count and make an instant speedup on any directory or file
scanning code on windows that uses stat as it no longer needs to open
the file to calculate a value that is actually very rarely used in the
win32 world.

Anyway, thanks for this patch! Sounds great.

cheers,
Yves




--
perl -Mre=debug -e "/just|another|perl|hacker/"

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.