
davem at iabyn
Aug 21, 2012, 2:45 PM
Post #8 of 9
(101 views)
Permalink
|
|
Re: Seeking guidance on Benchmarking and potentially removing swash caching
[In reply to]
|
|
On Tue, Aug 21, 2012 at 10:09:31AM -0600, Karl Williamson wrote: > On 08/20/2012 04:22 AM, Dave Mitchell wrote: > >On Sun, Aug 19, 2012 at 12:14:39PM -0600, Karl Williamson wrote: > >>I would prefer to benchmark more realistic real-world data, and > >>wonder if anyone has some ideas on such, or any other thoughts on > >>this issue. > > > >Well, one thing would be to benchmark a worst-case scenario. Pick a > >swatch that has a large inversion list: i.e. something that is maximally > >bad for binary search; then run a loop where only the same character is > >repeatedly looked up (so minimally small cache). And see whether the > >result is significantly bad. > > > >PS: If a swatch has only a small, fixed number of fields, would it be > >better to use an array rather than a hash to represent it? > > > > > I'm not understanding you here very well. > > You use the term 'swatch'. I think you meant what I called 'swash'. > The comments in the code refer to a 'swatch' as the 64-bit bit map > (8 bytes) that each value in swash hashes is. Brain fart on my part. I meant swash. > If so, then I still don't understand. My email gave the results of > using binary search with and without caching the last look-up > result, and caching gave somewhat better results, so my proposal > included caching, and so your test would just keep returning that > cached value, so would be fine. I wasn't paying close enough attention. Ignore me. > > I also don't understand the postscript. An inversion list is an > ordered array, so that would seem to fit your description. Perhaps > you meant an adaptive algorithm, which I hadn't thought of. Use the > inversion list if the number of elements is small, and use a swash > if it is larger. (The largest inversion list for an official Unicode > property is about 1600 elements, which is 11 probes to search, BTW.) > regcomp.c could decide for any given property or bracketed character > class which one to use. Possibly, also, a swash that was getting > too large and losing performance could be switched to an inversion > list. That seems probably more trouble than the possible gain. From your original email: "A swash is a hash with a few keys.". I was just randomly wondering if the key set is small and well known, whether swashes could be implemented using arrays, e.g. $swash{BITS} becomes use constant BITS => 3; $swash[BITS] But knowing almost nothing about swashes, I have no idea whether that would provide any benefit. -- I've often wanted to drown my troubles, but I can't get my wife to go swimming.
|