davem at iabyn
Aug 21, 2012, 2:45 PM
Post #8 of 9
On Tue, Aug 21, 2012 at 10:09:31AM -0600, Karl Williamson wrote:
Re: Seeking guidance on Benchmarking and potentially removing swash caching
[In reply to]
> On 08/20/2012 04:22 AM, Dave Mitchell wrote:
> >On Sun, Aug 19, 2012 at 12:14:39PM -0600, Karl Williamson wrote:
> >>I would prefer to benchmark more realistic real-world data, and
> >>wonder if anyone has some ideas on such, or any other thoughts on
> >>this issue.
> >Well, one thing would be to benchmark a worst-case scenario. Pick a
> >swatch that has a large inversion list: i.e. something that is maximally
> >bad for binary search; then run a loop where only the same character is
> >repeatedly looked up (so minimally small cache). And see whether the
> >result is significantly bad.
> >PS: If a swatch has only a small, fixed number of fields, would it be
> >better to use an array rather than a hash to represent it?
> I'm not understanding you here very well.
> You use the term 'swatch'. I think you meant what I called 'swash'.
> The comments in the code refer to a 'swatch' as the 64-bit bit map
> (8 bytes) that each value in swash hashes is.
Brain fart on my part. I meant swash.
> If so, then I still don't understand. My email gave the results of
> using binary search with and without caching the last look-up
> result, and caching gave somewhat better results, so my proposal
> included caching, and so your test would just keep returning that
> cached value, so would be fine.
I wasn't paying close enough attention. Ignore me.
> I also don't understand the postscript. An inversion list is an
> ordered array, so that would seem to fit your description. Perhaps
> you meant an adaptive algorithm, which I hadn't thought of. Use the
> inversion list if the number of elements is small, and use a swash
> if it is larger. (The largest inversion list for an official Unicode
> property is about 1600 elements, which is 11 probes to search, BTW.)
> regcomp.c could decide for any given property or bracketed character
> class which one to use. Possibly, also, a swash that was getting
> too large and losing performance could be switched to an inversion
> list. That seems probably more trouble than the possible gain.
From your original email: "A swash is a hash with a few keys.". I was just
randomly wondering if the key set is small and well known, whether swashes
could be implemented using arrays, e.g.
use constant BITS => 3;
But knowing almost nothing about swashes, I have no idea whether that
would provide any benefit.
I've often wanted to drown my troubles, but I can't get my wife to go