Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

RFC: pack()ing long words

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


david at cantrell

Aug 13, 2012, 4:41 AM

Post #1 of 14 (491 views)
Permalink
RFC: pack()ing long words

pack() and unpack() can handle words of 1, 2, 4, and (if you built your
perl right) 8 bytes. And I use the same magic characters (although
without using pack and unpack) in Data::Hexdumper.

However, I want to extend it to support 16 byte words and, indeed, to
support any other length words. 3 byte words, for example.

I'd like to remain as compatible as possible with the characters used in
pack()'s templates, but there's nothing there for what I want.

So, can I propose that we pick a character for this purpose and at least
define some syntax for specifying a word length, endian-ness, and repeat
count for it, even if it isn't implemented yet?

Something like this perhaps:
X5,4>

which means:
X - whatever letter we choose
5 - word length
,4 - optional repeat count
> - optional endian-ness

--
David Cantrell | Enforcer, South London Linguistic Massive

Fashion label: n: a liferaft for personalities
which lack intrinsic buoyancy


fawaka at gmail

Aug 13, 2012, 4:49 AM

Post #2 of 14 (475 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, Aug 13, 2012 at 2:41 PM, David Cantrell <david [at] cantrell> wrote:
> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
> perl right) 8 bytes. And I use the same magic characters (although
> without using pack and unpack) in Data::Hexdumper.
>
> However, I want to extend it to support 16 byte words and, indeed, to
> support any other length words. 3 byte words, for example.

That sounds like an excellent idea.

> I'd like to remain as compatible as possible with the characters used in
> pack()'s templates, but there's nothing there for what I want.
>
> So, can I propose that we pick a character for this purpose and at least
> define some syntax for specifying a word length, endian-ness, and repeat
> count for it, even if it isn't implemented yet?
>
> Something like this perhaps:
> X5,4>
>
> which means:
> X - whatever letter we choose
> 5 - word length
> ,4 - optional repeat count
> > - optional endian-ness

I don't like the syntax much, but I'm not sure I can think of
something better. Maybe «X{5}4»?

Leon


h.m.brand at xs4all

Aug 13, 2012, 4:51 AM

Post #3 of 14 (476 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, 13 Aug 2012 12:41:02 +0100, David Cantrell
<david [at] cantrell> wrote:

> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
> perl right) 8 bytes. And I use the same magic characters (although
> without using pack and unpack) in Data::Hexdumper.
>
> However, I want to extend it to support 16 byte words and, indeed, to
> support any other length words. 3 byte words, for example.
>
> I'd like to remain as compatible as possible with the characters used in
> pack()'s templates, but there's nothing there for what I want.
>
> So, can I propose that we pick a character for this purpose and at least
> define some syntax for specifying a word length, endian-ness, and repeat
> count for it, even if it isn't implemented yet?
>
> Something like this perhaps:
> X5,4>
>
> which means:
> X - whatever letter we choose
> 5 - word length
> ,4 - optional repeat count
> > - optional endian-ness

Counterintuitive in that order

l4 is 4 longs, so if the 4 in your example matches the 4 in l4, I'd
guess that

X5>4

would be more intuitive

I kinda like your approach though. What about bits? Why restrict to
multiple of 8 bits?

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.14 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/


david at cantrell

Aug 13, 2012, 10:16 AM

Post #4 of 14 (474 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, Aug 13, 2012 at 01:51:03PM +0200, H.Merijn Brand wrote:

> Counterintuitive in that order
>
> l4 is 4 longs, so if the 4 in your example matches the 4 in l4, I'd
> guess that
>
> X5>4
>
> would be more intuitive

But then you have a problem if someone wants native endian-ness - you
can't tell whether X54 is a single word of 54 bytes, or four words of
five bytes each.

Leon's suggestion of putting the word length in {curlies} works nicely.

> I kinda like your approach though. What about bits? Why restrict to
> multiple of 8 bits?

Hmmm ... and just have X{40}4 instead of X{5}4 for a five byte
(== 40 bits) word. I haven't looked at the source (and am somewhat
terrified to do so TBH) but I can see that getting a bit tricky. If you
consume just three bits with X3, does the next template thingy, and all
the ones after it, have to start and stop half way through a byte?
Yuck.

Maybe that's something to allow for in the syntax, but leave the
implementation until even later.

--
David Cantrell | Bourgeois reactionary pig

Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt


fawaka at gmail

Aug 13, 2012, 11:31 AM

Post #5 of 14 (471 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david [at] cantrell> wrote:
> I haven't looked at the source (and am somewhat
> terrified to do so TBH) but I can see that getting a bit tricky.

pp_pack.c is where you need to be. It's rather full of "tricky".

Leon


demerphq at gmail

Aug 13, 2012, 12:32 PM

Post #6 of 14 (472 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On 13 August 2012 20:31, Leon Timmermans <fawaka [at] gmail> wrote:
> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david [at] cantrell> wrote:
>> I haven't looked at the source (and am somewhat
>> terrified to do so TBH) but I can see that getting a bit tricky.
>
> pp_pack.c is where you need to be. It's rather full of "tricky".

And that is the diplomatic way to put it. :-)

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"


david at cantrell

Aug 13, 2012, 3:35 PM

Post #7 of 14 (474 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On 13/08/2012 20:32, demerphq wrote:
> On 13 August 2012 20:31, Leon Timmermans <fawaka [at] gmail> wrote:
>> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david [at] cantrell> wrote:
>>> I haven't looked at the source (and am somewhat
>>> terrified to do so TBH) but I can see that getting a bit tricky.
>>
>> pp_pack.c is where you need to be. It's rather full of "tricky".
>
> And that is the diplomatic way to put it. :-)

I shouldn't have looked, but I did. It is dark and full of terrors, and
I want my mummy.

Thankfully, all I'm asking for right now is that the syntax be defined,
so that I can go ahead and implement it in my module, and make sure I
use the same magic letter as pack() will do if pack/unpack ever sprout
this tentacle in the future. So if people agree that this is a good
thing to do, all that actually needs patching for now is the
documentation, something like ...

x A null byte (a.k.a ASCII NUL, "\000", chr(0))
X Back up a byte.
+
+ Y NOT YET IMPLEMENTED. This syntax is reserved for a word of
+ an arbitrary number of bits. The number of bits is
+ specified as a base ten number in {braces}, eg Y{40} for
+ a forty bit (or five byte) word.
+
@ Null-fill or truncate to absolute position, counted from the
start of the innermost ()-group.
. Null-fill or truncate to absolute position specified by
the value.

--
David Cantrell | http://www.cantrell.org.uk/david

Eye have a spelling chequer / It came with my pea sea
It planely marques four my revue / Miss Steaks eye kin knot sea.
Eye strike a quay and type a word / And weight for it to say
Weather eye am wrong oar write / It shows me strait a weigh.


craig.a.berry at gmail

Aug 13, 2012, 5:18 PM

Post #8 of 14 (470 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, Aug 13, 2012 at 5:35 PM, David Cantrell <david [at] cantrell> wrote:
> On 13/08/2012 20:32, demerphq wrote:
>> On 13 August 2012 20:31, Leon Timmermans <fawaka [at] gmail> wrote:
>>> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david [at] cantrell> wrote:
>>>> I haven't looked at the source (and am somewhat
>>>> terrified to do so TBH) but I can see that getting a bit tricky.
>>>
>>> pp_pack.c is where you need to be. It's rather full of "tricky".
>>
>> And that is the diplomatic way to put it. :-)
>
> I shouldn't have looked, but I did. It is dark and full of terrors, and
> I want my mummy.
>
> Thankfully, all I'm asking for right now is that the syntax be defined,
> so that I can go ahead and implement it in my module, and make sure I
> use the same magic letter as pack() will do if pack/unpack ever sprout
> this tentacle in the future. So if people agree that this is a good
> thing to do, all that actually needs patching for now is the
> documentation, something like ...
>
> x A null byte (a.k.a ASCII NUL, "\000", chr(0))
> X Back up a byte.
> +
> + Y NOT YET IMPLEMENTED. This syntax is reserved for a word of
> + an arbitrary number of bits. The number of bits is
> + specified as a base ten number in {braces}, eg Y{40} for
> + a forty bit (or five byte) word.
> +
> @ Null-fill or truncate to absolute position, counted from the
> start of the innermost ()-group.
> . Null-fill or truncate to absolute position specified by
> the value.

pp_pack.c has its terrors, but even I can see that if you unpack an
integer type you get an IV or a UV on the stack (that's what mPUSHi
and mPUSHu do). What is it you want pushed on the stack when you
unpack a 16-byte word? It's not going to be anything that Perl can
represent as a numeric value unless you also implement
arbitrary-precision numerics. Or have I misunderstood what you're
wanting?


h.m.brand at xs4all

Aug 13, 2012, 11:39 PM

Post #9 of 14 (465 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, 13 Aug 2012 19:18:42 -0500, "Craig A. Berry"
<craig.a.berry [at] gmail> wrote:

> On Mon, Aug 13, 2012 at 5:35 PM, David Cantrell <david [at] cantrell> wrote:
> > On 13/08/2012 20:32, demerphq wrote:
> >> On 13 August 2012 20:31, Leon Timmermans <fawaka [at] gmail> wrote:
> >>> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david [at] cantrell> wrote:
> >>>> I haven't looked at the source (and am somewhat
> >>>> terrified to do so TBH) but I can see that getting a bit tricky.
> >>>
> >>> pp_pack.c is where you need to be. It's rather full of "tricky".
> >>
> >> And that is the diplomatic way to put it. :-)
> >
> > I shouldn't have looked, but I did. It is dark and full of terrors, and
> > I want my mummy.
> >
> > Thankfully, all I'm asking for right now is that the syntax be defined,
> > so that I can go ahead and implement it in my module, and make sure I
> > use the same magic letter as pack() will do if pack/unpack ever sprout
> > this tentacle in the future. So if people agree that this is a good
> > thing to do, all that actually needs patching for now is the
> > documentation, something like ...
> >
> > x A null byte (a.k.a ASCII NUL, "\000", chr(0))
> > X Back up a byte.
> > +
> > + Y NOT YET IMPLEMENTED. This syntax is reserved for a word of
> > + an arbitrary number of bits. The number of bits is
> > + specified as a base ten number in {braces}, eg Y{40} for
> > + a forty bit (or five byte) word.
> > +
> > @ Null-fill or truncate to absolute position, counted from the
> > start of the innermost ()-group.
> > . Null-fill or truncate to absolute position specified by
> > the value.
>
> pp_pack.c has its terrors, but even I can see that if you unpack an
> integer type you get an IV or a UV on the stack (that's what mPUSHi
> and mPUSHu do). What is it you want pushed on the stack when you
> unpack a 16-byte word? It's not going to be anything that Perl can
> represent as a numeric value unless you also implement
> arbitrary-precision numerics. Or have I misunderstood what you're
> wanting?

Math::bigint?

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.14 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/


david at cantrell

Aug 14, 2012, 10:10 AM

Post #10 of 14 (469 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, Aug 13, 2012 at 07:18:42PM -0500, Craig A. Berry wrote:

> pp_pack.c has its terrors, but even I can see that if you unpack an
> integer type you get an IV or a UV on the stack (that's what mPUSHi
> and mPUSHu do). What is it you want pushed on the stack when you
> unpack a 16-byte word?

Dunno. I guess the thing that is the closest match to an int would be a
string of bytes in the right order, so a PV.

--
David Cantrell | top google result for "topless karaoke murders"

I know that you believe you understand what you think you wrote, but
I'm not sure you realize that what you wrote is not what you meant.


bulk88 at hotmail

Aug 15, 2012, 1:01 AM

Post #11 of 14 (459 views)
Permalink
RE: RFC: pack()ing long words [In reply to]

----------------------------------------
> Date: Tue, 14 Aug 2012 18:10:02 +0100
> From: david [at] cantrell
> To: perl5-porters [at] perl
> Subject: Re: RFC: pack()ing long words
>
> On Mon, Aug 13, 2012 at 07:18:42PM -0500, Craig A. Berry wrote:
>
> > pp_pack.c has its terrors, but even I can see that if you unpack an
> > integer type you get an IV or a UV on the stack (that's what mPUSHi
> > and mPUSHu do). What is it you want pushed on the stack when you
> > unpack a 16-byte word?
>
> Dunno. I guess the thing that is the closest match to an int would be a
> string of bytes in the right order, so a PV.
A packed string (PV, binary gibberish, not ASCII numbers) is the best, or others say method of last resort to pack/unward any word size ints. If a sufficient big number library is loaded into the script, then return/take big number objects. I integrated http://search.cpan.org/~salva/Math-Int64-0.26/lib/Math/Int64.pm into my XS library, so Math::Int64 objects are accepted and returned, or 8byte PV strings otherwise. Size is checked to make sure the scalar is exactly 8 characters long for sanity reasons.


perl.p5p at rjbs

Aug 15, 2012, 6:46 PM

Post #12 of 14 (460 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

* David Cantrell <david [at] cantrell> [2012-08-13T13:16:09]
> Leon's suggestion of putting the word length in {curlies} works nicely.
>
> > I kinda like your approach though. What about bits? Why restrict to
> > multiple of 8 bits?
>
> Hmmm ... and just have X{40}4 instead of X{5}4 for a five byte
> (== 40 bits) word.

I liked his suggestion as well. I'd probably suggest r, as it's an
rbitrary-length word.

--
rjbs
Attachments: signature.asc (0.48 KB)


ikegami at adaelis

Aug 16, 2012, 6:20 AM

Post #13 of 14 (454 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On Mon, Aug 13, 2012 at 7:41 AM, David Cantrell <david [at] cantrell>wrote:

> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
> perl right) 8 bytes. And I use the same magic characters (although
> without using pack and unpack) in Data::Hexdumper.
>
> However, I want to extend it to support 16 byte words and, indeed, to
> support any other length words. 3 byte words, for example.
>
> I'd like to remain as compatible as possible with the characters used in
> pack()'s templates, but there's nothing there for what I want.
>
> So, can I propose that we pick a character for this purpose and at least
> define some syntax for specifying a word length, endian-ness, and repeat
> count for it, even if it isn't implemented yet?
>
> Something like this perhaps:
> X5,4>
>
> which means:
> X - whatever letter we choose
> 5 - word length
> ,4 - optional repeat count
> > - optional endian-ness
>

We already have (...)4 for repeating.


sfandino at yahoo

Aug 27, 2012, 3:34 AM

Post #14 of 14 (425 views)
Permalink
Re: RFC: pack()ing long words [In reply to]

On 08/13/2012 01:51 PM, H.Merijn Brand wrote:
> On Mon, 13 Aug 2012 12:41:02 +0100, David Cantrell
> <david [at] cantrell> wrote:
>
>> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
>> perl right) 8 bytes. And I use the same magic characters (although
>> without using pack and unpack) in Data::Hexdumper.
>>
>> However, I want to extend it to support 16 byte words and, indeed, to
>> support any other length words. 3 byte words, for example.
>>
>> I'd like to remain as compatible as possible with the characters used in
>> pack()'s templates, but there's nothing there for what I want.
>>
>> So, can I propose that we pick a character for this purpose and at least
>> define some syntax for specifying a word length, endian-ness, and repeat
>> count for it, even if it isn't implemented yet?
>>
>> Something like this perhaps:
>> X5,4>
>>
>> which means:
>> X - whatever letter we choose
>> 5 - word length
>> ,4 - optional repeat count
>> > - optional endian-ness
>
> Counterintuitive in that order
>
> l4 is 4 longs, so if the 4 in your example matches the 4 in l4, I'd
> guess that
>
> X5>4
>
> would be more intuitive
>
> I kinda like your approach though. What about bits? Why restrict to
> multiple of 8 bits?
>

but them, to make it really interesting, pack/unpack would have to
handle all the templates at arbitrary bit offsets.

For instance:

pack("X{23}CX9");
# should extract
# - a bitstring from bits 0 to 22
# - an unsigned char from bits 23 to 30
# - a bitstring from bits 31 to 39

Or at least, it should be possible to do that for X templates:

pack("X{23}X8X9");
# should extract
# - a bitstring from bits 0 to 22
# - a bitstring from bits 23 to 30
# - a bitstring from bits 31 to 39

Also, when considering bit strings and bit-offsets, "endianess" may be
interpreted in several ways: where do you place the byte boundaries? at
the byte boundaries in the string being unpacked? every 8 bits on the
sub-bitstring? starting from the left or from the right?

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.