Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

bareword sub lookups

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


zefram at fysh

Oct 26, 2009, 6:29 PM

Post #1 of 9 (402 views)
Permalink
bareword sub lookups

Attached is a patch that changes how the tokeniser looks up subroutines,
when they're referenced by a bareword, for prototype and const-sub
purposes. Formerly, it has looked up bareword subs directly in the
package, which is contrary to the way the generated op tree looks up
the sub, via an rv2cv op. The patch makes the tokeniser generate the
rv2cv op earlier, and dig around in that.

The motivation for this is to allow modules to hook the rv2cv op
creation, to affect the name->subroutine lookup process. Currently,
such hooking affects op execution as intended, but everything goes wrong
with a bareword ref where the tokeniser looks at some unrelated CV,
or a blank space, in the package. With the patch in place, an rv2cv
hook correctly affects the tokeniser and therefore the prototype-based
aspects of parsing.

The patch also changes ck_subr (which applies the argument context and
checking parts of prototype behaviour) to handle subs referenced by an
RV const op inside the rv2cv, where formerly it would only handle a gv
op inside the rv2cv. This is to support the most likely kind of modified
rv2cv op.

I have a module that performs this sort of rv2cv hooking: Lexical::Var.
One of the things it can do is lexically-scoped functions (which may be
imported from elsewhere). For example, in the code

sub croak { print "ribbit!" }
use Lexical::Var '&croak' => \&Carp::croak;
&croak("fail!");

the subroutine reference in the call starts out in the form
rv2cv(const("croak")). In the absence of Lexical::Var, ck_rvconst would
have turned it into rv2cv(gv(*main::croak)), which would refer to the
main::croak() function. But Lexical::Var hooks that process and turns
it into rv2cv(const(\&Carp::croak)), referencing Carp::croak().

Currently Lexical::Var can't handle bareword function calls, because
of the tokeniser problem. It can only handle sigil-prefixed calls.
This seriously damages the usability of anything that works this way.
Lexical-Var-0.002 contains some inactive code and tests for bareword
function calls, to match this core patch.

The patch has one drawback that I'm aware of. Creating the rv2cv op
forces a ref stored in the package, as a space-saving representation of
a constant sub, to be upgraded to a full GV and CV. The preexisting code
carefully avoided forcing creation of a CV in this case. This causes two
test in t/op/gv.t, which are specifically looking for this behaviour,
to fail, so the patch marks them as TODO. It's not at all simple to
recover the space optimisation in the presence of the rv2cv-based logic.
The inlining of constant subs does still work, however.

As far as I know, nothing else gets broken by the patch. All other
tests pass. My greatest suspicion of unintended breakage is around the
intuit_method() function, which I don't understand sufficiently to be
confident about.

I'd really like to have package-independent lexical subroutine importation
fully working in 5.12.

-zefram
Attachments: rv2cv_patch (7.71 KB)


nick at ccl4

Oct 27, 2009, 3:07 AM

Post #2 of 9 (385 views)
Permalink
Re: bareword sub lookups [In reply to]

On Tue, Oct 27, 2009 at 01:29:40AM +0000, Zefram wrote:

> The patch has one drawback that I'm aware of. Creating the rv2cv op
> forces a ref stored in the package, as a space-saving representation of
> a constant sub, to be upgraded to a full GV and CV. The preexisting code
> carefully avoided forcing creation of a CV in this case. This causes two
> test in t/op/gv.t, which are specifically looking for this behaviour,
> to fail, so the patch marks them as TODO. It's not at all simple to
> recover the space optimisation in the presence of the rv2cv-based logic.
> The inlining of constant subs does still work, however.
>
> As far as I know, nothing else gets broken by the patch. All other
> tests pass. My greatest suspicion of unintended breakage is around the
> intuit_method() function, which I don't understand sufficiently to be
> confident about.
>
> I'd really like to have package-independent lexical subroutine importation
> fully working in 5.12.

I'd really like *not* to loose that space saving though. That's a general
memory performance regression, to allow a specialist new feature.

I don't understand the tokeniser enough to quickly figure out a fix, and I'm
unlikely to have time to in the next few months. Heck, I don't understand the
tokeniser and op generation enough to figure out if this patch is safe, in
that it doesn't have unintended side effects.

> diff --git a/t/op/gv.t b/t/op/gv.t
> index e04c2ca..f920113 100644
> --- a/t/op/gv.t
> +++ b/t/op/gv.t
> @@ -329,8 +329,11 @@ $::{oonk} = \"Value";
> is (ref $::{ga_shloip}, 'SCALAR', "Export of proxy constant as is");
> is (ref $::{oonk}, 'SCALAR', "Export doesn't affect original");
> is (eval 'ga_shloip', "Value", "Constant has correct value");
> -is (ref $::{ga_shloip}, 'SCALAR',
> - "Inlining of constant doesn't change represenatation");
> +{
> + local $TODO = "ref-as-glob broken by rv2cv-based subroutine lookup";
> + is (ref $::{ga_shloip}, 'SCALAR',
> + "Inlining of constant doesn't change represenatation");
> +}
>
> delete $::{ga_shloip};
>
> @@ -433,7 +436,10 @@ sub non_dangling {
> non_dangling();
> is (ref $::{oonk}, 'SCALAR', "Export doesn't affect original");
> is (eval 'zap', "Value", "Constant has correct value");
> -is (ref $::{zap}, 'SCALAR', "Exported target is also a PCS");
> +{
> + local $TODO = "ref-as-glob broken by rv2cv-based subroutine lookup";
> + is (ref $::{zap}, 'SCALAR', "Exported target is also a PCS");
> +}

Nicholas Clark


zefram at fysh

Oct 27, 2009, 3:32 AM

Post #3 of 9 (385 views)
Permalink
Re: bareword sub lookups [In reply to]

Nicholas Clark wrote:
>I'd really like *not* to loose that space saving though. That's a general
>memory performance regression,

Hmm. There might be a solution involving creating an ephemeral CV during
the parsing process, which gets thrown away again once the inlining
is done. I'm thinking that the base rv2cv checker could examine the
package it's targeting and convert a ref-as-glob into a temporary const
CV, without affecting what's stored in the package. There's a nastiness
around taking persistent refs to the const sub, where we *do* want to
upgrade in the package, which might be resolved by a weak ref somewhere.

In case it wasn't clear: the space optimisation remains intact for
constants that are not used. Most of the Fcntl zoo will remain compact,
for example. The optimisation is only broken for constants that are
inlinably used.

>to allow a specialist new feature.

I'm hoping that the new feature will be routinely used, not specialist,
because it's significantly superior to the way function importation is
currently managed.

By the way, Lexical::Var offers another way to do constant inlining.
Use of a scalar lexical var will be compiled to a const op if the scalar
is SvREADONLY:

$ perl -MO=Concise -lwe 'use Lexical::Var "\$pi" => \3.14159265358979323846; print $pi'
6 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 34 -e:1) v:%,{ ->3
5 <@> print vK ->6
3 <0> pushmark s ->4
4 <$> const[NV 3.14159265358979] s ->5
-e syntax OK

-zefram


nick at ccl4

Oct 27, 2009, 3:46 AM

Post #4 of 9 (386 views)
Permalink
Re: bareword sub lookups [In reply to]

On Tue, Oct 27, 2009 at 10:32:00AM +0000, Zefram wrote:
> Nicholas Clark wrote:
> >I'd really like *not* to loose that space saving though. That's a general
> >memory performance regression,
>
> Hmm. There might be a solution involving creating an ephemeral CV during
> the parsing process, which gets thrown away again once the inlining
> is done. I'm thinking that the base rv2cv checker could examine the
> package it's targeting and convert a ref-as-glob into a temporary const
> CV, without affecting what's stored in the package. There's a nastiness
> around taking persistent refs to the const sub, where we *do* want to
> upgrade in the package, which might be resolved by a weak ref somewhere.

Changing constant subs is already already a default warning. If I understand
it correctly, the current implementation will grab the value of the constant
at compile time, so any subsequent change will not be noticed. Hence if I
understand correctly what you're suggesting is no worse from that point of
view - subsequent changes to a constant subroutine would not be noticed.


I was also wondering if it's possible to put the pointer to the SV* that is
the reference in the symbol table (sitting in the place of a typeglob) into
the const OP that your patch generates, and having something later on "notice"
that and cope with converting it to a real subroutine at that point.
That "later on" might be the base rv2cv checker.

I might be misunderstanding something here.

I've not checked the implementation, but from memory, when a proxy constant
subroutine is upgraded, the SV that is the reference is passed to sv_upgrade()
and becomes the GV, so that it is the same SV with the same address. Does that
help?

> In case it wasn't clear: the space optimisation remains intact for
> constants that are not used. Most of the Fcntl zoo will remain compact,
> for example. The optimisation is only broken for constants that are
> inlinably used.

It's still about .5k per subroutine created, IIRC.

> >to allow a specialist new feature.
>
> I'm hoping that the new feature will be routinely used, not specialist,
> because it's significantly superior to the way function importation is
> currently managed.

Which I guess means that it would take about 5 years to reach wide scale
adoption.

Nicholas Clark


zefram at fysh

Oct 27, 2009, 4:14 AM

Post #5 of 9 (389 views)
Permalink
Re: bareword sub lookups [In reply to]

Nicholas Clark wrote:
> what you're suggesting is no worse from that point of
>view - subsequent changes to a constant subroutine would not be noticed.

Oh yes, it's exactly the same semantics there.

People get rather hung up on compile-time function resolution as being a
problem. Not only do we have constant subs inlined, but with prototypes
affecting compilation the boat has long since sailed on being able to
freely replace subs at runtime. Inlining is a feature.

>I was also wondering if it's possible to put the pointer to the SV* that is
>the reference in the symbol table (sitting in the place of a typeglob) into
>the const OP that your patch generates, and having something later on "notice"
>that and cope with converting it to a real subroutine at that point.

I had some half-baked notion along those lines, early on in developing
the patch. Maybe the RV could stand in as GV ior CV in the op tree, in
much the same way that a GV currently can stand in for an RV referencing
a CV in rv2cv and entersub ops. Needs some careful analysis to get it
right, but in the op and pp code, which I find much less hateful than
the tokeniser.

>I've not checked the implementation, but from memory, when a proxy constant
>subroutine is upgraded, the SV that is the reference is passed to sv_upgrade()
>and becomes the GV, so that it is the same SV with the same address. Does that
>help?

I didn't notice that. It might well be useful, if the RV is referenced
directly from ops. Neat way to handle referential equality.

>It's still about .5k per subroutine created, IIRC.

Bloody hell, I had no idea it was that big. Which structs are you
counting here? *rummage in perl source*

struct gv 16
struct xpvgv 28
struct gp 44
struct cv 16
struct xpvcv 64

That's 148 octets, plus malloc overhead, maybe 200. I don't see how you
get to 500. Did I miss something big? Or are you perhaps thinking of
how the structs were before the great slimming in 5.9?

>Which I guess means that it would take about 5 years to reach wide scale
>adoption.

Yes, undoubtedly.

-zefram


nick at ccl4

Oct 27, 2009, 6:29 AM

Post #6 of 9 (388 views)
Permalink
Re: bareword sub lookups [In reply to]

On Tue, Oct 27, 2009 at 11:14:57AM +0000, Zefram wrote:
> Nicholas Clark wrote:

> >It's still about .5k per subroutine created, IIRC.
>
> Bloody hell, I had no idea it was that big. Which structs are you
> counting here? *rummage in perl source*
>
> struct gv 16
> struct xpvgv 28
> struct gp 44
> struct cv 16
> struct xpvcv 64
>
> That's 148 octets, plus malloc overhead, maybe 200. I don't see how you
> get to 500. Did I miss something big? Or are you perhaps thinking of
> how the structs were before the great slimming in 5.9?

Probably before the diet. Plus all globs had magic then, which was at least
another 24 bytes. I think I was actually also counting the cost of the hash
entry in the symbol table.

Nicholas Clark


zefram at fysh

Oct 28, 2009, 1:47 AM

Post #7 of 9 (370 views)
Permalink
Re: bareword sub lookups [In reply to]

Think I've worked out what to do about globs and ops. It's complicated
by the fact that a glob knows its own name, so a ref or other small
structure can't be upgraded to a glob if we're pointed directly at the
ref. So the intermediate thing in the op tree has to be the glob's name.

New op type "namegv". It's an SV op, carrying a fully-qualified
symbol name. If executed, it'll look up and return the glob of that
name, implicitly creating the glob or upgrading a sub-glob structure
to a first-class glob if necessary. However, it should pretty much
never actually execute, because namegv ops should always be converted
to something else in the op building process.

If a refgen op is applied to a namegv, it upgrades the namegv to a gv op,
creating or upgrading the glob. Same for anything else that actually
needs a glob when it operates on a glob op.

If an rv2cv op is applied to a namegv, it stays as it is initially.
Anything seeking a const sub can see the namegv and do a package lookup
that doesn't create a glob. An argumentless entersub applied to such a
const sub resolves to a const op. Other cases of entersub, or refgen,
or other things applied to an rv2cv makes the namegv upgrade.

This scheme is extensible. Other ops can change to delay upgrading namegv
ops, in the hope of avoiding glob generation altogether. This lets us
introduce more sub-glob representations in the package, with localised
effect on glob-handling ops.

Care is required to avoid namegv ops hanging around. With the name
as an SV, a group of namegv ops for the same name would quickly become
bigger than the glob that they're replacing. I wonder about interning
the name SVs in the stash, for names that don't yet have a glob, but we
already have a meaning for a PV stored directly in the stash.

-zefram


zefram at fysh

Nov 6, 2009, 5:20 PM

Post #8 of 9 (326 views)
Permalink
Re: bareword sub lookups [In reply to]

Nicholas and I have discussed the patch that I posted 11 days ago
regarding overridability of bareword sub lookups. We're both in two minds
regarding the real value of the part of the const-sub space optimisation
that the patch lost. I've worked out a feasible way to regain that
optimisation: downgrade a GV to the placeholder format, when a GV op is
being discarded and the substitution is safe. This takes only a small
amount of code, and is much simpler and safer than the op-based schemes
that I described earlier. Nicholas said he'd be satisfied with this
sort of arrangement.

The attached patch is the resulting revised version of the bareword
sub patch. (For clarity: Nicholas has not yet reviewed this code.)
It incorporates the original patch (allowing rv2cv op hookers to control
prototype processing), the GV-downgrading addition, and a mention in
perldelta. All core tests pass, including in particular the unmodified
form of the t/op/gv.t tests for the const-sub space optimisation.

-zefram
Attachments: d0 (12.4 KB)


rgs at consttype

Nov 8, 2009, 6:35 AM

Post #9 of 9 (317 views)
Permalink
Re: bareword sub lookups [In reply to]

2009/11/7 Zefram <zefram [at] fysh>:
> Nicholas and I have discussed the patch that I posted 11 days ago
> regarding overridability of bareword sub lookups.  We're both in two minds
> regarding the real value of the part of the const-sub space optimisation
> that the patch lost.  I've worked out a feasible way to regain that
> optimisation: downgrade a GV to the placeholder format, when a GV op is
> being discarded and the substitution is safe.  This takes only a small
> amount of code, and is much simpler and safer than the op-based schemes
> that I described earlier.  Nicholas said he'd be satisfied with this
> sort of arrangement.
>
> The attached patch is the resulting revised version of the bareword
> sub patch.  (For clarity: Nicholas has not yet reviewed this code.)
> It incorporates the original patch (allowing rv2cv op hookers to control
> prototype processing), the GV-downgrading addition, and a mention in
> perldelta.  All core tests pass, including in particular the unmodified
> form of the t/op/gv.t tests for the const-sub space optimisation.

Thanks, applied to bleadperl as f7461760003db2ce68155c97ea6c1658e96fcd27.

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.