Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

Orderly global destruction

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


marvin at rectangular

Feb 20, 2008, 6:43 PM

Post #1 of 9 (1344 views)
Permalink
Orderly global destruction

Greets,

This post will be of interest only to a few individuals, but I'm
sending it to the list so that it ends up in the public record, and
because it is relevant to the RT issue #32689 "KinoSearch::Simple and
DESTROY". It arises out of a phone conversation between Peter Karman
and I last night.

When you call Devel::Peek::Dump($obj) on a "blessed reference"
returned by the Perl function bless(), you see two SV structs: an
outer RV, and the inner object SV:

$ perl -MDevel::Peek -e 'my $blank; Dump bless \$blank, "Foo"'
SV = RV(0x81a418) at 0x800d80
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x800e34
SV = PVMG(0x80e440) at 0x800e34
REFCNT = 2
FLAGS = (PADBUSY,PADMY,OBJECT)
IV = 0
NV = 0
PV = 0
STASH = 0x800f0c "Foo"

KinoSearch stores only the inner SV, discarding the outer. It does
this by first incrementing the refcount of the inner object, then
decrementing the refcount of the RV wrapper, triggering the RV's
reclamation without killing off the inner object. It's kind of
pointless to create and then immediately destroy the outer RV, but we
have to do things that way because there are no public Perl C API
functions that allow for the direct creation of an inner object.

The relevant KS functions are prototyped in trunk/c_src/KinoSearch/
Obj.h and implemented in trunk/perl/xs/KinoSearch/Obj.c, but the
following function illustrates what the code would look like if it
were collected into one place and unrolled for clarity. We'll use
InvIndex as an example class.

typedef struct InvIndex {
VirtualTable *_;
SV *inner_perl_object;
Schema *schema;
Folder *folder;
} InvIndex;

InvIndex*
InvIndex_new(Schema *schema, Folder *folder)
{
kino_InvIndex *self =
(kino_InvIndex*)malloc(sizeof(kino_InvIndex));
SV *perl_obj_rv = newSV(0); /* soon to be an RV */

/* Turn perl_obj_rv into an RV and create an inner Perl object,
* which right now only the RV can "see".
*/
sv_setref_pv(perl_obj_rv, "KinoSearch::InvIndex", self);

/* Copy a pointer to the inner Perl object into the KS obj. */
self->inner_perl_obj = SvRV(perl_obj_ref);

/* Increment the refcount of the inner object, so that it won't
* go away when we decrement the RV.
*/
SvREFCNT_inc(self->inner_perl_obj);

/* Decrement the outer RV. Its refcount falls to 0, and Perl
* frees it. Now [self] holds the sole refcount keeping
* the inner Perl object alive.
*/
SvREFCNT_dec(perl_obj_rv);

/* Assign the vtable pointer for the InvIndex class. */
self->_ = (VirtualTable*)&INVINDEX;

/* Increment the refcounts of schema and folder and copy
pointers
* into the InvIndex object.
*/
self->schema = KINO_REFCOUNT_INC(schema);
self->folder = KINO_REFCOUNT_INC(folder);

return self;
}

There is an esoteric but important reason for doing things this way.

During Perl global destruction, all objects pointed to by RVs are
reclaimed in a first sweep. Then later, Perl goes and sweeps over all
existing SVs, repeatedly decrementing their refcounts until every last
one gets DESTROYed.

It is essential that all composite KS objects get DESTROYed during the
*first* sweep.

Consider this snippet, which does nothing but create an InvIndex object:

use MySchema;
my $invindex = MySchema->open('/path/to/invindex');

Under the current implementation, at the end of that script there are
three Perl objects in existence, BUT ONLY ONE RV: $invindex, visible
from Perl space. The first sweep will catch this one RV, decrementing
its refcount, and trigger an orderly cascade of detructor calls.

However, if KS were to keep RVs around instead of only the inner
object, at the end of that script there would be 4 RVs in existence:
one in Perl space, and three in C space. Because the second phase of
global destruction is unordered, it's entirely possible that either
the Schema or the Folder object will get DESTROYed before the
InvIndex. If that happens, we'll have a nasty double-free problem:

void
InvIndex_destroy(InvIndex *self)
{
REFCOUNT_DEC(self->schema); /* BAD if self->schema already
freed */
REFCOUNT_DEC(self->folder);
FREE_OBJ(self);
}

Fortunately, that can't happen the way things are now.

Congratulations if you've fought your way through this entire
message. If the reasoning seems complicated, well, it sort of is --
but that's XS for you. Now you know why there is absolutely no XS any
more in trunk/c_src, where most of KS is implemented. :)

(Footnote: On the off chance that someone who reads this message wants
to browse the relevant portions of the Perl source code, check out
Perl_destruct in perl.c, and sv_clean_objs/sv_clean_all in sv.c.)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Excerpt from a comment in sv.c:
--------------------------------------------------------------------------

The function visit() scans the SV arenas list, and calls a specified
function for each SV it finds which is still live - ie which has an
SvTYPE
other than all 1's, and a non-zero SvREFCNT. visit() is used by the
following functions (specified as [function that calls visit()] /
[function
called by visit() for each SV]):

sv_report_used() / do_report_used()
dump all remaining SVs (debugging aid)

sv_clean_objs() / do_clean_objs(),do_clean_named_objs()
Attempt to free all objects pointed to by RVs,
and, unless DISABLE_DESTRUCTOR_KLUDGE is
defined,
try to do the same for all objects indirectly
referenced by typeglobs too. Called once from
perl_destruct(), prior to calling
sv_clean_all()
below.

sv_clean_all() / do_clean_all()
SvREFCNT_dec(sv) each remaining SV, possibly
triggering an sv_free(). It also sets the
SVf_BREAK flag on the SV to indicate that the
refcnt has been artificially lowered, and thus
stopping sv_free() from giving spurious
warnings
about SVs which unexpectedly have a refcnt
of zero. called repeatedly from
perl_destruct()
until there are no SVs left.





_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


peter at peknet

Feb 20, 2008, 6:55 PM

Post #2 of 9 (1285 views)
Permalink
Re: Orderly global destruction [In reply to]

Marvin Humphrey wrote on 2/20/08 8:43 PM:
> Greets,
>
> This post will be of interest only to a few individuals, but I'm sending
> it to the list so that it ends up in the public record, and because it
> is relevant to the RT issue #32689 "KinoSearch::Simple and DESTROY". It
> arises out of a phone conversation between Peter Karman and I last night.
>

Thanks, Marvin. A thorough and well-commented post on the subject. And glad that
you clarified why the make-a-perl-RV-then-free-it step is necessary in the
initial object creation. I did not know the details of Perl's SV cleanup, and
can now see quite clearly why storing only the inner SV in C space is necessary.

--
Peter Karman . http://peknet.com/ . peter [at] peknet

_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


nate at verse

Feb 20, 2008, 8:04 PM

Post #3 of 9 (1290 views)
Permalink
Re: Orderly global destruction [In reply to]

On 2/20/08, Marvin Humphrey <marvin [at] rectangular> wrote:
> It's kind of
> pointless to create and then immediately destroy the outer RV, but we
> have to do things that way because there are no public Perl C API
> functions that allow for the direct creation of an inner object.

It's been a while since I've done this, but isn't the inner object
just a SV with the IV field pointing to the C object? If so, could
you create it directly? I think the exact type of SV shouldn't matter
much, as everything (I think ) can store an IV.

A little quick web searching seems to show that current idiom is
something like sv_set_iv(sv, PTR2IV(ptr)). Is the problem that PTR2IV
isn't public? If so, I'd guess that would have to be just an
oversight, as traditionally it was just a (IV) cast.

The solution you described seems like it would work fine, but maybe
the direct creation approach would sidestep the need for it.

Nathan Kurz
nate [at] verse (not very good at lurking)

_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


marvin at rectangular

Feb 21, 2008, 12:18 PM

Post #4 of 9 (1273 views)
Permalink
Re: Orderly global destruction [In reply to]

On Feb 20, 2008, at 8:04 PM, Nathan Kurz wrote:

> On 2/20/08, Marvin Humphrey <marvin [at] rectangular> wrote:
>> It's kind of
>> pointless to create and then immediately destroy the outer RV, but we
>> have to do things that way because there are no public Perl C API
>> functions that allow for the direct creation of an inner object.
>
> It's been a while since I've done this, but isn't the inner object
> just a SV with the IV field pointing to the C object? If so, could
> you create it directly? I think the exact type of SV shouldn't matter
> much, as everything (I think ) can store an IV.
>
> A little quick web searching seems to show that current idiom is
> something like sv_set_iv(sv, PTR2IV(ptr)). Is the problem that PTR2IV
> isn't public? If so, I'd guess that would have to be just an
> oversight, as traditionally it was just a (IV) cast.

PTR2IV is documented in perlguts. It's not in perlapi, but it should
be. Using it is safe.

However, it's not that simple. These are the changes to trunk/perl/xs/
KinoSearch/Obj.c that it would really take to switch to direct creation:

+ /* Code cribbed from Perl_sv_bless, in sv.c. */
+ stash = gv_stashpvn(self->_->class_name, self->_->class_name_len,
TRUE);
+ inner_obj = newSV(0);
+ SvOBJECT_on(inner_obj);
+ PL_sv_objcount++;
+ SvUPGRADE(inner_obj, SVt_PVMG);
+ SvSTASH_set(inner_obj, (HV*)SvREFCNT_inc(stash));
+ sv_setiv(inner_obj, PTR2IV(self));
+ self->ref.native = inner_obj;

As the comment indicates, the code was derived from Perl_sv_bless in
sv.c, which gets called via pp_bless in pp.c. It was a PITA to figure
out what was needed and what wasn't, which is partly why I hadn't
gotten to this before. Perl_sv_bless has barely changed from 5.8.3 to
5.10.0, so I think that at least that code would work smoothly across
all Perl versions supported by KS.

However, these are non-public API invocations:

* SvOBJECT_on is undocumented, but it just sets a flag.
* I have no idea what PL_sv_objcount is for, so I'm cargo-culting
there.

We also have to modify kino_Doc_to_native, in trunk/perl/xs/KinoSearch/
Doc.c:

+ HV *stash
+ = gv_stashpvn(self->_->class_name, self->_->class_name_len,
true);
+ Gv_AMupdate(stash);
+

What that does is turn on overloading. I think it only needs to be
called once per class, but we have to keep invoking it in case of
subclassing.

Gv_AMupdate is undocumented. I extracted it from the Gv_AMG macro,
also undocumented, used by sv_bless:

if (Gv_AMG(stash))
SvAMAGIC_on(sv);
else
(void)SvAMAGIC_off(sv);

Gv_AMG is defined in sv.h:

#define Gv_AMG(stash) (PL_amagic_generation && Gv_AMupdate(stash))

From my testing, it looks like as soon as there's any class that
turns on overloading, PL_amagic_generation turns positive and stays
positive, causing Gv_AMupdate(stash) to be invoked with each call to
sv_bless().

I should also mention that KS currently uses the macro SvAMAGIC_on in
Doc.c, even though it's undocumented. That was what it took to get
overloading working for Doc under Perl 5.8.x.

> The solution you described seems like it would work fine, but maybe
> the direct creation approach would sidestep the need for it.

The complexity resides in the concept. We're only talking about a few
lines of code, but writing them takes a fairly deep understanding of
the Perl code base.

The disappointing thing is that after all the work that it took to
disentangle sv_bless and suss out the direct creation approach, I see
absolutely no difference in the benchmarks. Creating and destroying
that extra RV just doesn't seem to matter.

So, we could commit the attached patch implementing direct creation,
and the conceptual basis of kino_Obj_create would marginally less
wacky, but to make that happen we have to use a bunch of undocumented
Perl internal functions. I appreciate the nudge that it took to get
me to fully explore this path, but it looks like a dead end.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Attachments: no_rv.diff (1.88 KB)


marvin at rectangular

Feb 21, 2008, 1:26 PM

Post #5 of 9 (1291 views)
Permalink
Re: Orderly global destruction [In reply to]

On Feb 21, 2008, at 12:18 PM, Marvin Humphrey wrote:

> The disappointing thing is that after all the work that it took to
> disentangle sv_bless and suss out the direct creation approach, I
> see absolutely no difference in the benchmarks. Creating and
> destroying that extra RV just doesn't seem to matter.

I was able to expose a difference by switching Token to use
KinoSearch::Obj as a base class rather than
KinoSearch::Util::FastObj. With that change, the indexing benchmark
improved by around 5% using direct creation: 0.86 secs vs. 0.90 secs
for 1000 docs. So at least we know it makes a difference if you're
creating huge numbers of objects.

However, with Token using FastObj, we're at 0.59 secs already.

I actually think we can speed indexing up some more by allocating
Tokens from a MemPool belonging to the TokenBatch. That would cut
calls to malloc() way way down.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


henka at cityweb

Feb 22, 2008, 12:22 AM

Post #6 of 9 (1279 views)
Permalink
Re: Orderly global destruction [In reply to]

> On Feb 21, 2008, at 12:18 PM, Marvin Humphrey wrote:
> With that change, the indexing benchmark
> improved by around 5% using direct creation: 0.86 secs vs. 0.90 secs
> for 1000 docs. So at least we know it makes a difference if you're
> creating huge numbers of objects.
>
> However, with Token using FastObj, we're at 0.59 secs already.

I'm presuming here that "...indexing benchmark improved by..." means
exactly that.

>From 0.90 to 0.59... that's what, a ~34% improvement?

Wow. From an end-user perspective (which is what I am), and ignoring
inefficient wrapper code (which is what I have - sorry Marvin, I know my
code irritates the crap out of you ;), that's a dramatic improvement.

*Especially* when talking about indexing a TB or more. Instead of several
weeks to index, this could cut it down to a couple of weeks...

Good show, keep it up!

Regards
Henry


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


nate at verse

Feb 24, 2008, 9:20 PM

Post #7 of 9 (1287 views)
Permalink
Re: Orderly global destruction [In reply to]

On 2/21/08, Marvin Humphrey <marvin [at] rectangular> wrote:
> So, we could commit the attached patch implementing direct creation,
> and the conceptual basis of kino_Obj_create would marginally less
> wacky, but to make that happen we have to use a bunch of undocumented
> Perl internal functions. I appreciate the nudge that it took to get
> me to fully explore this path, but it looks like a dead end.

Great work on figuring out all the details to make that work!

I guess my instinct would be to apply the patch for the slight benefit
in directness and performance, but I agree that the benefit is
minimal. Still, I think it's a benefit. I guess my view is that the
kino_Obj C code is working at just about the same level as the Perl
internals, and thus has the right to do the job directly.

But I can see why you might prefer the better documented approach.

Thanks for looking into it,

Nathan Kurz
nate [at] verse

_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


marvin at rectangular

Feb 26, 2008, 11:50 PM

Post #8 of 9 (1267 views)
Permalink
Re: Orderly global destruction [In reply to]

On Feb 24, 2008, at 9:20 PM, Nathan Kurz wrote:

> I guess my instinct would be to apply the patch for the slight benefit
> in directness and performance, but I agree that the benefit is
> minimal. Still, I think it's a benefit.

OK, fair enough -- I'm swayed. It was tough to explain the "destroy
the object in order to save it" code flow. This is more coherent.

The XS isn't any harder to understand than the other XS functions --
even the documented ones, you wind up having to look up all the time.
GV_AMupdate etc. are now documented in some expanded comments, and the
full rationale is preserved in these mailing list archives.

Committed as r3064.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch


marvin at rectangular

Feb 28, 2008, 12:46 PM

Post #9 of 9 (1267 views)
Permalink
Re: Orderly global destruction [In reply to]

On Feb 24, 2008, at 9:20 PM, Nathan Kurz wrote:
>
>> I guess my instinct would be to apply the patch for the slight
>> benefit
>> in directness and performance, but I agree that the benefit is
>> minimal. Still, I think it's a benefit.


A footnote: direct creation had the side effect of wrecking Doc's
overloading in Perl 5.10.

In 5.8, the SVf_AMAGIC flag is held by the outer ref, and in 5.10,
it's held by the inner object. However, turning it on is accomplished
via the outer ref in both cases.

Usage:

SvAMAGIC_on(outer_ref);

Macro def in 5.8.8 (from sv.c):

#define SvAMAGIC_on(sv) (SvFLAGS(sv) |= SVf_AMAGIC)

Macro def in 5.10.0:

# define SvAMAGIC_on(sv) ({ SV * const kloink = sv; \
assert(SvROK(kloink)); \
SvFLAGS(SvRV(kloink)) |= SVf_AMAGIC; \
})

Under the previous "destroy the object in order to save it" algo, the
act of creating the outer ref in 5.10 turned on the SVf_AMAGIC flag on
the inner object. But with direct creating, that wasn't happening
anymore.

The solution was to modify kino_Doc_to_native, which creates an outer
ref, to always invoke SvAMAGIC_on(sv). Previously, it had been set up
to only turn on for 5.8.x.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


Modified: trunk/perl/xs/KinoSearch/Doc.c
===================================================================
--- trunk/perl/xs/KinoSearch/Doc.c 2008-02-28 14:05:42 UTC (rev 3081)
+++ trunk/perl/xs/KinoSearch/Doc.c 2008-02-28 20:54:47 UTC (rev 3082)
@@ -130,12 +130,8 @@
*/
stash = gv_stashpvn(self->_->class_name, self->_->class_name_len,
true);
Gv_AMupdate(stash);
-
-#if PERL_VERSION < 9
- /* In perl 5.8, the OVERLOAD magic flag is carried in the ref,
while in
- * 5.10, it is carried in the inner object. */
SvAMAGIC_on(perl_obj);
-#endif
+
return perl_obj;
}


_______________________________________________
KinoSearch mailing list
KinoSearch [at] rectangular
http://www.rectangular.com/mailman/listinfo/kinosearch

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.