Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


eliben at gmail

Aug 10, 2013, 5:12 PM

Post #1 of 22 (56 views)
Permalink
Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

Hello,

Recently as part of the effort of untangling the tests of ElementTree and
general code improvements (e.g. http://bugs.python.org/issue15651), I ran
into something strange about PEP 3121-compliant modules. I'll demonstrate
with csv, just as an example.

PEP 3121 mandates this function to look up the module-specific state in the
current sub-interpreter:

PyObject* PyState_FindModule(struct PyModuleDef*);

This appears to make the following assumption: a given sub-interpreter only
imports any C extension *once*. If it happens more than once, the
assumption breaks in troubling ways. In normal code, it should never happen
more than once because of the caching in sys.modules; However, many of our
tests monkey-patch sys.modules (mainly by calling
test.support.import_fresh_module) and hell breaks use. Here's a simple
example:

----
import sys

csv = __import__('csv')
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)

print(csv.list_dialects())
# ==> ['unixpwd', 'excel-tab', 'excel', 'unix']

del sys.modules['csv'] # FUN
del sys.modules['_csv']
some_other_csv = __import__('csv')

print(csv.list_dialects())
# ==> ['excel-tab', 'excel', 'unix']
----

Note how doing some sys.modules acrobatics and re-importing suddenly
changes the internal state of a previously imported module. This happens
because:

1. The first import of 'csv' (which then imports `_csv) creates
module-specific state on the heap and associates it with the current
sub-interpreter. The list of dialects, amongst other things, is in that
state.
2. The 'del's wipe 'csv' and '_csv' from the cache.
3. The second import of 'csv' also creates/initializes a new '_csv' module
because it's not in sys.modules. This *replaces* the per-sub-interpreter
cached version of the module's state with the clean state of a new module

So essentially, while PEP 3121 moves state from C-file globals to
per-module state, the state is still global, and this fact can be exposed
from pure Python code.

The above is a toy example. Here's a more serious case I ran into with ET,
but once again is demonstrated with 'csv' for simplicity:

----

import io
from test.support import import_fresh_module

import csv

csv_other = import_fresh_module('csv', fresh=['_csv', 'csv'])

f = io.StringIO('foo\x00,bar\nbaz,42')
reader = csv.reader(f)

try:
for row in reader:
print(row)
except csv.Error as e:
print('Caught csv.error', e)
except Exception as e:
print('Caught Exception', e)
----

In the above, the reader throws 'csv.Error' (because of the NULL byte) but
the exception clause does not catch it where expected, because it's a
different exception class called `csv.Error`, due to the same problem
demonstrated above (if the seemingly innocent import_fresh_module is
removed, all is good).

Any ideas/suggestion regarding this are welcome. This is quite an esoteric
problem, but I believe it's serious. PEP 3121 is not used much (yet), but
recently there was talk again about committing some of the patches created
for converting Modules/*.c extensions to it during a GSoC project. I
believe that we should understand the implications first. There can be a
number of solutions; including modifying the PEP 3121 implementation
machinery to really create/keep state "per module" and not just "per kind
of module in a single sub-interpreter".

Eli


ncoghlan at gmail

Aug 10, 2013, 5:47 PM

Post #2 of 22 (54 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

In a similar vein, Antoine recently noted that the fact the per-module
state isn't a real PyObject creates a variety of interesting lifecycle
management challenges.

I'm not seeing an easy solution, either, except to automatically skip
reinitialization when the module has already been imported.


eliben at gmail

Aug 10, 2013, 6:06 PM

Post #3 of 22 (54 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

n Sat, Aug 10, 2013 at 5:47 PM, Nick Coghlan <ncoghlan [at] gmail> wrote:

> In a similar vein, Antoine recently noted that the fact the per-module
> state isn't a real PyObject creates a variety of interesting lifecycle
> management challenges.
>
> I'm not seeing an easy solution, either, except to automatically skip
> reinitialization when the module has already been imported.
>
This solution has problems. For example, in the case of ET it would
preclude testing what happens when pyexpat is disabled (remember we were
discussing this...). This is because there would be no real way to create
new instances of such modules (they would all cache themselves in the init
function - similarly to what ET now does in trunk, because otherwise some
of its global-dependent crazy tests fail).

A more radical solution would be to *really* have multiple instances of
state per sub-interpreter. Well, they already exist -- it's
PyState_FindModule which is the problematic one because it only remembers
the last one. But I see that it's only being used by extension modules
themselves, to efficiently find modules they belong to. It feels a bit like
a hack that was made to avoid rewriting lots of code, because in general a
module's objects *can* know which module instance they came from. E.g. it
can be saved as a private field in classes exported by the module.

So a more radical approach would be:

PyState_FindModule can be deprecated, but still exist and be documented to
return the state the *last* module created in this sub-interpreter. stdlib
extension modules that actually use this mechanism can be rewritten to just
remember the module for real, and not rely on PyState_FindModule to fetch
it from a global cache. I don't think this would be hard, and it would make
the good intention of PEP 3121 more real - actual intependent state per
module instance.

Eli


ncoghlan at gmail

Aug 10, 2013, 11:58 PM

Post #4 of 22 (47 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On 10 Aug 2013 21:06, "Eli Bendersky" <eliben [at] gmail> wrote:
>
> n Sat, Aug 10, 2013 at 5:47 PM, Nick Coghlan <ncoghlan [at] gmail> wrote:
>>
>> In a similar vein, Antoine recently noted that the fact the per-module
state isn't a real PyObject creates a variety of interesting lifecycle
management challenges.
>>
>> I'm not seeing an easy solution, either, except to automatically skip
reinitialization when the module has already been imported.
>
> This solution has problems. For example, in the case of ET it would
preclude testing what happens when pyexpat is disabled (remember we were
discussing this...). This is because there would be no real way to create
new instances of such modules (they would all cache themselves in the init
function - similarly to what ET now does in trunk, because otherwise some
of its global-dependent crazy tests fail).

Right, it would still be broken, just in a less horrible way.

>
> A more radical solution would be to *really* have multiple instances of
state per sub-interpreter. Well, they already exist -- it's
PyState_FindModule which is the problematic one because it only remembers
the last one. But I see that it's only being used by extension modules
themselves, to efficiently find modules they belong to. It feels a bit like
a hack that was made to avoid rewriting lots of code, because in general a
module's objects *can* know which module instance they came from. E.g. it
can be saved as a private field in classes exported by the module.
>
> So a more radical approach would be:
>
> PyState_FindModule can be deprecated, but still exist and be documented
to return the state the *last* module created in this sub-interpreter.
stdlib extension modules that actually use this mechanism can be rewritten
to just remember the module for real, and not rely on PyState_FindModule to
fetch it from a global cache. I don't think this would be hard, and it
would make the good intention of PEP 3121 more real - actual intependent
state per module instance.

Sounds promising to me. I suspect handling exported functions will prove to
be tricky, though - they may need to be redesigned to behave more like
"module methods".

>
> Eli
>
>
>
>
>
>
>


solipsis at pitrou

Aug 11, 2013, 2:58 AM

Post #5 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sat, 10 Aug 2013 18:06:02 -0700
Eli Bendersky <eliben [at] gmail> wrote:
> This solution has problems. For example, in the case of ET it would
> preclude testing what happens when pyexpat is disabled (remember we were
> discussing this...). This is because there would be no real way to create
> new instances of such modules (they would all cache themselves in the init
> function - similarly to what ET now does in trunk, because otherwise some
> of its global-dependent crazy tests fail).
>
> A more radical solution would be to *really* have multiple instances of
> state per sub-interpreter. Well, they already exist -- it's
> PyState_FindModule which is the problematic one because it only remembers
> the last one.

I'm not sure I understand your diagnosis. modules_per_index (and
PyState_FindModule) is per-interpreter so we already have a
per-interpreter state here. Something else must be interferring.

Note that module state is just a field attached to the module object
("void *md_state" in PyModuleObject). It's really the extension modules
which are per-interpreter, which is a good thing.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 11, 2013, 3:33 AM

Post #6 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

Hi Eli,

On Sat, 10 Aug 2013 17:12:53 -0700
Eli Bendersky <eliben [at] gmail> wrote:
>
> Note how doing some sys.modules acrobatics and re-importing suddenly
> changes the internal state of a previously imported module. This happens
> because:
>
> 1. The first import of 'csv' (which then imports `_csv) creates
> module-specific state on the heap and associates it with the current
> sub-interpreter. The list of dialects, amongst other things, is in that
> state.
> 2. The 'del's wipe 'csv' and '_csv' from the cache.
> 3. The second import of 'csv' also creates/initializes a new '_csv' module
> because it's not in sys.modules. This *replaces* the per-sub-interpreter
> cached version of the module's state with the clean state of a new module

I would say this is pretty much expected. The converse would be a bug
IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
subinterpreter support:

"Extension module initialization currently has a few deficiencies.
There is no cleanup for modules, the entry point name might give
naming conflicts, the entry functions don't follow the usual calling
convention, and multiple interpreters are not supported well."

Re-initializing state when importing a module anew makes extension
modules more like pure Python modules, which is a good thing.


I think the piece of interpretation you offered yesterday on IRC may be
the right explanation for the ET shenanigans:

"Maybe the bug is that ParseError is kept in per-module state, and
also exported from the module?"

PEP 3121 doesn't offer any guidelines for using its API, and its
example shows PyObject* fields in a module state.

I'm starting to think that it might be a bad use of PEP 3121. PyObjects
can, and therefore should be stored in the extension module dict where
they will participate in normal resource management (i.e. garbage
collection). If they are in the module dict, then they shouldn't be
held alive by the module state too, otherwise the (currently tricky)
lifetime management of extension modules can produce oddities.


So, the PEP 3121 "module state" pointer (the optional opaque void*
thing) should only be used to hold non-PyObjects. PyObjects should go
to the module dict, like they do in normal Python modules. Now, the
reason our PEP 3121 extension modules abuse the module state pointer to
keep PyObjects is two-fold:

1. it's surprisingly easier (it's actually a one-liner if you don't
handle errors - a rather bad thing, but all PEP 3121 extension modules
currently don't handle a NULL return from PyState_FindModule...)

2. it protects the module from any module dict monkeypatching. It's not
important if you are using a generic API on the PyObject, but it is if
the PyObject is really a custom C type with well-defined fields.

Those two issues can be addressed if we offer an API for it. How about:

PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
const char *name,
PyObject *restrict_type)

*def* is a pointer to the module definition.
*name* is the attribute to look up on the module dict.
*restrict_type*, if non-NULL, is a type object the looked up attribute
must be an instance of.

Lookup an attribute in the current interpreter's extension module
instance for the module definition *def*.
Returns a *new* reference (!), or NULL if an error occurred.
An error can be:
- no such module exists for the current interpreter (ImportError?
RuntimeError? SystemError?)
- no such attribute exists in the module dict (AttributeError)
- the attribute doesn't conform to *restrict_type* (TypeError)

So code can be written like:

PyObject *dialects = PyState_GetModuleAttr(
&_csvmodule, "dialects", &PyDict_Type);
if (dialects == NULL)
return NULL;

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 11, 2013, 3:37 AM

Post #7 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, 11 Aug 2013 12:33:16 +0200
Antoine Pitrou <solipsis [at] pitrou> wrote:
> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects. PyObjects should go
> to the module dict, like they do in normal Python modules. Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
>
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
>
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.

I overlooked a third reason which is performance. But, those lookups
are generally not performance-critical.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Aug 11, 2013, 4:04 AM

Post #8 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On 11 August 2013 06:33, Antoine Pitrou <solipsis [at] pitrou> wrote:
> So code can be written like:
>
> PyObject *dialects = PyState_GetModuleAttr(
> &_csvmodule, "dialects", &PyDict_Type);
> if (dialects == NULL)
> return NULL;

This sounds like a good near term solution to me.

Longer term, I think there may be value in providing a richer
extension module initialisation API that lets extension modules be
represented as module *subclasses* in sys.modules, since that would
get us to a position where it is possible to have *multiple* instances
of an extension module in the *same* subinterpreter by holding on to
external references after removing them from sys.modules (which is
what we do in the test suite for pure Python modules). Enabling that
also ties into the question of passing info to the extension module
about how it is being loaded (e.g. as a submodule of a larger
package), as well as allowing extension modules to cleanly handle
reload(). However, that's dependent on the ModuleSpec idea we're
currently thrashing out on import-sig (and should be able to bring to
python-dev soon), and I think getting that integrated at all will be
ambitious enough for 3.4 - using it to improve extension module
handling would then be a project for 3.5.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 11, 2013, 4:48 AM

Post #9 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, 11 Aug 2013 07:04:40 -0400
Nick Coghlan <ncoghlan [at] gmail> wrote:
> On 11 August 2013 06:33, Antoine Pitrou <solipsis [at] pitrou> wrote:
> > So code can be written like:
> >
> > PyObject *dialects = PyState_GetModuleAttr(
> > &_csvmodule, "dialects", &PyDict_Type);
> > if (dialects == NULL)
> > return NULL;
>
> This sounds like a good near term solution to me.
>
> Longer term, I think there may be value in providing a richer
> extension module initialisation API that lets extension modules be
> represented as module *subclasses* in sys.modules, since that would
> get us to a position where it is possible to have *multiple* instances
> of an extension module in the *same* subinterpreter by holding on to
> external references after removing them from sys.modules (which is
> what we do in the test suite for pure Python modules).

Either that, or add a "struct PyMemberDef *m_members" field to
PyModuleDef, to enable looking up stuff in the m_state using regular
attribute lookup.

Unfortunately, doing so would probably break the ABI. Also, allowing
for module subclasses is probably more flexible in the long term. We
just need to devise a convenience API for that (perhaps by allowing to
create both the subclass *and* instantiate it in a single call).

> However, that's dependent on the ModuleSpec idea we're
> currently thrashing out on import-sig (and should be able to bring to
> python-dev soon), and I think getting that integrated at all will be
> ambitious enough for 3.4 - using it to improve extension module
> handling would then be a project for 3.5.

Sounds reasonable.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 11, 2013, 5:01 AM

Post #10 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

Antoine Pitrou, 11.08.2013 12:33:
> On Sat, 10 Aug 2013 17:12:53 -0700 Eli Bendersky wrote:
>> Note how doing some sys.modules acrobatics and re-importing suddenly
>> changes the internal state of a previously imported module. This happens
>> because:
>>
>> 1. The first import of 'csv' (which then imports `_csv) creates
>> module-specific state on the heap and associates it with the current
>> sub-interpreter. The list of dialects, amongst other things, is in that
>> state.
>> 2. The 'del's wipe 'csv' and '_csv' from the cache.
>> 3. The second import of 'csv' also creates/initializes a new '_csv' module
>> because it's not in sys.modules. This *replaces* the per-sub-interpreter
>> cached version of the module's state with the clean state of a new module
>
> I would say this is pretty much expected. The converse would be a bug
> IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
> subinterpreter support:
>
> "Extension module initialization currently has a few deficiencies.
> There is no cleanup for modules, the entry point name might give
> naming conflicts, the entry functions don't follow the usual calling
> convention, and multiple interpreters are not supported well."
>
> Re-initializing state when importing a module anew makes extension
> modules more like pure Python modules, which is a good thing.

It's the same as defining a type or function in a loop, or inside of a
closure. The whole point of reimporting is that you get a new module.

However, it should not change the content of the old module, just create a
new one.


> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects. PyObjects should go
> to the module dict, like they do in normal Python modules. Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
>
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
>
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.

Yes, it's a major safety problem if you can crash the interpreter by
assigning None to a module attribute.


> Those two issues can be addressed if we offer an API for it. How about:
>
> PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
> const char *name,
> PyObject *restrict_type)
>
> *def* is a pointer to the module definition.
> *name* is the attribute to look up on the module dict.
> *restrict_type*, if non-NULL, is a type object the looked up attribute
> must be an instance of.
>
> Lookup an attribute in the current interpreter's extension module
> instance for the module definition *def*.
> Returns a *new* reference (!), or NULL if an error occurred.
> An error can be:
> - no such module exists for the current interpreter (ImportError?
> RuntimeError? SystemError?)
> - no such attribute exists in the module dict (AttributeError)
> - the attribute doesn't conform to *restrict_type* (TypeError)
>
> So code can be written like:
>
> PyObject *dialects = PyState_GetModuleAttr(
> &_csvmodule, "dialects", &PyDict_Type);
> if (dialects == NULL)
> return NULL;

At least for Cython it's unlikely that it'll ever use this. It's just way
too much overhead for looking up a global name. Plus, not all global names
are visible in the module dict, e.g. it's common to have types that are
only used internally to keep some kind of state. Those would still have to
live in the internal per-module state.

ISTM that this is not a proper solution for the problem, because it only
covers the simple use cases. Rather, I'd prefer making the handling of
names in the per-module instance state safer.

Essentially, with PEP 3121, modules are just one form of an extension type.
So what's wrong with giving them normal extension type fields? Functions
are essentially methods of the module, global types are just inner classes.
Both should keep the module alive (on the one side) and be tied to it (on
the other side). If you reimport a module, you'd get a new set of
everything, and the old module would just linger in the background until
the last reference to it dies.

In other words, I don't see why modules should be any special.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 11, 2013, 5:16 AM

Post #11 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

Antoine Pitrou, 11.08.2013 13:48:
> On Sun, 11 Aug 2013 07:04:40 -0400 Nick Coghlan wrote:
>> On 11 August 2013 06:33, Antoine Pitrou wrote:
>>> So code can be written like:
>>>
>>> PyObject *dialects = PyState_GetModuleAttr(
>>> &_csvmodule, "dialects", &PyDict_Type);
>>> if (dialects == NULL)
>>> return NULL;
>>
>> This sounds like a good near term solution to me.
>>
>> Longer term, I think there may be value in providing a richer
>> extension module initialisation API that lets extension modules be
>> represented as module *subclasses* in sys.modules, since that would
>> get us to a position where it is possible to have *multiple* instances
>> of an extension module in the *same* subinterpreter by holding on to
>> external references after removing them from sys.modules (which is
>> what we do in the test suite for pure Python modules).
>
> Either that, or add a "struct PyMemberDef *m_members" field to
> PyModuleDef, to enable looking up stuff in the m_state using regular
> attribute lookup.

Hmm, yes, it's unfortunate that the module state isn't just a public part
of the object struct.


> Unfortunately, doing so would probably break the ABI. Also, allowing
> for module subclasses is probably more flexible in the long term.

+1000


> We
> just need to devise a convenience API for that (perhaps by allowing to
> create both the subclass *and* instantiate it in a single call).

Right. This conflicts somewhat with the simplified module creation. If the
module loader passed the readily instantiated module instance into the
module init function, then module subtypes don't fit into this scheme anymore.

One more reason why modules shouldn't be special. Essentially, we need an
m_new() and m_init() for them. And the lifetime of the module type would
have to be linked to the (sub-)interpreter, whereas the lifetime of the
module instance would be determined by whoever uses the module and/or
decides to unload/reload it.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 11, 2013, 5:32 AM

Post #12 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, 11 Aug 2013 14:16:10 +0200
Stefan Behnel <stefan_ml [at] behnel> wrote:
>
> > We
> > just need to devise a convenience API for that (perhaps by allowing to
> > create both the subclass *and* instantiate it in a single call).
>
> Right. This conflicts somewhat with the simplified module creation. If the
> module loader passed the readily instantiated module instance into the
> module init function, then module subtypes don't fit into this scheme anymore.
>
> One more reason why modules shouldn't be special. Essentially, we need an
> m_new() and m_init() for them. And the lifetime of the module type would
> have to be linked to the (sub-)interpreter, whereas the lifetime of the
> module instance would be determined by whoever uses the module and/or
> decides to unload/reload it.

It may be simpler if the only strong reference to the module type is in
the module instance itself. Successive module initializations would get
different types, but that shouldn't be a problem in practice.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 11, 2013, 5:48 AM

Post #13 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

Antoine Pitrou, 11.08.2013 14:32:
> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
>>> We
>>> just need to devise a convenience API for that (perhaps by allowing to
>>> create both the subclass *and* instantiate it in a single call).
>>
>> Right. This conflicts somewhat with the simplified module creation. If the
>> module loader passed the readily instantiated module instance into the
>> module init function, then module subtypes don't fit into this scheme anymore.
>>
>> One more reason why modules shouldn't be special. Essentially, we need an
>> m_new() and m_init() for them. And the lifetime of the module type would
>> have to be linked to the (sub-)interpreter, whereas the lifetime of the
>> module instance would be determined by whoever uses the module and/or
>> decides to unload/reload it.
>
> It may be simpler if the only strong reference to the module type is in
> the module instance itself. Successive module initializations would get
> different types, but that shouldn't be a problem in practice.

Agreed. Then the module instance would just be the only instance of a new
type that gets created each time the module initialised. Even if module
subtypes were to become common place once they are generally supported
(because they're the easiest way to store per-module state efficiently),
module reinitialisation should be rare enough to just buy them with a new
type for each. The size of the complete module state+dict will almost
always outweigh the size of the one additional type by factors.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 11, 2013, 5:53 AM

Post #14 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

Stefan Behnel, 11.08.2013 14:48:
> Antoine Pitrou, 11.08.2013 14:32:
>> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
>>>> We
>>>> just need to devise a convenience API for that (perhaps by allowing to
>>>> create both the subclass *and* instantiate it in a single call).
>>>
>>> Right. This conflicts somewhat with the simplified module creation. If the
>>> module loader passed the readily instantiated module instance into the
>>> module init function, then module subtypes don't fit into this scheme anymore.
>>>
>>> One more reason why modules shouldn't be special. Essentially, we need an
>>> m_new() and m_init() for them. And the lifetime of the module type would
>>> have to be linked to the (sub-)interpreter, whereas the lifetime of the
>>> module instance would be determined by whoever uses the module and/or
>>> decides to unload/reload it.
>>
>> It may be simpler if the only strong reference to the module type is in
>> the module instance itself. Successive module initializations would get
>> different types, but that shouldn't be a problem in practice.
>
> Agreed. Then the module instance would just be the only instance of a new
> type that gets created each time the module initialised. Even if module
> subtypes were to become common place once they are generally supported
> (because they're the easiest way to store per-module state efficiently),
> module reinitialisation should be rare enough to just buy them with a new
> type for each. The size of the complete module state+dict will almost
> always outweigh the size of the one additional type by factors.

BTW, this already suggests a simple module initialisation interface. The
extension module would expose a function that returns a module type, and
the loader/importer would then simply instantiate that. Nothing else is needed.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 11, 2013, 5:58 AM

Post #15 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

Stefan Behnel, 11.08.2013 14:53:
> Stefan Behnel, 11.08.2013 14:48:
>> Antoine Pitrou, 11.08.2013 14:32:
>>> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
>>>>> We
>>>>> just need to devise a convenience API for that (perhaps by allowing to
>>>>> create both the subclass *and* instantiate it in a single call).
>>>>
>>>> Right. This conflicts somewhat with the simplified module creation. If the
>>>> module loader passed the readily instantiated module instance into the
>>>> module init function, then module subtypes don't fit into this scheme anymore.
>>>>
>>>> One more reason why modules shouldn't be special. Essentially, we need an
>>>> m_new() and m_init() for them. And the lifetime of the module type would
>>>> have to be linked to the (sub-)interpreter, whereas the lifetime of the
>>>> module instance would be determined by whoever uses the module and/or
>>>> decides to unload/reload it.
>>>
>>> It may be simpler if the only strong reference to the module type is in
>>> the module instance itself. Successive module initializations would get
>>> different types, but that shouldn't be a problem in practice.
>>
>> Agreed. Then the module instance would just be the only instance of a new
>> type that gets created each time the module initialised. Even if module
>> subtypes were to become common place once they are generally supported
>> (because they're the easiest way to store per-module state efficiently),
>> module reinitialisation should be rare enough to just buy them with a new
>> type for each. The size of the complete module state+dict will almost
>> always outweigh the size of the one additional type by factors.
>
> BTW, this already suggests a simple module initialisation interface. The
> extension module would expose a function that returns a module type, and
> the loader/importer would then simply instantiate that. Nothing else is needed.

Actually, strike the word "module type" and replace it with "type". Is
there really a reason why Python needs a module type at all? I mean, you
can stick arbitrary objects in sys.modules, so why not allow arbitrary
types to be returned by the module creation function?

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


eliben at gmail

Aug 11, 2013, 6:03 AM

Post #16 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, Aug 11, 2013 at 2:58 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:

> On Sat, 10 Aug 2013 18:06:02 -0700
> Eli Bendersky <eliben [at] gmail> wrote:
> > This solution has problems. For example, in the case of ET it would
> > preclude testing what happens when pyexpat is disabled (remember we were
> > discussing this...). This is because there would be no real way to create
> > new instances of such modules (they would all cache themselves in the
> init
> > function - similarly to what ET now does in trunk, because otherwise some
> > of its global-dependent crazy tests fail).
> >
> > A more radical solution would be to *really* have multiple instances of
> > state per sub-interpreter. Well, they already exist -- it's
> > PyState_FindModule which is the problematic one because it only remembers
> > the last one.
>
> I'm not sure I understand your diagnosis. modules_per_index (and
> PyState_FindModule) is per-interpreter so we already have a
> per-interpreter state here. Something else must be interferring.
>
>
Yes, it's per interpreter, but only one per interpreter is remembered in
state->modules_by_index. What I'm trying to say is that currently two
different instances of PyModuleObject *within the same interpterer* share
the state if they get to it through PyState_FindModule, because they share
the same PyModuleDef, and stat->modules_by_index keeps only one module per
PyModuleDef.


> Note that module state is just a field attached to the module object
> ("void *md_state" in PyModuleObject). It's really the extension modules
> which are per-interpreter, which is a good thing.
>


ncoghlan at gmail

Aug 11, 2013, 6:19 AM

Post #17 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On 11 Aug 2013 09:02, "Stefan Behnel" <stefan_ml [at] behnel> wrote:
>
> Stefan Behnel, 11.08.2013 14:53:
> > Stefan Behnel, 11.08.2013 14:48:
> >> Antoine Pitrou, 11.08.2013 14:32:
> >>> On Sun, 11 Aug 2013 14:16:10 +0200 Stefan Behnel wrote:
> >>>>> We
> >>>>> just need to devise a convenience API for that (perhaps by allowing
to
> >>>>> create both the subclass *and* instantiate it in a single call).
> >>>>
> >>>> Right. This conflicts somewhat with the simplified module creation.
If the
> >>>> module loader passed the readily instantiated module instance into
the
> >>>> module init function, then module subtypes don't fit into this
scheme anymore.
> >>>>
> >>>> One more reason why modules shouldn't be special. Essentially, we
need an
> >>>> m_new() and m_init() for them. And the lifetime of the module type
would
> >>>> have to be linked to the (sub-)interpreter, whereas the lifetime of
the
> >>>> module instance would be determined by whoever uses the module and/or
> >>>> decides to unload/reload it.
> >>>
> >>> It may be simpler if the only strong reference to the module type is
in
> >>> the module instance itself. Successive module initializations would
get
> >>> different types, but that shouldn't be a problem in practice.
> >>
> >> Agreed. Then the module instance would just be the only instance of a
new
> >> type that gets created each time the module initialised. Even if module
> >> subtypes were to become common place once they are generally supported
> >> (because they're the easiest way to store per-module state
efficiently),
> >> module reinitialisation should be rare enough to just buy them with a
new
> >> type for each. The size of the complete module state+dict will almost
> >> always outweigh the size of the one additional type by factors.
> >
> > BTW, this already suggests a simple module initialisation interface. The
> > extension module would expose a function that returns a module type, and
> > the loader/importer would then simply instantiate that. Nothing else is
needed.
>
> Actually, strike the word "module type" and replace it with "type". Is
> there really a reason why Python needs a module type at all? I mean, you
> can stick arbitrary objects in sys.modules, so why not allow arbitrary
> types to be returned by the module creation function?

That's exactly what I have in mind, but the way extension module imports
currently work means we can't easily do it just yet. Fortunately, importlib
means we now have some hope of fixing that :)

Cheers,
Nick.
>
> Stefan
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com


eliben at gmail

Aug 11, 2013, 6:26 AM

Post #18 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:

>
> Hi Eli,
>
> On Sat, 10 Aug 2013 17:12:53 -0700
> Eli Bendersky <eliben [at] gmail> wrote:
> >
> > Note how doing some sys.modules acrobatics and re-importing suddenly
> > changes the internal state of a previously imported module. This happens
> > because:
> >
> > 1. The first import of 'csv' (which then imports `_csv) creates
> > module-specific state on the heap and associates it with the current
> > sub-interpreter. The list of dialects, amongst other things, is in that
> > state.
> > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > 3. The second import of 'csv' also creates/initializes a new '_csv'
> module
> > because it's not in sys.modules. This *replaces* the per-sub-interpreter
> > cached version of the module's state with the clean state of a new module
>
> I would say this is pretty much expected.


I'm struggling to see how it's expected. The two imported csv modules are
different (i.e. different id() of members), and yet some state is shared
between them. I think the root reason for it is that "PyModuleDev
_csvmodule" is uniqued per interpreter, not per module instance.

Even if dialects were not a PyObject, this would still be problematic,
don't you think? And note that here, unlike the ET.ParseError case, I don't
think the problem is exporting internal per-module state as a module
attribute. The following two are un-reconcilable, IMHO:

1. Wanting to have two instances of the same module in the same interpterer.
2. Using a global shared PyModuleDef between all instances of the same
module in the same interpterer.



> The converse would be a bug
> IMO (but perhaps Martin disagrees). PEP 3121's stated goal is not only
> subinterpreter support:
>
> "Extension module initialization currently has a few deficiencies.
> There is no cleanup for modules, the entry point name might give
> naming conflicts, the entry functions don't follow the usual calling
> convention, and multiple interpreters are not supported well."
>
> Re-initializing state when importing a module anew makes extension
> modules more like pure Python modules, which is a good thing.
>
>
> I think the piece of interpretation you offered yesterday on IRC may be
> the right explanation for the ET shenanigans:
>
> "Maybe the bug is that ParseError is kept in per-module state, and
> also exported from the module?"
>
> PEP 3121 doesn't offer any guidelines for using its API, and its
> example shows PyObject* fields in a module state.
>
> I'm starting to think that it might be a bad use of PEP 3121. PyObjects
> can, and therefore should be stored in the extension module dict where
> they will participate in normal resource management (i.e. garbage
> collection). If they are in the module dict, then they shouldn't be
> held alive by the module state too, otherwise the (currently tricky)
> lifetime management of extension modules can produce oddities.
>
>
> So, the PEP 3121 "module state" pointer (the optional opaque void*
> thing) should only be used to hold non-PyObjects. PyObjects should go
> to the module dict, like they do in normal Python modules. Now, the
> reason our PEP 3121 extension modules abuse the module state pointer to
> keep PyObjects is two-fold:
>
> 1. it's surprisingly easier (it's actually a one-liner if you don't
> handle errors - a rather bad thing, but all PEP 3121 extension modules
> currently don't handle a NULL return from PyState_FindModule...)
>
> 2. it protects the module from any module dict monkeypatching. It's not
> important if you are using a generic API on the PyObject, but it is if
> the PyObject is really a custom C type with well-defined fields.
>
> Those two issues can be addressed if we offer an API for it. How about:
>
> PyObject *PyState_GetModuleAttr(struct PyModuleDef *def,
> const char *name,
> PyObject *restrict_type)
>
> *def* is a pointer to the module definition.
> *name* is the attribute to look up on the module dict.
> *restrict_type*, if non-NULL, is a type object the looked up attribute
> must be an instance of.
>
> Lookup an attribute in the current interpreter's extension module
> instance for the module definition *def*.
> Returns a *new* reference (!), or NULL if an error occurred.
> An error can be:
> - no such module exists for the current interpreter (ImportError?
> RuntimeError? SystemError?)
> - no such attribute exists in the module dict (AttributeError)
> - the attribute doesn't conform to *restrict_type* (TypeError)
>
> So code can be written like:
>
> PyObject *dialects = PyState_GetModuleAttr(
> &_csvmodule, "dialects", &PyDict_Type);
> if (dialects == NULL)
> return NULL;
>


solipsis at pitrou

Aug 11, 2013, 6:40 AM

Post #19 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, 11 Aug 2013 06:26:55 -0700
Eli Bendersky <eliben [at] gmail> wrote:
> On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:
>
> >
> > Hi Eli,
> >
> > On Sat, 10 Aug 2013 17:12:53 -0700
> > Eli Bendersky <eliben [at] gmail> wrote:
> > >
> > > Note how doing some sys.modules acrobatics and re-importing suddenly
> > > changes the internal state of a previously imported module. This happens
> > > because:
> > >
> > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > module-specific state on the heap and associates it with the current
> > > sub-interpreter. The list of dialects, amongst other things, is in that
> > > state.
> > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > 3. The second import of 'csv' also creates/initializes a new '_csv'
> > module
> > > because it's not in sys.modules. This *replaces* the per-sub-interpreter
> > > cached version of the module's state with the clean state of a new module
> >
> > I would say this is pretty much expected.
>
> I'm struggling to see how it's expected. The two imported csv modules are
> different (i.e. different id() of members), and yet some state is shared
> between them.

There are two csv modules, but there are not two _csv modules.
Extension modules are currently immortal until the end of the
interpreter:

>>> csv = __import__('csv')
>>> wcsv = weakref.ref(csv)
>>> w_csv = weakref.ref(sys.modules['_csv'])
>>> del sys.modules['csv']
>>> del sys.modules['_csv']
>>> del csv
>>> gc.collect()
50
>>> wcsv()
>>> w_csv()
<module '_csv' from
'/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_csv.cpython-34dm.so'>


So, "sharing" a state is pretty much expected, since you are
re-initializating an existing module.
(but the module does get re-initialized, which is the point of PEP 3121)

> 1. Wanting to have two instances of the same module in the same interpterer.

It could be nice, but really, that's not a common use case. And it's
impossible for extension modules, currently.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


eliben at gmail

Aug 11, 2013, 8:49 AM

Post #20 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, Aug 11, 2013 at 6:40 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:

> On Sun, 11 Aug 2013 06:26:55 -0700
> Eli Bendersky <eliben [at] gmail> wrote:
> > On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou <solipsis [at] pitrou>
> wrote:
> >
> > >
> > > Hi Eli,
> > >
> > > On Sat, 10 Aug 2013 17:12:53 -0700
> > > Eli Bendersky <eliben [at] gmail> wrote:
> > > >
> > > > Note how doing some sys.modules acrobatics and re-importing suddenly
> > > > changes the internal state of a previously imported module. This
> happens
> > > > because:
> > > >
> > > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > > module-specific state on the heap and associates it with the current
> > > > sub-interpreter. The list of dialects, amongst other things, is in
> that
> > > > state.
> > > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > > 3. The second import of 'csv' also creates/initializes a new '_csv'
> > > module
> > > > because it's not in sys.modules. This *replaces* the
> per-sub-interpreter
> > > > cached version of the module's state with the clean state of a new
> module
> > >
> > > I would say this is pretty much expected.
> >
> > I'm struggling to see how it's expected. The two imported csv modules are
> > different (i.e. different id() of members), and yet some state is shared
> > between them.
>
> There are two csv modules, but there are not two _csv modules.
> Extension modules are currently immortal until the end of the
> interpreter:
>
> >>> csv = __import__('csv')
> >>> wcsv = weakref.ref(csv)
> >>> w_csv = weakref.ref(sys.modules['_csv'])
> >>> del sys.modules['csv']
> >>> del sys.modules['_csv']
> >>> del csv
> >>> gc.collect()
> 50
> >>> wcsv()
> >>> w_csv()
> <module '_csv' from
> '/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_
> csv.cpython-34dm.so'>
>
>
> So, "sharing" a state is pretty much expected, since you are
> re-initializating an existing module.
> (but the module does get re-initialized, which is the point of PEP 3121)
>

Yes, you're right - this is an oversight on my behalf. Indeed, the
extensions dict in import.c keeps it alive once loaded, and only ever gets
cleaned up in Py_Finalize.

Eli


solipsis at pitrou

Aug 11, 2013, 8:56 AM

Post #21 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, 11 Aug 2013 08:49:56 -0700
Eli Bendersky <eliben [at] gmail> wrote:

> On Sun, Aug 11, 2013 at 6:40 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:
>
> > On Sun, 11 Aug 2013 06:26:55 -0700
> > Eli Bendersky <eliben [at] gmail> wrote:
> > > On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou <solipsis [at] pitrou>
> > wrote:
> > >
> > > >
> > > > Hi Eli,
> > > >
> > > > On Sat, 10 Aug 2013 17:12:53 -0700
> > > > Eli Bendersky <eliben [at] gmail> wrote:
> > > > >
> > > > > Note how doing some sys.modules acrobatics and re-importing suddenly
> > > > > changes the internal state of a previously imported module. This
> > happens
> > > > > because:
> > > > >
> > > > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > > > module-specific state on the heap and associates it with the current
> > > > > sub-interpreter. The list of dialects, amongst other things, is in
> > that
> > > > > state.
> > > > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > > > 3. The second import of 'csv' also creates/initializes a new '_csv'
> > > > module
> > > > > because it's not in sys.modules. This *replaces* the
> > per-sub-interpreter
> > > > > cached version of the module's state with the clean state of a new
> > module
> > > >
> > > > I would say this is pretty much expected.
> > >
> > > I'm struggling to see how it's expected. The two imported csv modules are
> > > different (i.e. different id() of members), and yet some state is shared
> > > between them.
> >
> > There are two csv modules, but there are not two _csv modules.
> > Extension modules are currently immortal until the end of the
> > interpreter:
> >
> > >>> csv = __import__('csv')
> > >>> wcsv = weakref.ref(csv)
> > >>> w_csv = weakref.ref(sys.modules['_csv'])
> > >>> del sys.modules['csv']
> > >>> del sys.modules['_csv']
> > >>> del csv
> > >>> gc.collect()
> > 50
> > >>> wcsv()
> > >>> w_csv()
> > <module '_csv' from
> > '/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_
> > csv.cpython-34dm.so'>
> >
> >
> > So, "sharing" a state is pretty much expected, since you are
> > re-initializating an existing module.
> > (but the module does get re-initialized, which is the point of PEP 3121)
> >
>
> Yes, you're right - this is an oversight on my behalf. Indeed, the
> extensions dict in import.c keeps it alive once loaded, and only ever gets
> cleaned up in Py_Finalize.

It's not the extensions dict in import.c, it's modules_by_index in the
interpreter state.
(otherwise it wouldn't be per-interpreter)

The extensions dict holds the module *definition* (the struct
PyModuleDef), not the module instance.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


eliben at gmail

Aug 11, 2013, 9:07 AM

Post #22 of 22 (45 views)
Permalink
Re: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others) [In reply to]

On Sun, Aug 11, 2013 at 8:56 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:

> On Sun, 11 Aug 2013 08:49:56 -0700
> Eli Bendersky <eliben [at] gmail> wrote:
>
> > On Sun, Aug 11, 2013 at 6:40 AM, Antoine Pitrou <solipsis [at] pitrou>
> wrote:
> >
> > > On Sun, 11 Aug 2013 06:26:55 -0700
> > > Eli Bendersky <eliben [at] gmail> wrote:
> > > > On Sun, Aug 11, 2013 at 3:33 AM, Antoine Pitrou <solipsis [at] pitrou
> >
> > > wrote:
> > > >
> > > > >
> > > > > Hi Eli,
> > > > >
> > > > > On Sat, 10 Aug 2013 17:12:53 -0700
> > > > > Eli Bendersky <eliben [at] gmail> wrote:
> > > > > >
> > > > > > Note how doing some sys.modules acrobatics and re-importing
> suddenly
> > > > > > changes the internal state of a previously imported module. This
> > > happens
> > > > > > because:
> > > > > >
> > > > > > 1. The first import of 'csv' (which then imports `_csv) creates
> > > > > > module-specific state on the heap and associates it with the
> current
> > > > > > sub-interpreter. The list of dialects, amongst other things, is
> in
> > > that
> > > > > > state.
> > > > > > 2. The 'del's wipe 'csv' and '_csv' from the cache.
> > > > > > 3. The second import of 'csv' also creates/initializes a new
> '_csv'
> > > > > module
> > > > > > because it's not in sys.modules. This *replaces* the
> > > per-sub-interpreter
> > > > > > cached version of the module's state with the clean state of a
> new
> > > module
> > > > >
> > > > > I would say this is pretty much expected.
> > > >
> > > > I'm struggling to see how it's expected. The two imported csv
> modules are
> > > > different (i.e. different id() of members), and yet some state is
> shared
> > > > between them.
> > >
> > > There are two csv modules, but there are not two _csv modules.
> > > Extension modules are currently immortal until the end of the
> > > interpreter:
> > >
> > > >>> csv = __import__('csv')
> > > >>> wcsv = weakref.ref(csv)
> > > >>> w_csv = weakref.ref(sys.modules['_csv'])
> > > >>> del sys.modules['csv']
> > > >>> del sys.modules['_csv']
> > > >>> del csv
> > > >>> gc.collect()
> > > 50
> > > >>> wcsv()
> > > >>> w_csv()
> > > <module '_csv' from
> > > '/home/antoine/cpython/default/build/lib.linux-x86_64-3.4-pydebug/_
> > > csv.cpython-34dm.so'>
> > >
> > >
> > > So, "sharing" a state is pretty much expected, since you are
> > > re-initializating an existing module.
> > > (but the module does get re-initialized, which is the point of PEP
> 3121)
> > >
> >
> > Yes, you're right - this is an oversight on my behalf. Indeed, the
> > extensions dict in import.c keeps it alive once loaded, and only ever
> gets
> > cleaned up in Py_Finalize.
>
> It's not the extensions dict in import.c, it's modules_by_index in the
> interpreter state.
> (otherwise it wouldn't be per-interpreter)
>
> The extensions dict holds the module *definition* (the struct
> PyModuleDef), not the module instance.
>

Thanks for the clarification.

Eli

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.