Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Make extension module initialisation more like Python module initialisation

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


stefan_ml at behnel

Nov 8, 2012, 4:47 AM

Post #1 of 14 (242 views)
Permalink
Make extension module initialisation more like Python module initialisation

Hi,

I suspect that this will be put into a proper PEP at some point, but I'd
like to bring this up for discussion first. This came out of issues 13429
and 16392.

http://bugs.python.org/issue13429

http://bugs.python.org/issue16392

Stefan


The problem
===========

Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed. For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialisation. This means that it knows neither the __file__ it is being
loaded from nor its package (i.e. its FQMN). This hinders relative imports
and resource loading. In Py3, it's also not being added to sys.modules,
which means that a (potentially transitive) re-import of the module will
really try to reimport it and thus run into an infinite loop when it
executes the module init function again. And without the FQMN, it's not
trivial to correctly add the module to sys.modules either.

We specifically run into this for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of a FQMN and correct
file path hinders the compilation of __init__.py modules, i.e. packages,
especially when relative imports are being used at module init time.

The proposal
============

I propose to split the extension module initialisation into two steps in
Python 3.4, in a backwards compatible way.

Step 1: The current module init function can be reduced to just creating
the module instance and returning it (and potentially doing some simple C
level setup). Optionally, after creating the module (and this is the new
part), the module init code can register a C callback function that will be
called after setting up the module.

Step 2: The shared library importer receives the module instance from the
module init function, adds __file__, __path__, __package__ and friends to
the module dict, and then checks for the callback. If non-NULL, it calls it
to continue the module initialisation by user code.

The callback
============

The callback is defined as follows::

int (*PyModule_init_callback)(PyObject* the_module,
PyModuleInitContext* context)

"PyModuleInitContext" is a struct that is meant mostly for making the
callback more future proof by allowing additional parameters to be passed
in. For now, I can see a use case for the following fields::

struct PyModuleInitContext {
char* module_name;
char* qualified_module_name;
}

Both names are encoded in UTF-8. As for the file path, I consider it best
to retrieve it from the module's __file__ attribute as a Python string
object to reduce filename encoding problems.

Note that this struct argument is not strictly required, but given that
this proposal would have been much simpler if the module init function had
accepted such an argument in the first place, I consider it a good idea not
to let this chance pass by again.

The registration of the callback uses a new C-API function:

int PyModule_SetInitFunction(PyObject* module,
PyModule_init_callback callback)

The function name uses "Set" instead of "Register" to make it clear that
there is only one such function per module.

An alternative would be a new module creation function "PyModule_Create3()"
that takes the callback as third argument, in addition to what
"PyModule_Create2()" accepts. This would require users to explicitly pass
in the (second) version argument, which might be considered only a minor issue.

Implementation
==============

The implementation requires local changes to the extension module importer
and a new C-API function. In order to store the callback, it should use a
new field in the module object struct.

Open questions
==============

It is not clear how extensions should be handled that register more than
one module in their module init function, e.g. compiled packages. One
possibility would be to leave the setup to the user, who would have to know
all FQMNs anyway in this case, although not the import file path.
Alternatively, the import machinery could use a stack to remember for which
modules a callback was registered during the last init function call, set
up all of them and then call their callbacks. It's not clear if this meets
the intention of the user.

Alternatives
============

1) It would be possible to make extension modules optionally export another
symbol, e.g. "PyInit2_modulename", that the shared library loader would
call in addition to the required function "PyInit_modulename". This would
remove the need for a new API that registers the above callback. The
drawback is that it also makes it easier to write broken code because a
Python version or implementation that does not support this second symbol
would simply not call it, without error. The new C-API function would let
the build fail instead if it is not supported.

2) The callback could be made available as a Python function in the module
dict, thus also removing the need for an explicit registration API.
However, this approach would add overhead to both sides, the importer code
and the user provided module init code, as it would require additional
dictionary handling and the implementation of a one-time Python function in
user code. It would also suffer from the problem that missing support in
the runtime would pass silently.

3) The callback could be registered statically in the PyModuleDef struct by
adding a new field. This is not trivial to do in a backwards compatible way
because the struct would grow longer without explicit initialisation by
existing user code. Extending PyModuleDef_HEAD_INIT might be possible but
would still break at least binary compatibility.

4) Pass a new context argument into the module init function that contains
all information necessary to properly and completely set up the module at
creation time. This would provide a much simpler and cleaner solution than
the proposed solution. However, it will not be possible before Python 4 as
it breaks backwards compatibility with all existing extension modules at
both the source and binary level.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mal at egenix

Nov 8, 2012, 5:01 AM

Post #2 of 14 (238 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

On 08.11.2012 13:47, Stefan Behnel wrote:
> Hi,
>
> I suspect that this will be put into a proper PEP at some point, but I'd
> like to bring this up for discussion first. This came out of issues 13429
> and 16392.
>
> http://bugs.python.org/issue13429
>
> http://bugs.python.org/issue16392
>
> Stefan
>
>
> The problem
> ===========
>
> Python modules and extension modules are not being set up in the same way.
> For Python modules, the module is created and set up first, then the module
> code is being executed. For extensions, i.e. shared libraries, the module
> init function is executed straight away and does both the creation and
> initialisation. This means that it knows neither the __file__ it is being
> loaded from nor its package (i.e. its FQMN). This hinders relative imports
> and resource loading. In Py3, it's also not being added to sys.modules,
> which means that a (potentially transitive) re-import of the module will
> really try to reimport it and thus run into an infinite loop when it
> executes the module init function again. And without the FQMN, it's not
> trivial to correctly add the module to sys.modules either.
>
> We specifically run into this for Cython generated modules, for which it's
> not uncommon that the module init code has the same level of complexity as
> that of any 'regular' Python module. Also, the lack of a FQMN and correct
> file path hinders the compilation of __init__.py modules, i.e. packages,
> especially when relative imports are being used at module init time.
>
> The proposal
> ============
>
> ... [callbacks] ...
>
> Alternatives
> ============
> ...
> 3) The callback could be registered statically in the PyModuleDef struct by
> adding a new field. This is not trivial to do in a backwards compatible way
> because the struct would grow longer without explicit initialisation by
> existing user code. Extending PyModuleDef_HEAD_INIT might be possible but
> would still break at least binary compatibility.

I think the above is the cleaner approach than the callback mechanism.
There's no problem in adding new slots to the end of the PyModuleDef struct
- we've been doing that for years in many other structs :-)

All you have to do is bump the Python API version number.

(Martin's PEP http://www.python.org/dev/peps/pep-3121/ has the details)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 08 2012)
>>> Python Projects, Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Nov 8, 2012, 5:20 AM

Post #3 of 14 (237 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

M.-A. Lemburg, 08.11.2012 14:01:
> On 08.11.2012 13:47, Stefan Behnel wrote:
>> I suspect that this will be put into a proper PEP at some point, but I'd
>> like to bring this up for discussion first. This came out of issues 13429
>> and 16392.
>>
>> http://bugs.python.org/issue13429
>>
>> http://bugs.python.org/issue16392
>>
>> Stefan
>>
>>
>> The problem
>> ===========
>>
>> Python modules and extension modules are not being set up in the same way.
>> For Python modules, the module is created and set up first, then the module
>> code is being executed. For extensions, i.e. shared libraries, the module
>> init function is executed straight away and does both the creation and
>> initialisation. This means that it knows neither the __file__ it is being
>> loaded from nor its package (i.e. its FQMN). This hinders relative imports
>> and resource loading. In Py3, it's also not being added to sys.modules,
>> which means that a (potentially transitive) re-import of the module will
>> really try to reimport it and thus run into an infinite loop when it
>> executes the module init function again. And without the FQMN, it's not
>> trivial to correctly add the module to sys.modules either.
>>
>> We specifically run into this for Cython generated modules, for which it's
>> not uncommon that the module init code has the same level of complexity as
>> that of any 'regular' Python module. Also, the lack of a FQMN and correct
>> file path hinders the compilation of __init__.py modules, i.e. packages,
>> especially when relative imports are being used at module init time.
>>
>> The proposal
>> ============
>>
>> ... [callbacks] ...
>>
>> Alternatives
>> ============
>> ...
>> 3) The callback could be registered statically in the PyModuleDef struct by
>> adding a new field. This is not trivial to do in a backwards compatible way
>> because the struct would grow longer without explicit initialisation by
>> existing user code. Extending PyModuleDef_HEAD_INIT might be possible but
>> would still break at least binary compatibility.
>
> I think the above is the cleaner approach than the callback mechanism.

Oh, definitely.


> There's no problem in adding new slots to the end of the PyModuleDef struct
> - we've been doing that for years in many other structs :-)
>
> All you have to do is bump the Python API version number.
>
> (Martin's PEP http://www.python.org/dev/peps/pep-3121/ has the details)

The difference is that this specific struct is provided by user code and
(typically) initialised statically. There is no guarantee that user code
that does not expect the additional field will initialise it to 0. Failing
that, I don't see how we could trust its value in any way.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Nov 8, 2012, 5:51 AM

Post #4 of 14 (237 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Stefan Behnel, 08.11.2012 14:20:
> M.-A. Lemburg, 08.11.2012 14:01:
>> On 08.11.2012 13:47, Stefan Behnel wrote:
>>> I suspect that this will be put into a proper PEP at some point, but I'd
>>> like to bring this up for discussion first. This came out of issues 13429
>>> and 16392.
>>>
>>> http://bugs.python.org/issue13429
>>>
>>> http://bugs.python.org/issue16392
>>>
>>> Stefan
>>>
>>>
>>> The problem
>>> ===========
>>>
>>> Python modules and extension modules are not being set up in the same way.
>>> For Python modules, the module is created and set up first, then the module
>>> code is being executed. For extensions, i.e. shared libraries, the module
>>> init function is executed straight away and does both the creation and
>>> initialisation. This means that it knows neither the __file__ it is being
>>> loaded from nor its package (i.e. its FQMN). This hinders relative imports
>>> and resource loading. In Py3, it's also not being added to sys.modules,
>>> which means that a (potentially transitive) re-import of the module will
>>> really try to reimport it and thus run into an infinite loop when it
>>> executes the module init function again. And without the FQMN, it's not
>>> trivial to correctly add the module to sys.modules either.
>>>
>>> We specifically run into this for Cython generated modules, for which it's
>>> not uncommon that the module init code has the same level of complexity as
>>> that of any 'regular' Python module. Also, the lack of a FQMN and correct
>>> file path hinders the compilation of __init__.py modules, i.e. packages,
>>> especially when relative imports are being used at module init time.
>>>
>>> The proposal
>>> ============
>>>
>>> ... [callbacks] ...
>>>
>>> Alternatives
>>> ============
>>> ...
>>> 3) The callback could be registered statically in the PyModuleDef struct by
>>> adding a new field. This is not trivial to do in a backwards compatible way
>>> because the struct would grow longer without explicit initialisation by
>>> existing user code. Extending PyModuleDef_HEAD_INIT might be possible but
>>> would still break at least binary compatibility.
>>
>> I think the above is the cleaner approach than the callback mechanism.
>
> Oh, definitely.
>
>
>> There's no problem in adding new slots to the end of the PyModuleDef struct
>> - we've been doing that for years in many other structs :-)
>>
>> All you have to do is bump the Python API version number.
>>
>> (Martin's PEP http://www.python.org/dev/peps/pep-3121/ has the details)
>
> The difference is that this specific struct is provided by user code and
> (typically) initialised statically. There is no guarantee that user code
> that does not expect the additional field will initialise it to 0. Failing
> that, I don't see how we could trust its value in any way.

Hmm - you're actually right. In C, uninitialised fields in a static struct
are set to 0 automatically. Same case as the type structs. That makes your
objection perfectly valid. I'll rewrite and shorten the proposal.

Thanks!

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Nov 8, 2012, 6:41 AM

Post #5 of 14 (230 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel <stefan_ml [at] behnel> wrote:

> Hi,
>
> I suspect that this will be put into a proper PEP at some point, but I'd
> like to bring this up for discussion first. This came out of issues 13429
> and 16392.
>
> http://bugs.python.org/issue13429
>
> http://bugs.python.org/issue16392
>
> Stefan
>
>
> The problem
> ===========
>
> Python modules and extension modules are not being set up in the same way.
> For Python modules, the module is created and set up first, then the module
> code is being executed. For extensions, i.e. shared libraries, the module
> init function is executed straight away and does both the creation and
> initialisation. This means that it knows neither the __file__ it is being
> loaded from nor its package (i.e. its FQMN). This hinders relative imports
> and resource loading. In Py3, it's also not being added to sys.modules,
> which means that a (potentially transitive) re-import of the module will
> really try to reimport it and thus run into an infinite loop when it
> executes the module init function again. And without the FQMN, it's not
> trivial to correctly add the module to sys.modules either.
>
> We specifically run into this for Cython generated modules, for which it's
> not uncommon that the module init code has the same level of complexity as
> that of any 'regular' Python module. Also, the lack of a FQMN and correct
> file path hinders the compilation of __init__.py modules, i.e. packages,
> especially when relative imports are being used at module init time.
>

Or to put it another way, importlib doesn't give you a nice class to
inherit from which will handle all of the little details of creating a
blank module (or fetching from sys.modules if you are reloading), setting
__file__, __cached__, __package__, __name__, __loader__, and (optionally)
__path__ for you, and then cleaning up if something goes wrong. It's a pain
to do all of this yourself and to get all the details right (i.e. there's a
reason that @importlib.util.module_for_loader exists).


>
> The proposal
> ============
>
> I propose to split the extension module initialisation into two steps in
> Python 3.4, in a backwards compatible way.
>
> Step 1: The current module init function can be reduced to just creating
> the module instance and returning it (and potentially doing some simple C
> level setup). Optionally, after creating the module (and this is the new
> part), the module init code can register a C callback function that will be
> called after setting up the module.
>

Why even bother with the module creation? Why can't Python do that as well
and then call the callback?


>
> Step 2: The shared library importer receives the module instance from the
> module init function, adds __file__, __path__, __package__ and friends to
> the module dict, and then checks for the callback. If non-NULL, it calls it
> to continue the module initialisation by user code.


> The callback
> ============
>
> The callback is defined as follows::
>
> int (*PyModule_init_callback)(PyObject* the_module,
> PyModuleInitContext* context)
>
> "PyModuleInitContext" is a struct that is meant mostly for making the
> callback more future proof by allowing additional parameters to be passed
> in. For now, I can see a use case for the following fields::
>
> struct PyModuleInitContext {
> char* module_name;
> char* qualified_module_name;
> }
>
> Both names are encoded in UTF-8. As for the file path, I consider it best
> to retrieve it from the module's __file__ attribute as a Python string
> object to reduce filename encoding problems.
>
> Note that this struct argument is not strictly required, but given that
> this proposal would have been much simpler if the module init function had
> accepted such an argument in the first place, I consider it a good idea not
> to let this chance pass by again.
>
> The registration of the callback uses a new C-API function:
>
> int PyModule_SetInitFunction(PyObject* module,
> PyModule_init_callback callback)
>
> The function name uses "Set" instead of "Register" to make it clear that
> there is only one such function per module.
>
> An alternative would be a new module creation function "PyModule_Create3()"
> that takes the callback as third argument, in addition to what
> "PyModule_Create2()" accepts. This would require users to explicitly pass
> in the (second) version argument, which might be considered only a minor
> issue.
>
> Implementation
> ==============
>
> The implementation requires local changes to the extension module importer
> and a new C-API function. In order to store the callback, it should use a
> new field in the module object struct.
>
> Open questions
> ==============
>
> It is not clear how extensions should be handled that register more than
> one module in their module init function, e.g. compiled packages. One
> possibility would be to leave the setup to the user, who would have to know
> all FQMNs anyway in this case, although not the import file path.
> Alternatively, the import machinery could use a stack to remember for which
> modules a callback was registered during the last init function call, set
> up all of them and then call their callbacks. It's not clear if this meets
> the intention of the user.
>
> Alternatives
> ============
>
> 1) It would be possible to make extension modules optionally export another
> symbol, e.g. "PyInit2_modulename", that the shared library loader would
> call in addition to the required function "PyInit_modulename". This would
> remove the need for a new API that registers the above callback. The
> drawback is that it also makes it easier to write broken code because a
> Python version or implementation that does not support this second symbol
> would simply not call it, without error. The new C-API function would let
> the build fail instead if it is not supported.
>

An alternative to the alternative is that if the PyInit2 function exists
it's called instead of the the PyInit function, and then the PyInit
function is nothing more than a single line function call (or whatever the
absolute bare minimum is) into some helper that calls the PyInit2 call
properly for backwards ABI compatibility (i.e. passes in whatever details
are lost by the indirection in function call). That provides an eventual
upgrade path of dropping PyInit and moving over to PyInit2.

-Brett


>
> 2) The callback could be made available as a Python function in the module
> dict, thus also removing the need for an explicit registration API.
> However, this approach would add overhead to both sides, the importer code
> and the user provided module init code, as it would require additional
> dictionary handling and the implementation of a one-time Python function in
> user code. It would also suffer from the problem that missing support in
> the runtime would pass silently.
>
> 3) The callback could be registered statically in the PyModuleDef struct by
> adding a new field. This is not trivial to do in a backwards compatible way
> because the struct would grow longer without explicit initialisation by
> existing user code. Extending PyModuleDef_HEAD_INIT might be possible but
> would still break at least binary compatibility.
>
> 4) Pass a new context argument into the module init function that contains
> all information necessary to properly and completely set up the module at
> creation time. This would provide a much simpler and cleaner solution than
> the proposed solution. However, it will not be possible before Python 4 as
> it breaks backwards compatibility with all existing extension modules at
> both the source and binary level.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>


stefan_ml at behnel

Nov 8, 2012, 7:00 AM

Post #6 of 14 (219 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Hi Brett,

thanks for the feedback.

Brett Cannon, 08.11.2012 15:41:
> On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel wrote:
>> I propose to split the extension module initialisation into two steps in
>> Python 3.4, in a backwards compatible way.
>>
>> Step 1: The current module init function can be reduced to just creating
>> the module instance and returning it (and potentially doing some simple C
>> level setup). Optionally, after creating the module (and this is the new
>> part), the module init code can register a C callback function that will be
>> called after setting up the module.
>
> Why even bother with the module creation? Why can't Python do that as well
> and then call the callback?
>
>
>> Step 2: The shared library importer receives the module instance from the
>> module init function, adds __file__, __path__, __package__ and friends to
>> the module dict, and then checks for the callback. If non-NULL, it calls it
>> to continue the module initialisation by user code.
> [...]
> An alternative to the alternative is that if the PyInit2 function exists
> it's called instead of the the PyInit function, and then the PyInit
> function is nothing more than a single line function call (or whatever the
> absolute bare minimum is) into some helper that calls the PyInit2 call
> properly for backwards ABI compatibility (i.e. passes in whatever details
> are lost by the indirection in function call). That provides an eventual
> upgrade path of dropping PyInit and moving over to PyInit2.

In that case, you'd have to export the PyModuleDef descriptor as well,
because that's what tells CPython how the module behaves and what to do
with it to set it up properly (e.g. allocate module state space on the heap).

In fact, if the module init function became a field in the descriptor, it
would be enough (taking backwards compatibility aside) if *only* the
descriptor was exported and used by the module loader.

With the caveat that this might kill some less common but not necessarily
illegitimate use cases that do more than just creating and initialising a
single module...

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Nov 8, 2012, 7:06 AM

Post #7 of 14 (221 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

On Thu, Nov 8, 2012 at 10:00 AM, Stefan Behnel <stefan_ml [at] behnel> wrote:

> Hi Brett,
>
> thanks for the feedback.
>
> Brett Cannon, 08.11.2012 15:41:
> > On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel wrote:
> >> I propose to split the extension module initialisation into two steps in
> >> Python 3.4, in a backwards compatible way.
> >>
> >> Step 1: The current module init function can be reduced to just creating
> >> the module instance and returning it (and potentially doing some simple
> C
> >> level setup). Optionally, after creating the module (and this is the new
> >> part), the module init code can register a C callback function that
> will be
> >> called after setting up the module.
> >
> > Why even bother with the module creation? Why can't Python do that as
> well
> > and then call the callback?
> >
> >
> >> Step 2: The shared library importer receives the module instance from
> the
> >> module init function, adds __file__, __path__, __package__ and friends
> to
> >> the module dict, and then checks for the callback. If non-NULL, it
> calls it
> >> to continue the module initialisation by user code.
> > [...]
> > An alternative to the alternative is that if the PyInit2 function exists
> > it's called instead of the the PyInit function, and then the PyInit
> > function is nothing more than a single line function call (or whatever
> the
> > absolute bare minimum is) into some helper that calls the PyInit2 call
> > properly for backwards ABI compatibility (i.e. passes in whatever details
> > are lost by the indirection in function call). That provides an eventual
> > upgrade path of dropping PyInit and moving over to PyInit2.
>
> In that case, you'd have to export the PyModuleDef descriptor as well,
> because that's what tells CPython how the module behaves and what to do
> with it to set it up properly (e.g. allocate module state space on the
> heap).
>

True.


>
> In fact, if the module init function became a field in the descriptor, it
> would be enough (taking backwards compatibility aside) if *only* the
> descriptor was exported and used by the module loader.
>
>
Also true.


> With the caveat that this might kill some less common but not necessarily
> illegitimate use cases that do more than just creating and initialising a
> single module...
>

You mean creating another module in the init function? That's fine, but
that should be a call to __import__ anyway and that should handle things
properly. Else you are circumventing the import system and you can do
everything from scratch. I don't see why this would stop you from doing
anything you want, it just simplifies the common case.


stefan_ml at behnel

Nov 8, 2012, 7:30 AM

Post #8 of 14 (222 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Brett Cannon, 08.11.2012 16:06:
> On Thu, Nov 8, 2012 at 10:00 AM, Stefan Behnel <stefan_ml [at] behnel> wrote:
>
>> Hi Brett,
>>
>> thanks for the feedback.
>>
>> Brett Cannon, 08.11.2012 15:41:
>>> On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel wrote:
>>>> I propose to split the extension module initialisation into two steps in
>>>> Python 3.4, in a backwards compatible way.
>>>>
>>>> Step 1: The current module init function can be reduced to just creating
>>>> the module instance and returning it (and potentially doing some simple
>> C
>>>> level setup). Optionally, after creating the module (and this is the new
>>>> part), the module init code can register a C callback function that
>> will be
>>>> called after setting up the module.
>>>
>>> Why even bother with the module creation? Why can't Python do that as
>> well
>>> and then call the callback?
>>>
>>>
>>>> Step 2: The shared library importer receives the module instance from
>> the
>>>> module init function, adds __file__, __path__, __package__ and friends
>> to
>>>> the module dict, and then checks for the callback. If non-NULL, it
>> calls it
>>>> to continue the module initialisation by user code.
>>> [...]
>>> An alternative to the alternative is that if the PyInit2 function exists
>>> it's called instead of the the PyInit function, and then the PyInit
>>> function is nothing more than a single line function call (or whatever
>> the
>>> absolute bare minimum is) into some helper that calls the PyInit2 call
>>> properly for backwards ABI compatibility (i.e. passes in whatever details
>>> are lost by the indirection in function call). That provides an eventual
>>> upgrade path of dropping PyInit and moving over to PyInit2.
>>
>> In that case, you'd have to export the PyModuleDef descriptor as well,
>> because that's what tells CPython how the module behaves and what to do
>> with it to set it up properly (e.g. allocate module state space on the
>> heap).
>
> True.
>
>> In fact, if the module init function became a field in the descriptor, it
>> would be enough (taking backwards compatibility aside) if *only* the
>> descriptor was exported and used by the module loader.
>
> Also true.
>
>> With the caveat that this might kill some less common but not necessarily
>> illegitimate use cases that do more than just creating and initialising a
>> single module...
>
> You mean creating another module in the init function? That's fine, but
> that should be a call to __import__ anyway and that should handle things
> properly.

Ok.


> Else you are circumventing the import system and you can do
> everything from scratch.

I guess I'd be ok with putting that burden on users in this case.


> I don't see why this would stop you from doing
> anything you want, it just simplifies the common case.

The only problematic case I see here would be a module that calculates the
size of its state space at init time, e.g. based on some platform specifics
or environment parameters, anything from the platform specific size of some
data type to the runtime configured number of OpenMP threads.

That would make the PyModuleDef a compile time static thing - not sure if
that's currently required.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 5, 2013, 10:02 PM

Post #9 of 14 (48 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Hi,

let me revive and summarize this old thread.

Stefan Behnel, 08.11.2012 13:47:
> I suspect that this will be put into a proper PEP at some point, but I'd
> like to bring this up for discussion first. This came out of issues 13429
> and 16392.
>
> http://bugs.python.org/issue13429
>
> http://bugs.python.org/issue16392
>
>
> The problem
> ===========
>
> Python modules and extension modules are not being set up in the same way.
> For Python modules, the module is created and set up first, then the module
> code is being executed. For extensions, i.e. shared libraries, the module
> init function is executed straight away and does both the creation and
> initialisation. This means that it knows neither the __file__ it is being
> loaded from nor its package (i.e. its FQMN). This hinders relative imports
> and resource loading. In Py3, it's also not being added to sys.modules,
> which means that a (potentially transitive) re-import of the module will
> really try to reimport it and thus run into an infinite loop when it
> executes the module init function again. And without the FQMN, it's not
> trivial to correctly add the module to sys.modules either.
>
> We specifically run into this for Cython generated modules, for which it's
> not uncommon that the module init code has the same level of complexity as
> that of any 'regular' Python module. Also, the lack of a FQMN and correct
> file path hinders the compilation of __init__.py modules, i.e. packages,
> especially when relative imports are being used at module init time.

The outcome of this discussion was that the extension module import
protocol needs to change in order to provide all necessary information to
the module init function.

Brett Cannon proposed to move the module object creation into the extension
module importer, i.e. outside of the user provided module init function.
CPython would then load the extension module, create and initialise the
module object (set __file__, __name__, etc.) and pass it into the module
init function.

I proposed to make the PyModuleDef struct the new entry point instead of
just a generic C function, as that would give the module importer all
necessary information about the module to create the module object. The
only missing bit is the entry point for the new module init function.

Nick Coghlan objected to the proposal of simply extending PyModuleDef with
an initialiser function, as the struct is part of the stable ABI.

Alternatives I see:

1) Expose a struct that points to the extension module's PyModuleDef struct
and the init function and expose that struct instead.

2) Expose both the PyModuleDef and the init function as public symbols.

3) Provide a public C function as entry point that returns both a
PyModuleDef pointer and a module init function pointer.

4) Change the m_init function pointer in PyModuleDef_base from func(void)
to func(PyObject*) iff the PyModuleDef struct is exposed as a public symbol.

5) Duplicate PyModuleDef and adapt the new one as in 4).

Alternatives 1) and 2) only differ marginally by the number of public
symbols being exposed. 3) has the advantage of supporting more advanced
setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack and
has the disadvantage that the signature of the module init function cannot
be stored across reinitialisations (PyModuleDef has no "flags" or "state"
field to remember it). 5) would fix that, i.e. we could add a proper
pointer to the new module init function as well as a flags field for future
extensions. A similar effect could be achieved by carefully designing the
struct in 1).

I think 1-3 are all reasonable ways to do this, although I don't think 3)
will be necessary. 5) would be a clean fix, but has the disadvantage of
duplicating an entire struct just to change one field in it.

I'm currently leaning towards 1), with a struct that points to PyModuleDef,
module init function and a flags field for future extensions. I understand
that this would need to become part of the stable ABI, so explicit
extensibility is important to keep up backwards compatibility.

Opinions?

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Aug 5, 2013, 10:35 PM

Post #10 of 14 (48 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

On 6 August 2013 15:02, Stefan Behnel <stefan_ml [at] behnel> wrote:
> Alternatives I see:
>
> 1) Expose a struct that points to the extension module's PyModuleDef struct
> and the init function and expose that struct instead.
>
> 2) Expose both the PyModuleDef and the init function as public symbols.
>
> 3) Provide a public C function as entry point that returns both a
> PyModuleDef pointer and a module init function pointer.
>
> 4) Change the m_init function pointer in PyModuleDef_base from func(void)
> to func(PyObject*) iff the PyModuleDef struct is exposed as a public symbol.
>
> 5) Duplicate PyModuleDef and adapt the new one as in 4).
>
> Alternatives 1) and 2) only differ marginally by the number of public
> symbols being exposed. 3) has the advantage of supporting more advanced
> setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack and
> has the disadvantage that the signature of the module init function cannot
> be stored across reinitialisations (PyModuleDef has no "flags" or "state"
> field to remember it). 5) would fix that, i.e. we could add a proper
> pointer to the new module init function as well as a flags field for future
> extensions. A similar effect could be achieved by carefully designing the
> struct in 1).
>
> I think 1-3 are all reasonable ways to do this, although I don't think 3)
> will be necessary. 5) would be a clean fix, but has the disadvantage of
> duplicating an entire struct just to change one field in it.
>
> I'm currently leaning towards 1), with a struct that points to PyModuleDef,
> module init function and a flags field for future extensions. I understand
> that this would need to become part of the stable ABI, so explicit
> extensibility is important to keep up backwards compatibility.
>
> Opinions?

I believe a better option would be to migrate module creation over to
a dynamic PyModule_Slot and PyModule_Spec approach in the stable ABI,
similar to the one that was defined for types in PEP 384.

A related topic is that over on import-sig, we're currently tinkering
with the idea of changing the way *Python* module imports happen to
include a separate "ImportSpec" object (exact name TBC). The spec
would contain preliminary info on all of the things that the import
system can figure out *without* actually importing the module. That
list includes all the special attributes that are currently set on
modules:

__loader__
__name__
__package__
__path__
__file__
__cached__

(Note that the attributes on the spec *may not* be the same as those
in the module's own namespace - for example, __name__ and
__spec__.name would differ in a module executed with -m, and __path__
and __spec__.path would end up differing in packages that directly
manipulated their __path__ attribute during __init__ execution)

The intent is to clean up some of the ad hoc hackery that was needed
to make PEP 420 work, and reduce the amount of duplicated
functionality needed in loader implementations.

If you wanted to reboot this thread on import-sig, that would probably
be a good thing :)

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 5, 2013, 11:03 PM

Post #11 of 14 (48 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Nick Coghlan, 06.08.2013 07:35:
> If you wanted to reboot this thread on import-sig, that would probably
> be a good thing :)

Sigh. Yet another list to know about and temporarily follow...

The import-sig list doesn't seem to be mirrored on Gmane yet. Also, it
claims to be dead w.r.t. Py3.4:

"""
The intent is that this SIG will be re-retired after Python 3.3 is released.
"""

-> http://www.python.org/community/sigs/current/import-sig/

"""
Resurrected for landing PEP 382 in Python 3.3.
"""

-> http://mail.python.org/mailman/listinfo/import-sig

Seriously, wouldn't python-dev be just fine for this? It's not like the
import system is going to be rewritten for each minor release from now on.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Aug 6, 2013, 12:09 AM

Post #12 of 14 (48 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

On 6 August 2013 16:03, Stefan Behnel <stefan_ml [at] behnel> wrote:
> Seriously, wouldn't python-dev be just fine for this? It's not like the
> import system is going to be rewritten for each minor release from now on.

We currently use it whenever we're doing a deep dive into import
system arcana, so python-dev only needs to worry about the question
once it's a clearly viable proposal. I think the other thread will be
quite relevant to the topic you're interested in, since we hadn't even
considered extension modules yet.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


rymg19 at gmail

Aug 6, 2013, 8:02 AM

Post #13 of 14 (44 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Nice idea, but some of those may break 3rd party libraries like Boost. Python that have their own equilavent of the Python/C API. Or Even SWIG might experience trouble in one or two of those.

Stefan Behnel <stefan_ml [at] behnel> wrote:

>Hi,
>
>let me revive and summarize this old thread.
>
>Stefan Behnel, 08.11.2012 13:47:
>> I suspect that this will be put into a proper PEP at some point, but
>I'd
>> like to bring this up for discussion first. This came out of issues
>13429
>> and 16392.
>>
>> http://bugs.python.org/issue13429
>>
>> http://bugs.python.org/issue16392
>>
>>
>> The problem
>> ===========
>>
>> Python modules and extension modules are not being set up in the same
>way.
>> For Python modules, the module is created and set up first, then the
>module
>> code is being executed. For extensions, i.e. shared libraries, the
>module
>> init function is executed straight away and does both the creation
>and
>> initialisation. This means that it knows neither the __file__ it is
>being
>> loaded from nor its package (i.e. its FQMN). This hinders relative
>imports
>> and resource loading. In Py3, it's also not being added to
>sys.modules,
>> which means that a (potentially transitive) re-import of the module
>will
>> really try to reimport it and thus run into an infinite loop when it
>> executes the module init function again. And without the FQMN, it's
>not
>> trivial to correctly add the module to sys.modules either.
>>
>> We specifically run into this for Cython generated modules, for which
>it's
>> not uncommon that the module init code has the same level of
>complexity as
>> that of any 'regular' Python module. Also, the lack of a FQMN and
>correct
>> file path hinders the compilation of __init__.py modules, i.e.
>packages,
>> especially when relative imports are being used at module init time.
>
>The outcome of this discussion was that the extension module import
>protocol needs to change in order to provide all necessary information
>to
>the module init function.
>
>Brett Cannon proposed to move the module object creation into the
>extension
>module importer, i.e. outside of the user provided module init
>function.
>CPython would then load the extension module, create and initialise the
>module object (set __file__, __name__, etc.) and pass it into the
>module
>init function.
>
>I proposed to make the PyModuleDef struct the new entry point instead
>of
>just a generic C function, as that would give the module importer all
>necessary information about the module to create the module object. The
>only missing bit is the entry point for the new module init function.
>
>Nick Coghlan objected to the proposal of simply extending PyModuleDef
>with
>an initialiser function, as the struct is part of the stable ABI.
>
>Alternatives I see:
>
>1) Expose a struct that points to the extension module's PyModuleDef
>struct
>and the init function and expose that struct instead.
>
>2) Expose both the PyModuleDef and the init function as public symbols.
>
>3) Provide a public C function as entry point that returns both a
>PyModuleDef pointer and a module init function pointer.
>
>4) Change the m_init function pointer in PyModuleDef_base from
>func(void)
>to func(PyObject*) iff the PyModuleDef struct is exposed as a public
>symbol.
>
>5) Duplicate PyModuleDef and adapt the new one as in 4).
>
>Alternatives 1) and 2) only differ marginally by the number of public
>symbols being exposed. 3) has the advantage of supporting more advanced
>setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack
>and
>has the disadvantage that the signature of the module init function
>cannot
>be stored across reinitialisations (PyModuleDef has no "flags" or
>"state"
>field to remember it). 5) would fix that, i.e. we could add a proper
>pointer to the new module init function as well as a flags field for
>future
>extensions. A similar effect could be achieved by carefully designing
>the
>struct in 1).
>
>I think 1-3 are all reasonable ways to do this, although I don't think
>3)
>will be necessary. 5) would be a clean fix, but has the disadvantage of
>duplicating an entire struct just to change one field in it.
>
>I'm currently leaning towards 1), with a struct that points to
>PyModuleDef,
>module init function and a flags field for future extensions. I
>understand
>that this would need to become part of the stable ABI, so explicit
>extensibility is important to keep up backwards compatibility.
>
>Opinions?
>
>Stefan
>
>
>_______________________________________________
>Python-Dev mailing list
>Python-Dev [at] python
>http://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe:
>http://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


stefan_ml at behnel

Aug 6, 2013, 8:59 AM

Post #14 of 14 (44 views)
Permalink
Re: Make extension module initialisation more like Python module initialisation [In reply to]

Ryan, 06.08.2013 17:02:
> Nice idea, but some of those may break 3rd party libraries like Boost.
> Python that have their own equilavent of the Python/C API. Or Even SWIG
> might experience trouble in one or two of those.

Te idea is that this will be an alternative way of initialising a module
that CPython will only use if an extension module exports the corresponding
symbol. So it won't break existing code, neither source code nor binaries.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.