7

Subinterpreters for Python

 4 years ago
source link: https://lwn.net/SubscriberLink/820424/172e6da006687167/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

By Jake Edge

May 13, 2020

A project that has been floating around in the Python world for a number of years is now working its way toward inclusion into the language—or not. "Subinterpreters", which are separate Python interpreters that can currently be created via the C API for extensions, are seen by some as a way to get a more Go-like concurrency model for Python. The first step toward that goal is to expose that API in the standard library. But there are questions about whether subinterpreters are actually a desirable feature for Python at all, as well as whether the hoped-for concurrency improvements will materialize.

PEP 554

Eric Snow's PEP 554 (" Multiple Interpreters in the Stdlib ") would expose the existing "subinterpreter" support from the C API in the standard library. That would allow Python programs to use multiple separate interpreters; the PEP also proposes to add a way to share some data types between the instances. The eventual goal is to allow those subinterpreters to run in parallel, but the implementation is not there yet.

In particular, giving each subinterpreter its own global interpreter lock (GIL) is not (yet) on the table. The GIL prevents prevents multiple threads from executing Python bytecode at the same time. It exists mainly because the CPython memory-management code and garbage collector are not thread-safe. But the existence of the GIL has meant that other features, C-based extensions for example, depend on it for proper functioning. There have been efforts to remove the GIL from Python along the way, including the Gilectomy project .Subinterpreters are seen by some as another way of addressing the "GIL problem".

The PEP proposes adding an interpreters module to the standard library that will allow the creation of subinterpreters as follows:

interp = interpreters.create()
Interpreters can then run code passed as a string to the run()

method. Data is not shared between these interpreters unless it is done explicitly by using "channels" created this way:

recv, send = interpreters.create_channel()
As might be guessed, simple objects (e.g. bytes, strings, integers) can then be sent and received using the send() and recv()

methods of the corresponding channel objects.

The run() method blocks until the subinterpreter completes, though it can be executed in a separate thread as an example from the PEP that uses the threading module shows:

interp = interpreters.create()
def run():
    interp.run('print("during")')
t = threading.Thread(target=run)
print('before')
t.start()
print('after')

Because the GIL is shared between all of the interpreters, however, the concurrency gains are minimal. In the most recent revisions, the PEP tries to make it clear that exposing the feature from the C API is worth doing regardless of what happens with the GIL:

To avoid any confusion up front: This PEP is unrelated to any efforts to stop sharing the GIL between subinterpreters. At most this proposal will allow users to take advantage of any results of work on the GIL. The position here is that exposing subinterpreters to Python code is worth doing, even if they still share the GIL.

PEP 554 has been around since 2017, but Snow thinks it is getting ready for "pronouncement" (a decision to accept or reject it) now. While he believes there is value to exposing the interface in its own right, the PEP has had trouble separating itself from the ongoing GIL work; PEP 554 could perhaps be added to Python 3.9, though the GIL changes are not complete. In mid-April, Snowposed a question to the python-dev mailing list, wondering if it made sense to hold off on the PEP until 3.10 because there is no per-interpreter GIL.

Many folks have conflated PEP 554 with having a per-interpreter GIL. In fact, I was careful to avoid any mention of parallelism or the GIL in the PEP. Nonetheless some are expecting that when PEP 554 lands we will reach multi-core nirvana.

While PEP 554 might be accepted and the implementation ready in time for 3.9, the separate effort toward a per-interpreter GIL is unlikely to be sufficiently done in time. That will likely happen in the next couple months (for 3.10).

So...would it be sufficiently problematic for users if we land PEP 554 in 3.9 without per-interpreter GIL?

His main concern is that users will be confused and frustrated by encountering subinterpreters with a shared GIL, which will have lots of limitations; that might lead them to not reconsider the feature when those limitations are lifted for 3.10. He listed four options for proceeding: merging it without the GIL changes, the same but mark it as a "provisional" module, not merging until the GIL changes are ready, and the same but adding a 3.9-only subinterpreters module to the Python Package Index (PyPI). He was in favor of the first or the second option.

C extensions

But others are concerned that adding subinterpreter support to the standard library will put additional burdens onto the developers of C-based extensions. Those extensions sometimes use global variables, which do not play well with subinterpreters—whether they are created via the existing C API or the proposed standard library interpreters module. That means that using subinterpreters could lead to strange, hard-to-find problems when combined with extensions.

CPython core developer Nathaniel Smith, who is also a core developer of the C-based extension NumPy , wasparticularly unhappy with the proposal:

I think you've been downplaying the impact of subinterpreter support on the existing extension ecosystem. All features have a cost, which is why PEPs always require substantial rationales and undergo intense scrutiny. But subinterpreters are especially expensive. Most features only affect a small group of modules (e.g. async/await affected twisted and tornado, but 99% of existing libraries didn't care); OTOH subinterpreters require updates to every C extension module. And if we start telling users that subinterpreters are a supported way to run arbitrary Python code, then we've effectively limited extension authors options to "update to support subinterpreters" or "explain to users why they aren't writing a proper Python module", which is an intense amount of pressure; for most features maintainers have the option of saying "well, that isn't relevant to me", but with subinterpreter support that option's been removed.

NumPy core developer Sebastian Bergchimed in as well. He suggested that it could take up to a solid year of work to support subinterpreters in NumPy. He alsosaid that the proposal to raise an exception when subinterpreters import extensions that are not subinterpreter-ready is helpful, though it likely will still lead to bugs being filed against the extensions. The PEP proposes to raise ImportError for any extension that does not support PEP 489 (" Multi-phase extension module initialization "); multi-phase initialization eliminates the problems with global state variables for the extensions by moving them into their own module-specific dictionary object.

Both Smith and Berg are skeptical of the existing C-level subinterpreter support. Berg said: " I believe you must consider subinterpreters basically a non-feature at this time. It has neither users nor reasonable ecosystem support ", while Smith said that he might write a PEP to propose that subinterpreters be completely eliminated from Python. Snowreplied to Berg that there are existing users, however:

That's not to say that alone justifies exposing the C-API, of course. :)

Benefits?

Beyond the concerns about extensions, though, Smith is not convinced of the benefits for concurrency that could eventually come from subinterpreter support. PEP 554 is careful not to directly connect the interpreters module with the eventual plan to stop sharing the GIL between subinterpreters, though it is clearly the eventual goal for some. Smith is skeptical of that plan as well:

In talks and informal conversations, you paint a beautiful picture of all the wonderful things subinterpreters will do. Lots of people are excited by these wonderful things. I tried really hard to be excited too. (In fact I spent a few weeks trying to work out a subinterpreter-style proposal myself way back before you started working on this!) But the problem is, whenever I look more closely at the exciting benefits, I end up convincing myself that they're a mirage, and either they don't work at all (e.g. quickly sharing arbitrary objects between interpreters), or else end up being effectively a more complex, fragile version of things that already exist.

Berg concurred to a certain extent. He said that there is a need for a wider vision, beyond the PEP's smaller goals, to explain what the plans are for subinterpreters so that a fuller picture can be considered. Snow agreed that there was a need for better documentation, an informational PEP or other justification document, though that has not appeared as yet. Ultimately, the decision on the PEP rests with Antoine Pitrou, who is the delegate for the PEP. He is generallyfavorably inclined toward it:

Mostly, I hope that by making the subinterpreters functionality available to pure Python programmers (while it was formally an advanced and arcane part of the C API), we will spur of bunch of interesting third-party experimentations, including possibilities that we on python-dev have not thought about.

He had some concrete suggestions on things to improve in the API and suggested that the feature be added provisionally (effectively option two in Snow's original message). He also explicitly solicited more feedback. Mark Shannon reviewed the PEP andsaid that he was in favor of the idea, but that it did not make sense to add the module to the standard library without showing that it would be beneficial for parallelism:

My main objection is that without per-[subinterpreter] GILs (SILs?) PEP 554 provides no value over threading or multi-processing. Multi-processing provides true parallelism and threads provide shared memory concurrency.

If per-[subinterpreter] GILs are possible then, and only then, sub-interpreters will provide true parallelism and (limited) shared memory concurrency.

The problem is that we don't know whether we can implement per-[subinterpreter] GILs without too large a negative performance impact. I think we can, but we can't say so for certain.

Snowdisagreed, not surprisingly, but Shannon put together a table comparing different existing approaches to concurrency in Python with PEP 554 and an "ideal" communicating sequential processes (CSP) model. Go's concurrency model is roughly based around CSP; adding it to Python has also been tried along the way. Shannonsaid:

There are lot of question marks in the PEP 544 column. The PEP needs to address those.

As it stands, multiprocessing a better fit for CSP than PEP 554.

IMO, sub-interpreters only become a useful option for concurrency if they allow true parallelism and are not much more expensive than threads.

Snowsees concurrency as something of a side issue, but he is thinking of taking up the suggestion by Berg and others to more fully document the complete plan:

I really want to keep discussion focused on the proposed API in the PEP. Honestly I'm considering taking up the recommendation to add a new PEP about making subinterpreters official. I never meant for that to be more than a minor point for PEP 554.

There was plenty of other discussion, but Snow eventuallydeferred the PEP until the 3.10 time frame:

FYI, after consulting with the steering council I've decided to change the target release to 3.10, when we expect to have per-interpreter GIL landed. That will help maximize the impact of the module and avoid any confusion. I'm undecided on releasing a 3.9-only module on PyPI. If I do it will only be for folks to try it out early and I probably won't advertise it much.

It is an interesting feature and one that numerous core developers think could really help the performance of Python programs on multiple cores. But, without the GIL changes, it is difficult to know for sure whether it will be a substantial win. As Smithput it: " [...] the new concurrency model in PEP 554 has never actually been used, and it isn't even clear whether it's useful at all. Designing useful concurrency models is *stupidly* hard. " We will have to wait to see if subinterpreters can clear that hurdle.

Index entries for this article Python Python Enhancement Proposals (PEP)/PEP 554 Python Subinterpreters (

to post comments)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK