4

Python 3.11 Preview: TOML and tomllib

 1 year ago
source link: https://realpython.com/python311-tomllib/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

TOML and tomllib – Real Python

Python 3.11 Beta

A new version of Python is released in October each year. The code is developed and tested over a seventeen-month period before the release date. New features are implemented during the alpha phase. For Python 3.11, seven alpha releases were made between October 2021 and April 2022.

The first beta release of Python 3.11 happened in the early hours of May 8, 2022. Each such pre-release is coordinated by a release manager—currently Pablo Galindo Salgado—and ties together hundreds of commits from Python’s core developers and other volunteers.

This release also marked the feature freeze for the new version. In other words, no new features will be added to Python 3.11 that aren’t already present in Python 3.11.0b1. Instead, the time between the feature freeze and the release date—October 3, 2022—is used to test and solidify the code.

About once a month during the beta phase, Python’s core developers release a new beta version to continue showing off the new features, testing them, and getting early feedback. Currently, the latest beta version of Python 3.11 is 3.11.0b3, released on June 1, 2022.

Note: This tutorial uses the third beta version of Python 3.11. You might experience small differences if you use a later version. However, tomllib builds on a mature library, and you can expect that what you learn in this tutorial will stay the same through the beta phase and in the final release of Python 3.11.

If you’re maintaining your own Python package, then the beta phase is an important period when you should start testing your package with the new version. Together with the community, the core developers want to find and fix as many bugs as possible before the final release.

Cool New Features

Some of the highlights of Python 3.11 include:

There’s a lot to look forward to in Python 3.11! You can already read about the enhanced error messages and task and exception groups in earlier Python 3.11 preview articles.

In this tutorial, you’ll focus on how you can use the new tomllib library to read and parse TOML files. You’ll also get a short peek at some of the typing improvements that’ll be shipping with Python 3.11.

Installation

To play with the code examples in this tutorial, you’ll need to install a version of Python 3.11 onto your system. In this subsection, you’ll learn about a few different ways to do this: using Docker, using pyenv, or installing from source. Pick the one that works best for you and your system.

Note: Beta versions are previews of upcoming features. While most features will work well, you shouldn’t depend on any Python 3.11 beta version in production or anywhere else where potential bugs will have serious consequences.

If you have access to Docker on your system, then you can download the latest version of Python 3.11 by pulling and running the python:3.11-rc-slim Docker image:

$ docker pull python:3.11-rc-slim
3.11-rc-slim: Pulling from library/python
[...]
docker.io/library/python:3.11-rc-slim

$ docker run -it --rm python:3.11-rc-slim

This drops you into a Python 3.11 REPL. Check out Run Python Versions in Docker for more information about working with Python through Docker, including how to run scripts.

The pyenv tool is great for managing different versions of Python on your system, and you can use it to install Python 3.11 beta if you like. It comes with two different versions, one for Windows and one for Linux and macOS. Choose your platform with the switcher below:

On Windows, you can use pyenv-win. First update your pyenv installation:

PS> pyenv update
:: [Info] ::  Mirror: https://www.python.org/ftp/python
[...]

Doing an update ensures that you can install the latest version of Python. You could also update pyenv manually.

Use pyenv install --list to check which versions of Python 3.11 are available. Then, install the latest one:

$ pyenv install 3.11.0b3
Downloading Python-3.11.0b3.tar.xz...
[...]

The installation may take a few minutes. Once your new beta version is installed, then you can create a virtual environment where you can play with it:

PS> pyenv local 3.11.0b3
PS> python --version
Python 3.11.0b3

PS> python -m venv venv
PS> venv\Scripts\Activate.ps1

You use pyenv local to activate your Python 3.11 version, and then set up the virtual environment with python -m venv.

You can also install Python from one of the pre-release versions available on python.org. Choose the latest pre-release and scroll down to the Files section at the bottom of the page. Download and install the file corresponding to your system. See Python 3 Installation & Setup Guide for more information.

Most of the examples in this tutorial rely on new features, so you should run them with your Python 3.11 executable. Exactly how you run the executable depends on how you installed it. If you need help, then have a look at the relevant tutorial on Docker, pyenv, virtual environments, or installing from source.

tomllib TOML Parser in Python 3.11

Python is a mature language. The first public version of Python was released in 1991, more than thirty years ago. A lot of Python’s distinct features, including explicit exception handling, the reliance on whitespace, and rich data structures like lists and dictionaries, were present even in the early days.

One feature lacking in the first versions of Python, though, was a convenient way to share community packages and modules. That’s not so surprising. In fact, Python was invented at about the same time as the World Wide Web. At the end of 1991, only twelve web servers existed worldwide, and none of them were dedicated to distributing Python code.

Over time, both Python and the Internet got more popular. Several initiatives aimed to allow sharing of Python code. These features evolved organically and led to Python’s somewhat chaotic relationship to packaging.

This has been adressed through several Packaging PEPs (Python Enhancement Proposals) over the last couple of decades, and the situation has improved considerably for both library maintainers and end users.

One challenge was that building packages relied on executing a setup.py file, but there was no mechanism for knowing which dependencies that file relied on. This created a kind of chicken-and-egg problem where you’d need to run setup.py to discover how you can run setup.py.

In practice, pip—Python’s package manager—assumed that it should use Setuptools to build packages and that Setuptools is available on your computer. This made it harder to use alternative build systems like Flit and Poetry.

To resolve the situation, PEP 518 introduced the pyproject.toml configuration file, which specifies Python project build dependencies. PEP 518 was accepted in 2016. At the time, TOML was still a fairly new format and there was no built-in support for parsing TOML in Python or its standard library.

As the TOML format has matured and the use of the pyproject.toml file has settled in, Python 3.11 adds support for parsing TOML files. In this section, you’ll learn more about what the TOML format is, how you can use the new tomllib to parse TOML documents, and why tomllib doesn’t support writing TOML files.

Learn Basic TOML

Tom Preston-Werner first announced Tom’s Obvious, Minimal Language—commonly known as TOML—and released version 0.1.0 of its specification in 2013. From the beginning, the aim of TOML has been to provide a “minimal configuration file format that’s easy to read due to obvious semantics” (Source). The stable version 1.0.0 of the TOML specification was released in January 2021.

A TOML file is a UTF-8 encoded, case-sensitive text file. The main building blocks in TOML are key-value pairs, where the key is separated from the value by an equal sign (=):

version = 3.11

In this minimal TOML document, version is a key with the corresponding value 3.11. Values have types in TOML. 3.11 is interpreted as a floating-point number. Other basic types that you may take advantage of are strings, Booleans, integer numbers, and dates:

version = 3.11
release_manager = "Pablo Galindo Salgado"
is_beta = true
beta_release = 3
release_date = 2022-06-01

This example shows most of these types. The syntax is similar to Python’s syntax, except for having lowercase Booleans and a special date literal. In their basic form, TOML key-value pairs resemble Python variable assignments, so they should look familiar. For more details on these and other similarities, check out the TOML Documentation.

At its core, a TOML document is a collection of key-value pairs. You can add some structure to these pairs by wrapping them in arrays and tables. An array is a list of values, similar to a Python list. A table is a nested collection of key-value pairs, similar to a Python dict.

You use square brackets to wrap the elements of an array. A table is initiated by starting with a [key] line naming the table:

[python]
version = 3.11
release_manager = "Pablo Galindo Salgado"
is_beta = true
beta_release = 3
release_date = 2022-06-01
peps = [657, 654, 678, 680, 673, 675, 646, 659]

[toml]
version = 1.0
release_date = 2021-01-12

This TOML document can be represented as follows in Python:

{
    "python": {
        "version": 3.11,
        "release_manager": "Pablo Galindo Salgado",
        "is_beta": True,
        "beta_release": 3,
        "release_date": datetime.date(2022, 6, 1),
        "peps": [657, 654, 678, 680, 673, 675, 646, 659],
    },
    "toml": {
        "version": 1.0,
        "release_date": datetime.date(2021, 1, 12),
    },
}

The [python] key in TOML becomes represented in Python by a "python" key in the dictionary pointing to a nested dictionary containing all the key-value pairs in the TOML section. TOML tables can be arbitrarily nested, and a TOML document can contain several TOML tables.

This wraps up your short introduction to TOML syntax. Although TOML by design has a fairly minimal syntax, there are some details that you haven’t covered here. To dive deeper, check out the TOML specification.

In addition to its syntax, you should consider how you interpret values in a TOML file. TOML documents are usually used for configuration. Ultimately, some other application uses the information from a TOML document. That application therefore has some expectation about the content of the TOML file. The implication of this is that a TOML document can have two different kinds of errors:

  1. Syntax error: The TOML document isn’t valid TOML. The TOML parser usually catches this.
  2. Schema error: The TOML document is valid TOML, but its structure isn’t what the application expects. The application itself must handle this.

The TOML specification doesn’t currently include a schema language that can be used to validate the structure of TOML documents, although several proposals exist. Such a schema would check that a given TOML document includes the correct tables, keys, and value types for a given use case.

As an example of an informal schema, PEP 517 and PEP 518 say that a pyproject.toml file should define the build-system table, which must include the keys requires and build-backend. Furthermore, the value of requires must be an array of strings, while the value of build-backend must be a string. The following is an example of a TOML document fulfilling this schema:

# pyproject.toml

[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"

This example follows the requirements of PEP 517 and PEP 518. However, that validation is typically done by the build front-end.

Note: If you want to learn more about building your own packages in Python, check out How to Publish an Open-Source Python Package to PyPI.

You can check this validation yourself. Create the following erroneous pyproject.toml file:

# pyproject.toml

[build-system]
requires = "setuptools>=61.0.0"
backend = "setuptools.build_meta"

This is valid TOML, so the file can be read by any TOML parser. However, it’s not a valid build-system table according to the requirements in the PEPs. To confirm this, install build, which is a PEP 517 compliant build front-end, and perform a build based on your pyproject.toml file:

(venv) $ python -m pip install build
(venv) $ python -m build
ERROR Failed to validate `build-system` in pyproject.toml:
      `requires` must be an array of strings

The error message points out that requires must be an array of strings, as specified in PEP 518. Play with other versions of your pyproject.toml file and note which other validations build does for you. You may need to implement similar validations in your own applications.

So far, you’ve seen a few examples of TOML documents, but you haven’t explored how you can use them in your own projects. In the next subsection, you’ll learn how you can use the new tomllib package in the standard library to read and parse TOML files in Python 3.11.

Read TOML With tomllib

Python 3.11 comes with a new module in the standard library named tomllib. You can use tomllib to read and parse any TOML v1.0 compliant document. In this subsection, you’ll learn how you can load TOML directly from files and from strings that contain TOML documents.

PEP 680 describes tomllib and some of the process that led to TOML support being added to the standard library. Two deciding factors for the inclusion of tomllib in Python 3.11 were the central role that pyproject.toml plays in the Python packaging ecosystem and the TOML specification’s reaching version 1.0 in early 2021.

The implementation of tomllib is more or less lifted straight from tomli by Taneli Hukkinen, who’s also one of the co-authors of PEP 680.

The tomllib module is quite simple in that it only contains two functions:

  1. load() reads TOML documents from files.
  2. loads() reads TOML documents from strings.

You’ll first see how you can use tomllib to read the following pyproject.toml file, which is a simplified version of the same file in the tomli project:

# pyproject.toml

[build-system]
requires = ["flit_core>=3.2.0,<4"]
build-backend = "flit_core.buildapi"

[project]
name = "tomli"
version = "2.0.1"  # DO NOT EDIT THIS LINE MANUALLY. LET bump2version DO IT
description = "A lil' TOML parser"
requires-python = ">=3.7"
readme = "README.md"
keywords = ["toml"]

    [project.urls]
    "Homepage" = "https://github.com/hukkin/tomli"
    "PyPI" = "https://pypi.org/project/tomli"

Copy this document and save it in a file named pyproject.toml on your local file system. You can now start a REPL session in order to explore Python 3.11’s TOML support:

>>> import tomllib
>>> with open("pyproject.toml", mode="rb") as fp:
...     tomllib.load(fp)
...
{'build-system': {'requires': ['flit_core>=3.2.0,<4'],
                  'build-backend': 'flit_core.buildapi'},
 'project': {'name': 'tomli',
             'version': '2.0.1',
             'description': "A lil' TOML parser",
             'requires-python': '>=3.7',
             'readme': 'README.md',
             'keywords': ['toml'],
             'urls': {'Homepage': 'https://github.com/hukkin/tomli',
                      'PyPI': 'https://pypi.org/project/tomli'}}}

You use load() to read and parse a TOML file by passing a file pointer to the function. Note that the file pointer must point to a binary stream. One way to ensure this is to use open() with mode="rb", where the b indicates binary mode.

Note: According to PEP 680, the file must be opened in binary mode so that tomllib can ensure that the UTF-8 encoding is handled correctly on all systems.

Compare the original TOML document with the resulting Python data structure. The document is represented by a Python dictionary where all the keys are strings, and different tables in TOML are represented as nested dictionaries. Observe that the comment about version in the original file is ignored and not part of the result.

You can use loads() to load a TOML document that’s already represented in a string. The following example parses the example from the previous subsection:

>>> import tomllib
>>> document = """
... [python]
... version = 3.11
... release_manager = "Pablo Galindo Salgado"
... is_beta = true
... beta_release = 3
... release_date = 2022-06-01
... peps = [657, 654, 678, 680, 673, 675, 646, 659]
...
... [toml]
... version = 1.0
... release_date = 2021-01-12
... """

>>> tomllib.loads(document)
{'python': {'version': 3.11,
            'release_manager': 'Pablo Galindo Salgado',
            'is_beta': True,
            'beta_release': 3,
            'release_date': datetime.date(2022, 6, 1),
            'peps': [657, 654, 678, 680, 673, 675, 646, 659]},
 'toml': {'version': 1.0,
          'release_date': datetime.date(2021, 1, 12)}}

Similarly to load(), loads() returns a dictionary. In general, the representation is based on basic Python types: str, float, int, bool, as well as dictionaries, lists, and datetime objects. The tomllib documentation includes a conversion table that shows how TOML types are represented in Python.

If you prefer, then you can use loads() to read TOML from files by combining it with pathlib:

>>> import pathlib
>>> import tomllib

>>> path = pathlib.Path("pyproject.toml")
>>> with path.open(mode="rb") as fp:
...     from_load = tomllib.load(fp)
...
>>> from_loads = tomllib.loads(path.read_text())

>>> from_load == from_loads
True

In this example, you load pyproject.toml using both load() and loads(). You then confirm that the Python representation is the same regardless of how you load the file.

Both load() and loads() accept one optional parameter: parse_float. This allows you to take control over how floating-point numbers are parsed and represented in Python. By default, they’re parsed and stored as float objects, which in most Python implementations are 64-bit with about 16 decimal digits of precision.

One alternative, if you need to work with more precise numbers, is to use decimal.Decimal instead:

>>> import tomllib
>>> from decimal import Decimal
>>> document = """
... small = 0.12345678901234567890
... large = 9999.12345678901234567890
... """

>>> tomllib.loads(document)
{'small': 0.12345678901234568,
 'large': 9999.123456789011}

>>> tomllib.loads(document, parse_float=Decimal)
{'small': Decimal('0.12345678901234567890'),
 'large': Decimal('9999.12345678901234567890')}

Here you load a TOML document with two key-value pairs. By default, you lose a bit of precision when using load() or loads(). By using the Decimal class, you keep the precision in your input.

As noted, the tomllib module is adapted from the popular tomli module. If you want to use TOML and tomllib on codebases that need to support older versions of Python, then you can fall back on tomli. To do so, add the following line in your requirements file:

tomli >= 1.1.0 ; python_version < "3.11"

This will install tomli when used on Python versions before 3.11. In your source code, you can then use tomllib or tomli as appropriate with the following import:

try:
    import tomllib
except ModuleNotFoundError:
    import tomli as tomllib

This code will import tomllib on Python 3.11 and later. If tomllib isn’t available, then tomli is imported and aliased to the tomllib name.

You’ve seen how to use tomllib to read TOML documents. You may wonder how you can write TOML files. It turns out that you can’t write TOML with tomllib. Read on to learn why, and to see some of the alternatives.

Write TOML

Similar existing libraries like json and pickle include both load() and dump() functions, where the latter is used to write data. The dump() function, as well as the corresponding dumps(), is deliberately left out of tomllib.

According to PEP 680 and the discussion around it, this has been done for a handful of reasons:

  • The main motivation for including tomllib in the standard library is to be able to read TOML files used in the ecosystem.

  • The TOML format is designed to be a human-friendly configuration format, so many TOML files are written manually.

  • The TOML format isn’t designed to be a data serialization format like JSON or pickle, so being fully consistent with the json and pickle APIs isn’t necessary.

  • TOML documents may contain comments and formatting that should be preserved when written to file. This isn’t compatible with representing TOML as basic Python types.

  • There are different opinions about how to lay out and format TOML files.

  • None of the core developers expressed interest in maintaining a write API for tomllib.

Once something is added to the standard library, it becomes hard to change or remove because someone’s relying on it. This is a good thing, as it means that Python stays mostly backward compatible: few Python programs that run on Python 3.10 will stop working on Python 3.11.

Another consequence is that the core team is conservative about adding new features. Support for writing TOML documents can be added later if it becomes clear that there’s a real demand for it.

This doesn’t leave you empty-handed, though. There are several third-party TOML writers available. The tomllib documentation mentions two packages:

  • tomli-w is, as the name implies, a sibling of tomli that can write TOML documents. It’s a simple module without many options to control the output.
  • tomlkit is a powerful package for working with TOML documents, and it supports both reading and writing. It preserves comments, indentation, and other whitespace. TOML Kit is developed for and used by Poetry.

Depending on your use case, one of those packages will probably fulfill your TOML writing needs.

If you don’t want to add an external dependency just to write a TOML file, then you can also try to roll your own writer. The following example shows an example of an incomplete TOML writer. It doesn’t support all the features of TOML v1.0, but it supports enough to write the pyproject.toml example that you saw earlier:

# tomllib_w.py

from datetime import date

def dumps(toml_dict, table=""):
    document = []
    for key, value in toml_dict.items():
        match value:
            case dict():
                table_key = f"{table}.{key}" if table else key
                document.append(
                    f"\n[{table_key}]\n{dumps(value, table=table_key)}"
                )
            case _:
                document.append(f"{key} = {_dumps_value(value)}")
    return "\n".join(document)

def _dumps_value(value):
    match value:
        case bool():
            return "true" if value else "false"
        case float() | int():
            return str(value)
        case str():
            return f'"{value}"'
        case date():
            return value.isoformat()
        case list():
            return f"[{', '.join(_dumps_value(v) for v in value)}]"
        case _:
            raise TypeError(
                f"{type(value).__name__} {value!r} is not supported"
            )

The dumps() function accepts a dictionary representing a TOML document. It converts the dictionary to a string by looping over the key-value pairs in the dictionary. You’ll have a closer look at the details soon. First, you should check that the code works. Open a REPL and import dumps():

>>> from tomllib_w import dumps
>>> print(dumps({"version": 3.11, "module": "tomllib_w", "stdlib": False}))
version = 3.11
module = "tomllib_w"
stdlib = false

You write a simple dictionary with different types of values. They’re correctly written as TOML types: numbers are plain, strings are surrounded by double quotes, and Booleans are lowercase.

Look back at the code. Most of the serialization to TOML types happens in the helper function, _dumps_value(). It uses structural pattern matching to construct different kinds of TOML strings based on the type of value.

The main dumps() function works with dictionaries. It loops over each key-value pair. If the value is another dictionary, then it constructs a TOML table by adding a table header and then calling itself recursively to handle the key-value pairs inside of the table. If the value isn’t a dictionary, then _dumps_value() is used to correctly convert the key-value pair to TOML.

As noted, this writer doesn’t support the full TOML specification. For example, it doesn’t support all date and time types that are available in TOML, or nested structures like inline or array tables. There are also some edge cases in string handling that aren’t supported. However, it’s enough for many applications.

You can, for example, try to load and then dump the pyproject.toml file that you worked with earlier:

>>> import tomllib
>>> from tomllib_w import dumps
>>> with open("pyproject.toml", mode="rb") as fp:
...     pyproject = tomllib.load(fp)
...
>>> print(dumps(pyproject))

[build-system]
requires = ["flit_core>=3.2.0,<4"]
build-backend = "flit_core.buildapi"

[project]
name = "tomli"
version = "2.0.1"
description = "A lil' TOML parser"
requires-python = ">=3.7"
readme = "README.md"
keywords = ["toml"]

[project.urls]
Homepage = "https://github.com/hukkin/tomli"
PyPI = "https://pypi.org/project/tomli"

Here, you first read pyproject.toml with tomllib. Then you use your own tomllib_w module to write the TOML document back to the console.

You may expand on tomllib_w if you need better support for writing TOML documents. However, in most cases you should rely on one of the existing packages, like tomli_w or tomlkit, instead.

While you’re not getting support for writing TOML files in Python 3.11, the included TOML parser will be useful for many projects. Going forward, you can use TOML for your configuration files, knowing that you’ll have first-class support for reading them in Python.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK