4

Package namespacing for Python library collection

 3 years ago
source link: https://sourcediving.com/package-namespacing-for-python-library-collection-82088c8be400
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Package namespacing for Python library collection

A practi­cal gui­de on how to manage a collecti­on of code snip­pets as a sin­gle, easy to main­ta­in lib­ra­ry collecti­on. I will uti­li­se Python pac­kage name­spa­cing, managed in a Git mono-repository.

Are you, your team or orga­ni­sati­on suf­fe­ring from con­stant copy­ing of frag­ments of code among pro­jects? Do they diver­ge quick­ly? Are you alrea­dy fin­ding an incre­a­sing need to reu­se snip­pets of code in mul­tiple pla­ces as your project/team/business grows? Then this article is for you. I will pri­ma­ri­ly focus on a mono-repo lib­ra­ry collecti­on but I will pre­sent other tech­no­lo­gies as well. Hope­fully, it will make cho­o­sing the most appro­pri­a­te one for your situati­on easier.

Lib­ra­ry collecti­on (here­af­ter as lib­ra­ry) is a num­ber of code pie­ces, wrap­ped in pac­kages (here­af­ter as sub-pac­kage) that are dis­tri­bu­table indi­vi­du­ally and can be re-used in mul­tiple projects.

TL;DR: If you­’re here just for an exam­ple, have a look at an exam­ple Git repo­si­to­ry.

Typi­cal can­di­da­tes for sub-packaging:

  • Con­stants, sche­mas, com­pa­ny policies
  • Uti­li­ty functi­ons exten­ding libraries
  • Deve­lo­p­ment tools

Motivation

The ulti­ma­te dri­ve behind “lib­ra­ri­fi­cati­on” is lowering the main­te­nan­ce cost, which is affec­ted by seve­ral clo­se­ly rela­ted properties.

Breaking changes

A bre­a­king chan­ge occurs when you make a bac­k­ward incom­pa­ti­ble chan­ge, such as remo­ving or rena­ming a functi­on, a functi­on para­me­ter, a pac­kage or chan­ging a functi­on beha­vi­our. In Python, pac­kages are ver­si­o­ned using and frequent­ly com­ply with Seman­tic ver­si­o­ning. It is advi­sa­ble to use the same for your own library.

To keep the main­te­nan­ce cost as low as possi­ble, you want to redu­ce the num­ber of bre­a­king changes.

Dependencies

The more code your lib­ra­ry accu­mu­la­tes, the more like­ly it is for bre­a­king chan­ges to occur. But not eve­ry bre­a­king chan­ge affects the who­le lib­ra­ry. Let’s say your lib­ra­ry looks like this:

company_utils
├── __init__.py
├── constants.py # only constants without any dependencies
├── logging_utils.py # depends only on the Python logging library
└── flask_utils.py # several utility functions used with the Flask
# web framework and depends on importing it
  • constants.py con­ta­ins only con­stants without any dependencies.
  • logging_utils.py depends only on the Python log­ging library.
  • flask_utils.py con­ta­ins seve­ral uti­li­ty functi­ons used with the Flask web fra­mework and depends on impor­ting it.

If you need to remo­ve an obso­le­te con­stant from company_utils.constants, you need to bump the major num­ber of your lib­ra­ry ver­si­on, such as 1.2.4 -> 2.0.0. This will noti­fy the user of the lib­ra­ry “hey, you need to check what has chan­ged and modi­fy your code”. However, the­re is no need to modi­fy the code if you don't use company_utils.constants. May­be you just use company_utils.logging_utils and the chan­ge is not bre­a­king for you.

In the exam­ple abo­ve I tried to illustra­te how unne­cessa­ry bre­a­king chan­ges incre­a­se the main­te­nan­ce cost.

Domain separation

Multiple repositories

Buil­ding on top of the pre­vi­ous section, it may seem tri­vial to just split the dif­fe­rent doma­ins into sepa­ra­te pac­kages hos­ted in indi­vi­du­al repo­si­to­ries. However, this incre­a­ses the main­te­nan­ce cost again.

Now you need to main­ta­in all deve­lo­p­ment too­ling in mul­tiple repo­si­to­ries. This may inclu­de CI con­fi­gu­rati­on, lin­ter settings, docu­men­tati­on, build scripts etc. The actu­al code can be as small as a sin­gle file. The­re­fo­re, the main­te­nan­ce cost on kee­ping mul­tiple repo­si­to­ries up to date will like­ly outwei­gh any bene­fit gai­ned from the split.

Extras

If mul­tiple pac­kages in indi­vi­du­al repo­si­to­ries are not the answer, what about eve­ry­thing in a sin­gle repo­si­to­ry? Popu­lar pac­ka­ging and dis­tri­bu­ti­on tools, such as setup­tools, Pipe­nv and Poet­ry, allow dec­la­ring “extras” — opti­o­nal fea­tu­res with the­ir own depen­den­cies. You could tre­at you lib­ra­ry as a pac­kage and the sub-pac­kages as extras.

You would install such a lib­ra­ry as:

pip install "company_utils[logging_utils,constants]==2.0.0"

The lib­ra­ry no lon­ger brings a num­ber of unused depen­den­cies. However, this appro­ach has still many of the negatives:

  • The enti­re lib­ra­ry uses a sin­gle version
  • Code is dis­tri­bu­ted even when not used
  • import company_utils.flask_utils will not show any errors in your IDE but will fail on execu­ti­on because the flask depen­den­cy is not installed
  • Nothing pre­vents cross sub-pac­kage dependencies

Package namespacing

Ano­ther opti­on is to use pac­kage name­spa­cing, name­ly the native/implicit name­spa­ce pac­kages as defi­ned in . Both setup­tools and Poet­ry sup­port pac­kage namespacing.

The docu­men­tati­on is vague on how name­spa­cing hel­ps and how to use it for mul­tiple sub-pac­kages. As it turns out, pac­kage name­spa­cing is not designed to work in a sin­gle repo­si­to­ry out of the box. Attempt to do so results in a mono-repo. The key mis­sing infor­mati­on is that each name­spa­ced pac­kage needs its own build script that must live out­si­de of the pac­kage. This is tric­ky in a mono-repo because you can­not easi­ly have mul­tiple setup.py/ pyproject.toml files in the same folder.

File structure examples

setuptools variant, alternative 1:

setup-constants.py       # Each setup-*.py must explicitly
setup-flask_utils.py # include one sub-package
setup-logging_utils.py
company_utils/ # No __init__.py here.
├── constants/ # Sub-packages have __init__.py.
| ├── __init__.py
| └── constants.py
├── flask_utils/
| ├── __init__.py
| └── flask_utils.py
└── logging_utils/
├── __init__.py
└── logging_utils.py

setuptools variant, alternative 2:

company_utils.constants/
├── setup.py # All setup.py differ only in the package name
└── src/
└── company_utils/
└── constants/
├── __init__.py
└── constants.py
company_utils.flask_utils/
├── setup.py
└── src/
└── company_utils/
└── flask_utils/
├── __init__.py
└── flask_utils.py
company_utils.logging_utils/
├── setup.py
└── src/
└── company_utils/
└── logging_utils/
├── __init__.py
└── logging_utils.py

poetry variant:

company_utils.constants/
├── pyproject.toml
└── src/
└── constants/
├── __init__.py
└── constants.py
company_utils.flask_utils/
├── pyproject.toml
└── src/
└── flask_utils/
├── __init__.py
└── flask_utils.py
company_utils.logging_utils/
├── pyproject.toml
└── src/
└── logging_utils/
├── __init__.py
└── logging_utils.py

You can see that having mul­tiple setup or pyproject files is ugly and incre­a­ses main­te­nan­ce cost by intro­du­cing dupli­cati­on. A bet­ter solu­ti­on is sug­ges­ted in the next chapter.

Low maintenance namespacing solution

It’s time to tie toge­ther infor­mati­on from the pre­vi­ous chapters. We are aiming for a solu­ti­on with con­stant main­te­nan­ce cost, inde­pen­dent of the num­ber of sub-pac­kages. The resul­ting solu­ti­on allows ver­si­o­ning and dis­tri­bu­ti­on of its sub-pac­kages inde­pen­dent­ly. Pac­kage name­spa­cing pro­vi­des an easy way to find them and import them.

To sum­ma­ri­ze the appro­ach, we will repla­ce dupli­cati­on with iteration.

Build tools

For buil­ding the sub-pac­kages, we will use setuptools as they offer higher fle­xi­bi­li­ty. setup.py is just a Python script. We will para­me­t­ri­ze it to get rid of the need for mul­tiple files. Addi­ti­o­nally, we will cap­tu­re each sub-pac­kage requi­re­ments in a requirements.txt file. We will also keep ver­si­on of each sub-pac­kage in __version__ of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py file.

File structure:

setup.py
src/
└── company_utils/ # No __init__.py here.
└── constants/ # Sub-packages have __init__.py.
| ├── __init__.py
| ├── constants.py
| └── requirements.txt
└── flask_utils/
| ├── __init__.py
| ├── flask_utils.py
| └── requirements.txt
└── logging_utils/
├── __init__.py
├── logging_utils.py
└── requirements.txt

See example setup.py and requirements.txt files.

We will also need something to build all packages as setup.py builds only one at a time. Personally, I like to automate tasks with PyInvoke. But any Python/Bash/other script will do as well. See an example build task.

With this setup, we can build all packages at once:

company_utils.constants-1.0.0-py3-none-any.whl
company_utils.flask_utils-1.0.0-py3-none-any.whl
company_utils.logging_utils-1.0.0-py3-none-any.whl

and push them to a package registry (PyPI, PackageCloud, etc.).

Local development

You may have noti­ced that having many requirements.txt does­n't make local deve­lo­p­ment deve­lo­per-fri­en­dly. How are you going to install all tho­se requi­re­ments to not have import errors? And how you will keep them up-to-date?

Let us add another automation task to install all the src/<NAMESPACE>/<SUB-PACKAGE>/requirements.txtfiles. This task will either install all available dependencies or dependencies of selected sub-package. You can also see that pipenv is being called. I recommend using Pipenv or Poetry to manage your development dependencies and Virtual Python Environment.

How would this look like in practice? You would use your pipenv sync -d or poetry install for dev dependencies and pipenv run inv install_subpackage_dependencies or poetry run inv install_subpackage_dependencies for sub-package dependencies.

Continuous integration

Ano­ther pro­blem you may have noti­ced is that instal­ling all sub-pac­kage depen­den­cies will pre­vent tests from dis­co­ve­ring import of depen­den­cies from other sub-pac­kages. For exam­ple, if you import flask in company_utils.constants, it will work locally but fail when the lib­ra­ry will be installed. Con­ti­nu­ous inte­grati­on (CI) comes to the rescue! The “cross-import” sce­na­rio should be rare. The­re­fo­re, you can lea­ve it to fail in a CI pipe­li­ne inste­ad and keep a lot of com­ple­xi­ty out of the local deve­lo­p­ment envi­ron­ment. CI will be the qua­li­ty gate.

Hope­fully, the CI solu­ti­on of your cho­ice allows para­me­t­ri­zati­on of jobs (such a Circle­CI Mat­rix Jobs). Each para­me­ter in this case will be the name of a sub-pac­kage. Sin­ce you want to tar­get spe­ci­fic sub-pac­kages, it is also a good idea to split your tests in fol­ders named by the sub-pac­kage. Then the pipe­li­ne could look like:

install pipenvpipenv clean
# run if you cache dependenciespipenv install --dev --deploy
# makes sure Pipfile.lock is up to datepipenv run inv install_subpackage_dependencies --name ${sub_package}
# Only a single sub-package dependencies are now present
# Run any tests you like

Note that there is a slight overhead in the matrix job on checking out the source code and figuring out if a pipeline for each library and Python version combination needs to run. If you have a large number of libraries, you could benefit from running all sub-packages in a loop of a single pipeline and just swapping dependencies with the install_subpackage_dependencies task. You will lose the isolation but gain speed in having only one setup step.

Versioning, semantic releases

As men­ti­o­ned befo­re, ver­si­on num­ber is kept in __version__ of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py file. If you are won­de­ring how to keep a chan­ge­log or auto­ma­te ver­si­o­ning with seman­tic rele­a­ses, I will descri­be it in a futu­re blog post. For now, you can have a look at these resour­ces for inspiration:

When not to choose this approach?

This article suggests a middle ground between individual repositories that are difficult to maintain for tiny libraries and a single large library that contains a lot of unnecessary dependencies. If your library is large or in most cases requires all of its dependencies, I would suggest a traditional single library in a single repository approach. Adding new libraries is easy. So it may be tempting treating a namespaced library collection as a Golden Hammer.

Summary

Main­ta­i­ning a collecti­on of lib­ra­ries can save a lot of deve­lo­p­ment time. However, due to the lack of direct sup­port in all com­mon­ly used build tools, it has also a small upfront cost on deve­lo­ping you own tasks around it. Hope­fully, this article has hel­ped you to see if the invest­ment is worth the poten­tial gains or event imple­ment simi­lar solu­ti­on on your own.

All the exam­ples abo­ve have a wor­king exam­ple in this Git repo­si­to­ry.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK