Package namespacing for Python library collection
source link: https://sourcediving.com/package-namespacing-for-python-library-collection-82088c8be400
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Package namespacing for Python library collection
A practical guide on how to manage a collection of code snippets as a single, easy to maintain library collection. I will utilise Python package namespacing, managed in a Git mono-repository.
Are you, your team or organisation suffering from constant copying of fragments of code among projects? Do they diverge quickly? Are you already finding an increasing need to reuse snippets of code in multiple places as your project/team/business grows? Then this article is for you. I will primarily focus on a mono-repo library collection but I will present other technologies as well. Hopefully, it will make choosing the most appropriate one for your situation easier.
Library collection (hereafter as library) is a number of code pieces, wrapped in packages (hereafter as sub-package) that are distributable individually and can be re-used in multiple projects.
TL;DR: If you’re here just for an example, have a look at an example Git repository.
Typical candidates for sub-packaging:
- Constants, schemas, company policies
- Utility functions extending libraries
- Development tools
Motivation
The ultimate drive behind “librarification” is lowering the maintenance cost, which is affected by several closely related properties.
Breaking changes
A breaking change occurs when you make a backward incompatible change, such as removing or renaming a function, a function parameter, a package or changing a function behaviour. In Python, packages are versioned using and frequently comply with Semantic versioning. It is advisable to use the same for your own library.
To keep the maintenance cost as low as possible, you want to reduce the number of breaking changes.
Dependencies
The more code your library accumulates, the more likely it is for breaking changes to occur. But not every breaking change affects the whole library. Let’s say your library looks like this:
company_utils
├── __init__.py
├── constants.py # only constants without any dependencies
├── logging_utils.py # depends only on the Python logging library
└── flask_utils.py # several utility functions used with the Flask
# web framework and depends on importing it
constants.py
contains only constants without any dependencies.logging_utils.py
depends only on the Python logging library.flask_utils.py
contains several utility functions used with the Flask web framework and depends on importing it.
If you need to remove an obsolete constant from company_utils.constants
, you need to bump the major number of your library version, such as 1.2.4
-> 2.0.0
. This will notify the user of the library “hey, you need to check what has changed and modify your code”. However, there is no need to modify the code if you don't use company_utils.constants
. Maybe you just use company_utils.logging_utils
and the change is not breaking for you.
In the example above I tried to illustrate how unnecessary breaking changes increase the maintenance cost.
Domain separation
Multiple repositories
Building on top of the previous section, it may seem trivial to just split the different domains into separate packages hosted in individual repositories. However, this increases the maintenance cost again.
Now you need to maintain all development tooling in multiple repositories. This may include CI configuration, linter settings, documentation, build scripts etc. The actual code can be as small as a single file. Therefore, the maintenance cost on keeping multiple repositories up to date will likely outweigh any benefit gained from the split.
Extras
If multiple packages in individual repositories are not the answer, what about everything in a single repository? Popular packaging and distribution tools, such as setuptools, Pipenv and Poetry, allow declaring “extras” — optional features with their own dependencies. You could treat you library as a package and the sub-packages as extras.
You would install such a library as:
pip install "company_utils[logging_utils,constants]==2.0.0"
The library no longer brings a number of unused dependencies. However, this approach has still many of the negatives:
- The entire library uses a single version
- Code is distributed even when not used
import company_utils.flask_utils
will not show any errors in your IDE but will fail on execution because theflask
dependency is not installed- Nothing prevents cross sub-package dependencies
Package namespacing
Another option is to use package namespacing, namely the native/implicit namespace packages as defined in . Both setuptools and Poetry support package namespacing.
The documentation is vague on how namespacing helps and how to use it for multiple sub-packages. As it turns out, package namespacing is not designed to work in a single repository out of the box. Attempt to do so results in a mono-repo. The key missing information is that each namespaced package needs its own build script that must live outside of the package. This is tricky in a mono-repo because you cannot easily have multiple setup.py
/ pyproject.toml
files in the same folder.
File structure examples
setuptools
variant, alternative 1:
setup-constants.py # Each setup-*.py must explicitly
setup-flask_utils.py # include one sub-package
setup-logging_utils.py
company_utils/ # No __init__.py here.
├── constants/ # Sub-packages have __init__.py.
| ├── __init__.py
| └── constants.py
├── flask_utils/
| ├── __init__.py
| └── flask_utils.py
└── logging_utils/
├── __init__.py
└── logging_utils.py
setuptools
variant, alternative 2:
company_utils.constants/
├── setup.py # All setup.py differ only in the package name
└── src/
└── company_utils/
└── constants/
├── __init__.py
└── constants.py
company_utils.flask_utils/
├── setup.py
└── src/
└── company_utils/
└── flask_utils/
├── __init__.py
└── flask_utils.py
company_utils.logging_utils/
├── setup.py
└── src/
└── company_utils/
└── logging_utils/
├── __init__.py
└── logging_utils.py
poetry
variant:
company_utils.constants/
├── pyproject.toml
└── src/
└── constants/
├── __init__.py
└── constants.py
company_utils.flask_utils/
├── pyproject.toml
└── src/
└── flask_utils/
├── __init__.py
└── flask_utils.py
company_utils.logging_utils/
├── pyproject.toml
└── src/
└── logging_utils/
├── __init__.py
└── logging_utils.py
You can see that having multiple setup
or pyproject
files is ugly and increases maintenance cost by introducing duplication. A better solution is suggested in the next chapter.
Low maintenance namespacing solution
It’s time to tie together information from the previous chapters. We are aiming for a solution with constant maintenance cost, independent of the number of sub-packages. The resulting solution allows versioning and distribution of its sub-packages independently. Package namespacing provides an easy way to find them and import them.
To summarize the approach, we will replace duplication with iteration.
Build tools
For building the sub-packages, we will use setuptools
as they offer higher flexibility. setup.py
is just a Python script. We will parametrize it to get rid of the need for multiple files. Additionally, we will capture each sub-package requirements in a requirements.txt
file. We will also keep version of each sub-package in __version__
of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py
file.
File structure:
setup.py
src/
└── company_utils/ # No __init__.py here.
└── constants/ # Sub-packages have __init__.py.
| ├── __init__.py
| ├── constants.py
| └── requirements.txt
└── flask_utils/
| ├── __init__.py
| ├── flask_utils.py
| └── requirements.txt
└── logging_utils/
├── __init__.py
├── logging_utils.py
└── requirements.txt
See example setup.py
and requirements.txt
files.
We will also need something to build all packages as setup.py
builds only one at a time. Personally, I like to automate tasks with PyInvoke. But any Python/Bash/other script will do as well. See an example build task.
With this setup, we can build all packages at once:
company_utils.constants-1.0.0-py3-none-any.whl
company_utils.flask_utils-1.0.0-py3-none-any.whl
company_utils.logging_utils-1.0.0-py3-none-any.whl
and push them to a package registry (PyPI, PackageCloud, etc.).
Local development
You may have noticed that having many requirements.txt
doesn't make local development developer-friendly. How are you going to install all those requirements to not have import errors? And how you will keep them up-to-date?
Let us add another automation task to install all the src/<NAMESPACE>/<SUB-PACKAGE>/requirements.txt
files. This task will either install all available dependencies or dependencies of selected sub-package. You can also see that pipenv
is being called. I recommend using Pipenv or Poetry to manage your development dependencies and Virtual Python Environment.
How would this look like in practice? You would use your pipenv sync -d
or poetry install
for dev dependencies and pipenv run inv install_subpackage_dependencies
or poetry run inv install_subpackage_dependencies
for sub-package dependencies.
Continuous integration
Another problem you may have noticed is that installing all sub-package dependencies will prevent tests from discovering import of dependencies from other sub-packages. For example, if you import flask
in company_utils.constants
, it will work locally but fail when the library will be installed. Continuous integration (CI) comes to the rescue! The “cross-import” scenario should be rare. Therefore, you can leave it to fail in a CI pipeline instead and keep a lot of complexity out of the local development environment. CI will be the quality gate.
Hopefully, the CI solution of your choice allows parametrization of jobs (such a CircleCI Matrix Jobs). Each parameter in this case will be the name of a sub-package. Since you want to target specific sub-packages, it is also a good idea to split your tests in folders named by the sub-package. Then the pipeline could look like:
install pipenvpipenv clean
# run if you cache dependenciespipenv install --dev --deploy
# makes sure Pipfile.lock is up to datepipenv run inv install_subpackage_dependencies --name ${sub_package}
# Only a single sub-package dependencies are now present
# Run any tests you like
Note that there is a slight overhead in the matrix job on checking out the source code and figuring out if a pipeline for each library and Python version combination needs to run. If you have a large number of libraries, you could benefit from running all sub-packages in a loop of a single pipeline and just swapping dependencies with the install_subpackage_dependencies
task. You will lose the isolation but gain speed in having only one setup step.
Versioning, semantic releases
As mentioned before, version number is kept in __version__
of each src/<NAMESPACE>/<SUB-PACKAGE>/__init__.py
file. If you are wondering how to keep a changelog or automate versioning with semantic releases, I will describe it in a future blog post. For now, you can have a look at these resources for inspiration:
- Lerna: A JavaScript tool for managing projects with multiple packages
- This example change log of a mono-repo project using Lerna
When not to choose this approach?
This article suggests a middle ground between individual repositories that are difficult to maintain for tiny libraries and a single large library that contains a lot of unnecessary dependencies. If your library is large or in most cases requires all of its dependencies, I would suggest a traditional single library in a single repository approach. Adding new libraries is easy. So it may be tempting treating a namespaced library collection as a Golden Hammer.
Summary
Maintaining a collection of libraries can save a lot of development time. However, due to the lack of direct support in all commonly used build tools, it has also a small upfront cost on developing you own tasks around it. Hopefully, this article has helped you to see if the investment is worth the potential gains or event implement similar solution on your own.
All the examples above have a working example in this Git repository.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK