4

A Standard & Complete CI/CD Pipeline for Most Python Projects | Somraj Saha

 11 months ago
source link: https://jarmos.vercel.app/blog/a-standard-ci-cd-pipeline-for-python-projects
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

HomeAbout MeBlog

A Standard and Complete CI/CD Pipeline for Python Projects

Learn to create robust CI/CD pipelines for your Python projects

Have you ever spent ages tinkering with CI/CD tools rather than work on writing code for your Python project? I sure did! There were times Poetry couldn't install dependencies due to virtual environments issues. Or other times, the dependencies wouldn't just cache for some other reasons.

On top of it, some CI/CD tools are difficult to debug due to obscure error messages. Hence, I'm sharing this GitHub Actions workflow which I use with most of my Python projects. It works right out-of-the-box without any tinkering & sets you on the right path to publishing your project. The workflow is very minimal yet doesn't compromise on some of the most major CI/CD principles required for maintaining optimal coding standards. Keeping it minimal also means, you're free to build upon it for further changes & improvements.

That said, here're what you get with this workflow, out-of-the-box without any changes:

  • Linting & code formatting with pylint, Black & isort on all PRs & pushes to the remote repository.
  • Running integrated test suites for catching any breaking changes before merging the PR.
  • Caching dependencies for faster workflow execution times.
  • Uploading coverage reports to CodeCov for following coverage reports.

So, as you can see, the workflow doesn't do much but ensure the bare minimum CI/CD principles are taken care of. And, best of all, you can build upon it as you'll soon see.

About the Workflow

Python's package management scene isn't praiseworthy. And coupled with those packaging issues, due to virtualenv requirements, setting up CI/CD tools are quite complicated as well (on GitHub Actions at least). So, I scourged through the Internet to come up with the most optimal CI/CD setup for Python projects. While Poetry, out-of-the-box is a great CLI tool for local development, it doesn't work well with CI/CD platforms. With Poetry, you can manage local virtualenvs as easily as publishing your project on PyPi right from your terminal!

But that's manual labour and as developers we commit often & push to remote repositories on regular intervals. Repeated manual tasks are subject to mistakes thus increasing the chances of a bug or breaking changes creeping into the project. I wanted to resolve this issue without spending too much time setting up CI/CD tools. The goal was to make the setup as simple & as minimal as possible, yet it should qualify to meet the modern standards of CI/CD principles.

In other words, the setup should be able to perform linting and/or formatting tasks, run the test suites, generate coverage reports & upload the report to CodeCov. And those were the tasks, the setup should have at the minimum. Hence, the principles of minimalism were kept in mind.

I also assume most projects are hosted on GitHub repositories so the setup works ONLY with GitHub Actions. And in case you're looking to use other CI/CD platforms like Travis CI/CircleCI, then you might want to look elsewhere.

That said, you can copy the code snippets shared below in an aptly named <NAME-OF-THE-WORKFLOW>.yml under the .github directory of your project. For example, I usually name the file like test_suite.yml. GitHub can identify your workflow files from there automatically. Once you push your commits to the remote repository, the workflow should initiate then. You can access it at https://github.com/<GITHUB-USERNAME>/<PROJECT-NAME>/actions?query=workflow%3A%22Test+Suite%22.

That said, here's the code snippet for the CI/CD pipeline. Feel free to copy+paste it. 😉



1name: Test Suite 2 3on: [pull_request, push] 4 5jobs: 6 linter: 7 runs-on: ubuntu-latest 8 steps: 9 - name: Check out repository 10 uses: actions/[email protected] 11 12 - name: Set up python 13 uses: actions/[email protected] 14 15 - name: Load cache (if exists) 16 uses: actions/[email protected] 17 with: 18 path: ~/.cache/pip 19 key: ${{ runner.os }}-pip 20 restore-keys: ${{ runner.os }}-pip 21 22 - name: Install Black, Pylint & iSort 23 run: python -m pip install black pylint isort 24 25 - name: Run linters 26 run: | 27 pylint alokka 28 black . 29 isort . 30 31 test: 32 needs: linter 33 strategy: 34 fail-fast: true 35 matrix: 36 os: ["ubuntu-latest", "macos-latest", "windows-latest"] 37 python-version: ["3.8", "3.9", "3.10", "3.11"] 38 defaults: 39 run: 40 shell: bash 41 runs-on: ${{ matrix.os }} 42 steps: 43 - name: Checkout Repository 44 uses: actions/[email protected] 45 46 - name: Set up Python v${{matrix.python-version }} 47 uses: actions/[email protected] 48 with: 49 python-version: ${{ matrix.python-version}} 50 51 - name: Install Poetry 52 uses: snok/[email protected] 53 with: 54 virtualenvs-create: true 55 virtualenvs-in-project: true 56 57 - name: Load Cached Virtualenv 58 id: cached-pip-wheels 59 uses: actions/[email protected] 60 with: 61 path: ~/.cache 62 key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }} 63 64 - name: Install Dependencies 65 run: poetry install --no-interaction --no-root -vvv 66 67 - name: Run Tests 68 run: | 69 source $VENV 70 pytest -vvv --cov-report xml --cov=./ 71 72 - name: Upload Coverage 73 uses: codecov/[email protected] 74 with: 75 token: ${{ secrets.CODECOV_TOKEN }} 76 file: coverage.xml 77 fail_ci_if_error: true

Brief Overview of What the Workflow Does

If you're impatient like me & would like to skim through the article, here's what you should know:

  • The workflow executes on PR & push events. As in when someone makes a PR, the Test Suite workflow will run. The same happens when you push you local commits to the remote repository.
  • The workflow consists of two jobs: linter & test. The latter of which is dependent on the former. So if linter fails, execution of test will be skipped.
  • linter runs on an Ubuntu VM & installs pylint, Black & isort for linting & formatting the code. They're also cached for decreasing the execution times.
  • test runs on a MacOS, an Ubuntu & a Windows VM with Python versions - 3.8, 3.9, 3.10 and 3.11 respectively. Do note, these runs happen in parallel irrespective of each other's execution state.
  • The test job will also cache & install the virtualenv stored under the .venv directory. And then run the test suites with PyTest which generates a coverage.xml report to be uploaded to CodeCov.

So, as you can see, even if the workflow is kept as minimal as possible, it still accomplishes a lot of tasks. In fact, most of these tasks are indispensable for maitaining the minimum quality standards for your projects.

Anyway, with a brief overview of what the workflow does, let's take a deeper look into what each line of code was written for. The next section describes it in as much details as possible.

In-depth Explanation of the Workflow

Right at the top of the file is the name: Test Suite key-value pair. It describes the name of the workflow which GitHub shows in it's web UI. The succeeding line, on: [pull_request, push] pair describes the events that should trigger the workflow.

The jobs: section describes different jobs that should run in parallel (not neccessarily, more on it later). Being a minimalist, this workflow describes two jobs: a linter & a test. The names of the jobs are kept self-descriptive intentionally. As mentioned in the previous section, linter performs linting actions when some code is pushed to the repository or a PR is created. While the test job initiates the array of tests on the code pushed or in a PR.

That said, each job has to be assigned an operating system which is assigned with the runs-on: keyword. While these jobs run in parallel, they can be made dependent on another. Hence, they can also be stopped prior to completion if a dependent job failed earlier for some reasons.

Now for the interesting part. The steps: key describes what/which workflow/commands to execute. So, the linter job executes a git checkout first, then sets up an appropriate version of Python in the succeeding step.

The next couple of steps involves caching dependencies for decreased workflow execution time. The actions/cache GitHub Action loads the dependencies if they've been cached earlier. It also identifies the correct cache with a signed key.

If the dependencies aren't loaded from the cache, then pip installs Black, pylint & isort for linting purposes.

The final step for the linter job is to execute the aforementioned linting & formatting tools. Pylint, Black & isort has sensible defaults, hence they're passed without any additional arguments. While you could replace Pylint with Flake8 but I feel the latter needs some configuration to make the most out of it especially for an open-source project.

And finally coming to the test job. This job mirrors the previous linter job to an extent as you'll see soon enough.

Right off the bat, using the needs: key, the job is said to be dependent on the completion of the linter job. Thus, test willn't execute in parallel but will also not execute if linter fails. The fast fail option is enabled with the fail-fast: true pair.

In addition to the above strategy, this job is set to run on multiple OS platforms with multiple versions of Python. This is set with the matrix: key which has os & python-version has its values. The os & python-version values accept an array each, of the OS & the Python versions respectively.

The next line sets the default shell for the virtual environment the workflow should run.

And as mentioned earlier each workflow has to assigned an OS to run on with the runs-on keyword. The runs-on key for the test job accepts a variable which will iterate through each of the values set under matrix.os. Thus, allowing the workflow to run multiple instances of the ensuing steps for different OSes & Python versions!

The following next two steps is pretty similar to how linter started it's execution process but with a caveat. Based on the values set up under matrix.python-version, each OS instance will have one Python instance as well.

Now, instead of installing pip as was the case in linter, the workflow installs Poetry using the [snok/install-poetry](https://github.com/snok/install-poetry) Action. It configures Poetry to setup virtualenvs inside the project directory which can then easily cached in the next step.

The Cache action caches the whole virtualenv instead of the dependencies. Hence, Poetry installs the dependencies only if the cached .venv wasn't restored.

Following that, the .venv is activated & pytest then runs the test suite. The arguments passed to pytest ensures maximum verbosity for debugging & reporting the output in a .xml file format in the root directory. The generated report then uploads the file to [CodeCov][2] using the [codecov/codecov-action][12].

The CodeCov Action accepts an API token that you'll have to copy & pass in as a [Secret][8] Environment Variable. The CodeCov token can be found at https://codecov.io/gh/<GITHUB-USERNAME>/<PROJECT-NAME> (for projects hosted on GitHub). And finally, at the end, the CodeCov Actions is set to fail if it errors out.

The workflow at it's full glory isn't as minimal as it sounds. It's complexity comes with the fact that production-grade software should be thoroughly tested & formatted by following standards, if your project is open-source. Even then, there's still a lot of room for further improvements & changes. And the next section looks into how you further build upon this workflow.

Room for Further Improvements

As mentioned countless other times, the pipeline is kept minimalistic with an intention: Keep room for further changes and/or improvements.

Tthere're a ton more changes/improvements that can be made as per one's requirements. Some such improvements that I can think of over my head are:

  • Enable a release event wherein the package is tested, formatted, linted, built & then uploaded to PyPi with Poetry.
  • Considering scalability, the linters & code formatters can be run in parrallel instead of the sequential runs.
  • Tag & update a CHANGELOG.md file upon release.

And many more. The possibilities are endless & only limited by the project & individual maintainer's requirements.

But all said & done, the code shared here should suffice for most open-sourced Python projects on GitHub.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK