15

Bug #915: please help!

 4 years ago
source link: https://nedbatchelder.com/blog/202001/bug_915_please_help.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I just released coverage.py 5.0.3, with two bug fixes. There was another bug I really wanted to fix, but it has stumped me. I’m hoping someone can figure it out.

Bug #915 describes a disk I/O failure. Thanks to some help from Travis support, Chris Caron has provided instructions for reproducing it in Docker, and they work: I can generate disk I/O errors at will. What I can’t figure out is what coverage.py is doing wrong that causes the errors.

To reproduce it, start a Travis-based docker image:

cid=$(docker run -dti --privileged=true --entrypoint=/sbin/init -v /sys/fs/cgroup:/sys/fs/cgroup:ro travisci/ci-sardonyx:packer-1542104228-d128723)
docker exec -it $cid /bin/bash

Then in the container, run these commands:

su - travis
git clone --depth=1 --branch=nedbat/debug-915 https://github.com/nedbat/apprise-api.git
cd apprise-api
source ~/virtualenv/python3.6/bin/activate
pip install tox
tox -e bad,good

This will run two tox environments, called good and bad . Bad will fail with a disk I/O error, good will succeed. The difference is that bad uses the pytest-cov plugin, good does not. Two detailed debug logs will be created: debug-good.txt and debug-bad.txt. They show what operations were executed in the SqliteDb class in coverage.py.

The Big Questions: Why does bad fail? What is it doing at the SQLite level that causes the failure? And most importantly, what can I change in coverage.py to prevent the failure?

Some observations and questions:

  • If I change the last line of the steps to “tox -e good,bad” (that is, run the environments in the other order) then the error doesn’t happen. I don’t understand why that would make a difference.
  • I’ve tried adding time.sleep’s to try to slow the pace of database access, but maybe in not enough places? And if this fixes it, what’s the right way to productize that change?
  • I’ve tried using the detailed debug log to create a small Python program that in theory accesses the SQLite database in exactly the same way, but I haven’t managed to create the error that way. What aspect of access am I overlooking?

If you come up with answers to any of these questions, I will reward you somehow. I am also eager to chat if that would help you solve the mysteries. I can be reached on, Twitter , asnedbat on IRC, or in Slack . Please get in touch if you have any ideas. Thanks.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK