Cache Rules Everything Around Me: Finding secrets in Python bytecode
source link: https://blog.jse.li/posts/pyc/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
When you import a Python file for the first time, the Python interpreter will compile it and cache the resulting bytecode in a .pyc
file so that subsequent imports don’t have to deal with the overhead of parsing or compiling again.
It’s also common practice for Python projects to store configuration, keys, and passwords (collectively referred to as “secrets”) in a gitignored Python file called secrets.py
, config.py
, or settings.py
, which other parts of the project import. This provides a nice separation between secrets and source code that gets checked in, and for the most part, this kind of setup works well. And because it reuses the language’s import mechanism, these projects don’t have to fuss around with file I/O or formats like JSON.
But for the same reason that this pattern is fast and convenient, it is also potentially insecure. Because it reuses the language’s import mechanism, which has a habit of creating and caching .pyc
files, those secrets also live in the compiled bytecode! Some initial research using the GitHub API reveals that thousands of GitHub repositories contain secrets hidden inside their bytecode.
Existing tools for finding secrets (my favorite is trufflehog
) in repositories skip over binary files like .pyc
files, instead only scanning plain text files such as source code or configuration files.
A crash-course on cached source
Earlier versions of Python stored these files next to the original source files, but beginning with Python 3.2
, these files all live in a folder called __pycache__
at the root of the imported module. For our purposes, this doesn’t matter.
Suppose we had a Python file containing this secret password:
SECRET_KEY = "Green eggs and ham"
The bytecode corresponding to that line of code looks like this:
0 LOAD_CONST 1 ('Green eggs and ham') 2 STORE_FAST 0 (SECRET_KEY)
Note that the string and variable name are reproduced in their entirety! Further, it turns out that Python bytecode often contains enough information to recover the original structure of the file. Tools like uncompyle6
can translate .pyc
files back into their original forms.
$ uncompyle6 secrets.cpython-38.pyc # uncompyle6 version 3.6.7 # Python bytecode 3.8 (3413) # Decompiled from: Python 3.8.2 (default, Apr 8 2020, 14:31:25) # [GCC 9.3.0] # Embedded file name: secrets.py # Compiled at: 2020-05-12 17:16:29 # Size of source mod 2**32: 34 bytes SECRET_KEY = 'Green eggs and ham' # okay decompiling secrets.cpython-38.pyc
Caching out
To investigate just how widespread this problem was, I wrote a short script to search GitHub for .pyc
files and decompile them to look for secrets. I alerted any organizations that had secrets I discovered this way.
import base64 import io import os import tempfile import uncompyle6 from github import Github GITHUB_KEY = os.environ.get("GITHUB_KEY") g = Github(GITHUB_KEY) items = g.search_code("filename:secrets.pyc") for item in items: print(f"DECOMPILING REPO https://github.com/{item.repository.full_name}") print(f"OWNER TYPE: {item.repository.owner.type}") try: contents = base64.b64decode(item.content) with tempfile.NamedTemporaryFile(suffix=".pyc") as f: f.write(contents) f.seek(0) out = io.StringIO() uncompyle6.decompile_file(f.name, out) out.seek(0) print(out.read()) except Exception as e: print(e) print(f"COULD NOT DECOMPILE REPO https://github.com/{item.repository.full_name}") continue print("\n\n\n")
Try this out yourself!
This post comes with a small capture-the-flag style lab for you to try out this technique yourself.
You can find it at https://github.com/veggiedefender/pyc-secret-lab/
Takeaways
Cached bytecode is a low-level internal performance optimization, which is the kind of thing Python was supposed to free us from having to think about! The contents of .pyc
files are inscrutable without special tools like a disassembler or decompiler. When these files are buried inside __pycache__
(the double underscores
signal “keep out; internal use only”), they’re easy to overlook. Avoiding this requires pretty advanced knowledge of git and python internals.
Actionable things you can do:
-
Look through your repositories for loose
.pyc
files, and delete them -
If you have
.pyc
files and they contain secrets, then revoke and rotate your secrets -
Use a standard gitignore
to prevent checking in
.pyc
files - Use JSON files or environment variables for configuration
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK