openSUSE Factory Achieves Bit-By-Bit Reproducible Builds - Slashdot
source link: https://linux.slashdot.org/story/24/04/19/2148247/opensuse-factory-achieves-bit-by-bit-reproducible-builds
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
openSUSE Factory Achieves Bit-By-Bit Reproducible Builds
›
What does the build process do that doesn't produce the exact same bits every time?
-
I'm guessing it has to do with "sans the likes of embedded signatures".
-
Compile-time timestamps were a problem for me in the 00s and, apparently, still exist - I recall having to filter timestamps for custom fingerprints. Parallel compilation and data structures that don't guarantee ordering add to the problem and it wouldn't surprise me if some started throwing in UID's into the mix.
-
Many older projects embed things like time stamps or hashes or even a one-up counter to help align binaries with source control for debugging purposes and to give some sort of guaranteed serialization and uniqueness. Many people consider putting the serial build number in the *.*._ slot of the resulting library a best practice. Every place where this was done has to be stamped out.
-
Or perhaps just omitted.
Hashes, PGP/GPG signatures, Git, et. always contain the ENTIRE file. How about we add a special mode where something like "version:[0-9.A-Za-z](1,20)" -- if I've got my RE right -- are ignored during the hash.
This would allow a hash to be carried out on 99.99% of the file to indicate it's contents but still allow some uniqueness.
I also doubt that you could fit a malware attack into ~20 bytes of ASCII code, but that still need to be addresses.
-
-
1. Timestamp's
2. Full paths to the source code location (which helps a debugger find the source code)
3. Many binary formats have space, eg. Bytes 1,2,3 are used and byte 4 is unused in the header. If the space was allocated by malloc() and not zero-initialized, it will have in it whatever junk was left there before
4. Sometimes if an artifact uses other artifacts built from source, vs artifacts fetched from build-cache since they'd been built previously, it might record that
5. Metadata about the build. I mentioned timestamp. Also host name, env vars etc. they're all helpful for diagnosing/reproducing build issues.They're all irritating things put in by well-intentioned people for helpful reasons before it was widely understood that determinism is crucial in build systems.
-
Can you explain why determinism is crucial? Not all systems are the same, nor should they be. I can see an argument that timestamps and source paths should remain embedded. From a security point of view, you won't be able to simply hash the binary file directly, you'll have to know what the file format is so that you can mask out the bits. But so what? Security practices should adapt to the developer needs, not the other way around
-
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK