0

openSUSE Factory Achieves Bit-By-Bit Reproducible Builds - Slashdot

 4 weeks ago
source link: https://linux.slashdot.org/story/24/04/19/2148247/opensuse-factory-achieves-bit-by-bit-reproducible-builds
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

openSUSE Factory Achieves Bit-By-Bit Reproducible Builds

Sign up for the Slashdot newsletter! OR check out the new Slashdot job board to browse remote jobs or jobs in your areaDo you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 20 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
×
Michael Larabel reports via Phoronix: While Fedora 41 in late 2024 is aiming to have more reproducible package builds, openSUSE Factory has already achieved a significant milestone in bit-by-bit reproducible builds. Since last month openSUSE Factory has been producing bit-by-bit reproducible builds sans the likes of embedded signatures. OpenSUSE Tumbleweed packages for that rolling-release distribution are being verified for bit-by-bit reproducible builds. SUSE/openSUSE is still verifying all packages are yielding reproducible builds but so far it's looking like 95% or more of packages are working out. You can learn more via the openSUSE blog.

What does the build process do that doesn't produce the exact same bits every time?

  • Re:

    I'm guessing it has to do with "sans the likes of embedded signatures".

  • Re:

    Compile-time timestamps were a problem for me in the 00s and, apparently, still exist - I recall having to filter timestamps for custom fingerprints. Parallel compilation and data structures that don't guarantee ordering add to the problem and it wouldn't surprise me if some started throwing in UID's into the mix.

  • Re:

    Many older projects embed things like time stamps or hashes or even a one-up counter to help align binaries with source control for debugging purposes and to give some sort of guaranteed serialization and uniqueness. Many people consider putting the serial build number in the *.*._ slot of the resulting library a best practice. Every place where this was done has to be stamped out.
    • Re:

      Or perhaps just omitted.

      Hashes, PGP/GPG signatures, Git, et. always contain the ENTIRE file. How about we add a special mode where something like "version:[0-9.A-Za-z](1,20)" -- if I've got my RE right -- are ignored during the hash.

      This would allow a hash to be carried out on 99.99% of the file to indicate it's contents but still allow some uniqueness.

      I also doubt that you could fit a malware attack into ~20 bytes of ASCII code, but that still need to be addresses.

  • 1. Timestamp's
    2. Full paths to the source code location (which helps a debugger find the source code)
    3. Many binary formats have space, eg. Bytes 1,2,3 are used and byte 4 is unused in the header. If the space was allocated by malloc() and not zero-initialized, it will have in it whatever junk was left there before
    4. Sometimes if an artifact uses other artifacts built from source, vs artifacts fetched from build-cache since they'd been built previously, it might record that
    5. Metadata about the build. I mentioned timestamp. Also host name, env vars etc. they're all helpful for diagnosing/reproducing build issues.

    They're all irritating things put in by well-intentioned people for helpful reasons before it was widely understood that determinism is crucial in build systems.

    • Re:

      Can you explain why determinism is crucial? Not all systems are the same, nor should they be. I can see an argument that timestamps and source paths should remain embedded. From a security point of view, you won't be able to simply hash the binary file directly, you'll have to know what the file format is so that you can mask out the bits. But so what? Security practices should adapt to the developer needs, not the other way around

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK