The right thing for the wrong reasons: FLOSS doesn't imply security


While source code is critical for user autonomy, it isn't required to evaluate software security or understand run-time behavior



Onion Details



Page Clicks: 0

First Seen: 03/11/2024

Last Indexed: 10/21/2024

Domain Index Total: 190



Onion Content



Background I find it quite easy to handle views different from my own. I feel more troubled when I see people agree with me for the wrong reasons. It’s no secret that I’m a passionate supporter of software freedom: I’ve written two posts about how Free, Libre, and Open-Source software ( FLOSS ) is necessary but insufficient to preserve user autonomy: Whatsapp and the domestication of users The phenomenon of a class of predatory businesses models I call “user domestication” and defense measures: FLOSS, open platforms, and simplicity. Keeping platforms open How open platforms can lose their openness, and what measures can prevent this. The Web, XMPP, email, and Matrix are examples that highlight both sides of the issue. After two posts spanning over 5000 words, I need to add some nuance. Toggle table of contents Introduction One of the biggest parts of the Free and Open Source Software definitions is the freedom to study a program and modify it; in other words, access to editable source code. I agree that such access is essential; however, far too many people support source availability for the wrong reasons. One such reason is that source code is necessary to have any degree of transparency into how a piece of software operates, and is therefore necessary to determine if it is at all secure or trustworthy. Although security through obscurity is certainly not a robust measure, this claim has two issues: Source code describes what a program is designed to do; it is unnecessary and insufficient to determine if what it actually does aligns with its intended design. Vulnerability discovery doesn’t require source code. I’d like to expand on these issues, focusing primarily on compiled binaries. Bear in mind that I do not think that source availability is useless from a security perspective (it certainly makes audits easier), and I do think that source availability is required for user freedom. I’m arguing only that source unavailability doesn’t imply insecurity , and source availability doesn’t imply security . It’s possible (and often preferable) to perform security analysis on binaries, without necessarily having source code. In fact, vulnerability discovery doesn’t typically rely upon source code analysis. I’ll update this post occasionally as I learn more on the subject. If you like it, check back in a month or two to see if it has something new. PS: this stance is not absolute; I concede to several good counter-arguments in a dedicated section ! How security fixes work I don’t think anyone seriously claims that software’s security instantly improves the second its source code is published. The argument I’m responding to is that source code is necessary to understand what a program does and how (in)secure it is, and without it we can’t know for sure. Assuming a re-write that fundamentally changes a program’s architecture is not an option note 1 , software security typically improves by fixing vulnerabilities via something resembling this process: Someone discovers a vulnerability Developers are informed of the vulnerability Developers reproduce the issue and understand what caused it Developers patch the software to fix the vulnerability Source code is typically helpful (sometimes essential) to Step 3. If someone has completed Step 3, they will require source code to proceed to Step 4. Source code isn’t necessary for Steps 1 and 2 ; these steps rely upon understanding how a program misbehaves. For that, we use reverse engineering and/or fuzzing . Reverse engineering Understanding how a program is designed is not the same as understanding what a program does. A reasonable level of one type of understanding does not imply the other. Source code note 2 is essential to describe a program’s high-level, human-comprehensible design; it represents a contract that outlines how a developer expects a program to behave. A compiler or interpreter note 3 must then translate it into machine instructions. But source code isn’t always easy to map directly to machine instructions because it is part of a complex system: Compilers (sometimes even interpreters) can apply optimizations and hardening measures that are difficult to reason about. This is especially true for JIT compilers that leverage run-time information. The operating system itself may be poorly understood by the developers, and run a program in a way that contradicts a developer’s expectations. Toolchains, interpreters, and operating systems can have bugs that impact program execution. Different compilers and compiler flags can offer different security guarantees and mitigations. Source code can be deceptive by featuring sneaky obfuscation techniques, sometimes unintentionally. Confusing naming patterns, re-definitions, and vulnerabilities masquerading as innocent bugs have all been well-documented: look up “hypocrite commits” or the Underhanded C Contest for examples. All of the above points apply to each dependency and the underlying operating system, which can impact a program’s behavior. Furthermore, all programmers are flawed mortals who don’t always fully understand source code. Everyone who’s done a non-trivial amount of programming is familiar with the feeling of encountering a bug during run-time for which the cause is impossible to find…until they notice it staring them in the face on Line 12. Think of all the bugs that aren’t so easily noticed. Reading the source code, compiling, and passing tests isn’t sufficient to show us a program’s final behavior. The only way to know what a program does when you run it is to…run it. note 4 Special builds Almost all programmers are fully aware of their limited ability, which is why most already employ techniques to analyze run-time behavior that don’t depend on source code. For example, developers of several compiled languages note 5 can build binaries with sanitizers to detect undefined behavior, races, uninitialized reads, etc. that human eyes may have missed when reading source code. While source code is necessary to build these binaries, it isn’t necessary to run them and observe failures. Distributing binaries with sanitizers and debug information to testers is a valid way to collect data about a program’s potential security issues. Dynamic analysis It’s hard to figure out which syscalls and files a large program needs by reading its source, especially when certain libraries (e.g. the libc implementation/version) can vary. A syscall tracer like strace(1) note 6 makes the process trivial. A personal example: the understanding I gained from strace was necessary for me to write my bubblewrap scripts . These scripts use bubblewrap(1) to sandbox programs with the minimum permissions possible. Analyzing every relevant program and library’s source code would have taken me months, while strace gave me everything I needed to know in an afternoon: analyzing the strace output told me exactly which syscalls to allow and which files to grant access to, without even having to know what language the program was written in. I generated the initial version of the syscall allow-lists with the following command note 7 : strace name-of-program-args 2>&1 \ | rg '^([a-z_]*)\(.*' --replace '$1' \ | sort | uniq This also extends to determining how programs utilize the network: packet sniffers like Wireshark can determine when a program connects to the network, and where it connects. These methods are not flawless. Syscall tracers are only designed to shed light on how a program interacts with the kernel. Kernel interactions tell us plenty (it’s sometimes all we need), but they don’t give the whole story. Furthermore, packet inspection can be made a bit painful by transit encryption note 8 ; tracing a program’s execution alongside packet inspection can offer clarity, but this is not easy. For more information, we turn to core dumps , also known as memory dumps. Core dumps share the state of a program during execution or upon crashing, giving us greater visibility into exactly what data a program is processing. Builds containing debugging symbols (e.g. DWARF ) have more detailed core dumps. Vendors that release daily snapshots of pre-release builds typically include some symbols to give testers more detail concerning the causes of crashes. Web browsers are a common example: Chromium dev snapshots, Chrome Canary, Firefox Nightly, WebKit Canary builds, etc. all include debug symbols. Until 2019, Minecraft: Bedrock Edition included debug symbols which were used heavily by the modding community. note 9 Dynamic analysis example: Zoom In 2020, Zoom Video Communications came under scrutiny for marketing its “Zoom” software as a secure, end-to-end encrypted solution for video conferencing. Zoom’s documentation claimed that it used “AES-256” encryption. Without source code, did we have to take the docs at their word? The Citizen Lab didn’t. On 2020-04-03 , it published Move Fast and Roll Your Own Crypto ( application/pdf ) revealing critical flaws in Zoom’s encryption. It utilized Wireshark and mitmproxy to analyze networking activity, and inspected core dumps to learn about its encryption implementation. The Citizen Lab’s researchers found that Zoom actually used an incredibly flawed implementation of a weak version of AES-128 (ECB mode), and easily bypassed it. Syscall tracing, packet sniffing, and core dumps are great, but they rely on manual execution which might not hit all the desired code paths. Fortunately, there are other forms of analysis available. Binary analysis Tracing execution and inspecting memory dumps can be considered forms of reverse engineering, but they only offer a surface-level view of what’s going on. Reverse engineering gets much more interesting when we analyze a binary artifact. Static binary analysis is a powerful way to inspect a program’s underlying design. Decompilation (especially when supplemented with debug symbols) can re-construct a binary’s assembly or source code. Symbol names may look incomprehensible in stripped binaries, and comments will be missing. What’s left is more than enough to decipher control flow to uncover how a program processes data. This process can be tedious, especially if a program uses certain forms of binary obfuscation. The goal doesn’t have to be a complete understanding of a program’s design (incredibly difficult without source code); it’s typically to answer a specific question, fill in a gap left by tracing/fuzzing, or find a well-known property. When developers publish documentation on the security architecture of their closed-source software, reverse engineering tools like decompilers are exactly what you need to verify their honesty (or lack thereof). Decompilers are seldom used alone in this context. Instead, they’re typically a component of reverse engineering frameworks that also sport memory analysis, debugging tools, scripting, and s...