Kirsle.net logo Kirsle.net

"Just compile it yourself!" and other misguided security suggestions

June 9, 2021 by Noah

On forums like r/privacy people often discuss the role of open source software when it comes to privacy and end-to-end encrypted messaging applications. The general consensus is: a privacy focused app must be open source so that people can get their eyes on the source code and audit it for security vulnerabilities, verify it's doing what it says in the tin and without any secret government backdoors built in that would undermine the security and reveal peoples' private chats.

These are all well and good: if the source code is not open, you can't verify the code isn't doing something sneaky like uploading your encryption keys to the service provider or whatever. But, open source alone isn't a silver bullet to help guarantee the security of the app:

  • Just because the code is readable and somebody could audit it for bugs, doesn't actually mean anybody does. Some vendors of such software may hire security firms to deliberately audit their code, but for random small projects that haven't been formally audited, "open source != automatically secure" -- but still, it is better than closed source where nobody can audit the code.
  • Just because the source code is available doesn't mean the program you download from the App Store is built on exactly the same code. Google Chrome, for example, is built on top of the open source Chromium browser but after Google injects a few proprietary services and features; the Chrome program released by Google has features not found in the Chromium source code. This can be helped by so-called "reproducible builds" and I'll cover that below, but reproducible builds do not come "for free."

In this post I'll address a few common tired things I hear people on r/privacy say in regards to this topic and how it's never quite that simple.

"Just compile it yourself from the open source code!"

Take the Signal messenger app for example: it's an end-to-end encrypted messenger and the client apps are open source (but the server-side code is not). You can audit Signal's app source and verify it's encrypting messages locally on your device and that it handles the encryption keys well, so that even though the server side isn't open source, one can verify that the client app is encrypting and the service provider couldn't get access to your chats if they wanted to.

Signal releases their app on the mobile app store markets, and when you download it from here, you're downloading a pre-built binary program that the company built from the source code, but with binary releases there's always a chance that the binary has something 'extra' added that isn't available in the source.

So to be the most security-conscious, you might download the Signal source code, read through it yourself and audit its security, and then compile your own Android .apk package from the source code you have sitting in front of you. Then you know what code your app is running.

All good and doable in theory, but has many practical problems in practice:

  • Many users don't know how to compile software from source code.
  • And even if they did, keeping on top of updates becomes a chore, as you always need to re-download the latest source, re-compile it yourself.
  • If you use 20 different open source Android apps and you're compiling all of them from source all the time, this problem magnifies massively.

I'll give you a practical example from my life:

In GNU/Linux, most software is open source and compiling software yourself is typically not difficult at all. It's almost always a matter of just getting the source and running "./configure; make; make install" and it's built, whether we're talking about Firefox or the Apache web server program.

One time, I had a web server running a RedHat based Linux distribution, and I had need to build my own Apache web server from sources. Apache has a feature called "suexec" where, if a website includes some PHP or Perl CGI scripts, that Apache would run the script as the user who owns the file, so that any files created by the script would be owned as the user and permissions are all sane.

For example: /home/kirsle/www is where the user kirsle keeps their website, and they installed WordPress or something there; it's better if WordPress's PHP scripts run as kirsle so that config files and uploads are owned by kirsle and are put in kirsle's home directory, rather than the default of all PHP scripts being run as the Apache user and not having write permission to kirsle's home folder and any files it creates are owned by Apache and not by kirsle.

So I wanted Apache's suexec root to be /home instead of the default /var/www, only this is a compile-time option, so you must rebuild Apache from source code and tell it the suexec root at that time. The vendor-provided Apache binary shipped by RedHat was built with the suexec root being /var/www and I could not have user websites under /home with their build of Apache.

So, I compiled it myself.

And then a week later, RedHat pushes software updates and included in those is a bugfix release of Apache; if I simply updated my software, I'd get RedHat's new Apache which would break my shit because RedHat's Apache doesn't use my suexec root. Instead, I'd need to go and fetch the new Apache myself, compile it myself, and do this every single week whenever RedHat pushes a new Apache, and if I ever got lazy or wasn't paying attention, all my websites break!

I eventually left the RedHat ecosystem for Debian, which had an optional package, apache2-suexec-custom which gave Apache a config file to set the suexec root at run time so I didn't need to re-compile it.

Anyway: building Linux apps is easy, but you don't want to maintain your own fork, it was hard enough with just one program. On Android, building apps is significantly more complex than this, you need to download the full Android Studio SDK and run a bunch of steps, and still you'd end up needing to maintain your own builds of your apps forever, keeping on top of updates and rebuilding the app all the time.

If you have 20 open source apps on Android and you want to build them all from source on every update... are you really going to? I doubt it. You'll either end up with grossly out-of-date apps (where the upstream company had released security patches and bugfixes, but you haven't gone and re-compiled all your apps in a while), or else you make "app compilation" into a full time job.

Reproducible Builds, bro!

So to address the problem that "the binary release you get from the App Store might not perfectly reflect the original source code that it allegedly was built from", we have Reproducible Builds which a lot of non-technical users (who are not software developers) tout around as being some kind of magic silver bullet.

The idea is that, with reproducible builds, you can download the source code to a program, build it to a binary, and get exactly the same binary bit-for-bit no matter who built the program. So for Signal's case: you'd download the binary Signal release from the Google Play Store, take a checksum hash of their binary, and then you compile your own Signal from source code, and take a hash of your binary. With Reproducible Builds, the two hashes should be perfectly identical: your copy is bit-for-bit the same as the one Signal released to the app store, giving you the confidence that Signal Co. built their app from the exact same sources you did and there's no room for them sneaking in a change that wasn't represented in the source code.

This is all good, and it does work where it's been implemented, but Reproducible Builds are by no means "free" and they incur significant work by the original software developers of a program to make their builds reproducible.

I'll give you a practical example you can test yourself:

package main

import "fmt"

func main() {
	fmt.Println("Hello world!")
}

This is a "Hello world" program in the Go programming language. Very short and simple, you'd expect that this program compiles to the exact same binary version every time, but no!

# When hello.go was at /home/kirsle/Documents
$ go build -o hello hello.go
$ md5sum hello
db0db0a8828d0768a7862938322fc94f

# Let's move hello.go into a different folder, and try again
$ cp hello.go /home/kirsle/Pictures/
$ cd /home/kirsle/Pictures
$ go build -o hello hello.go
$ md5sum hello
05f11df0b806ad18078c4498af4a9f45

# To be clear, both md5 sums together:
$ md5sum ~/Documents/hello ~/Pictures/hello
05f11df0b806ad18078c4498af4a9f45  /home/kirsle/Pictures/hello
db0db0a8828d0768a7862938322fc94f  /home/kirsle/Documents/hello

How could such a simple program come out to two different binaries, when both were built on the same computer by the same user with the same software environment?

Because the folder the source code is in was different! If I build the binary from ~/Documents a second time, I do get a matching binary back out:

~/Documents$ go build -o hello hello.go
~/Documents$ go build -o hello2 hello.go
~/Documents$ md5sum hello hello2
db0db0a8828d0768a7862938322fc94f  hello
db0db0a8828d0768a7862938322fc94f  hello2

I picked on the Go programming language here because one of the things Go builds in to your binary is the file paths of the source code, but issues like this affect all programming languages. Sometimes the date and time of the build will affect the output; sometimes the version of your C compiler will affect the output; sometimes the fact you built it on Fedora and I built mine on Ubuntu will cause us to get different outputs. By default, software is not reproducible, and you actually need to go pretty far out of your way to design software that is. Seriously: just browse around the Reproducible Builds website and read their guidelines of what problems you encounter and how to get around them.

What you end up doing for Reproducible Builds is defining a very strict build environment, like:

  • Using a very particular Linux distribution (e.g. Ubuntu 21.04)
  • Using very specific versions of all your software (same C compiler version, same versions of all shared libraries that go into your program, etc.)
  • Using a very specific build path: like all code must live in /opt/source and be built from there, as, building code in a user's own home directory can affect the result as seen in the Go example above.
  • Sometimes, using a very specific system clock time: if you can't remove all time-sensitive parts in your code, but the system clock is still making your builds not reproducible, you need to control for the system clock as well.

You end up getting something resembling a full fat virtual machine environment just to pin down all the moving parts. This all requires lots of man hours of effort on the part of the software vendor; if you're Signal Co. and your whole product is about security, and you have a large staff of developers to throw at the problem, you go through this effort. But if you're a random lone developer working on a small hobby project, you don't have the time or energy to do this. Reproducible Builds are not a golden ticket solution to this problem because Reproducible Builds do not come cheap.

And even when an app does offer reproducible builds: how do you, the ordinary consumer, verify to yourself that your source code of Signal built to the same binary as the app store? The version of Signal you built out of your home directory will not be bit-for-bit identical to the Play Store release of the app. To verify the build yourself, you'll need to reproduce exactly the precise dev environment that Signal built theirs under and these are hard steps to follow even if you're a developer yourself and are well versed in all the technologies needed to do so.

And you're going to do that 20 times over for the 20 open source apps you run on Android? Yeah right.

So what can I do?

Start by defining what your threat model actually is: what data of yours do you want to keep secret, and from who do you want to keep it from? If your threat model involves hiding from nation state intelligence angencies, good luck! If they want to get you, they'll get you. But unless you're Edward Snowden this is probably not your threat model.

Do you just want to keep advertisers out of your biz? Keep Google from reading your chats and selling you ads and propaganda? Keep your boss from reading your emails? Keep your friends from going through your phone? For most of these kind of "ordinary" threat models, you're probably fine just downloading Signal from the app store and not worrying too much about the whole thing.

But if you did have a requirement that, absolutely 100% for certain, your chats can NOT leak and you have a very strong requirement around this: then this would be a perfect time to compile your own apps from source code, verify with reproducible builds, audit the source yourself to make sure its security is tight, and so on, and only for the specific apps you need (doing all this work for 20 different apps would be too much, imho).

I'll just tell you about the balance I take with this personally:

  • Running a Linux distribution, most of my software was built by the maintainers of my distro, from open sources, and I trust them to do a good job and to respond quickly to security concerns. Fedora, Debian, Ubuntu etc. always tend to get patches out by the next day whenever security vulnerabilities are discovered.
  • On Android, I prefer to get my apps from the F-Droid open source app store rather than Google Play; for example, Nextcloud is available on both F-Droid and Google Play and I prefer the F-Droid versions. This is because F-Droid builds all apps themselves: developers submit their source code to F-Droid and then F-Droid's maintainers build it, in a model similar to Linux distributions, and the developer would have a more difficult time sneaking in some dirty code when it must be there in the source for F-Droid to see.
  • And I stay mindful about which apps I'm using and their features. I use Telegram as my main messenger, despite the fact that it is not end-to-end encrypted: normal chats are encrypted between you and the server, but on the server they are readable in clear text and the company could read them if they wanted. If I want to discuss something sensitive that I want E2E encryption for, I would use a different app better tailored for that, like Signal. Telegram is "private enough" for my everyday mundane conversations but I'm always conscious that it's not "truly end-to-end secure" and that my chats could leak or be subpoena'd one day.
Tags:

Comments

There are 0 comments on this page. Add yours.

Add a Comment

Used for your Gravatar and optional thread subscription. Privacy policy.
You may format your message using GitHub Flavored Markdown syntax.