tasiaiso.vulpecula.zone/docs/posts/curlpipebash.md

207 lines
8.6 KiB
Markdown
Raw Permalink Normal View History

2024-05-09 12:19:35 +02:00
---
date: 2024-05-08
2024-05-26 22:25:51 +02:00
unlisted: true
2024-05-09 12:19:35 +02:00
---
2024-05-07 13:02:54 +02:00
2024-08-24 16:56:26 +02:00
<!-- h1
h2 term done
h2 asrf done check title
h2 exmp finish xplain
h2 scur check todos
h2 srpt done todo
h3 updt todo -->
2024-05-23 13:50:15 +02:00
# Using curl | bash safely
2024-08-07 22:53:27 +02:00
> The post you're about to read is the result of the research of \[REDACTED\] foxes in a trenchcoat.
2024-05-23 13:50:15 +02:00
> I don't know what I'm doing.
>
2024-08-24 16:56:26 +02:00
> Take everything I say here with a wheelbarrow of salt.
2024-08-07 22:53:27 +02:00
> Do your own research.
> Don't trust *one* person's opinion with the security your infrastructure.
2024-05-23 13:50:15 +02:00
In April of 2024 I wrote [a post](./old-curlpipebash.md) on Fedi explaining that using `curl | bash` was not a security risk.
I based my original argument on the fact that you ultimately have to trust the person that provides you the code.
2024-08-07 22:53:27 +02:00
A bit later, I discussed on the same topic with 2 people in a Matrix channel.
They exposed me to an attack vector that makes using `curl | bash` actually potentially dangerous.
This caused me to do further research on the topic, and ultimately write this post.
2024-05-17 08:47:47 +02:00
2024-05-23 13:50:15 +02:00
<!-- which is true, but *incomplete*. -->
2024-08-07 22:53:27 +02:00
<!-- But is it actually dangerous ? -->
<!-- Is the cake a lie ? -->
<!-- Well, as you could probably imagine, it turns out that the answer actually is, "it depends". -->
<!-- I'll talk about what the actual dangers of using `curl | bash` are, and how we can mitigate them. -->
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
> TL;DR: If you're here because you just want to download software, go for it.
You're *probably* going to be just fine.
If you're interested in information security or want to implement a `curl | bash` script however, please read the rest.
2024-05-18 19:37:54 +02:00
2024-08-07 22:53:27 +02:00
## Terminology
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
- Software artifact: Stuff that comes out of your repository: code, shell scripts, binaries, etc.
In this blog post I will focus on the shell script that installs your binaries more than anything else.
2024-05-07 13:02:54 +02:00
2024-05-18 19:37:54 +02:00
- Signing authority: a server that hosts the artifact's cryptographic hash or signature.
- Artifact provider: a server that serves the artifact directly to us.
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
## Attack surface
2024-05-07 13:02:54 +02:00
We can establish a simplified supply chain for a software artifact:
2024-05-09 12:19:35 +02:00
```text
2024-05-17 08:47:47 +02:00
/----------\ /--------\ /--------\
| Artifact | ------> | Server | ------> | Client |
\----------/ | \--------/ | \--------/
(1) (2) (3) (4) (5)
2024-05-07 13:02:54 +02:00
```
An malicious actor could compromise the supply chain by attacking:
2024-08-07 22:53:27 +02:00
1. The machine the artifact is built on;
2. The connection beteen the artifact builder and the server;
3. The machine the server is served to client by;
4. The connection beteen the server and the client;
5. The client that requests the artifact.
2024-05-07 13:02:54 +02:00
2024-05-17 08:47:47 +02:00
For the purpose of this post however, the attack vectors (1), (2) and (5) are out of scope, which leaves us with only (3) and (4).
2024-05-07 13:02:54 +02:00
> There's not a lot that can be leveraged then ? So I'd imagine using `curl | bash` is safe *most of the time*.
2024-05-17 08:47:47 +02:00
Precisely. *Most of the time*.
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
## An example script
We'll use this script as an example for the rest of this post:
2024-05-07 13:02:54 +02:00
```bash
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
```
2024-08-07 22:53:27 +02:00
This script installs the Determinate Nix installer, an alternative installer for the Nix package manager. Let's break it down a bit:
2024-05-07 13:02:54 +02:00
2024-05-17 08:47:47 +02:00
- `curl`: Call the cUrl commande line utility; This will create a HTTPS request;
2024-05-07 13:02:54 +02:00
- `--proto '=https'`:
- `--tlsv1.2`: Only connect to the server with a secure tunnel (TLS v1.2 or later);
2024-05-17 08:47:47 +02:00
- `-sSf -L`: Do not output progress updates, TODO and folow redirections;
2024-05-07 13:02:54 +02:00
- `https://install.determinate.systems/nix`: The URL that points to an installation script;
- `|`: If `curl` gets the script successfully, pass it on to the next command;
- `sh`: Execute whatever `curl` gets from the server
2024-05-08 17:12:33 +02:00
- `-s`:
- `-- install`:
- ``:
2024-05-07 13:02:54 +02:00
2024-05-17 08:47:47 +02:00
We can see that the script explicitly requires `curl` to use a secure connection.
At first glance, this seeems like a secure way to run the installer.
However, this script does't check that the script you're downloading is what it should be.
If the server is compromised in some way, we could be downloading malware instead.
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
## Securing our infrastructure
2024-05-18 19:37:54 +02:00
We can mitigate this risk by using a method used by most package managers, which is using 2 different servers with different functions:
2024-05-22 15:40:30 +02:00
2024-05-18 19:37:54 +02:00
- One that hosts the artifact's cryptographic hash or signature (here called *signing authority*);
- And another one that serves the artifact directly to us (here called *artifact provider*).
2024-05-17 08:47:47 +02:00
This way, if either server is compromised, the software that's served to the client will not be verified and therefore not run.
2024-05-07 13:02:54 +02:00
2024-05-18 19:37:54 +02:00
We can drastically reduce the risk of getting both machines compromised at once by:
2024-05-08 17:12:33 +02:00
2024-05-07 13:02:54 +02:00
- Having them be controlled by 2 different entities (companies and/or persons);
- Having them be managed by 2 different systems administrators;
- Using different data centers, network routes, domains and SSL certificates;
- Using different operating systems;
- Using different HTTP servers;
- Using different configurations;
This way, the only thing we have to trust is that the artifacts uploaded to the servers are healthy, and that **both** servers are not compromised at once (which should be overwhelmely unlikely if they are separate and different enough).
Now, our infrastructure looks like this:
2024-05-09 12:19:35 +02:00
```text
2024-05-07 13:02:54 +02:00
/-----------\
| Signing |
/-> | authority | --\
2024-05-17 08:47:47 +02:00
/----------\ | \-----------/ \---> /--------\
| Artifact | ---| | Client |
\----------/ | /-----------\ /---> \--------/
2024-05-07 13:02:54 +02:00
\-> | Artifact | --/
| provider |
\-----------/
```
2024-05-17 08:47:47 +02:00
> There are still other parameters that I won't bother bringing into the picture right now, like the SSL certificates provider, and of course, the way the servers get the artifact in the first place (which depends on how your script is written and how and where your software is built).
2024-05-07 13:02:54 +02:00
An example infrastructure would look like this:
- Signing authority
2024-08-05 16:29:28 +02:00
- Managed by John Doe
- Hosted by DigitalOcean (Germany)
- OS: NixOS
- HTTP server: Nginx
- Domain: `determinate.systems`
2024-05-07 13:02:54 +02:00
2024-05-18 19:37:54 +02:00
- Signing authority (alternative)
2024-08-05 16:29:28 +02:00
- Managed by gitea.com
2024-05-23 13:50:15 +02:00
<!-- - Hosted by DigitalOcean (Germany) --> TODO
2024-05-18 19:37:54 +02:00
<!-- - OS: NixOS -->
<!-- - HTTP server: Nginx -->
2024-08-05 16:29:28 +02:00
- Domain: `gitea.com`
2024-05-18 19:37:54 +02:00
2024-05-07 13:02:54 +02:00
- Artifact provider
2024-08-05 16:29:28 +02:00
- Managed by Jane Poe
- Hosted by a worldwide CDN (Hetzner) TODO
- OS: RHEL
- HTTP server: Apache
- Domain: `install-determinate.systems`
2024-05-07 13:02:54 +02:00
2024-05-17 08:47:47 +02:00
> Notice the artifact is now in a different domain (`install-determinate.systems`) and not in a subdomain like it was previously (`install.determinate.systems`).
2024-08-07 22:53:27 +02:00
> That means that both servers need to use 2 very different SSL certificates.
2024-05-07 13:02:54 +02:00
Now, compromising this part of the supply chain has become extremely hard. The attacker will either:
2024-05-08 17:12:33 +02:00
2024-05-18 19:37:54 +02:00
- Need technical knowledge in NixOS, RHEL, Nginx and Apache, as well as compromising an entire CDN (TODO);
2024-08-07 22:53:27 +02:00
- Compromise both of the sysadmin's machines by hacking them or through social engineering;
- TODO
2024-05-07 13:02:54 +02:00
- Use several of the methods listed above.
2024-08-07 22:53:27 +02:00
Now, it would be a lot more feasible to attack another part of the supply chain, which is a subject for another post.
2024-05-07 13:02:54 +02:00
## Implementing curl | bash safely
> You've spent so much time explaining that `curl | bash` is insecure, why would we bother making a secure version of it ?
Because the other way around this is to package your software for every distro and package manager under the sun, which is a task which simply imagining sends shivers down my spine.
2024-08-07 22:53:27 +02:00
<!-- check wording -->
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
Making a shell script that leverages this infrastructure isn't actually that hard.
Most of the work is around creating two resilient and independent servers.
What we have to do is simply to check the artifact provider's response against a hash or a signature provided by the signing authority.
2024-05-07 13:02:54 +02:00
2024-08-07 22:53:27 +02:00
<!-- TODO: host scripts on the blog -->
2024-05-17 08:47:47 +02:00
```bash
2024-08-07 22:53:27 +02:00
# good script
2024-05-17 08:47:47 +02:00
CURL=$(curl --tlsv1.3 https://pastebin.com/raw/Tity9gDQ)
2024-08-07 22:53:27 +02:00
# bad script
2024-05-17 08:47:47 +02:00
# CURL=$(curl --tlsv1.3 https://pastebin.com/raw/xYTmzaMQ)
EXPECTED='caa42ef74ba42d3d097bfcd7c718cd22ca807c1116ce1f86b00ecce9337858d7 -'
ACTUAL=$(echo $CURL | sha256sum)
if [ "$EXPECTED" == "$ACTUAL" ]; then
2024-05-18 19:37:54 +02:00
echo $CURL | bash
2024-05-17 08:47:47 +02:00
else
2024-05-22 15:40:30 +02:00
echo "Checksum mismatch"
2024-05-17 08:47:47 +02:00
fi
```
2024-05-18 19:37:54 +02:00
2024-05-23 13:50:15 +02:00
This can be minified a bit, but it's more readable like that.
2024-05-18 19:37:54 +02:00
### Updating the script
2024-08-07 22:53:27 +02:00
<!-- TODO check howto do that -->
2024-05-18 19:37:54 +02:00
When a new artifact is available, the artifact provider has to start hosting it.
Then, the signing authority needs to get the artifact's hash (dirctly from the source) and then update the way the script is displayed (git repo or website).
Preferably, the artifact provided should include the artifact's version in it's URL and keep hosting non-vulnerable versions, that way the script will still work before the signing authority finishes its work, and after another update is released.