aboutsummaryrefslogtreecommitdiff
path: root/blog/anything-message-queue.markdown
blob: 8acb0b3ea7419c73dc7b2f2677d66c22bf7107f7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
---
title: "Anything can be a message queue if you use it wrongly enough"
date: 2023-06-04
tags:
 - aws
 - cursed
 - tuntap
 - satire
---

<div class="warning"><xeblog-conv name="Cadey" mood="coffee"
standalone>Hi, <span id="hnwarning">readers</span>! This post is
satire. Don't treat it as something that is viable for production
workloads. By reading this post you agree to never implement or use
this accursed abomination. This article is released to the public for
educational reasons. Please do not attempt to recreate any of the
absurd acts referenced here.</xeblog-conv></div>

<xeblog-hero ai="Ligne Claire" file="nihao-xiyatu" prompt="1girl, green hair, green eyes, landscape, hoodie, backpack, space needle"></xeblog-hero>

<script>
if (document.referrer.match(/news.ycombinator.com/)) {
  document.getElementById("hnwarning").innerText = "Hacker News users";
}
</script>

You may think that the world is in a state of relative peace. Things
look like they are somewhat stable, but reality couldn't be farther
from the truth. There is an enemy out there that transcends time,
space, logic, reason, and lemon-scented moist towelettes. That enemy
is a scourge of cloud costs that is likely the single reason why
startups die from their cloud bills when they are so young.

The enemy is [Managed NAT
Gateway](https://aws.amazon.com/blogs/aws/new-managed-nat-network-address-translation-gateway-for-aws/).
It is a service that lets you egress traffic from a VPC to the public
internet at $0.07 per gigabyte. This is something that is probably
literally free for them to run but ends up getting a huge chunk of
their customer's cloud spend. Customers don't even look too deep into
this because they just shrug it off as the cost of doing business.

This one service has allowed companies like [the duckbill
group](https://www.duckbillgroup.com/) to make _millions_ by showing
companies how to not spend as much on the cloud.

However, I think I can do one better. What if there was a _better_ way
for your own services? What if there was a way you could reduce that
cost for your own services by up to 700%? What if you could bypass
those pesky network egress costs yet still contact your machines over
normal IP packets?

<xeblog-conv name="Aoi" mood="coffee">Really, if you are trying to
avoid Managed NAT Gateway in production for egress-heavy workloads
(such as webhooks that need to come from a common IP address), you
should be using a [Tailscale](https://www.tailscale.com) [exit
node](https://tailscale.com/kb/1103/exit-nodes/) with a public
IPv4/IPv6 address attached to it. If you also attach this node to the
same VPC as your webhook egress nodes, you can basically recreate
Managed NAT Gateway at home. You also get the added benefit of 
encrypting your traffic further on the wire.<br /><br />This is the
only thing in this article that you can safely copy into your
production workloads.</xeblog-conv>

## Base facts

Before I go into more detail about how this genius creation works,
here's some things to consider:

When AWS launched originally, it had three services:

- [S3](https://en.wikipedia.org/wiki/Amazon_S3) - Object storage for
  cloud-native applications
- [SQS](https://en.wikipedia.org/wiki/Amazon_Simple_Queue_Service) - A
  message queue
- [EC2](https://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud) -
  A way to run Linux virtual machines somewhere
  
Of those foundational services, I'm going to focus the most on S3: the
Simple Storage Service. In essence, S3 is `malloc()` for the cloud.

<xeblog-conv name="Mara" mood="hacker" standalone>If you already know
what S3 is, please click [here](#postcloud) to skip this explanation.
It may be worth revisiting this if you do though!</xeblog-conv>

### The C programming language

When using the C programming language, you normally are working with
memory in the stack. This memory is almost always semi-ephemeral and
all of the contents of the stack are no longer reachable (and
presumably overwritten) when you exit the current function. You can do
many things with this, but it turns out that this isn't very useful in
practice. To work around this (and reliably pass mutable values
between functions), you need to use the
[`malloc()`](https://www.man7.org/linux/man-pages/man3/malloc.3.html)
function. `malloc()` takes in the number of bytes you want to allocate
and returns a pointer to the region of memory that was allocated.

<xeblog-conv name="Aoi" mood="sus">Huh? That seems a bit easy for C.
Can't allocating memory fail when there's no more free memory to
allocate? How do you handle that?</xeblog-conv>
<xeblog-conv name="Mara" mood="happy">Yes, allocating memory can
fail. When it does fail it returns a null pointer and sets the
[errno](https://www.man7.org/linux/man-pages/man3/errno.3.html)
superglobal variable to the constant `ENOMEM`. From here all behavior
is implementation-defined.</xeblog-conv>
<xeblog-conv name="Aoi" mood="coffee">Isn't "implementation-defined"
code for "it'll probably crash"?</xeblog-conv>
<xeblog-conv name="Mara" mood="hacker">In many cases: yes most of the
time it will crash. Hard. Some applications are smart enough to handle
this more gracefully (IE: try to free memory or run a garbage
collection run), but in many cases it doesn't really make more sense
to do anything but crash the program.</xeblog-conv>
<xeblog-conv name="Aoi" mood="facepalm">Oh. Good. Just what I wanted
to hear.</xeblog-conv>

When you get a pointer back from `malloc()`, you can store anything in
there as long as it's the same length as you passed or less.

<xeblog-conv name="Numa" mood="delet" standalone>Fun fact: if you
overwrite the bounds you passed to `malloc()` and anything involved in
the memory you are writing is user input, congradtulations: you just
created a way for a user to either corrupt internal application state
or gain arbitrary code execution. A similar technique is used in
The Legend of Zelda: Ocarina of Time speedruns in order to get
arbitrary code execution via [Stale Reference
Manipulation](https://www.zeldaspeedruns.com/oot/srm/srm-overview).</xeblog-conv>

Oh, also anything stored in that pointer to memory you got back from
`malloc()` is stored in an area of ram called "the heap", which is
moderately slower to access than it is to access the stack.

### S3 in a nutshell

Much in the same way, S3 lets you allocate space for and submit
arbitrary bytes to the cloud, then fetch them back with an address.
It's a lot like the `malloc()` function for the cloud. You can put
bytes there and then refer to them between cloud functions.

<xeblog-conv name="Mara" mood="hacker" standalone>The bytes are stored
in the cloud, which is slightly slower to read from than it would be
to read data out of the heap.</xeblog-conv>

And these arbitrary bytes can be _anything_. S3 is usually used for
hosting static assets (like all of the conversation snippet avatars
that a certain website with an orange background hates), but nothing
is stopping you from using it to host literally anything you want.
Logging things into S3 is so common it's literally a core product
offering from Amazon. Your billing history goes into S3. If you
download your tax returns from WealthSimple, it's probably downloading
the PDF files from S3. VRChat avatar uploads and downloads are done
via S3.

<xeblog-conv name="Mara" mood="happy" standalone>It's like an FTP
server but you don't have to care about running out of disk space on
the FTP server!</xeblog-conv>

### IPv6

You know what else is bytes? [IPv6
packets](https://en.wikipedia.org/wiki/IPv6_packet). When you send an
IPv6 packet to a destination on the internet, the kernel will prepare
and pack a bunch of bytes together to let the destination and
intermediate hops (such as network routers) know where the packet
comes from and where it is destined to go.

Normally, IPv6 packets are handled by the kernel and submitted to a
queue for a hardware device to send out over some link to the
Internet. This works for the majority of networks because they deal
with hardware dedicated for slinging bytes around, or in some cases
shouting them through the air (such as when you use Wi-Fi or a mobile
phone's networking card).

<xeblog-conv name="Aoi" mood="coffee">Wait, did you just say that
Wi-Fi is powered by your devices shouting at eachother?</xeblog-conv>
<xeblog-conv name="Cadey" mood="aha">Yep! Wi-Fi signal strength is
measured in decibels even!</xeblog-conv>
<xeblog-conv name="Numa" mood="delet">Wrong. Wi-Fi is more accurately
_light_, not _sound_. It is much more accurate to say that the devices
are _shining_ at eachother. Wi-Fi is the product of radio waves, which
are the same thing as light (but it's so low frequency that you can't
see it). Boom. Roasted.</xeblog-conv>

### The core Unix philosophy: everything is a file

<span id="postcloud"></span>
There is a way to bypass this and have software control how network
links work, and for that we need to think about Unix conceptually for
a second. In the hardcore Unix philosophical view: everything is a
file. Hard drives and storage devices are files. Process information
is viewable as files. Serial devices are files. This core philosophy
is rooted at the heart of just about everything in Unix and Linux
systems, which makes it a lot easier for applications to be
programmed. The same API can be used for writing to files, tape
drives, serial ports, and network sockets. This makes everything a lot
conceptually simpler and reusing software for new purposes trivial.

<xeblog-conv name="Mara" mood="hacker" standalone>As an example of
this, consider the
[`tar`](https://man7.org/linux/man-pages/man1/tar.1.html) command. The
name `tar` stands for "Tape ARchive". It was a format that was created
for writing backups [to actual magnetic tape
drives](https://en.wikipedia.org/wiki/Tape_drive). Most commonly, it's
used to download source code from GitHub or as an interchange format
for downloading software packages (or other things that need to put
multiple files in one distributable unit).</xeblog-conv>

In Linux, you can create a
[TUN/TAP](https://en.wikipedia.org/wiki/TUN/TAP) device to let
applications control how network or datagram links work. In essence,
it lets you create a file descriptor that you can read packets from
and write packets to. As long as you get the packets to their intended
destination somehow and get any other packets that come back to the
same file descriptor, the implementation isn't relevant. This is how
OpenVPN, ZeroTier, FreeLAN, Tinc, Hamachi, WireGuard and Tailscale
work: they read packets from the kernel, encrypt them, send them to
the destination, decrypt incoming packets, and then write them back
into the kernel.

### In essence

So, putting this all together:

* S3 is `malloc()` for the cloud, allowing you to share arbitrary
  sequences of bytes between consumers.
* IPv6 packets are just bytes like anything else.
* TUN devices let you have arbitrary application code control how
  packets get to network destinations.

In theory, all you'd need to do to save money on your network bills
would be to read packets from the kernel, write them to S3, and then
have another loop read packets from S3 and write those packets back
into the kernel. All you'd need to do is wire things up in the right
way.

So I did just that.

Here's some of my friends' reactions to that list of facts:

- I feel like you've just told me how to build a bomb. I can't belive
  this actually works but also I don't see how it wouldn't. This is
  evil.
- It's like using a warehouse like a container ship. You've put a
  warehouse on wheels.
- I don't know what you even mean by that. That's a storage method.
  Are you using an extremely generous definition of "tunnel"?
- sto psto pstop stopstops
- We play with hypervisors and net traffic often enough that we know
  that this is something someone wouldn't have thought of.
- Wait are you planning to actually *implement and use* ipv6 over
  s3?
- We're paying good money for these shitposts :)
- Is routinely coming up with cursed ideas a requirement for working
  at tailscale?
- That is horrifying. Please stop torturing the packets. This is a
  violation of the Geneva Convention.
- Please seek professional help.

<xeblog-conv name="Cadey" mood="enby" standalone>Before any of you
ask, yes, this was the result of a drunken conversation with [Corey
Quinn](https://twitter.com/quinnypig).</xeblog-conv>

## Hoshino

Hoshino is a system for putting outgoing IPv6 packets into S3 and then
reading incoming IPv6 packets out of S3 in order to avoid the absolute
dreaded scourge of Managed NAT Gateway. It is a travesty of a tool
that does work, if only barely.

The name is a reference to the main character of the anime [Oshi no
Ko](https://en.wikipedia.org/wiki/Oshi_no_Ko), Hoshino Ai. Hoshino is
an absolute genius that works as a pop idol for the group B-Komachi.

Hoshino is a shockingly simple program. It creates a TUN device,
configures the OS networking stack so that programs can use it, and
then starts up two threads to handle reading packets from the kernel
and writing packets into the kernel.

When it starts up, it creates a new TUN device named either `hoshino0`
or an administrator-defined name with a command line flag. This
interface is only intended to forward IPv6 traffic.

Each node derives its IPv6 address from the
[`machine-id`](https://www.man7.org/linux/man-pages/man5/machine-id.5.html)
of the system it's running on. This means that you can somewhat
reliably guarantee that every node on the network has a unique address
that you can easily guess (the provided ULA /64 and then the first
half of the `machine-id` in hex). Future improvements may include
publishing these addresses into DNS via Route 53.

When it configures the OS networking stack with that address, it uses
a [netlink](https://en.wikipedia.org/wiki/Netlink) socket to do this.
Netlink is a Linux-specific socket family type that allows userspace
applications to configure the network stack, communicate to the
kernel, and communicate between processes. Netlink sockets cannot
leave the current host they are connected to, but unlike Unix sockets
which are addressed by filesystem paths, Netlink sockets are addressed
by process ID numbers.

In order to configure the `hoshino0` device with Netlink, Hoshino does
the following things:

- Adds the node's IPv6 address to the `hoshino0` interface
- Enables the `hoshino0` interface to be used by the kernel
- Adds a route to the IPv6 subnet via the `hoshino0` interface

Then it configures the AWS API client and kicks off both of the main
loops that handle reading packets from and writing packets to the
kernel.

When uploading packets to S3, the key for each packet is derived from
the destination IPv6 address (parsed from outgoing packets using the
handy library
[gopacket](https://pkg.go.dev/github.com/google/gopacket)) and the
packet's unique ID (a
[ULID](https://pkg.go.dev/github.com/oklog/ulid/v2) to ensure that
packets are lexicographically sortable, which will be important to
ensure in-order delivery in the other loop).

When packets are processed, they are added to a
[bundle](https://pkg.go.dev/within.website/x/internal/bundler) for
later processing by the kernel. This is relatively boring code and
understanding it is mostly an exercise for the reader. `bundler` is
based on the Google package
[`bundler`](https://pkg.go.dev/google.golang.org/api/support/bundler),
but modified to use generic types because the original
implementation of `bundler` predates them.

### cardio

However, the last major part of understanding the genius at play here
is by the use of [cardio](https://pkg.go.dev/within.website/x/cardio).
Cardio is a utility in Go that lets you have a "heartbeat" for events
that should happen every so often, but also be able to influence the
rate based on need. This lets you speed up the rate if there is more
work to be done (such as when packets are found in S3), and reduce the
rate if there is no more work to be done (such as when no packets are
found in S3).

<xeblog-conv name="Aoi" mood="coffee" standalone>Okay, this is also
probably something that you can use outside of this post, but I
promise there won't be any more of these!</xeblog-conv>

When using cardio, you create the heartbeat channel and signals like
this:

```go
heartbeat, slower, faster := cardio.Heartbeat(ctx, time.Minute, time.Millisecond)
```

The first argument to `cardio.Heartbeat` is a
[`context`](https://pkg.go.dev/context) that lets you cancel the
heartbeat loop. Additionally, if your application uses
[`ln`](https://xeiaso.net/blog/ln-the-natural-logger-2020-10-17)'s
[`opname`](https://pkg.go.dev/within.website/ln/opname) facility, an
[`expvar`](https://pkg.go.dev/expvar) gauge will be created and named
after that operation name.

The next two arguments are the minimum and maximum heart rate. In this
example, the heartbeat would range between once per minute and once
per millisecond.

When you signal the heart rate to speed up, it will double the rate.
When you trigger the heart rate to slow down, it will halve the rate.
This will enable applications to spike up and gradually slow down as
demand changes, much like how the human heart will speed up with
exercise and gradually slow down as you stop exercising.

When the heart rate is too high for the amount of work needed to be
done (such as when the heartbeat is too fast, much like tachycardia in
the human heart), it will automatically back off and signal the heart
rate to slow down (much like I wish would happen to me sometimes).

This is a package that I always wanted to have exist, but never found
the need to write for myself until now.

### Terraform

Like any good recovering SRE, I used
[Terraform](https://www.terraform.io/) to automate creating
[IAM](https://aws.amazon.com/iam/) users and security policies for
each of the nodes on the Hoshino network. This also was used to create
the S3 bucket. Most of the configuration is fairly boring, but I did
run into an issue while creating the policy documents that I feel is
worth pointing out here.

I made the "create a user account and policies for that account" logic
into a Terraform module because that's how you get functions in
Terraform. It looked like this:

```hcl
data "aws_iam_policy_document" "policy" {
  statement {
    actions = [
      "s3:GetObject",
      "s3:PutObject",
      "s3:ListBucket",
    ]
    effect = "Allow"
    resources = [
      var.bucket_arn,
    ]
  }

  statement {
    actions   = ["s3:ListAllMyBuckets"]
    effect    = "Allow"
    resources = ["*"]
  }
}
```

When I tried to use it, things didn't work. I had given it the
permission to write to and read from the bucket, but I was being told
that I don't have permission to do either operation. The reason this
happened is because my statement allowed me to put objects to the
bucket, but not to any path INSIDE the bucket. In order to fix this, I
needed to make my policy statement look like this:

```hcl
statement {
  actions = [
    "s3:GetObject",
    "s3:PutObject",
    "s3:ListBucket",
  ]
  effect = "Allow"
  resources = [
    var.bucket_arn,
    "${var.bucket_arn}/*", # allow every file in the bucket
  ]
}
```

This does let you do a few cool things though, you can use this to
create per-node credentials in IAM that can only write logs to their
part of the bucket in particular. I can easily see how this can be
used to allow you to have infinite flexibility in what you want to do,
but good lord was it inconvenient to find this out the hard way.

Terraform also configured the lifecycle policy for objects in the
bucket to delete them after a day.

```hcl
resource "aws_s3_bucket_lifecycle_configuration" "hoshino" {
  bucket = aws_s3_bucket.hoshino.id

  rule {
    id = "auto-expire"

    filter {}

    expiration {
      days = 1
    }

    status = "Enabled"
  }
}
```

<xeblog-conv name="Cadey" mood="coffee" standalone>If I could, I would
set this to a few hours at most, but the minimum granularity for S3
lifecycle enforcement is in days. In a loving world, this should be a
sign that I am horribly misusing the product and should stop. I did
not stop.</xeblog-conv>

### The horrifying realization that it works

Once everything was implemented and I fixed the last bugs related to
[the efforts to make Tailscale faster than kernel
wireguard](https://tailscale.com/blog/more-throughput/), I tried to
ping something. I set up two virtual machines with
[waifud](https://xeiaso.net/blog/series/waifud) and installed Hoshino.
I configured their AWS credentials and then started it up. Both
machines got IPv6 addresses and they started their loops. Nervously, I
ran a ping command:

```
xe@river-woods:~$ ping fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f
PING fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f(fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f) 56 data bytes
64 bytes from fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f: icmp_seq=1 ttl=64 time=2640 ms
64 bytes from fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f: icmp_seq=2 ttl=64 time=3630 ms
64 bytes from fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f: icmp_seq=3 ttl=64 time=2606 ms
```

It worked. I successfully managed to send ping packets over Amazon S3.
At the time, I was in an airport dealing with the aftermath of [Air
Canada's IT system falling the heck
over](https://www.cbc.ca/news/business/air-canada-outage-1.6861923)
and the sheer feeling of relief I felt was better than drugs.

<xeblog-conv name="Cadey" mood="coffee" standalone>Sometimes I wonder
if I'm an adrenaline junkie for the unique feeling that you get when
your code finally works.</xeblog-conv>

Then I tested TCP. Logically holding, if ping packets work, then TCP
should too. It would be slow, but nothing in theory would stop it. I
decided to test my luck and tried to open the other node's metrics
page:

```
$ curl http://[fd5e:59b8:f71d:9a3e:c05f:7f48:de53:428f]:8081
# skipping expvar "cmdline" (Go type expvar.Func returning []string) with undeclared Prometheus type
go_version{version="go1.20.4"} 1
# TYPE goroutines gauge
goroutines 208
# TYPE heartbeat_hoshino.s3QueueLoop gauge
heartbeat_hoshino.s3QueueLoop 500000000
# TYPE hoshino_bytes_egressed gauge
hoshino_bytes_egressed 3648
# TYPE hoshino_bytes_ingressed gauge