diff options
| -rw-r--r-- | blog/aegis-prometheus-2021-04-05.markdown | 280 |
1 files changed, 280 insertions, 0 deletions
diff --git a/blog/aegis-prometheus-2021-04-05.markdown b/blog/aegis-prometheus-2021-04-05.markdown new file mode 100644 index 0000000..2b26e82 --- /dev/null +++ b/blog/aegis-prometheus-2021-04-05.markdown @@ -0,0 +1,280 @@ +--- +title: Prometheus and Aegis +date: 2021-04-05 +tags: + - prometheus + - o11y +--- + +# Prometheus and Aegis + +[*Last time on the christine dot website cinematic +universe:*](https://christine.website/blog/unix-domain-sockets-2021-04-01) + +*Unix sockets started to be used to grace the cluster. Things were at peace. +Then, a realization came through:* + +[What about Prometheus? Doesn't it need a direct line of fire to the service to +scrape metrics?](conversation://Mara/hmm?smol) + +*This could not do! Without observability the people of the Discord wouldn't have +a livefeed of the infrastructure falling over! This cannot stand! Look, our hero +takes action!* + +[It will soon!](conversation://Cadey/percussive-maintenance?smol) + +In order to help keep an eye on all of the services I run, I use +[Prometheus](https://prometheus.io/) for collecting metrics. For an example of +the kind of metrics I collect, see [here (1)](/metrics). In the configuration +that I have, Prometheus runs on a server in my apartment and reaches out to my +other machines to scrape metrics over the network. This worked great when I had +my major services listen over TCP, I could just point Prometheus at the backend +port over my tunnel. + +When I started using Unix sockets for hosting my services, this stopped working. +It became very clear very quickly that I needed some kind of shim. This shim +needed to do the following things: + +- Listen over the network as a HTTP server +- Connect to the unix sockets for relevant services based on the path (eg. + `/xesite` should get the metrics from `/srv/within/run/xesite.sock`) +- Do nothing else + +The Go standard library has a tool for doing reverse proxying in the standard +library: +[`net/http/httputil#ReverseProxy`](https://pkg.go.dev/net/http/httputil#ReverseProxy). +Maybe we could build something with this? + +[The documentation seems to imply it will use the network by default. Wait, +what's this `Transport` field?](conversation://Mara/hmm?smol) + +```go +type ReverseProxy struct { + // ... + + // The transport used to perform proxy requests. + // If nil, http.DefaultTransport is used. + Transport http.RoundTripper + + // ... +} +``` + +[So a transport is a <a +href="https://pkg.go.dev/net/http#RoundTripper">`RoundTripper`</a>, which is a +function that takes a request and returns a response somehow. It uses +`http.DefaultTransport` by default, which reads from the network. So at a +minimum we're gonna need: <ul><li>a `ReverseProxy`</li><li>a +`Transport`</li><li>a dialing function</li><ul>Right?](conversation://Mara/hmm?smol) + +Yep! Unix sockets can be used like normal sockets, so all you need is something +like this: + +```go +func proxyToUnixSocket(w http.ResponseWriter, r *http.Request) { + name := path.Base(r.URL.Path) + + fname := filepath.Join(*sockdir, name+".sock") + _, err := os.Stat(fname) + if os.IsNotExist(err) { + http.NotFound(w, r) + return + } + + ts := &http.Transport{ + Dial: func(_, _ string) (net.Conn, error) { + return net.Dial("unix", fname) + }, + DisableKeepAlives: true, + } + + rp := httputil.ReverseProxy{ + Director: func(req *http.Request) { + req.URL.Scheme = "http" + req.URL.Host = "aegis" + req.URL.Path = "/metrics" + req.URL.RawPath = "/metrics" + }, + Transport: ts, + } + rp.ServeHTTP(w, r) +} +``` + +[So in this handler:](conversation://Mara/hmm?smol) + +```go +name := path.Base(r.URL.Path) + +fname := filepath.Join(*sockdir, name+".sock") +_, err := os.Stat(fname) +if os.IsNotExist(err) { + http.NotFound(w, r) + return +} + +ts := &http.Transport{ + Dial: func(_, _ string) (net.Conn, error) { + return net.Dial("unix", fname) + }, + DisableKeepAlives: true, +} +``` + +[You have the socket path built from the URL path, and then you return +connections to that path ignoring what the HTTP stack thinks it should point +to?](conversation://Mara/hmm?smol) + +Yep. Then the rest is really just boilerplate: + +```go +package main + +import ( + "flag" + "log" + "net" + "net/http" + "net/http/httputil" + "os" + "path" + "path/filepath" +) + +var ( + hostport = flag.String("hostport", "[::]:31337", "TCP host:port to listen on") + sockdir = flag.String("sockdir", "./run", "directory full of unix sockets to monitor") +) + +func main() { + flag.Parse() + + log.SetFlags(0) + log.Printf("%s -> %s", *hostport, *sockdir) + + http.DefaultServeMux.HandleFunc("/", proxyToUnixSocket) + + log.Fatal(http.ListenAndServe(*hostport, nil)) +} +``` + +Now all that's needed is to build a NixOS service out of this: + +```nix +{ config, lib, pkgs, ... }: +let cfg = config.within.services.aegis; +in +with lib; { + # Mara\ this describes all of the configuration options for Aegis. + options.within.services.aegis = { + enable = mkEnableOption "Activates Aegis (unix socket prometheus proxy)"; + + # Mara\ This is the IPv6 host:port that the service should listen on. + # It's IPv6 because this is $CURRENT_YEAR. + hostport = mkOption { + type = types.str; + default = "[::1]:31337"; + description = "The host:port that aegis should listen for traffic on"; + }; + + # Mara\ This is the folder full of unix sockets. In the previous post we + # mentioned that the sockets should go somewhere like /tmp, however this + # may be a poor life decision: + # https://lobste.rs/s/fqqsct/unix_domain_sockets_for_serving_http#c_g4ljpf + sockdir = mkOption { + type = types.str; + default = "/srv/within/run"; + example = "/srv/within/run"; + description = + "The folder that aegis will read from"; + }; + }; + + # Mara\ The configuration that will arise from this module if it's enabled + config = mkIf cfg.enable { + # Mara\ Aegis has its own user account to keep things tidy. It doesn't need + # root to run so we don't give it root. + users.users.aegis = { + createHome = true; + description = "tulpa.dev/cadey/aegis"; + isSystemUser = true; + group = "within"; + home = "/srv/within/aegis"; + }; + + # Mara\ The systemd service that actually runs Aegis. + systemd.services.aegis = { + wantedBy = [ "multi-user.target" ]; + + # Mara\ These correlate to the [Service] block in the systemd unit. + serviceConfig = { + User = "aegis"; + Group = "within"; + Restart = "on-failure"; + WorkingDirectory = "/srv/within/aegis"; + RestartSec = "30s"; + }; + + # Mara\ When the service starts up, run this script. + script = let aegis = pkgs.tulpa.dev.cadey.aegis; + in '' + exec ${aegis}/bin/aegis -sockdir="${cfg.sockdir}" -hostport="${cfg.hostport}" + ''; + }; + }; +} +``` + +[Then I just flicked it on for a server of mine:](conversation://Cadey/enby?smol) + +```nix +within.services.aegis = { + enable = true; + hostport = "[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705"; + sockdir = "/srv/within/run"; +}; +``` + +[And then test it with `curl`:](conversation://Cadey/enby?smol) + +```console +$ curl http://[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705/printerfacts +# HELP printerfacts_hits Number of hits to various pages +# TYPE printerfacts_hits counter +printerfacts_hits{page="fact"} 15 +printerfacts_hits{page="index"} 23 +printerfacts_hits{page="not_found"} 17 +# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. +# TYPE process_cpu_seconds_total counter +process_cpu_seconds_total 0.06 +# HELP process_max_fds Maximum number of open file descriptors. +# TYPE process_max_fds gauge +process_max_fds 1024 +# HELP process_open_fds Number of open file descriptors. +# TYPE process_open_fds gauge +process_open_fds 12 +# HELP process_resident_memory_bytes Resident memory size in bytes. +# TYPE process_resident_memory_bytes gauge +process_resident_memory_bytes 5296128 +# HELP process_start_time_seconds Start time of the process since unix epoch in seconds. +# TYPE process_start_time_seconds gauge +process_start_time_seconds 1617458164.36 +# HELP process_virtual_memory_bytes Virtual memory size in bytes. +# TYPE process_virtual_memory_bytes gauge +process_virtual_memory_bytes 911777792 +``` + +[And there you go! Now we can make Prometheus point to this and we can save +Christmas!](conversation://Cadey/aha?smol) + +[:D](conversation://Mara/happy?smol) + +--- + +This is another experiment in writing these kinds of posts in more of a Socratic +method. I'm trying to strike a balance with a [limited pool of +stickers](https://tulpa.dev/cadey/kadis-layouts/src/branch/master/moonlander/leader.c#L68-L84) +while I wait for more stickers/emoji to come in. [Feedback](/contact) is always welcome. + +(1): These metrics are not perfect because of the level of caching that +Cloudflare does for me. |
