From 3890085b77db7637ca9b48cb7809cf898a26ec1c Mon Sep 17 00:00:00 2001 From: Xe Iaso Date: Sat, 31 Dec 2022 15:35:05 -0500 Subject: the nguh compiler Signed-off-by: Xe Iaso --- blog/formal-grammar-of-h-2019-05-19.markdown | 2 +- blog/h-language-2019-06-30.markdown | 1 + blog/hlang-nguh.markdown | 269 +++++++++++++++++++++++++++ blog/the-origin-of-h-2015-12-14.markdown | 1 + dhall/package.dhall | 5 + 5 files changed, 277 insertions(+), 1 deletion(-) create mode 100644 blog/hlang-nguh.markdown diff --git a/blog/formal-grammar-of-h-2019-05-19.markdown b/blog/formal-grammar-of-h-2019-05-19.markdown index 91c27c1..2756ac2 100644 --- a/blog/formal-grammar-of-h-2019-05-19.markdown +++ b/blog/formal-grammar-of-h-2019-05-19.markdown @@ -1,7 +1,7 @@ --- title: A Formal Grammar of h date: 2019-05-19 -series: conlangs +series: h --- ## Introduction diff --git a/blog/h-language-2019-06-30.markdown b/blog/h-language-2019-06-30.markdown index 1239f77..9641ba1 100644 --- a/blog/h-language-2019-06-30.markdown +++ b/blog/h-language-2019-06-30.markdown @@ -1,6 +1,7 @@ --- title: The h Programming Language date: 2019-06-30 +series: h tags: - wasm - release diff --git a/blog/hlang-nguh.markdown b/blog/hlang-nguh.markdown new file mode 100644 index 0000000..7a26eb1 --- /dev/null +++ b/blog/hlang-nguh.markdown @@ -0,0 +1,269 @@ +--- +title: "The Next-Generation Universal Hlang compiler" +date: 2022-12-31 +series: h +tags: + - hlang + - wasm +vod: + twitch: https://www.twitch.tv/videos/1693936831 + youtube: https://youtu.be/QY1O2n4tOhE +--- + +In a world where simple tasks have hundreds of dependencies and most of them are +not documented, everything falls to chaos. The monolithigarchy dictates that +your build times must be slow so that They (the dependocracy) can win over your +hearts and minds with video games that you play during your compile times. One +person gets mad about their string padding library being used by corporations +without paying and then the entire internet explodes for a few days. This is +unsustainable. + +hlang is the sledgehammer that will break down this complexity and deliver you a +truly uncompromised development experience. + +You can't spell _sledgehammer_ without +_h_! + +If none of this is making any sense, please read [the rest of the +series](https://xeiaso.net/blog/series/h). This will hopefully help something +make sense. + +If you need even more context, check [this +page](https://pkg.go.dev/context) for more information. + +There was one major flaw with hlang in the past though. It was a hollow shell of +itself and had rot to the slains and arrows of time. The playground stopped +working, so people could not understand the sheer might of hlang by playing with +it. + +Lo, behold, a new compiler was born. In this article, I will describe the nguh +compiler and how it revolutionizes the ways that you use hlang for both +professional and personal uses. + +Wait, what, there _were_ professional users +of hlang??? + +Having 2 years of hlang on your resume +will let you get hired by Google! + +## The Old Compiler + +The old compiler was a HACK. The main way it worked was by feeding the program +source code as a string to this [Go template](https://pkg.go.dev/text/template): + +``` +(module + (import "h" "h" (func $h (param i32))) + (func $h_main + (local i32 i32 i32) + (local.set 0 (i32.const 10)) + (local.set 1 (i32.const 104)) + (local.set 2 (i32.const 39)) + {{ range . -}} + {{ if eq . 32 -}} + (call $h (get_local 0)) + {{ end -}} + {{ if eq . 104 -}} + (call $h (get_local 1)) + {{ end -}} + {{ if eq . 39 -}} + (call $h (get_local 2)) + {{ end -}} + {{ end -}} + (call $h (get_local 0)) + ) + (export "h" (func $h_main)) +) +``` + +This template worked by taking the program input _as a string_ and looping over +each character to decide what to do. If it was a space, it would print a +newline. If it was an `h`, it would print `h`. If it was a `'`, it would print a +`'`. Anything else is ignored. + +However, this means that the parser was mostly ignored. And the parser spec +compiles to 117 bytes when gzipped, which means that it can fit on a tshirt. + +That's a savings of 0.8475%! + +Additionally, this would then use the command +[`wat2wasm`](https://developer.mozilla.org/en-US/docs/WebAssembly/Text_format_to_wasm) +to compile it to a WebAssembly file instead of doing it directly. This combined +with the fact that the `get_local` instruction was renamed to `local.get` in the +text format some time in the last 2 years means that not only was my compiler +hacky, it didn't work anymore. + +Apparently that was renamed before WASM +hit 1.0 and the legacy name was an alias they planned to remove. Guess who +didn't get the memo! + +Needless to say, this could be fixed by doing a simple +`s/get_local/local\.get/g` on the source file, but that's not fun. You know +what's really fun? Reverse-engineering a binary file on stream and reassembling +an identical replica in code. That's fun. + +## The nguh compiler + +On December 31st, 2022, I wrote the nguh compiler [on +stream](https://twitch.tv/princessxen). The nguh (nguh gives u hlang or +Next-Generation Universal Hlang compiler, whichever you prefer) compiler outputs +WebAssembly bytecode directly instead of using `wat2wasm` as a middleman. + +This means that hlang has even fewer +dependencies! + +nguh is supposed to be pronounced with the final sound of `-ing` and `uh` +smashed together. It is not phonetically valid in English. It will take some +practice to say it correctly. I'm not sorry. If you can read IPA, it's +pronounced /ŋə/. The name comes from the youtuber [Agma +Schwa](https://www.youtube.com/@AgmaSchwa)'s show about conlangs named /ŋə/. + +To help you understand the architecture of nguh, it will be helpful to get some +context about how WebAssembly files work. + +## How WebAssembly files work + +
+ What is WebAssembly? + +WebAssembly is a standard that specifies a way to run programs on arbitrary +hardware in a sandboxed way. It is used mainly in web browsers to power things +like YouTube's player component, Twitch stream viewing, and by developers any +time they need to put a block of code into a website without having to rewrite +it in JavaScript. + +I'm part of a slowly growing group of developers that want to run WebAssembly +code on the server so that you can take the same `.wasm` file and run it on any +hardware without having to have the source code and a working compiler setup. + +hlang is compiled to WebAssembly for no reason in particular. +
+ +At a high level, a WebAssembly module has a bunch of sections in it. Each +section contains information for things like what functions the module exports, +the types of imported fuctions, how much memory the module needs, what should be +in memory by default, and the function bodies for your code. Here's an annotated +disassembly of a hlang binary: + +``` +0x00, 0x61, 0x73, 0x6d, // \0asm wasm magic number +0x01, 0x00, 0x00, 0x00, // version 1 + +0x01, // type section +0x08, // 8 bytes long +0x02, // 2 entries +0x60, 0x01, 0x7f, 0x00, // function type 0, 1 i32 param, 0 return +0x60, 0x00, 0x00, // function type 1, 0 param, 0 return + +0x02, // import section +0x07, // 7 bytes long +0x01, // 1 entry +0x01, 0x68, // module h +0x01, 0x68, // name h +0x00, // type index +0x00, // function number + +0x03, // func section +0x02, // 2 bytes long +0x01, // function 1 +0x01, // type 1 + +0x07, // export section +0x05, // 5 bytes long +0x01, // 1 entry +0x01, 0x68, // "h" +0x00, 0x01, // function 1 + +0x0a, // code section +0x1b, // 27 bytes long +0x01, // 1 entry +0x19, // 25 bytes long +0x01, // 1 local declaration +0x03, 0x7f, // 3 i32 values - (local i32 i32 i32) +0x41, 0x0a, // i32.const 10 (newline) +0x21, 0x00, // local.set 0 +0x41, 0xe8, 0x00, // i32.const 104 (h) +0x21, 0x01, // local.set 1 +0x41, 0x27, // i32.const 39 (') +0x21, 0x02, // local.set 2 +0x20, 0x01, // local.get 1 push h +0x10, 0x00, // call 0 (putchar) +0x20, 0x00, // local.get 0 push newline +0x10, 0x00, // call 0 (putchar) +0x0b // end of function +``` + +At a high level, nguh just takes all the needed sections and [puts them in the +target +binary](https://github.com/Xe/x/blob/2fe527950512b97a544d2d59539026514ad59544/cmd/hlang/nguh/compile.go#L53). +Most of the sections are copied verbatim from that disassembly I pasted above +because they don't need any modification for the binary to work. + +The exciting part happens when the individual nodes in the hlang syntax tree get +compiled to WebAssembly bytecode. Each node in the tree has maybe its character +to print and maybe a list of child nodes. A syntax tree for hlang could look +like this if it has one character in the program: + +``` +input: h +H("h") +``` + +Or it could look like this if there are multiple characters in the program: + +``` +input: h h h +H{ + "h", + "h", + "h", +} +``` + +This means I need something like this: + +```go +// compile AST to wasm +if len(tree.Kids) == 0 { + if err := compileOneNode(funcBuf, tree); err != nil { + return nil, err + } +} else { + for _, node := range tree.Kids { + if err := compileOneNode(funcBuf, node); err != nil { + return nil, err + } + } +} +``` + +This will either read from the root of the tree or all of the tree's children in +order to compile the entire program. The `compileOneNode` function will turn the +text associated with the node into the correlating WASM bytecode (pushing the +relevant character to the stack and then calling the `h.h` (`putchar`) function). + +Finally it will generate the end of the function including a trailing newline +and end the `.wasm` file. + +Fun fact: the generated binary for a +hlang program that only prints `h` is 69 bytes. + +NICE! + +Here is a base-64 encoded hlang binary in case you find this interesting: + +``` +AGFzbQEAAAABCAJgAX8AYAAAAgcBAWgBaAAAAwIB +AQcFAQFoAAEKHQEbAQN/QQohAEHoACEBQSchAiAB +EAAgABAAAQEL +``` + +--- + +If you want to play with hlang, head to its new home at +[h.within.lgbt](https://h.within.lgbt). If you want to witness things such as +this being created live, follow me [on twitch](https://twitch.tv/princessxen) or +on my VTuber business account at [@xe@vt.social](https://vt.social/@xe). + +Happy new year to those that +celebrate! diff --git a/blog/the-origin-of-h-2015-12-14.markdown b/blog/the-origin-of-h-2015-12-14.markdown index da56849..3ede3d9 100644 --- a/blog/the-origin-of-h-2015-12-14.markdown +++ b/blog/the-origin-of-h-2015-12-14.markdown @@ -1,6 +1,7 @@ --- title: The Origin of h date: 2015-12-14 +series: h --- NOTE: There is a [second part](https://xeiaso.net/blog/formal-grammar-of-h-2019-05-19) to this article now with a formal grammar. diff --git a/dhall/package.dhall b/dhall/package.dhall index fdbcc67..6a4d639 100644 --- a/dhall/package.dhall +++ b/dhall/package.dhall @@ -23,6 +23,11 @@ in Config::{ , title = "Aura" , description = "PonyvilleFM live DJ recording bot" } + , Link::{ + , url = "https://h.within.lgbt" + , title = "The h Programming Language" + , description = "An esoteric programming language that compiles to WebAssembly" + } , Link::{ , url = "https://github.com/Xe/olin" , title = "Olin" -- cgit v1.2.3