Entropy

Gotta catch it all

I've been thinking a lot about randomness lately. I'm not sure why, maybe it's a meta-analysis of my own brain. Either way.

It's worth prefacing this by saying that I'm not a cryptographer, nor do I have particularly strong math skills. I usually defer to others for those topics.

Anyways! Entropy! It's around us! Famously, Silicon Graphics and Cloudflare pointed cameras at lava lamps (and various other sources at this point) to gather entropy to then feed into a randomness algorithm. Being able to produce unpredictable streams of bytes is a pretty valuable thing depending on your workload. But also, it's kind of fun to do when your workload doesn't demand it. Especially fun when you can work in seemingly useless implementation details!

I sat down one day and got really fixated on Linux entropy. Did you know you can feed a Linux system's entropy pool by just writing data to /dev/{u}random? Further, did you know that there's actually very little difference between urandom and random, besides its behaviour before the system is satisfied it has enough entropy? Also, they're re-seeded very frequently at boot using data from the entropy pool itself, and then every minute once boot is finished? I highly recommend this talk from the Wireguard creator Jason Donenfeld that goes into modernising the kernel's random.c implemtation, and hints at some future plans. Watching that talk also gives a bit more context to some of the decisions I'll talk about later in this post.

The entropy pool is fed from a number of sources, but it can only consume so many at a kernel level. From a userland level, the ability to "feed" the pool opens up a ton of possibilities. Is it really neccessary? No, not really, but it makes me smile and that's generally enough. It doesn't hurt, anyways. So I sat down and started hacking together a concept: can I use WASM plugins to feed data into my system's entropy pool?

WASM was chosen because I just wanted to experiment with implementing it as a plugin system. I've previously experimented with plugin systems in my projects, including writing various Minecraft Bukkit plugins and add using a full Go interpreter in my Platypus server monitoring project. I understand the theory and some of the practical implementation practices, but what is usually missing is a solid runtime that lets people use whatever language they're most comfortable with. WASM gives us a solid foundation that a lot of languages can build to. So while this was "just" an excuse to play with WASM more, my brain was still very fixated on feeding my entropy pool. Thus, morerandom was born.

Not much searching was required to find Extism, which is purpose built as a plugin system for other programs. The implementation on the "host" side is trivial, although the minor nit I have is that on the plugin side you need to use their library (as an example, I have a 1.wat plugin that just echo's 1. It's a bit more complex than it needs to be). The initial, and current, implementation just called a WASM binary's random() -> FnResult<String> function and returned whatever raw bytes were included in that result. Enough for a proof of concept, but then the gears started turning. I could use my microphone for entropy! My camera! But... those aren't accessible to all my machines... and exposing them to WASM plugins would be tricky and risky...

So the next step was adding microphone and camera supprt to the binary itself. Fairly easy given libraries existed for Rust already, and I could just spit out the raw bytes. Easy! I had essentially replicated lavarand in Rust. Cool! But... but wait, the other machines in my homelab might want to feed that entropy into their own pools... how could I do that? And further, how could I avoid a malicious client from extracting the raw microphone and camera bytes, potentially snooping on me?

And thus the rabbit hole progressed. To facilitate over-the-network entropy fetching, I added gRPC (Google's Remote Procedure Call, although it's not a Google project anymore). For the uninitiated, gRPC is at a high level a way for a codebase to call a function on a remote machine. We define a common schema that the binaries use to encode and decode the results of the function call, and is generally much faster and smaller than using HTTP calls. A small client binary on a server connects to my desktop's morerandom server and calls the get_random() function the server exposes. The client gets bytes that it then prints to the standard output. And we're happy, right?

Not quite. We still have the issue of extracting the raw data. For this, I took a page from Linux's random.c implementation and started looking at the ChaCha20 algorithm. I won't pretend to fully understand the algorith, but my current understanding is that you feed it a seed and it can extrapolate a practically endless stream of encrypted data - or randomness, in our case. For the seed, we use the entropy we've collected ourselves! To keep things simple, we first hash that data using the modern BLAKE3 hashing algorithm, giving it the bytes from the activated plugins, camera and microphone, then using the hash it produces as our entropy seed. I've essentially re-implemented /dev/{u}random, in the sense it ends up as a ChaCha20 stream. Furthering the similarities to the kernel's implementation, the server re-seeds this entropy every minute or so by fetching data from the plugins, microphone and camera, and mixing in a few bytes of data from the existing ChaCha20 stream. Theoretically this means that even if you guess the initial state of the server's plugins, camera and microphone, you'll have to keep controlling those inputs to stay in control of the seed - which isn't impossible, and why I wouldn't recommend this be your only source of entropy - but practically speaking very difficult.

I later added an HTTP server to demonstrate random number and boolean generation, for easy use in other projects if I want (and just to show off a bit, I suppose).

MoreRandom entropy flow diagram

I've iterated over the codebase a few time, cleaning it up and making it more robust, but the core entropy collection flow is pretty much the same. Alongside the server there's also a standalone binary, which does a one-off collection, hash and seed of the entropy before printing 32 bytes of data to stdout.

I run it through xxd if I'm looking at the raw data, otherwise the output is very garbled! Raw bytes are fun.

# One run
00000000: 8d55 7c9f 893f f21a af3b 9436 7928 a8ed  .U|..?...;.6y(..
00000010: 3a55 da08 313d ab65 96a5 9d00 c8ff b40d  :U..1=.e........
# Another one
00000000: 2071 208f ba71 4e0e 248f da65 b701 ac8b   q ..qN.$..e....
00000010: 2c43 b572 72b4 34ca ad42 8b2e 24d2 76c1  ,C.rr.4..B..$.v.

With all that in place, and my homelab servers running a systemd timer that using the morerandom client binary to fetch entropy from my desktop once a day and writing it to their own entropy pool, I consider this project pretty much done! Which is quite satisfying. I think the implementation is pretty robust, but I'm sure I'll find some other things to tinker with. I want to expand the WASM capabilities and available functions, but I don't really have a good reason to at the moment. This project was an awesome learning experience into ChaCha20, BLAKE3 and generally getting a little too obsessive with entropy on Linux.

If you have any feedback, questions etc. feel free to reach out on the fediverse!