Perseus Cluster Project (Random Notes)

Written by Sawyer Powell - 2024-05-20 Mon 17:41

Initial Brainstorm

I want to be able to easily interface with the system through a shell, and by virtue of that, through my programming language of choice.

I would also like to have easy Emacs integration.

I would like to have a simple way of deploying and managing dev applications with a strong CI/CD pipeline/concepts.

Something based on NIX would be interesting, so I can avoid touching docker with a thirty foot pole

I would also like a way of running applications that can take advantage of the distributed system.

Ability to develop a distributed database (Bigtable esque? would like to write my own database that supports SQL)
Ability to develop a distributed filestore (GFS)
Run Map/Reduce operations on the cluster
Really a strong conceptual foundation and good API for writing and expressing distributed algorithms

It would be cool to have the hardware/network set up such that I can just hook any old computer up, go through a setup process, and have it able to contribute itself to the cluster.

It seems something fundamental to this problem is task delegation? It seems that I'm going to need some sort of master computer, which can delegate tasks out to children computers. Fundamental to this there seems to be a few concepts that need further elaboration:

Task Structure
Delegation
Standardization

Task Structure

A task needs to be a specification of a computation that a computer needs to perform. Computation is fundamentally just transforming data. So, a task should consist of two parts

A program specifying the computation
The input data for that computation

This lends itself towards being a sort of distributed shell keyidea

In this model, the task delegation is the easy part, what's hard is making a distributed stdout

There will need to be a notion of a distributed stdout which means understanding distributed filestores is fundamental keyidea

Programs will need an (optionally) distributed way to access stdin and a default way of writing to a distributed stdout.

An initial thought is that we write this stdout to a distributed filestore

This gives me the impression that I need to investigate more into how GFS works, this could be the first thing that I need to enable the cluster to be capable of.

There needs to be a standard protocol for initializing and communicating completion of a task keyidea

Probably would be a good idea to build this around grpc

Delegation

Using classic web protocols to actually distribute the tasks will be important.

However, within the delegation there needs to be a notion of which computer to delegate to.

In this sense, there needs to be monitoring into how busy the computers actually are, and some scaling involved given how many resources they actually have.

There will likely need to be a scheduler running on the master server keyidea

Standardization

All of the computers should be runnning the same operating system and should have some unified standard procedure of setting them up and connecting them to the network.

The computers could be running NixOs, this way there can be an easily scriptable way to set up and tear down dev environments keyidea

The same computer that operates as the master/scheduler can also be responsible for routing all traffic within the network keyidea

Networking the test setup

Cluster network overview

One raspberry pi will act as the router for the network. Other pis will connect to it through a powered network switch. The routing pi will use its wifi card to connect to the broader network for internet traffic. The user can just plug their laptop into the switch through ethernet to interact with the pis.

Advantages

Allows me to take the cluster pretty much anywhere that has wifi and be able to program and develop on it.

Setting up a raspberry pi as a simple router

Overview

When devices first connect to the network, if they are configured to do so, the first thing that they will do is send a DHCPDISCOVER message to the whole network.

In order to assign that device an IP address, there needs to be a device on that network which can respond to that message with a DHCPOFFER message.

A DHCP Server can be managed with a utility called dnsmasq

Of course, the routing PI cannot automatically assign an ip for itself, therefore the first thing that we need to do is assign it a static ip. Then, we'll set up the DNS and DHCP services, enable ip forwarding (so the device can forward packets between its network interfaces). Finally, we'll set up NAT (network address translation) so the PI can translate internal IP addresses into "parent-network" addresses.

Pro tips

Make sure port 53 is open on the firewall for DNS!
Make sure ports 67 and 68 are open for DHCP!
Make sure port 22 is open for SSH!

Assign a static IP

sudo ifconfig eth0 192.168.1.1 netmask 255.255.255.0

Apply IP Forwarding and set up NAT using `ufw`

Configuring the kernel for forwarding

/etc/sysctl.conf

net.ipv4.ip_forward=1

Then run

sudo sysctl -p

To apply the changes made to sysctl.conf

Configuring the ufw for forwarding & NAT

sudo ufw default allow forward

Configure the ufw rules

/etc/ufw/before.rules

*nat
:POSTROUTING ACCEPT [0:0]
-A POSTROUTING -s 192.168.1.0/24 -o wlan0 -j MASQUERADE
COMMIT

Applies to the nat table which is responsible for network address translation.
-A POSTROUTING appends a rule to the post-routing chain where NAT is handled.
-o wlan0 specifies outgoing network interface. We're forwarding packets from our eth0 interface to our wlan0 interface
-j MASQUERADE hides the private IP addresses of devices on the local network behind the Pi's IP.

sudo ufw enable

If it isn't already enabled

sudo ufw reload

To be super certain that the changes were applied

Configure `dnsmasq`

sudo apt-get install dnsmasq

/etc/dnsmasq.conf

# Ensure that the router can function as a DNS server, working over port 53
port=53
interface=eth0
dhcp-range=192.168.1.50,192.168.1.150,24h
dhcp-authoritative

# Enable logging for DNS queries
log-queries
# Enable logginf for DHCP messages
log-dhcp
# Enable some logging niceties
log-async
log-facility=/var/log/dnsmasq.log

# Optional - enable 1.1.1.1 DNS resolution
server=1.1.1.1

Serves DHCP requests on `eth0`.
Assigns IP addresses from 192.168.1.50 to 192.168.1.150.
The lease is valid for 24 hours (`24h`).
dhcp-authoritative configures the DHCP server to take control of all leases on the server Clients connecting to a server for the first time might have a record of their existing DHCP lease. They'll often broadcast this lease over the network. Authoritative mode configures dnsmasq to take control of these leases and manage them, whether it has a record of them or not.

This avoids long timeouts when devices connect to the network anew. Since they'll wait for a while for a DHCP server to respond to their lease before asking for a new one.

NB dnsmasq requires a daemon to be running for the DHCP server to be active (also look into dhcpd for an alternative to dnsmasq)

Restart/manage the `dnsmasq` daemon through systemd

sudo systemctl restart dnsmasq

Initial Brainstorm
Networking the test setup
- Cluster network overview
- Setting up a raspberry pi as a simple router