User Tools

Site Tools


nomad:csi

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nomad:csi [2021/01/01 23:26] (current)
ben created
Line 1: Line 1:
 +====== Overview ======
 +
 +[[https://github.com/democratic-csi/democratic-csi|democratic-csi]] is a lightweight CSI provider using OpenZFS to store persistent data, and NFS to expose it to nomad jobs.
 +
 +===== Nomad implementation =====
 +
 +  * [[https://github.com/democratic-csi/democratic-csi/blob/master/docs/nomad.md|Upstream docs]]
 +  * [[https://github.com/democratic-csi/democratic-csi/issues/40|Implementation]]
 +
 +====== Installation ======
 +
 +The CSI plugin is run as nomad jobs with:
 +
 +  * two instances in controller mode as a service job
 +  * one instance per node, running in node mode, as a system job
 +
 +The controllers are responsible for managing the volumes, and the nodes are responsible for mounting the volumes onto the nomad clients prior to starting a job which wishes to use them.
 +
 +The job definitions, including the configurations live [[https://gitlab.sihnon.net/ben/nomad-jobs/-/tree/master/democratic-csi|nomad-jobs/democratic-csi]]
 +
 +====== Day to day tasks ======
 +
 +===== Creating a volume =====
 +
 +Nomad can't provision new volumes itself yet, they must be created manually. This requires ''csc''. To install it, run: <code bash>
 +GO111MODULE=off go get -u github.com/rexray/gocsi/csc
 +</code>
 +
 +To create a new 100MB volume named ''traefik-acme'', run: <code bash>
 +~/go/bin/csc -e tcp://democratic-csi.service.consul.sihnon.net:9000 controller create-volume --req-bytes 104857600 traefik-acme
 +# "traefik-acme"  104857600       "node_attach_driver"="nfs"      "provisioner_driver"="zfs-generic-nfs"  "server"="kowlan.jellybean.sihnon.net"  "share"="/pool2/democratic/root/traefik-acme"
 +</code>
 +
 +===== Registering the volume with Nomad =====
 +
 +Create a hcl volume definition file with contents similar to: <code bash vol-acme.json>
 +id = "traefik-acme"
 +name = "traefik-acme"
 +type = "csi"
 +external_id = "traefik-acme"
 +plugin_id = "kowlan"
 +access_mode = "single-node-writer"
 +attachment_mode = "file-system"
 +mount_options {
 +    fs_type = "nfs"
 +    mount_flags = ["nolock"]
 +}
 +context {
 +    node_attach_driver = "nfs"
 +    provisioner_driver = "zfs-generic-nfs"
 +    server = "kowlan.jellybean.sihnon.net"
 +    share = "/pool2/democratic/root/traefik-acme"
 +}
 +</code>
 +
 +Register this volume with nomad using: <code bash>
 +nomad volume register vol-acme.json
 +</code>
 +
 +===== Making changes to a volume definition =====
 +
 +If it's necessary to make changes to the volume definition, it must be unregistered and reregistered with the new options: <code bash>
 +nomad volume deregister traefik-acme
 +nomad volume register vol-acme.json
 +</code>
 +
 +The volume must not be in use for it to be deregisterable. If a job failed, nomad might not properly record that the allocation is no longer using the volume, in which case it can be forceably deregistered with: <code bash>
 +nomad volume deregister -force traefik-acme
 +</code>
 +
 +===== Resizing a volume =====
 +
 +''csc'' is supposed to be able to resize a volume using: <code bash>
 +~/go/bin/csc -e tcp://democratic-csi.service.consul.sihnon.net:9000 controller expand-volume --req-bytes 209715200 traefik-acme
 +</code>
 +
 +This segfaulted for me. However, all it's doing is calling ''zfs set refquota'' on the volume, and if this is done manually, it notices the change in size correctly.
 +
 +====== Notes ======
 +
 +  * The controller creates volumes by running ''zfs create'' commands via ssh to the fileserver host
 +  * It uses a root ssh key for this, which is stored in vault, and made available to the controller by nomad
 +  * It might be possible to reduce the permissions required, by creating a dedicated user account for this, and delegating zfs permissions to that user (future investigation).
 +  * Mount option ''nolock'' is required because nomad checks to see whether statd is running before allowing the filesystem to be mounted. Because nomad itself is running inside a container here, it cannot see that statd really is running on the host. Need to check how nomad is looking for statd and see if it can be exposed into the container.
 +  * The controller is configured to enable zfs snapshots by setting the ''com.sun:auto-snapshot=true'' dataset attribute in its config file
  
nomad/csi.txt · Last modified: 2021/01/01 23:26 by ben