Dreaming about distributed file system

SeaweedFS ?
Reddit NixOs
GitHub
Issues

1 Like

If you want something thatā€™s a million times nicer than Ceph and also supports geo-replication out of the box (whilst only supporting S3) - check out Garage: https://garagehq.deuxfleurs.fr/

I had heard of garage, but thanks for the reminder. I think what they are doing is super cool, but it is absolutely not a Ceph replacement for anything other than basic object storage (i.e. weakly consistent object store, no read your own writes), and so I think calling it nicer than Ceph with no qualifiers is a bit unfair :slight_smile: . For instance it is completely feasible with decent networking for Ceph to provide block storage devices and then run a database with separated compute / storage atop. This is a non-goal for garage, which is fine, they are trying to build a different system.

1 Like

Actually, it does implement read-after-write consistency: https://garagehq.deuxfleurs.fr/documentation/reference-manual/configuration/#top-level-configuration-options

For me, itā€™s nicer than Ceph in that you can get it going in a matter of seconds on basic hardware as opposed to requiring 10GE networking.

It seems to me like a good seaweed implementation needs at least 3 master servers with low latency + different failure domains to keep itā€™s promises (because you have a master with volume mappings replicated with RAFT). This is before you even bother adding any volume servers (although maybe you just add some big disks to your masters and collocate the services).

Still seems like a bunch of hassle vs some weakly consistent p2p sync system. I guess if you enjoy this kind of thing go for it? If you want to know why all of these distributed file systems all seem to have such baroque requirements and you have a bunch of free time and enjoy CS papers give this a read. It doesnā€™t require advanced mathematics, just a great deal of patience to work through various voting scenarios.

You could always run the seaweed master on a single machine, and back it up regularly? If volume mappings are ā€œstable enoughā€ this could provide something, but just bear in mind any new volumes created between backups would need to be manually recovered (assuming that seaweedfs has the tools for this, I havenā€™t read enough to answer that).

Shame on me for skimming! Thatā€™s super cool. But Ceph doesnā€™t require 10GBe for object replication, you only need the really nice networking gear if you want to provide an FS or block device. That said: configuring and running ceph is horrible, and I wouldnā€™t wish it on anyone :slight_smile:

This is mentioned under non-goals in the goals page.

POSIX/Filesystem compatibility: we do not aim at being POSIX compatible or to emulate any 
kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated 
in network messages that impose severe constraints on the deployment.

Does that mean I canā€™t use it as a filesystem?

Correct. Garage is an object store, not an FS. You can use it as a backup target, or for systems that target object stores.

1 Like

The Quobyte site have a lot of good Explanations.

I have now Unison running on my system, syncing /home between my server and work computer.

I make a systemd service that sync, when I start up my work computer. It is possible to sync on file changes (inotify), by changes one parameter in the config. But in my case, I want to reduce net traffic, so it only syncs on start up of my work computer.

Since there currently is no support for configuration of unison in NixOs, I have used the unison config framework.

The only unsolvable problem I meet, is that unison donā€™t sync the modified time of directories. I used rsync to solve that problem.

Iā€™m considering SeaweedFS to implement CSI (Container Storage Interface) for use in workloads in a small nomad cluster (3-4 nodes).

My main objective is to avoid having to define 10s of host volumes in my nomad clients (configured via the NixOS module) and to decouple storage from the nomad clients.

Since it feels a bit off to run SeaweedFS as a nomad workload in the very cluster that relies on it, Iā€™d prefer to run it as a host-system-service (via a NixOS module).

  1. Does this sound like a plan?
  2. Is SeaweedFS suitable for this purpose, or are there better alternatives?

For now Iā€™m looking to realise this in a LAN setting, but I want to replicate the same method later on a VPS based nomad cluster that is interlinked (nomad+consul) via a private WireGuard network. Iā€™ll not be looking to have the distributed FS also store databases, only files; I think for HA/redundancy for my databases a replicating set (managed by the database) would be the correct way?

For backups Iā€™ll probably be looking to just stream/archive from the VPS to a local storage solution, i.e. basically just dumps or maybe better borg backup. For database backup likely just pg_dump

git-annex seems like a good alternative to syncthing

1 Like

I highly recommend git-annex: Iā€™ve been using it daily for more than a year. Integrates into magit, too!
It does require the remote to also have it installed (so most forges wont work, lest there is some LFS-adapter Iā€™m not aware of), but you can also use network filesystems as remotes, as you can with git.

1 Like

You can also just use a plain old git repo on your NAS via ssh. Thatā€™s what I do.

https://git-annex.branchable.com/special_remotes/git-lfs/

1 Like

Thanks for the LFS doc link. It just works. No more wrestling with horrible file systems!