Dreaming about distributed file system

I have been dreaming about setting up a distributed file system on a server, and connecting it to my computers, so that I have easy access to all my project files on all my computers.

This will make it easy to, have and work, on multiple computers. I sometime restore old
computer to working order, and if we can make a NixOs config file that install a system
as a distributed file system client. It will be super easy to add a new computer to the system.

Plan for building it is:

  • Figure out the distributed file system system to use.
  • Make a NixOs configuration file for the distributed file system server.
  • Make a NixOs configuration file for the distributed file system client.
  • Test it on a server.
  • Test it on multiple clients.

Have anybody already experience with a distributed file system like the Andrew File System or similar?

Are the plan good, or can it be made better? How?

2 Likes

I use syncthing for this sort of thing (except for git repos).

4 Likes

There are options to enable GlusterFS or Ceph.
An example for replicated GlusterFS can be found here: GitHub - nh2/nixops-gluster-example: Advanced nixops deployment example: GlusterFS

3 Likes

This I assume. Syncthing - Wikipedia

Some info about the systems:

I have good experiences with Moosefs on NixOS. It is integrated into NixOS via a module and easy to setup. The only consideration is that the master server holds all the meta data in RAM and thus needs to be equipped with a bit more memory.

Describtion of a “high availability file system”. From Wikipedia.
"InterMezzo is described as a “high availability file system” since a client can continue to operate even if the connection to the server is lost. During a period of disconnection, updates are logged and will be propagated when the connection is restored. Conflicts are detected and handled according to a “conflict resolution policy” (although the best policy is likely to be to avoid conflicts).

Typical applications of replication mode are:

  • A cluster of servers operating on a shared file system.
  • Computers that are not always connected to the network, such as laptops.
    "

I tried doing this some years ago and my impression was that Linux has been optimized out of compatibility with distributed file-systems. Lots of applications and libraries use mmap instead of read and write because it does better in some benchmarks but breaks on network file-systems.

The only OS that is still maintained and has the true experience of a distributed file-system without enterprise grade complexity is 9FRONT. No Nix support though.

An alternative to distributed filesystems can be a distributed file store such as https://git-annex.branchable.com/.

It’s a beast to set up and comes with plenty of footguns but it has some incredibly useful features.

1 Like

Interesting, also because I have a lots of drives and also a usb docking station for drives.

Did you run on the whole file system or only a part?

I am only planning to mount /home on the distributed file system.
Tmp files written by programs should go to other directories, but of course you never know
what so crazy programmer will do.

The ability to do distributed storage offline and asynchronously is one of its prime features, I can definitely recommend you to take a look if that’s what you’re aiming to do.

I tried doing a Linux root over network and the problems I had were in /home.

1 Like

And what network file system did you use?

Hey @Isomorph70 any update on this? Curious to learn if you’ve decided on using something, as I am currently looking into this myself.

Unfortunately I got preoccupy with other stuff, so I have not tried anything out jet, but when I get time, I will return and try something.

1 Like

I have been running a IPFS server for some years. IPFS is more a file store, not really a file systems, that can serve file systems on different computers.

I do distributed systems for a living (i.e. global databases with replicated acid transactions) and my general take on this given what I imagine your use case is

As a single individual I would like to be able to work on the same set of files on many computers, but only ever on one computer at a time.

then I would absolutely not touch a distributed file system. Have you read the documentation for OpenAFS? Have you looked at the admin guide? It’s massive. That said, it basically gives you nothing that syncthing doesn’t: It has a completely weak consistency model while also requiring deep kernel integration and is targetted at serving 10s of thousands of concurrent users. Ceph can be used to build a competitor to S3! All the choices made by the Ceph devs are targetting people that need real time synchronous replication with rack-aware placement to survive a data centre level outage with a full time monitoring team. Of course you can install Ceph on a 3 pi cluster for fun (you want a min cluster size of 3 for meta-data to survive failure of one node), but if you are actually looking to solve a problem and use the lowest maintenance solution then I recommend choosing something developed explicitly for that use-case i.e. syncthing, git-annex et al.

Of course if you want to lab this stuff because you want to gain some experience running a particular system disregard everything I just wrote.

5 Likes

Alright @bme thanks. (Syncthing here I come… :slight_smile: )