How to make Nixpkgs more eco-friendly / use less resources

Hopefully Content Address packages, can lead to the holy grail of build farms…

distributed trustless builds…

I know Adam’s and co have been working on trustnix, a distributed build system… imagine your builds being done on machines directly connected to renewable resources, or where ever the sun happens to be in the sky on earth :-).

If this can be somehow linked to a way for builders (miners) to get rewarded for building derivations for others then centralized building can a thing of the past.

Hydra goes from 1000 cpu’s to many many thousands, use ipfs or hypercore to distribute said build…

a bit star trek but if works, it’s probably change the course of software building and distribution for ever…i mean, you can’t expect do a nixos rebuild switch on mars, and fetch it from earth over a rather low bandwidth and ‘slightly’ high latency tcp connection can you. ???

Nix = $ would never be a truer word then.

2 Likes

Trustless distributed build would probably a long time in the future. From my understanding, Trustix is more about checking the build from independant builder. The problem is that being able to demonstrate that two builder are independant is hard, and you need to trust someone or something to demonstrate they are independant (but it would certainly still be less trust that trusting a single hydra provider).

However, safely distributing file distribution is totally possible with IPFS (or certainly others) (which mean Inter Planetary File System btw). Possibly including a signed “input → CA (and IPFS) hash” (with IPNS possibly being said signature). (but no one will probably have a latency more than a few seconds before a long time, althought this may help with load balancing and performance).

Also, the centralized architecture as used today probably doesn’t prevent using certains host under certains condition (if we ignore the fact that this might not even be a good idea to begin with due to fabrication cost, but I’m not knowledgeable with this). Internet is probably fast enought that transfering data to the other side of the world is totally possible and easily achievable. (even thought the builder still need to be trusted sadly).

If it’s any consolation, I appreciate all of the work that you’ve done to extend the usage of Nix to CA derivations and IPFS.

I think it’s really forward thinking and helps demonstrate the power of using functional paradigms like merkel trees.

11 Likes

I can’t wait to see IPFS being integrated natively into Nix!

2 Likes

In the grand scheme of things that would surely have a negative impact: Instead of using our build farm, the work would be done multiple times downstream on less efficient machines.

I’m also a bit confused what the resources are we are talking about: Environmental Impact? Build time? Bandwith usage? These issues are interrelated, but I don’t think we can e.g. meaningfully discuss environmental impact without knowing what parts of our infrastructure have what impact exactly.

Build resource usage is also a tricky time because CA will probably make rebuilds faster, but I think it’s likely that the time previously spent on waiting for rebuilds, will then be used to schedule even more builds. Then it’d be more efficient, but the net resource usage would be similar…

Bandwidth usage is a tricky one: I think it greatly limits the accessibility of Nix/nixpkgs/NixOS outside of the West. However, as @NobbZ points out, we can’t eliminate the necessity to regularly redownload an entire system with the way Nix is designed, because we deliberately don’t allow cheating via ABI compatibility and dynamic library loading – as conventional distributions do. I’m not sure if it’s actually feasible to significantly reduce the output size of software packaged in nixpkgs – there’s probably a big element of fighting against the current of modern software development involved. The substitution mechanism is probably the area where we can have the biggest wins.

7 Likes

I mentioned a potential solution to avoiding downstream rebuilds from (internal) library changes before: build against library stubs, then relink against the real thing in a second derivation. If one wanted to not only avoid rebuilds but also minimize substitution, the second derivation should be built locally (though allowSubstitutes = false may be a bit too strong for that). Then only libraries/executables that truly changed should be refetched, as long as the stubs are unaffected.

4 Likes

This should be possible to do, purely, either with vanilla CA derivations and cleverness, or with a small extension to them.

2 Likes

There are other content addressed distributed file systems that are available too, think just because IPFS was first doesn’t mean its the best choice.

In fact the design choices of IPFS, if it gets popular , it’s going to have problems scaling to million of nodes, due to inefficiency in content discovery/publishing…it’s just one large DHT, but maybe they will copy the idea of topics soon.

However, let the best distributed content addressed file system win.

Many solutions you can’t directly share the /nix/store, but you have to have a copy dedicated to distributing over IPFS which leaves you quite a lot of redundant data. :frowning:

However, i need to check the IPFS nix extensions, and see if/how they address this.

I’ve reviewed the IPFS patches for nix, there certainly has been an extensive amount of work gone into them judging by the size of them.

Interesting stuff!

3 Likes

IPFS was not the first. Wikipedia describes a brief history of content-addressed storage. I recall exploring Tahoe-LAFS years before IPFS became available.

ok first to get $$$$$$ of funding… sorry for not making that clear.

Another avenue to explore would be to make it easier to share a nix store between local machines, right now it takes a decent amount of work for the end user to setup their machines as binary caches. It also comes with some weird gotchas so the experience is not very smooth. Fixing those problems has the potential to save bandwidth from the server and provide a better user experience for bandwidth constrained users, and is probably much more tractable than the optimizations listed above.

From an eco standpoint that would probably be only a modest win, if any, but it would be very appreciated by users with bandwidth constraints.

Without more concrete bandwidth and compute data on the current infrastructure though, everybody here is just guessing.

5 Likes

You could take a look at GitHub - cid-chan/peerix: Peer2Peer Nix-Binary-Cache

Otherwise, you can set up a local cache, but it’s not very practical. Like you need to make it the first substituter, and when it’s not reachable nix is unhappy :confused:

edit: I wrote about peerix

3 Likes

graham has all the answers (perhaps) , but i’m not sure if he is willing or can share that data , however there’s no harm in asking. If someone is willing to put time into a nix sustainability report. Whatever it reveals , nix is much more eco-friendly than gentoo ever was, as it has the possibility for a single compiled package to be used by many users via nix’s really clever binary caching features…

I’ve done some preliminary work using the hyper core file systems over fuse, but NGI were not too interested in funding it at the time

I think all eye were on IPFS projects… but maybe next year ;-).

@Ericson2314 thanks for you hardwork and research into this, and @Solene for asking quite a important question.

3 Likes

This thread’s title, and its implication that people compiling software for themselves is bad for the environment, really bother me.

Decentralization is an essential feature of healthy ecosystems; monocultures inevitably collapse. Diversity costs energy, and thermodynamic beancounting papers over this.

I identify as an environmentalist, and I bristle at the movement being reduced to mere joule-tallying.

4 Likes

As someone involved into BSD development, I agree with the diversity point. It’s bad to see only the combination of Linux/amd64

However this doesn’t prevent working on a more efficient diversity :+1:t3:

5 Likes

A few weeks ago I did a highly unscientific experiment: I took the NAR-serialisations of two random Firefox store-paths (uncompressed) and checked how much would be downloaded if one was present locally and the other was provided via zsync/zchunk. The results were disappointing: The downloaded amount was about as much as the xz-compressed NAR file in each case.
Things may be better with some NAR-aware chunking (at file boundaries, handling store paths).

3 Likes

There is also an upcoming hackathon: https://sdialliance.org/landing/softawere-hackathon/

3 Likes

They use this piece of software, to measure power consumption on a node. This then exposes the data via a REST API.

Maybe something like this could be used as a starting point?

1 Like

I enjoyed reading this discussion.

I wanted to add, similar to @sternenseemann, that we should probably analyze the resource usage of the Nixpkgs and NixOS ecosystem before discussing possible remedies. For example,

  • How much power does the Nixpkgs Hydra use as part of the build process?
  • What is the bandwitdth usage of cache.nixos.org? How does bandwidth usage compare to “power usage”? Is there a factor computing the CO2 equivalent?
  • Further, we could have a look at: ow much builds happen in a distributed way, for example, on personal computers?
  • Are there other parts that use resources?

We could also have a look at other parts of the Nixpkgs ecosystem such as the Nix Community builds and cache.

I am by no means an resource usage expert, but I do think the lowest hanging fruits should be eaten first.

With respect to local builds, I am using deploy-rs, and so, build all my derivations on one computer and transfer the outpus on the local network. Of course, this is not an option when the systems are owned or used by different people/entities.

1 Like

… for which a flake is available!

1 Like