What should the Hydra output size limit be?

Currently the nixos-unstable channel has been blocked for a week, because the GNOME ISO is too big for Hydra.

A request was made on that issue to keep discussion there focused on “fixing the immediate problem”, and whilst I think that a quick way to fix the problem would be to increase the output size limit, I appreciate that that is not a good long-term solution without some discussion of what the limit should be. So: how should we determine an appropriate size for the GNOME ISO?

I assume that the current limit is 2 GiB (2147483648 bytes), but I don’t know if there’s a way to check that.

The ISO has been consistently over 2 GB (2000000000 bytes) big since September 2021, and has been near that size since a year ago, February 2021. @06kellyjac posted a summary of sizes of other distributions’ live graphical ISOs on GitHub, which I have expanded upon here:

Distro Size DE/WM
Endeavour OS 21.5 1.9 GiB XFCE
Linux Mint 20.3 2.0 GiB XFCE
Solus 4.3 2.0 GiB GNOME
Linux Mint 20.3 2.1 GiB Cinnamon
Pop!_OS 21.10 2.5 GiB GNOME
elementary OS 6.1 2.5 GiB Pantheon
Debian 11.2 2.6 GiB GNOME
Manjaro (minimal) 21.1 2.6 GiB XFCE
Manjaro (minimal) 21.2 2.7 GiB GNOME
Ubuntu 21.10 2.9 GiB GNOME
Manjaro 21.2 3.3 GiB GNOME

I have manually verified the size of all of these.

Based on this, I think it is unrealistic for the GNOME ISO to be significantly smaller than 2 GiB. It’s probably not impossible to keep it just below 2 GiB, but it seems likely to be a lot of work, and whenever it hits 2 GiB, nixos-unstable will be blocked until it’s made smaller, which seems undesirable.

3 GiB seems like a much more realistic limit to me. It would be good in any case to keep the image well under 4 GB (3.7 GiB), so that it fits onto 4 GB USB sticks.

5 Likes

Note that the Hydra output limit is not specific to the GNOME ISO, but imposed on every derivation output on Hydra. 2GB is more than plenty for 99% of all derivations and shouldn’t be raised or other derivations will inevitably also use it up in cases that are avoidable (e. g. debug symbols for qemu had this problem as well as ghcjs’ bundled libraries). Our binary cache is, after all, growing in size ever faster.

A good solution in my opinion could be, following your point, to add a feature to Hydra allowing to adjust the output limit per derivation via a meta attribute, say hydraOutputLimit. We are sadly always short on people working on Hydra, however.

3 Likes

my point is mostly that, absent any hydraOutputLimit attribute, there is no choice but to increase the global limit. sure, it’s not ideal, but the alternative is that nixos-unstable spends a significant amount of time blocked on a single unreliable job, and people spend a lot of effort trying to reduce its size and get frustrated and burn out. the ISO’s size was already reduced by ~100 MiB at the end of October 2021, and within a month the size improvement was lost.

it would be lovely if there was a hydraOutputLimit, just as it would be lovely if there was e.g. tooling to show the size of many jobs at once, or email maintainers when a derivation becomes larger than expected, or whatever (yes, I am aware of the reasons the feature that emails you when your derivation fails to build was turned off), but to the best of my knowledge there isn’t, so,

5 Likes

Honestly, stuff like this is why I still can’t recommend NixOS to most coworkers. The blocked channel is blocking rollouts of critical and very known CVEs in Chromium for many people right now (chromium: 98.0.4758.80 -> 98.0.4758.102 by primeos · Pull Request #160354 · NixOS/nixpkgs · GitHub). In addition, there are people not being able to run home-manager (`remarshal` failed to be installed · Issue #159522 · NixOS/nixpkgs · GitHub).

Ideally, the size of the Gnome ISO can be smaller, yes. Right now it’s non-trivial to get the size down. When the channel is blocking all other changes from happening, why not do a temporary fix now?

EDIT: Found the limit here: https://github.com/NixOS/hydra/blob/d0bc0d0edafef4201a33391eae243c0e3c6925d3/src/hydra-queue-runner/hydra-queue-runner.cc#L44. Changing that hardcoded value might be a bit much, but it seems max_output_size is also a configuration option for the hydra-queue-runner. I never ran Hydra myself, but it seems such a configuration change should be trivial to do?

Once that is done people can work on shrinking down the ISO further without time pressure.

Another long-term solution might be to have exceptions for the hydra output size limit? (forget I mentioned this, the important part is the temporary solution for now)

7 Likes

For coworkers I’d recommend the stable release but yes having the unstable channel completely clogged and bikeshedding for days isn’t good.
Being pragmatic and raising the build limit or removing it from Hydra builds temporarily just until we do get it smaller makes much more sense.

5 Likes

I created a PR that attempts to up the limit in Hydra to 3GB: hydra: increase max_output_size from 2GB to 3GB by bobvanderlinden · Pull Request #206 · NixOS/infra · GitHub. That could be a short-term solution as suggested above.

4 Likes

Ah, you’re right. The stable channels are chugging along regardless of nixos-unstable. Good to know and good policy :+1: With that I see now that this is only a problem for users on unstable, not ideal, but part of the deal :sweat_smile:. Thanks for the comment.

2 Likes

Not really directly related to the topic, but I’ll still place it here.

Perhaps another (complementary) solution should be that the system/package updates should be based only on what is needed. For example, if I run nix flake update on my configuration flake, then it should conduct the following steps

  1. Fetch all hydra evaluations newer than the git revision my flake is on.
  2. For each evaluation, starting from the newest, check if tests for the packages specified in my configuration passed.
  3. If a sufficient hydra evaluation is found, update to the specified git revision.

I believe that this approach is better than the current system as

  • generally, a configuration should not require all things to pass. For example, on my system where i run plasma on x64_86 linux, I believe that an update shouldnt be blocked by say a failed aarch64 build. Worse yet, an iso.
  • This would remove the need for someone to explicitly follow a small channel.
  • checking for all used packages allows for an update to be haulted in the case a non-channel-blocker fails to build and the channel gets updated. For example, sometimes, sage fails to build but since it isnt a channel blocker, the channel still gets updated. The sage build (actually I think its the tests) takes long and if I can automatically notice before hand that it failed to build on hydra, I would know it doesnt make sense to update and have the build also fail on my computer.

Shortcomings of this approach

  • checking all hydra evaluations between your last update and the newest hydra evaluation may take a bit of time. I’m not sure how long but I dont believe it should take overly long as its just metadata.
  • How to determine what checks need to pass? I believe this is a hard question. Should it check all flake outputs? Not all flake outputs are derivations and the path to the derivation may be non trivial. What about if I use flake-utils and forEachSystem? Should I be blocked because I havent specified that I only need x86_64 linux? Even worse, if its a nixos flake, the set of packages depends on the module. The most thorough approach would be to fetch each git revision checked, evaluate against the configuration then determine what needs to be passed. I think that is infeasible.
  • If no suitable update can be done, what should the message be? Perhaps it should look at the newest evaluation and see what necessary builds failed there. I’m not sure if this is the best approach.
  • What about non-flake users which use the nix-channel command? One possible approach would be to have save some metadata along with the channel which specifies what packages are necessary for an update. Also, perhaps the nixos-rebuild --update command would automatically check what is required in /etc/nixos/configuration.nix in addition to what is required in the nixos channel.

Additionally, some other concerns/thoughts

  • Should channels still exist? I believe yes. This system is essentially a localized version of the channel update system. Additionally, if you dont want to check hydra locally, channels are a nice centralized place to see if its good to update.
  • The channel system actually has a really great benefit, it motivates people to get the build working even if they dont use it. For example, this issue of the gnome iso wouldnt be as big if barely anyone was waiting on it.
  • A real question, why are iso’s channel blockers? Not casting blame, but I suspect that it was simply something which seemed like a good idea at a time since it was likely easier to link the isos to the channel update system. I dont know much about nixos’ hydra instance, but do all channels have to have the same output size limit?
  • A simpler version of this would to have more channels. But, I believe this is overkill, imagine having nixos-unstable-plasma-x86_64-linux, nixos-unstable-gnome-aarch64-linux, …

No doubt a system was probably though of before and possibly even mentioned (but I’m not going try search for it). But I believe this is a nice idea to think of as an alternative to channels and I’m sharing it in case it wasnt shared before.

This has probably been asked before, but what is the point of the limit?
Catching mistakes? Forcing maintainers to focus on small package sizes?
I don’t think I can find any motivation that is worth blocking the channel over.

I can’t speak for whoever has set the limit, but I can certainly find some reasons. CDN isn’t for free, and in the case of ISOs they don’t deduplicate, so you spend new gigabytes just on the ISOs on every evaluation (i.e. every day). I’ve seen Eelco disabling Hydra builds of some huge packages (like games) with similar reasoning, though there the rebuild frequency isn’t so high.

so you spend new gigabytes just on the ISOs on every evaluation (i.e. every day)

Storage and hosting costs are a fair point, but this seems like passing the hot potato down to the maintainers. What I mean is, if 3 GB is what the output is, I don’t really see why the build should fail: issue a warning if you will, but the package is correct from the maintainer’s point of view.

For instance, the qtwebengine source is larger than 2GB: that’s a fact and there’s nothing I can do about it. I had to spend a day to figure out how to compress it to fit into the limit by trial-and-error (every time downloading a few gigabytes of sources).

How long are the evaluation artifacts kept around? Maybe the solution is to rotate old evaluations faster.

2 Likes

As far as I’m aware, nothing has been deleted from cache.nixos.org yet.

1 Like

Generally I’m a little wary of putting too many “potatoes” on the centralized infrastructure. Very few people have access to it… it’s like one significantly active person only (and it’s just their free time, too).

NixOS has many cases of excessive closure sizes, both build-time and run-time. The down-sides of that affect every individual user as well. Say, you need significantly more internet bandwidth and disk space than with other distros. I do think it will be nice to improve there – and very often the individual maintainers can make a significant difference with little effort.

1 Like

A “closure” which splits it’s 3 or more gigs of data across several pathes which it probably shares with other closures, is acceptable in my opinion, though a single store path entry should be capped sensibly.

Considering that not everyone has the luxury of 500Mib/s in the downstream, a single NAR shouldn’t exceed 500MiB unless you really can’t reduce the size further (because the software is huge, or we spit out ISOs/docker tarballs)

Remember, if you had to cancel a download of a NAR, you can’t resume it. Nix will always restart from byte 0.

Perhaps the solution is to stop building ISOs/VHDs/SD images in hydra, and build the equivalent of dockerToolsstreamLayeredImage that creates a script that references all the dependencies (properly dedup’d), and when run will create the ISO/dd image proper. Then this script can be run for the CDN upload to e.g. nixos.org. I really doubt anyone is substituting nixos ISOs from cache.nixos.org.

4 Likes

For instance, the qtwebengine source is larger than 2GB: that’s a fact and there’s nothing I can do about it.

Is this qtwebengine with all chromium vendored in? qtwebengine from github seemed to be only 15MB.

Yes, the actual sources are in several git submodules.