Yes, if this is the route we want to go I’d be happy to work on it. I’d probably want a couple people to talk through design with me before getting started. And one or more who would want to contribute could make this more of a team project and hopefully avoid some of the problems we have with other similar projects built by one person.
The ability to collect statuses from external sources would be very helpful for unfree package sets and for checks that require special hardware (e.g. GPUs)
I’m sorry, but using GitHub actions for this sounds like an extremely bad idea to me.
This is going to cost GitHub real money, and I wouldn’t trust them to be willing to do that. How much was the Equinix sponsorship worth again? Yes, running actions on forks is allowed by GitHub TOS, and other –much smaller– projects already do this as well. But I’ll bet that as soon as our CI causes costs which exceed some threshold, GitHub will take actions against it.
Even if GitHub works out fine for now, will it long term? With expensive sponsorships, we are one bad year for the company away from having to frantically search for an alternative again. This is the situation we are in right now, so why not search for more sustainable solutions this time?
Just because we are already deeply locked in to the GitHub platform doesn’t mean it is okay to take some big steps further in that direction.
What is the trust and threat modeling of running GitHub actions in forks?
I find it a bit disappointing that a project this size can’t manage to stand on its own legs in terms of infrastructure, and continues to rely on companies sponsoring stuff.
Infinisil wants to take a look at evaluating nixpkgs in github actions to compute the number of changed paths
Independently we will take a look how we can build packages.
For the beginning we will just run github actions as they are designed as a pull_request event. This is because it’s the most straight forward way and we actually have not validated if we cannot just build everything fast enough without resorting to my initial strategy.
Independently from meeting we also have other discussions about how we can develop ofborg in the future. However this might not happen before February, so we need some alternative solution in the meantime if not longer.
If you want to help migrate ofborg to a new sustainable infrastructure, be my guest. We can also evaluate both plans parallel, so please don’t feel blocked by us. If you want to help, you can join #infra:nixos.org matrix channel.
The infra team is currently small and therefore has to focus on the essential that is the core building infrastructure but if we have more helping hands we can also expand to bigger things. As of now public holidays for many of us are approaching, which we also want to enjoy.
Personally I only read about your post after the meeting…
To address your points. I think github actions are actually more secure than ofborg because they run builds in isolated VMs. Also we had to learn our lessons with insecure usage of GITHUB_TOKEN. Also note that we decided not to build in forks (also from a security standpoint this should not make a difference), because we think it should be possible to run everything from the NixOS org.
I don’t think development of ofborg is currently sustainable. It doesn’t receive a lot of contributions because it’s quite hard to get a development setup for testing the stack locally. This is probably fixable but not in the current timeframe.
Infinisil made also some good progress on optimizing evaluation (based on amjoseph’s work). Those could be in future retro-fitted into ofborg. The resulting tooling can than be also run from a local nixpkgs checkout, which actually makes switching to a different CI easier than ofborgs hard-coded github integration.
From my past experience in nix-community, Github also does not simply turn off resources for legitimate projects. They usually give a heads to fix resource issues.
What would be the cost of maintaining those machines and bandwidth for, say, another quarter in order to give folks a bit less stress over the hoildays and give us more time to do an orderly transition?
Is that something we might be able to fund via donations?
If the runs are moved into GitHub Actions but the infrastructure is found to be insufficient, is there any reason self-hosted runners could not be used? That could be a relatively easy release outlet for any pressure should we run into either usage limits from GitHub’s free coffers or just lack of speed due to the free tier limitations.
There is always the risk of vendor lockin, but if basically all the logic and scripts is handled in the job scripts, then the runner is just invoking a couple basic calls and that can be pretty easily ported to another system in the future.
But really - in an ideal world - what would the best solution look like? If things are moving around significantly anyway, this might be the opportune time to consider the best solution and begin working towards that. What are the MVP requirements, the needed requirements, and the nice to haves?
People are of course encouraged to check my math. Thoughts:
This is based on the assumption that they’ll cut us some slack for not doing a full year–their pricing is 20% off on-demand price for that and 50% off for a 3-year commit). Bump price if you don’t think that’ll happen.
I checked the chip SKUs, I could’ve goofed that up.
This amount is within striking distance of personal funding for some folks and corporate sponsorship from the various orgs that do business in the ecosystem. If we can avoid pissing people off we might have a shot at making this transition a lot less painful than the crash project it’s currently looking like.
No idea about transit costs, which might be a lot of what we’re getting for free as well.
We had that weird crypto thing earlier this year that resulted in somewhere between $20k and $40k of usable donation money. Based on the thread, it doesn’t sound like the money has been used to pay for the binary cache yet. Even $10k of that would buy us another month to come up with a solution, and the rest of it could go towards buying hardware for a non-GitHub permanent solution (if we go with @piegames’s wish to avoid GH-hosted runners).
There were some questions in the thread about the legal paperwork required to make the funds usable, but OfBorg is explicitly named in the “what we’ll use the money for” section, so I imagine there wouldn’t be much obstacle to it:
We have a lot of unfunded NixOS projects and common goods that doesn’t receive any attention, we would make a list out of it, e.g. Hydra/OfBorg, etc. and try to figure out if we can make a project funded out of this money
We want to be able to self-host as much as possible, and this would be a nice fund to buy hardware.
Equinox metal is shutting down next year anyway btw. I think something like June?
Also it makes more sense to use Hetzner for pricing.
On top of ofborg we also need new builders for hydra.
Also it makes more sense to use Hetzner for pricing.
Long term, sure–even longer term, we should probably just buy the hardware and rack it somewhere.
I used the Equinix pricing in case we wanted to just negotiate a contract extension for them to buy a few more months without having to scramble around–e.g., we could just leave the existing stuff in place. Hetzner or some other host afterwards of course makes sense.
To what extent could the necessary jobs be crowd-sourced?
I tried to get familiar with ofBorg before and got the impression that it definitely deserved more love. The documentation is lacking. When encountering breakage on the nixpkgs side, there seems to be little interest in fixing it because it uses old release branches. Not much activity on ofBorg’s own PRs, either. With more spare time in the future, I might have a second look.
If it became easier to setup ofBorg locally, and we had a mechanism that can distribute jobs to crowd-ran setups, this could become a nicely scalable solution for our CI needs.
If GitHub can run (parts of) these jobs, too, that’s great, but I think we should not lose the focus on developing towards a less limited infrastructure.
Sorry if someone already made this point, but I think it’s worth pointing out that if this is mainly gonna be based on GitHub actions, the actual migration to e.g. forgejo and forgejo runners in the future wouldn’t be horribly costly, as they’re essentially compatible (although this is explicitly not guaranteed by forgejo1). At least in my limited experience, GitHub actions were easily ported to Forgejo.
Another argument for actions is they’re somewhat common, meaning it will be easier to onboard contributors, and drive by fixes would likely be much easier for new contributors.
But still, all the other points you bring up are valid and strong arguments against GitHub actions. Considering people seem fairly set already on going in this direction, I hope they at least keep in mind that avoiding potential GitHub’isms when writing GitHub actions will in the long run pay dividends when we eventually end up needing to migrate elsewhere.
GitHub limits us to 20 concurrent builds per organization. To manage this, we can run GitHub Actions in contributors’ forks instead of directly on pull requests.
I would suggest contacting GitHub for a sponsored plan. GitHub should at least be able to provide a free team plan to NixOS org which would raise it 60 concurrent jobs. If they can provide a free enterprise plan to NixOS org then it would be 500 concurrent jobs which should be plenty.