Infrastructure Announcement: The future of OfBorg – Your Help Needed!

Generated from what? We have two new features that we haven’t announced yet that might work here, but we’d have to check.

We have the package attribute in the beginning of the commit title and we want to build that including passthru.tests.

Yeah that should be okay

That’s nice. Than we can already cover aarch64-{darwin,linux} and x86_64-linux. That build load is hopefully also not as bad as evaluating nixpkgs.

Covering aarch64-darwin covers x86_64-darwin. We have no native Intel Mac builders already.

2 Likes

Yes, if this is the route we want to go I’d be happy to work on it. I’d probably want a couple people to talk through design with me before getting started. And one or more who would want to contribute could make this more of a team project and hopefully avoid some of the problems we have with other similar projects built by one person. :slight_smile:

1 Like

The ability to collect statuses from external sources would be very helpful for unfree package sets and for checks that require special hardware (e.g. GPUs)

I’m sorry, but using GitHub actions for this sounds like an extremely bad idea to me.

  • This is going to cost GitHub real money, and I wouldn’t trust them to be willing to do that. How much was the Equinix sponsorship worth again? Yes, running actions on forks is allowed by GitHub TOS, and other –much smaller– projects already do this as well. But I’ll bet that as soon as our CI causes costs which exceed some threshold, GitHub will take actions against it.
  • Even if GitHub works out fine for now, will it long term? With expensive sponsorships, we are one bad year for the company away from having to frantically search for an alternative again. This is the situation we are in right now, so why not search for more sustainable solutions this time?
  • Just because we are already deeply locked in to the GitHub platform doesn’t mean it is okay to take some big steps further in that direction.
  • What is the trust and threat modeling of running GitHub actions in forks?

I find it a bit disappointing that a project this size can’t manage to stand on its own legs in terms of infrastructure, and continues to rely on companies sponsoring stuff.

14 Likes

See the meeting notes for today’s infra meeting where we mainly discussed the CI situation: infra/docs/meeting-notes/2024-11-14.md at 7688f20babbeb27a10e4d8669fffe4b0ed00e17f · NixOS/infra · GitHub

Here is the high-level plan:

  • Infinisil wants to take a look at evaluating nixpkgs in github actions to compute the number of changed paths
  • Independently we will take a look how we can build packages.
  • For the beginning we will just run github actions as they are designed as a pull_request event. This is because it’s the most straight forward way and we actually have not validated if we cannot just build everything fast enough without resorting to my initial strategy.

Independently from meeting we also have other discussions about how we can develop ofborg in the future. However this might not happen before February, so we need some alternative solution in the meantime if not longer.

3 Likes

If you want to help migrate ofborg to a new sustainable infrastructure, be my guest. We can also evaluate both plans parallel, so please don’t feel blocked by us. If you want to help, you can join #infra:nixos.org matrix channel.
The infra team is currently small and therefore has to focus on the essential that is the core building infrastructure but if we have more helping hands we can also expand to bigger things. As of now public holidays for many of us are approaching, which we also want to enjoy.

13 Likes

This thread is being somewhat duplicated on GitHub, so please check my comment there:

I haven’t seen my concerns on sustainability and trust addressed, and couldn’t find anything in the discussion notes or on the GitHub issue either.

2 Likes

Personally I only read about your post after the meeting…
To address your points. I think github actions are actually more secure than ofborg because they run builds in isolated VMs. Also we had to learn our lessons with insecure usage of GITHUB_TOKEN. Also note that we decided not to build in forks (also from a security standpoint this should not make a difference), because we think it should be possible to run everything from the NixOS org.

I don’t think development of ofborg is currently sustainable. It doesn’t receive a lot of contributions because it’s quite hard to get a development setup for testing the stack locally. This is probably fixable but not in the current timeframe.

Infinisil made also some good progress on optimizing evaluation (based on amjoseph’s work). Those could be in future retro-fitted into ofborg. The resulting tooling can than be also run from a local nixpkgs checkout, which actually makes switching to a different CI easier than ofborgs hard-coded github integration.

From my past experience in nix-community, Github also does not simply turn off resources for legitimate projects. They usually give a heads to fix resource issues.

11 Likes

What would be the cost of maintaining those machines and bandwidth for, say, another quarter in order to give folks a bit less stress over the hoildays and give us more time to do an orderly transition?

Is that something we might be able to fund via donations?

If the runs are moved into GitHub Actions but the infrastructure is found to be insufficient, is there any reason self-hosted runners could not be used? That could be a relatively easy release outlet for any pressure should we run into either usage limits from GitHub’s free coffers or just lack of speed due to the free tier limitations.

There is always the risk of vendor lockin, but if basically all the logic and scripts is handled in the job scripts, then the runner is just invoking a couple basic calls and that can be pretty easily ported to another system in the future.

But really - in an ideal world - what would the best solution look like? If things are moving around significantly anyway, this might be the opportune time to consider the best solution and begin working towards that. What are the MVP requirements, the needed requirements, and the nice to haves?

1 Like

Okay, the pricing (assuming we had normal pricing):

Machine RAM CPU cores Quantity Cost / month (USD) Subtotal / month (USD)
c3.small.x86 32 GB 16 (8 physical) 1 438.00 438.00
m3.large.x86 256 GB 64 (32 physical) 5 1,584.10 7,920.50
c3.large.arm64 256 GB 80 cores (80 physical) 1 1,277.50 1,277.50
c3.medium.x86 64 GB 48 (24 physical) 1 876.00 876.00
1-month total 10,512.00
3-month total 31,536.00

People are of course encouraged to check my math. Thoughts:

  • This is based on the assumption that they’ll cut us some slack for not doing a full year–their pricing is 20% off on-demand price for that and 50% off for a 3-year commit). Bump price if you don’t think that’ll happen.
  • I checked the chip SKUs, I could’ve goofed that up.
  • This amount is within striking distance of personal funding for some folks and corporate sponsorship from the various orgs that do business in the ecosystem. If we can avoid pissing people off we might have a shot at making this transition a lot less painful than the crash project it’s currently looking like.
  • No idea about transit costs, which might be a lot of what we’re getting for free as well.

We had that weird crypto thing earlier this year that resulted in somewhere between $20k and $40k of usable donation money. Based on the thread, it doesn’t sound like the money has been used to pay for the binary cache yet. Even $10k of that would buy us another month to come up with a solution, and the rest of it could go towards buying hardware for a non-GitHub permanent solution (if we go with @piegames’s wish to avoid GH-hosted runners).

There were some questions in the thread about the legal paperwork required to make the funds usable, but OfBorg is explicitly named in the “what we’ll use the money for” section, so I imagine there wouldn’t be much obstacle to it:

  • We have a lot of unfunded NixOS projects and common goods that doesn’t receive any attention, we would make a list out of it, e.g. Hydra/OfBorg, etc. and try to figure out if we can make a project funded out of this money
  • We want to be able to self-host as much as possible, and this would be a nice fund to buy hardware.

@Infinisil @RaitoBezarius Did you figure out the status of those funds?

2 Likes

Equinox metal is shutting down next year anyway btw. I think something like June?
Also it makes more sense to use Hetzner for pricing.
On top of ofborg we also need new builders for hydra.

Here is an estimation hexa made:

hydra.nixos.org (~1500 EUR/mo)

  • 3x ARM64 Builder (2 normal, 1 big-parallel)

  • 4x x86-64 Builder (3 normal, 1 big-parallel)

6 Likes

Also it makes more sense to use Hetzner for pricing.

Long term, sure–even longer term, we should probably just buy the hardware and rack it somewhere.

I used the Equinix pricing in case we wanted to just negotiate a contract extension for them to buy a few more months without having to scramble around–e.g., we could just leave the existing stuff in place. Hetzner or some other host afterwards of course makes sense.

2 Likes

Would you want to work on this? I opened up an issue here: Future development of Ofborg · Issue #695 · NixOS/ofborg · GitHub
If there are enough people interested, they could organize and meet for this.

1 Like