How to unstick "unsupported system type" errors on Hydra, when the build host is available?

endgame · May 5, 2023, 12:40am

I have an x86_64-linux Hydra server which successfully farms out builds to an Arm Mac Mini, until it doesn’t. It appears that because the builds are so long, the build hits max_unsupported_time despite making progress on intermediate steps, and then Hydra marks builds as failed. I have tried clearing the failed builds cache (via the Hydra admin web UI), restarting the failed jobs, and restarting the hydra-queue-runner, but I cannot seem to get new build steps to appear on the /steps page. Is there a log somewhere which can help me figure out what’s going on, or is there something else that I can kick to get things moving again?

Sandro · May 5, 2023, 6:30pm

Try increasing meta.timeout of the hydraJob. I think per default that is 10 hours and if the builds in total take longer than that hydra stops.

endgame · May 8, 2023, 8:28am

Thanks. I’m not sure that was it - most of the work was in evaluation and the actual build took less than 10h to stall out. One of the ops people at work said that the hydra-queue-runner might be using exponential backoff, but I haven’t had time to check this. In any case, things seem to be moving again.

Sandro · May 8, 2023, 8:32pm

it is caching build failures by default, not sure about backoff either.

endgame · May 8, 2023, 10:18pm

Indeed. While I was able to clear the failed build cache, that didn’t seem to be enough on its own to get things moving again. Until it recurs, I have to write it off as one of those little software mysteries.