I have an
x86_64-linux Hydra server which successfully farms out builds to an Arm Mac Mini, until it doesn’t. It appears that because the builds are so long, the build hits
max_unsupported_time despite making progress on intermediate steps, and then Hydra marks builds as failed. I have tried clearing the failed builds cache (via the Hydra admin web UI), restarting the failed jobs, and restarting the
hydra-queue-runner, but I cannot seem to get new build steps to appear on the
/steps page. Is there a log somewhere which can help me figure out what’s going on, or is there something else that I can kick to get things moving again?
I have an
Try increasing meta.timeout of the hydraJob. I think per default that is 10 hours and if the builds in total take longer than that hydra stops.
Thanks. I’m not sure that was it - most of the work was in evaluation and the actual build took less than 10h to stall out. One of the ops people at work said that the hydra-queue-runner might be using exponential backoff, but I haven’t had time to check this. In any case, things seem to be moving again.
it is caching build failures by default, not sure about backoff either.
Indeed. While I was able to clear the failed build cache, that didn’t seem to be enough on its own to get things moving again. Until it recurs, I have to write it off as one of those little software mysteries.