I’m trying to package a library (pytorch-lightning) in which a central component is its ability to retrieve models & datasets from the internet and run them locally. A common workflow in ML during distributed training is to save model checkpoints on a centralized server, with the ability to resume training if a slave fails / is killed.
The build itself is pure, but the
py.test code understandably requires internet access, as after all, the code using requests is what is being tested.
To package for nixpkgs, I could manually disable all tests that require network access (21 tests). This makes me unhappy for two reasons:
- the maintenance burden is high
- I actually want these tests to be run to verify correctness
It seems to me that no network access during
checkPhase is an anti-pattern. I think the intention is to prevent commiters from intentionally or inadvertently introducing impurities into
/nix/store but seems we throw the baby out with the bathwater. Many packages across nixpkgs have no tests because of this restriction, particularly webservers (e.g. gunicorn), which is concerning when you consider that some people are using NixOS in production for webservers with source code modifications & no tests.
What do you think of the prospect of allowing network access during
checkPhase? Perhaps there’s a way to do this that also maintains sandbox integrity, e.g. running
checkPhase with read permissions but not write permissions to