Why is GitHub a preferred source over PyPI?

Hi,
I would like to package some Python tools. I’ve noticed in the documentation that fetching from the source repo is preferred over fetching from PyPI, why is that? My intuition would tell me that a PyPI-based package would stay up forever and hence would be preferred, while a GitHub repo can be deleted, moved, or renamed by a maintainer at any time.

1 Like

As for availability, I think versions we use would typically be part of https://archive.softwareheritage.org/ so not disappear.

As for URL stability, since we always accompany a hash with the url, we automatically validate that what we got is what we expected.

git repo’s being more flexible is an advantage rather than a disadvantage: you could more easily create your own fork of a dependency and adapt the a Nix package to use that. I believe PyPI packages can have a ‘build step’, so this way the package would actually perform this (re)build based on your updated/forked code.

Finally, there is arguably a security advantage in using the presumably-easier-to-audit ‘source code’ over the PyPI package which could contain built artifacts.

In addition to what raboof said, the upstream source repository also often contains tests, which are stripped from the dist-ball uploaded to pypi.

2 Likes

So if a GitHub repo disappears, the same derivation will still build without any modifications, thanks to softwareheritage?

As a first layer, the drv itself is cached (usually), as a second layer, the sources should remain cached indefinitely.

And as a third layer, we should be able to rebuild with overriden sources from actual git-mirrors.

This is harder with dist-balls living on pypi.

And what really is always a massive concern (especially since the xz incident) are binary blobs, which origins you can not verify.

I’m not sure it’d be will be a simple out-of-the-box experience, but it should be possible at least.

Great, that makes sense. Thanks!