Code attribution policy

IMO copying verbatim more than three lines of code from another change not originally intended for a project, especially without talking to the authors, should get at least a link back to the original source and often a Co-authored-by.

This is not as much of a worry in terms of code review changes getting squashed, as long as the reviewer doesn’t substantially change the thing and doesn’t ask for credit (which is the most common case).

I think a lot of the thing that feels bad is when someone goes and picks a change, doesn’t communicate anything and then doesn’t write anything in the commit message about having done so.

In summary what I want if someone is picking my change is:

  • if it’s essentially verbatim, leave the author tag alone and maybe add yourself as Co-authored-by depending on how much you changed (e.g. maybe you added build system changes for make or something that doesn’t exist in lix).

    • This is slightly different for me in the case of completely unrelated projects where it’s not a change in response to a change, but simply taking code as it is.

      This may be because one particular function isn’t quite right or because of wanting to avoid a gratuitous dependency.

      An example of a case like this is where I copied an entire function from Python. I checked the license is compatible, added a copyright header to the file as required by their license, and added a comment in the source about where the function came from in Python. The point of this is documenting this is foreign code, not written by me, and it might change in the future so it should be possible to find again if it ever has bugs.

  • if a substantial amount is used, Co-authored-by, but not if it’s rewritten to the point of indistinguishability

  • if basically nothing is used, still consider linking to the other implementation’s change if you were aware about it while writing your change (obviously if you just didn’t know, that’s fine). it makes commit history a lot more useful.

  • regardless of the above, please link to the place you got the commit from and the pr/context. Commit hash is very helpful but if it’s cross -repo so it won’t resolve, consider adding a web link for the commits/CLs/PRs contained within. This is very helpful to figuring out the context of a change when debugging it later, and is context that’s really hard to build up later.

    If it’s relevant, it can be nice to link to the other implementation’s change even if the change was completely rewritten, since this helps keep track of cross project context.

This isn’t about hard and fast rules but rather about a culture of credit and helping people use the code base years into the future, even when there are forks. I am not super bothered if my code is taken, but removing the context is shooting yourself in the foot and is unkind to both the original author and your future self.

Again, these aren’t hard and fast rules, and nobody is perfect. The case that makes people uncomfortable is when there’s no way to dig through either commit history or GitHub prs (i would discourage putting stuff just in the pr message or comments, it’s a big extra step to find it) to find where the original code came from.

Longer commit messages are great, especially if they add useful information about what everyone was thinking that led to that commit being made.

I will note that including explicit acknowledgement in the source is practically an exceptional case. For most changes, including all of those that have led to my friends feeling their work has been taken without credit, the only thing that should be done more carefully has been commit messages. The case where this differs in my view is for vendoring library functions wholesale because then the reader may wonder why the code style doesn’t match, for instance, or they need updating.

Also I acknowledge GitHub does an awful job of surfacing commit messages to reviewers and certainly makes fixing them as a reviewer a pain, requiring using the git command line (it’s one click in Gerrit and the commit message is front and center for reviewers. if you’ve read lix commit messages you’ll see the cultural impact of this difference). So a certain amount of this is a shadow cast by our tools.

Anyhow, when in doubt, link to the code you looked at. It will help your future self and other maintainers a lot. People are a lot more ok with trying and not getting it quite right than not trying.

Personally I would not be mad at someone taking pretty much any of my contributions and resubmitting them, as long as there is some credit and ideally context, no matter how that credit is precisely arranged, no matter what the author field says. It’s most important to do your best. The problem is when there’s simply no such context in the commit metadata or the code (if appropriate).

13 Likes