Tool to find duplicates / useless PRs

Hi

I made a simple tool, currently a proof of concept, to look for PRs that have duplicates or that are not useful anymore because a new version has been merged meanwhile.

It requires gh and awk :grin:

Fetch the awk script dup.awk · GitHub and in a nixpkgs repo, run the following command

gh pr list -L 2000 -s all -S "search term here" | awk -F "\t" -f test.awk

You will get some results like:

imagemagick: 7.1.0-49 -> 7.1.0-50 195211 is superseeded by 196596 (7.1.0-50 < 7.1.0-51)
[Backport release-22.05] ungoogled-chromium: 106.0.5249.103 -> 106.0.5249.119 195711 is superseeded by 194633 (106.0.5249.119 < 106.0.5249.91)
hydrus: 502a -> 502 196437 is superseeded by 196212 (502 < 502a)
esphome: 2022.9.3 -> 2022.10.0b2 194888 is superseeded by 194651 (2022.10.0b2 < 2022.9.3)
emulsion: 9.0 -> 10.2-test.8 is duplicated in  196568 196433
jujutsu: 0.4.0 -> 0.5.1 is duplicated in  196598 196528
far2l: 2.4.0 -> 2.4.1 is duplicated in  195570 195016
immudb: 1.3.2 -> 1.4.0 is duplicated in  195822 195798
asciidoctor: 2.0.17 -> 2.0.18 is duplicated in  196250 196186
deno: 1.26.1 -> 1.26.2 is duplicated in  196667 196566

awk is a bit limited with its comparison operator, if it’s not entirely numerical it compares the string, so it’s not always right, this will deserve a rewrite in a more proper language.

11 Likes

it’s now able to tell if a package has multiple opened updates

I fixed a few bugs, added the state of the PRs.

With this short script, it can aggregates more results for a search (and unrelated results while there)

#!/bin/sh

LOG=$(mktemp /tmp/dupdup.XXXXXXXXXXXXXXXXXXXXXXXX)
set -x

(
    for i in "" "$1"
    do
        gh pr list -L 1000 -s open -S "$i"
        gh pr list -L 1000 -s merged -S "$i"
    done
) | sort -u > "$LOG"

awk -F "\t" -f test.awk "$LOG"

rm "$LOG"