Lately, I’ve started to add tests for some graphical applications to
nixos/tests in order to reduce manual maintenance and improve reliability. I couldn’t find any documentation or tutorials, so I just read the existing tests (the one for
chromium is extensive) and fiddled around. There are some learnings and pitfalls, but I also have quite a few open questions. Maybe you can add your recommendations, or possibly resolve some questions?
What I usually do is:
- Start the application as unprivileged user
- Do one nontrivial thing (e.g. for an editor: minimally edit a file)
- Test whether that action had roughly the expected effect
This will hopefully already catch a number of bugs like “The program segfaults on start”. I’m not sure whether it’s a good idea to extend the test further. The more extensive the test, the more features are tested. However, the test suite gets more brittle (more random CI failures) and is harder to maintain (changes in program behaviour necessitate changes in the test suite). What’s your opinion on the best balance here?
- Some programs start with a splash screen or some kind of startup wizard, since they are launched in a pristine VM and thus have never been run before. Often you can (and for sanity’s sake should) disable that with a command line option, or a custom config file which you plant in the user’s home directory.
- Instead of clicking with a mouse button you can often use
send_key("ret")to press a button.
- Sometimes actions like clicking on something will take some time to have an effect.
machine.sleepis not the best way to await the changes for that effect though, because it opens the door for race conditions: Whether the waiting time is sufficient depends on CI speed. Rather, you should use something blocking to await the effect of the action, e.g.
- For more complex GUI interaction, especially with multiple windows, I guess one should really familiarise oneself with
xdotool. (I for my part haven’t done so yet.)
- During development, put plenty of calls to `machine.screenshot(“DebugN”), (where N=1, 2,… is a running index) in your test to see in what state it is.
- During development, wrap your calls to
wait_for_textuses OCR to detect whether text has appeared, however this can sometimes fail for smaller fonts. What’s a good alternative in these cases?
- If the test fails, it persists no screenshots, I believe. Likewise, you can’t access the screenshots while the test is running. How do I find out what happened in my failed test (short of commenting out all failures to get at the screenshots)?
- The development cycle is lengthy: Re-running a test takes in the order of a minute, rather than a few seconds, because it boots a whole new VM. This makes development painful.
- How do I debug a failed test on CI (ofborg)? In particular, how to debug an
aarch64test when I’m on
x86_64? I can’t get at the screenshots or other artifacts. The most I can do is to try and print something, but that won’t tell me in what state the currently open windows are.
- How do I click on a particular button in a dialogue (short of counting pixels and using