Should /bin/sh in R source code be patched?

R has a couple of hardcoded calls to /bin/sh:

which are not patched in https://github.com/NixOS/nixpkgs/tree/master/pkgs/applications/science/math/R. I suspect that may cause problems on non-NixOS systems when /bin/sh (usually symlinked to bash) is old. Is that a valid concern?

What I have in mind is something like

substituteInPlace src/unix/sys-unix.c --replace "/bin/sh" "${lib.makeBinPath [ bash ]}/bash"

What problems are you expecting? POSIX sh should be very stable, and the way it is used there is both very simple and fully compliant with parts of the spec that haven’t changed in at least two decades.

Edit: In fact, there have only ever been two changes to the spec, both just updating wording: Shell Command Language. Are you worried about bugs in bash’ POSIX-compat mode?

We have a tool in stdenv for this called patchShebangs and fixing this impurity is usually prudent. Sometimes we don’t when there’s a good reason, e.g. it makes cross compilation or bootstrapping harder.

This is not about patching shebangs though, the code in question uses /bin/sh -c to let users launch subprocesses with a full shell.

The question is whether that should still be avoided. Given those invocations don’t actually end up in compiled binaries or anything, it’s probably fine to substitute, and it does indeed get rid of a small impurity for R applications running with that runtime. I guess there’s not really a reason not to, even if problems with this should be extremely uncommon.

@alexv I think if you do substitute, it would be prudent to use the --posix flag at least, since bash will normally execute in POSIX mode if called from /bin/sh. Substituting with normal bash would change the behavior, albeit subtly.

I rebuilt R with the patch but it somehow still manages to call system /usr/bin/bash (not even /bin/sh or /bin/bash) anyway every time the R system functions is used. I ran it in gdb to confirm:

% R -d gdb
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /nix/store/azpwzr53y32s4xsxidvbqw4yf75bx4jh-R-4.1.0/lib/R/bin/exec/R...(no debugging symbols found)...done.
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /nix/store/azpwzr53y32s4xsxidvbqw4yf75bx4jh-R-4.1.0/lib64/R/bin/exec/R

<snip>

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Attaching after process 28960 vfork to child process 28960]
[New inferior 2 (process 28960)]
warning: File "/nix/store/saw6nkqqqfx5xm1h5cpk7gxnxmw9wk47-glibc-2.33-62/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
[Detaching vfork parent process 28955 after child exec]
[Inferior 1 (process 28955) detached]

process 28960 is executing new program: /usr/bin/bash

[Attaching after process 28961 fork to child process 28961]
[New inferior 3 (process 28961)]
[Detaching after fork from parent process 28960]
[Inferior 2 (process 28960) detached]
process 28961 is executing new program: /nix/store/2plwmkx42rv8lr1h4kmifqigv6qk4zn0-which-2.21/bin/which
Missing separate debuginfos, use: debuginfo-install bash-4.2.46-35.el7_9.x86_64
[Inferior 3 (process 28961) exited normally]
(gdb)

I got curious and searched for “bin/which” in their codebase and stumbled on this: https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/src/library/base/R/unix/system.unix.R#L114-L115

I think you might be seeing some kind of configure/build-time dependency searching. I also see a lot of stuff in the configure about SHELL and CONFIG_SHELL. I assume given the @WHICH@ form above that it substitutes at least some of these, so you might want to search the source for @SHELL@ or @CONFIG_SHELL@, or perhaps drop into the build after configure and search the codebase for “/usr/bin/bash”?

Could also maybe intentionally set SHELL or CONFIG_SHELL to something dumb in the Nix expression and see if it turns up inside the program?

2 Likes

The shell is defined here https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/configure.ac#L2764 which is supposed to honor https://github.com/NixOS/nixpkgs/blob/33c9fd85e52f925e05f4a6bd69811f97e5ccdc15/pkgs/applications/science/math/R/default.nix#L76 setting. Other tools get defined here https://github.com/wch/r-source/blob/79298c499218846d14500255efd622b5021c10ec/configure.ac#L672 where I don’t see anything suspicious.