Building unsupported locales? (ja_JP.sjis)

Hey all,

I’m currently trying to run some legacy software under Wine. It uses the Japanese Shift-JIS locale, which is not POSIX compliant. As such, glibc does not officially support it.

However… in many Linux distributions it was possible to generate the locales anyways, despite this, by modifying the locale gen configuration and running locale gen. In NixOS, it seems this is pretty strictly forbidden. The following configuration:

  i18n = {
    ...
    defaultLocale = "en_US.UTF-8";
    supportedLocales = [
      "en_US.UTF-8/UTF-8"
      "ja_JP.SJIS/SJIS"
    ];
  };

…gives this error:

[...long list of supported locales...]
Error: unsupported locales detected:
ja_JP.SJIS/SJIS \
You should choose from the list above the error.
builder for '/nix/store/psjl4pzk47c5v4p0m0684vx42z9b2bzr-glibc-locales-2.27.drv' failed with exit code 1

I tried various strategies for overriding the buildPhase of glibc-locales, but it seems very hard to override this properly. Does anyone have any advice for how to approach this problem?

1 Like

I took a look at this. In the glibc-locales derivation, it checks to make sure all the locales are supported.

Disabling this check appears to make it so you can add unsupported locales:

diff --git a/pkgs/development/libraries/glibc/locales.nix b/pkgs/development/libraries/glibc/locales.nix
index 0dc19197415..2b08a91aa5c 100644
--- a/pkgs/development/libraries/glibc/locales.nix
+++ b/pkgs/development/libraries/glibc/locales.nix
@@ -39,13 +39,13 @@ callPackage ./common.nix { inherit stdenv; } {
         | sort > locales-supported.txt
       comm -13 locales-supported.txt locales-to-build.txt \
         > locales-unsupported.txt
-      if [[ $(wc -c locales-unsupported.txt) != "0 locales-unsupported.txt" ]]; then
-        cat locales-supported.txt
-        echo "Error: unsupported locales detected:"
-        cat locales-unsupported.txt
-        echo "You should choose from the list above the error."
-        false
-      fi
+      # if [[ $(wc -c locales-unsupported.txt) != "0 locales-unsupported.txt" ]]; then
+      #   cat locales-supported.txt
+      #   echo "Error: unsupported locales detected:"
+      #   cat locales-unsupported.txt
+      #   echo "You should choose from the list above the error."
+      #   false
+      # fi
 
       echo SUPPORTED-LOCALES='${toString locales}' > ../glibc-2*/localedata/SUPPORTED
     '' + ''

After doing nixos-rebuild build, I ended up with a glibc-locales with support for SJIS (although I didn’t actually reboot my system to check that this was fully working…):

$ strings /nix/store/pb0mznka084rvxyp89zcii272dx62xz0-glibc-locales-2.27/lib/locale/locale-archive | grep sjis
ja_JP.sjis
japanese.sjis

I recommend you send a PR adding an option to disable this check to the pkgs/development/libraries/glibc/locales.nix derivation. And then add another option to the nixos/modules/config/i18n.nix module that adds support for the same thing.

Thanks! I really need to learn how to test modifications to Nixpkgs better. Particularly, I’m not sure what to do when I want to fork a little bit off of nixpkgs temporarily.

I’ll try adding a switch, testing it out, and opening a PR for it.

BTW, are there good reasons for glibc folks not to directly support that locale? In other words, you might consider to also try convincing upstream to bless that combination. (I know nothing about Japanese encoding conventions, though I suppose non-UTF is always legacy.)

1 Like

Shift JIS is not ISO 2022 compliant, and therefore not POSIX compliant, so it will never be supported as far as I know.

1 Like

It’s been a while, but there never was a fantastic solution for this here. I wound up coming back to this recently. For reference, here’s what I wound up doing:

{ config, pkgs, lib, ... }:

{
  i18n = {
    defaultLocale = "en_US.UTF-8";
    glibcLocales = (pkgs.glibcLocales.overrideAttrs (finalAttrs: previousAttrs: {
      buildPhase = builtins.replaceStrings [ "false" ] [ "# false" ] previousAttrs.buildPhase;
    })).override {
      locales = config.i18n.supportedLocales;
      allLocales = false;
    };
    supportedLocales = [
      "en_US.UTF-8/UTF-8"
      "ja_JP.UTF-8/UTF-8"
      "ja_JP.SJIS/SHIFT_JIS"
    ];
  };
}

It’s a pretty fragile hack, but it bypasses the problem by just bypassing the error. It also seems to work just fine. I just figured I’d come back and mention this since I realized it shows up in Google when searching for this problem.

Although it’s been a few years now, I think now that I know my way around Nix better, I’d like to improve the i18n module/glibcLocales derivative to optionally allow unsupported locales to be built. It’d be a really nice improvement if I could sneak it in before NixOS 23.05 branches off…

2 Likes

I wish I could edit the post instead of bumping it, but I have some more context to add.

Firstly, at some point the relevant code changed from buildPhase to preBuild, so if anyone is running into issues with this hack that’s what you need to change.

Secondly, my stated reason for doing this (running Wine software with the Shift-JIS locale) seems to be pointless. If you select a ja_JP locale like ja_JP.UTF-8 Wine will map this to use the Shift-JIS encoding for non-UNICODE APIs. So actually it’s a lot more convenient to just use ja_JP.UTF-8; there might still be some reason to use ja_JP.SJIS, like older UNIX/Linux software, or possibly if you already have filenames on disk that use Shift-JIS encoding, but I think for the most part it’s pointless, and you’d probably be best re-encoding your filenames if that’s the case.

If you’re having some trouble believing this, try running LANG=ja_JP wine notepad. Notice that in the “Save As” dialog, the “Encoding” is set to `ANSI/OEM Japanese Shift-JIS". In fact, modern versions of Wine will spit out a bunch of errors if you attempt to use native Shift-JIS encoding:

err:environ:init_unix_codepage unrecognized charset 'SHIFT_JIS'

So yes. As it turns out, you probably don’t need or even want this hack. I wound up realizing this at some point, but forgot to update this thread.

2 Likes