How do I avoid mojibake in NixOS?

I’m using the unstable nixpkgs, and after a recent upgrade my system once again has problems with locale settings, showing mojibake instead of non-ascii characters:


and reporting errors like this:

[rkb@nixos:/etc/nixos]$ sudo nixos-rebuild switch
building Nix...
building the system configuration...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_MEASUREMENT = "en_SE.UTF-8",
        LC_NUMERIC = "en_SE.UTF-8",
        LC_TIME = "en_SE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
activating the configuration...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_MEASUREMENT = "en_SE.UTF-8",
        LC_NUMERIC = "en_SE.UTF-8",
        LC_TIME = "en_SE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
setting up /etc...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "",
        LC_ALL = (unset),
        LC_MEASUREMENT = "en_SE.UTF-8",
        LC_NUMERIC = "en_SE.UTF-8",
        LC_TIME = "en_SE.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
reloading user units for sddm...
reloading user units for rkb...
setting up tmpfiles

I’d had the same problem earlier and fixed it by applying this multi-glibc-locale-paths.nix.

Now it seems the problem is that glibc is up to 2.31, and the locale_archive file seems to be incompatible again.

I’m not worried about several hundred mb of locale data, I want things to work ― is there some option I can pass? Or do I have to go down the rabbit-hole of fiddling with glibc?


Related:

2 Likes

huh, strace locale gives a lot of ENOENT for en_US paths in glibc’s part of the Nix store… is that a clue?

[rkb@nixos:~/projects/nixpkgs]$ strace -e trace=file locale
execve("/run/current-system/sw/bin/locale", ["locale"], 0x7ffe7819bd90 /* 83 vars */) = 0
access("/etc/ld-nix.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/haswell/x86_64", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/haswell", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/x86_64", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/tls", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/haswell/x86_64", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/haswell", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/x86_64", 0x7ffcd952f2b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/nix/store/8kw71xqs5bz6s68dylv6y13082zn0023-glibc-locales-2.27/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/locale/en_SE.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/locale/en_SE.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/locale/en_SE/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/locale/en.UTF-8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/locale/en.utf8/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/lib/locale/en/LC_MEASUREMENT", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_SE.UTF-8
LC_TIME=en_SE.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT=en_SE.UTF-8
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
+++ exited with 0 +++
$ ls /nix/store/bpgdx6qqqzzi3szb0y3di3j3660f3wkj-glibc-2.31/share/locale/
be  bg  ca  cs  da  de  el  en_GB  eo  es  fi  fr  gl  hr  hu  ia  id  it  ja  ko  locale.alias  lt  nb  nl  pl  pt  pt_BR  ru  rw  sk  sl  sv  tr  uk  vi  zh_CN  zh_TW

Why would I have en_GB and all the rest, but not en_US?

I’m already setting supportedLocales = [ "all" ]; in configuration.nix… what else should I do?

i18n = {
    defaultLocale = "en_US.UTF-8";
    extraLocaleSettings = {
      LC_MEASUREMENT = "en_SE.UTF-8";
      LC_NUMERIC = "en_SE.UTF-8";
      # For dates formatted like ISO8601
      LC_TIME = "en_SE.UTF-8";
    };
    supportedLocales = [ "all" ];
  };

I feel like those incomplete glibc store folders are coming from the default options in the multi-glibc-locale-paths.nix, and if I could set i18n.supportedLocales = [ "all" ]; in those derivations, they’d include all the locales…

But I’m not sure how to express that, or how / where to poke around to find out. Here’s what I’m trying so far (it doesn’t run, I’m hitting errors like cannot coerce a set to a string):

  # A random Nixpkgs revision *after* the default glibc
  # was switched to version 2.31.x.
  newerpkgsSrc = pkgs.fetchFromGitHub {
    owner = "nixos";
    repo = "nixpkgs";
    rev = "9cd98386a38891d1074fc18036b842dc4416f562";
    sha256 = "0zanfgvsnvca39c44svfzy6v0p4vl3k38kq94vyv541vcbxmcdpr";
  };

  newerPkgs = import newerpkgsSrc { };

  glibc231 = newerPkgs.glibcLocales.overrideAttrs
    (oldAttrs: rec { i18n.supportedLocales = [ "all" ]; });

in {
  environment.sessionVariables = {
    LOCALE_ARCHIVE_2_11 = "${oldpkgs.glibcLocales}/lib/locale/locale-archive";
    LOCALE_ARCHIVE_2_27 = "${newpkgs.glibcLocales}/lib/locale/locale-archive";
    LOCALE_ARCHIVE_2_31 = "${glibc231}/lib/locale/locale-archive";
  };
}

I’m not sure whether that’s the issue: $out/share/locale/en_US doesn’t exist on glibc-2.30 either, AFAIU the supported locales are in $out/share/i18n/locale.

But I’m not sure how to express that, or how / where to poke around to find out. Here’s what I’m trying so far:

Without having tested it, this seems reasonable (after looking at the multi-glibc-locale-paths.nix repo). Does this fix the issue?


Apart from that, I can’t really reproduce it locally I’m afraid (since I’m using 20.03 with a bunch of packages from master). Did you manage to isolate this issue in a VM which is buildable using nixos-build-vms(8)?

Trying to, but the VMs take forever to start up… is there some trick to having nixos-build-vms or the ./result/bin/nixos-run-vms script use hardware acceleration?

I’ve added

  virtualisation.libvirtd.enable = true;
  boot.kernelModules = [ "kvm-intel" ];

to my configuration.nix, and I’m in the libvirtd group, but the VMs take more than ten minutes and still don’t show the desktop…

ok, stripping out the GUI stuff made the startup time tolerable (but still several minutes :tired_face: )

This config should demonstrate the issue ― opening the /etc/some-unicode file, you ought to see some characters like this:
image

but instead you’ll see something like

network.nix

{ mojibake =
{ config, pkgs, ... }:
{
  environment.etc."some-unicode" = {
    mode = "0555";
    text = ''
      # taken from https://raw.githubusercontent.com/minimaxir/big-list-of-naughty-strings/master/blns.txt

      #	Two-Byte Characters
      #
      #	Strings which contain two-byte characters: can cause rendering issues or character-length issues
      
      田中さんにあげて下さい
    '';
  };

  imports = [ # Include the results of the hardware scan.
  ];

  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  networking.hostName = "nixos"; # Define your hostname.

  networking.useDHCP = false;
  networking.interfaces.enp0s31f6.useDHCP = true;

  # Select internationalisation properties.
  i18n = {
    defaultLocale = "en_AU.UTF-8";
    extraLocaleSettings = {
      #LANGUAGE = "en_US.UTF-8";
      #LC_MEASUREMENT = "en_SE.UTF-8";
      #LC_NUMERIC = "en_SE.UTF-8";
      # For dates formatted like ISO8601
      LC_TIME = "en_SE.UTF-8";
    };
    #glibcLocales = pkgs.glibcLocales;
    #glibcLocales = pkgs.buildPackages.glibcLocales.override {
    #      allLocales = true;
    #      #locales = ["all"];
    #    };
    supportedLocales = [ "all" ];
  };

  console = {
    font = "Lat2-Terminus16";
    keyMap = "us";
  };

  time.timeZone = "Australia/Sydney";

  environment.systemPackages = with pkgs; [
    direnv
    git
    glibcLocales
    nixfmt
    ntfs3g
    stow
    tldr
    vim
    wget
  ];

  programs.mtr.enable = true;
  services.openssh.enable = true;
  sound.enable = false;
  hardware.pulseaudio.enable = false;
  hardware.bluetooth.enable = false;
  services.xserver.enable = false;
  services.xserver.layout = "us";
  services.xserver.xkbOptions = "eurosign:e";
  services.xserver.displayManager.sddm.enable = false;
  services.xserver.desktopManager.plasma5.enable = false;

  users.mutableUsers = false;
  users.users.rkb = {
    isNormalUser = true;
    password = "rkb";
    extraGroups = [ 
      "wheel" # Enable ‘sudo’ for the user.
    ];
  };

  system.stateVersion = "20.03"; # Did you read the comment?
  nixpkgs.config.allowUnfree = true;
};
}

oh and

No, I haven’t been able to figure out how to do that, only got expression errors.

A few notes after a brief look:

  • The locale en_SE doesn’t seem to exist (neither on glibc-2.30)
  • You don’t have to place the full cfg from nixos-generate-config into such a VM, it’s sufficient to do { vmname = { pkgs, ... }: { /* only relevant config */ }; }.
    For instance, you’re missing in your example the eth1 interface which is used by the test-driver to connect to QEMU (in fact, nixos-build-vms takes care of networking, booting etc).
  • The locale issue you’re describing seems only related to the font on the terminal. For instance, I had the same broken chars when building the VM with latest nixos-20.03 rather than master.

If it’s a desktop-related problem, please try to create a minimal setup for it. You can increase the memory size using virtualisation.memorySize = 4096; if needed. The VM should properly boot up when leaving out all the unnecessary configs from nixos-generate-config.

2 Likes

The locale en_SE doesn’t seem to exist (neither on glibc-2.30)

Huh, I’m not sure where I saw the advice to use Swedish locale to get dates like YYYY-MM-DD; indeed it’s not in this supported list… maybe I got myself tangled up by using KDE’s region settings?

You don’t have to place the full cfg from nixos-generate-config into such a VM,

Oh wow, that’s much faster! Thanks!

The locale issue you’re describing seems only related to the font on the terminal.

Yeah, that looks like separate issue. Ok, I think I’ve found the crux of it. With this config:

{ mojibake =

# Edit this configuration file to define what should be installed on
# your system.  Help is available in the configuration.nix(5) man page
# and in the NixOS manual (accessible by running ‘nixos-help’).

{ config, pkgs, ... }:

{
  environment.etc."some-unicode" = {
    mode = "0555";
    text = ''
      # taken from https://raw.githubusercontent.com/minimaxir/big-list-of-naughty-strings/master/blns.txt

      #	Two-Byte Characters
      #
      #	Strings which contain two-byte characters: can cause rendering issues or character-length issues

      田中さんにあげて下さい
    '';
  };
  virtualisation.memorySize = 4096;

  i18n = {
    defaultLocale = "en_AU.UTF-8";
    extraLocaleSettings = {
      # For dates formatted like ISO8601
      # https://serverfault.com/a/17184/276263
      LC_TIME =
      # This one causes "mojibake" (because it doesn't exist?)
        "en_SE.UTF-8";
      # This one works fine
        #"en_DK.UTF-8";
    };
    supportedLocales = [ "all" ];
  };

  console = {
    font = "Lat2-Terminus16";
    keyMap = "us";
  };

  services.xserver.enable = true;
  services.xserver.layout = "us";
  services.xserver.xkbOptions = "eurosign:e";
  services.xserver.displayManager.sddm.enable = true;
  services.xserver.desktopManager.plasma5.enable = true;

  programs.mtr.enable = true;

  users.mutableUsers = false;

  users.users.rkb = {
    isNormalUser = true;
    password = "rkb";
    extraGroups = [
      "wheel" # Enable ‘sudo’ for the user.
    ];
  };

  system.stateVersion = "20.03";
};
}

selecting en_SE for the LC_TIME causes programs like nano (but not cat?) to show mojibake:

Changing the nix config to en_DK allows nano to render it correctly:

Where I think I got myself in trouble, and where en_SE came from, was selecting the “Sweden - English” locale via KDE’s Region Settings:

(in that screenshot, the nix config still specifies en_DK)

It took me longer than I’d like to connect those dots… choosing certain locale options can cause programs to show gibberish. (Is this expected behaviour?)


So thanks @Ma27, for helping me find the problem!

2 Likes