Towards an integrated search engine for the Nix community

Amid cries for more and better documentation, we are lost in finding information across the myriad of disparate resources. Information is outdated, information is inaccessible to search engines, we don’t know how to find information, we forget to look for it, and so forth.

I would like to ask for thoughts and input on putting together a nix-focused search engine. (Hey maybe we can finally use that fancy semantic data in the xml documentation! [citation needed])

A stretch feature I would like is to mark outdated information, and cross-reference it to new correct information.

(Do we have any librarians?)

As usual, I don’t have time in the next weeks, but maybe over time we can pull something together…baby steps as always?

Initial ideas for data sources that would need to be ingested:

  • nix, nixos, nixpkgs, nixops manuals

  • nix pills

  • the old and new wiki, and …all the other variations

  • I think I saw some manual pages that weren’t part of the main manual pages

  • somehow index nixcon talks, at least transcripts

  • irc logs

  • old and new and idk mailing lists

  • discourse

  • all the codebases; nix, nixpkgs, etc

  • all the github issues and stuff

  • third party blog posts

  • all commit messages

  • as a bonus, maybe the arch and gentoo wikis and such? :stuck_out_tongue:

Hark, hark! Now seeking writers of search engines, all manners of scrapers, and soforth.

2 Likes

We’d probably want to explicitly mark information from these sort of sources as “not NixOS-specific”, to warn people of that whatever information is specified there is strictly advisory and may not always apply to NixOS.

On a small sidenote, discourse suggested Handy scripts for fuzzy searching nixpkgs and nixos-options to me.

2 Likes

I would be more than happy to polish those scripts up a bit more for a contribution to the repo.
I’ve continued tweaking them here and there over the months. I’ve gotten them to open source files, open a shell, or install into a user’s environment. I also added home-manager integration.

Admittingly in their current state they’re aimed at personal use and I expected folks to tweak them to their needs; but with a bit of work they could be repackaged for anybody to pick up and use out of the box.

A first step would be a direct link to nixos-unstable doc on the website, without having to go through hydra maze.

NB: out of scope but I would love for nix search to finally find something xD

If nix search doesn’t work for you, take a look at https://github.com/LnL7/nix-darwin/issues/140.

was more thinking in general, unfree, nested packages (haskell/lua etc)

It happened twice to me that a PR was kept back because it broke something else and somebody just pushed the fix in a separate commit to the master because he didn’t know that a) a PR already exists b) the fix breaks other things.

Because of this I was thinking about some kind of web service which collects informations like github issues, states, docs, etc. and shows informations about package states, open issues for the package, etc. Maybe a service that crawls issues, manuals and xml docs could also connect and prettyprint such informations useful to contributors.

I would love to create such a crawler service but I’m mainly a system/network engineer and my coding knowledge is very limited…

2 Likes

you are previous me! I wanted this as well.

But then it grokked me: why should this be specific for Nix? Put any topic instead of Nix… and it turns out that Google is best approach for searching any topic, including Nix.

Just make everything indexed by Google. Or do you want to build own mini-google for Nix topic?

The “catalog”-style knowledge isn’t easy to maintain, you can make it usable (Nix docs, nixos.wiki), but never ideal.

A question back to you: which queries should this tool answer best?

But then it grokked me: why should this be specific for Nix? Put any topic instead of Nix… and it turns out that Google is best approach for searching any topic, including Nix.

Well, for Nix we can reasonably estimate the reliability of sources; Google is usually worse than topic-specific ordering of sources.