Towards an integrated search engine for the Nix community

Amid cries for more and better documentation, we are lost in finding information across the myriad of disparate resources. Information is outdated, information is inaccessible to search engines, we don’t know how to find information, we forget to look for it, and so forth.

I would like to ask for thoughts and input on putting together a nix-focused search engine. (Hey maybe we can finally use that fancy semantic data in the xml documentation! [citation needed])

A stretch feature I would like is to mark outdated information, and cross-reference it to new correct information.

(Do we have any librarians?)

As usual, I don’t have time in the next weeks, but maybe over time we can pull something together…baby steps as always?

Initial ideas for data sources that would need to be ingested:

  • nix, nixos, nixpkgs, nixops manuals

  • nix pills

  • the old and new wiki, and …all the other variations

  • I think I saw some manual pages that weren’t part of the main manual pages

  • somehow index nixcon talks, at least transcripts

  • irc logs

  • old and new and idk mailing lists

  • discourse

  • all the codebases; nix, nixpkgs, etc

  • all the github issues and stuff

  • third party blog posts

  • all commit messages

  • as a bonus, maybe the arch and gentoo wikis and such? :stuck_out_tongue:

Hark, hark! Now seeking writers of search engines, all manners of scrapers, and soforth.

7 Likes

We’d probably want to explicitly mark information from these sort of sources as “not NixOS-specific”, to warn people of that whatever information is specified there is strictly advisory and may not always apply to NixOS.

1 Like

On a small sidenote, discourse suggested Handy scripts for fuzzy searching nixpkgs and nixos-options to me.

3 Likes

I would be more than happy to polish those scripts up a bit more for a contribution to the repo.
I’ve continued tweaking them here and there over the months. I’ve gotten them to open source files, open a shell, or install into a user’s environment. I also added home-manager integration.

Admittingly in their current state they’re aimed at personal use and I expected folks to tweak them to their needs; but with a bit of work they could be repackaged for anybody to pick up and use out of the box.

1 Like

A first step would be a direct link to nixos-unstable doc on the website, without having to go through hydra maze.

NB: out of scope but I would love for nix search to finally find something xD

If nix search doesn’t work for you, take a look at `nix search` needs an explicitly name `nixpkgs` entry in `NIX_PATH` · Issue #140 · LnL7/nix-darwin · GitHub.

1 Like

was more thinking in general, unfree, nested packages (haskell/lua etc)

It happened twice to me that a PR was kept back because it broke something else and somebody just pushed the fix in a separate commit to the master because he didn’t know that a) a PR already exists b) the fix breaks other things.

Because of this I was thinking about some kind of web service which collects informations like github issues, states, docs, etc. and shows informations about package states, open issues for the package, etc. Maybe a service that crawls issues, manuals and xml docs could also connect and prettyprint such informations useful to contributors.

I would love to create such a crawler service but I’m mainly a system/network engineer and my coding knowledge is very limited…

3 Likes

you are previous me! I wanted this as well.

But then it grokked me: why should this be specific for Nix? Put any topic instead of Nix… and it turns out that Google is best approach for searching any topic, including Nix.

Just make everything indexed by Google. Or do you want to build own mini-google for Nix topic?

The “catalog”-style knowledge isn’t easy to maintain, you can make it usable (Nix docs, nixos.wiki), but never ideal.

A question back to you: which queries should this tool answer best?

1 Like

But then it grokked me: why should this be specific for Nix? Put any topic instead of Nix… and it turns out that Google is best approach for searching any topic, including Nix.

Well, for Nix we can reasonably estimate the reliability of sources; Google is usually worse than topic-specific ordering of sources.

1 Like

A very bad prototype thrown together over NixCon and the course of the week, certainly full of bugs: GitHub - deliciouslytyped/nix-metasearx-prototype: A demonstrator for a nix ecosystem search aggregator using Searx
Uses hound from @grahamc and the IRC logs from @samueldr - this may create a relatively large amount of requests, so please (ab)use responsibly.

See add some nix themed engines to subtree · deliciouslytyped/nix-metasearx-prototype@dcd6a0b · GitHub for changes to searx.
The “covered” backends are github, IRC, hound, and the wiki.

This example is kind of ok because there are few results, but for searches with a large mass of results…there are a lot of results - mostly spammed full of IRC. Searx does allow toggling engines.
Basically, this needs a lot of work.

Writing Searx backends is very simple for JSON ones, though there are some general (reasonable) limitations afaict(!) such as not being able to return HTML in a result, and I don’t see a way to aggregate multiple queries - but maybe that shouldn’t be Searx’s problem.

5 Likes

Note to self:

What about pushing the index data into something like Algolia? Although I’m not sure if their free-tier community pricing would cover the volume and demand. You’re limited to indexing 10k items and 50k searches a month judging from their pricing matrix.

1 Like

So it looks like I forgot to pin flask or something, and the debug mode patch may fail to apply for some people. I’ll probably forget to fix that.

Somewhat related GitHub - mlvzk/manix: A fast CLI documentation searcher for Nix.

4 Likes

Got error while building options cache. This fork is properly working