Equivalent of pip install apache-beam[gcp]?

It’s easy enough to get apache-beam running in a shell with a shell.nix file:

{ pkgs ? import <nixpkgs> {} }:
let
  python-with-packages = pkgs.python310.withPackages(ps: with ps; [ 
      apache-beam
      grpcio
    ]);
in python-with-packages.env

However, running Apache Beam jobs that make use of GCP (e.g. loading files from storage buckets) requires installing apache-beam with extensions.

This is done in pip with pip install apache-beam[gcp]

Reading through the docs at:
https://github.com/NixOS/nixpkgs/blob/7b08f3fd0dad6e07a1611401fb8772a2469e64ac/doc/languages-frameworks/python.section.md#optional-extra-dependencies and trying to piece together the example at NixOS: Installing Python packages with optional dependencies it seems that I need to use optional-dependencies, but I don’t get any joy with that.

{ pkgs ? import <nixpkgs> {} }:
let
  python-with-packages = pkgs.python310.withPackages(ps: with ps; [ 
      apache-beam
      grpcio
    ] ++ apache-beam.optional-dependencies.gcp);
in python-with-packages.env

I get an error of:

error: attribute 'optional-dependencies' missing

The same is true of this approach, of using an override: python - How to enable an optional build dependency using nix-shell? - Stack Overflow

nix-shell -p "python310Packages.apache-beam.override { gcp = true; }"

Only results in a different error:

error: anonymous function at /nix/store/iffdwx6q15hf9nvgq821mci3n50czfq6-nixpkgs/nixpkgs/pkgs/development/python-modules/apache-beam/default.nix:1:1 called with unexpected argument 'gcp'

What am I doing wrong, and how could I have better troubleshooted this issue?

Looking at the Nix source of the apache-beam packages available, it doesn’t look like they have an optional-dependencies attribute (hence the error).

I could be missing something though. It doesn’t really help solve your problem, but it explains the error your seeing.

I can’t really dig into it further at the moment, but that might help you make progress.

Ah, I thought it was something that all Python packages would have, since it was in the packaging docs.

But now, looking at the commit history of nixpkgs, I can see numerous commits that add optional-dependencies to Python packages explicitly, for example:

https://github.com/NixOS/nixpkgs/commit/456cf812715ba11af7c7374171358b1cb2c7c619

However, looking at these examples, the packages that they’re passing through are other python3x Nix packages, which implies that the gcp package itself would need to exist, and I can’t see one called gcp in nixpkgs.

So, some progress on understanding, if not solution! Thanks. :slight_smile:

Yes, we need to manually transfer that definition to reference our packages. You can see what packages are behind the gcp optional here:

It is very likely we have most of those packaged up, and they just need to be wired up correctly. Feel free to chime in.

Aha! Thanks very much.

Taking that gcp section of the map, and adding it directly in, I get a successful result!

{ pkgs ? import <nixpkgs> {} }:
let
  python-with-packages = pkgs.python39.withPackages(ps: with ps; [ 
      # [gcp] optionals.
      cachetools
      google-apitools
      google-auth
      google-auth-httplib2
      google-cloud-datastore
      google-cloud-pubsub
      # google-cloud-pubsublite - Not found
      google-cloud-bigquery
      google-cloud-bigquery-storage
      google-cloud-core
      google-cloud-bigtable
      google-cloud-spanner
      google-cloud-dlp
      google-cloud-language
      google-cloud-videointelligence
      google-cloud-vision
      # google-cloud-recommendations-ai - Not found
      # End of [gcp] section.
      google-cloud-storage
      apache-beam
      grpcio
    ]);
in pkgs.mkShell {
  nativeBuildInputs = [ 
    pkgs.gcc-unwrapped.lib
    pkgs.poetry
    python-with-packages
  ];
}

Running nix-shell I was able to get into a shell that contained Python etc.

Running my Beam job with python3 ./wordcount.py resulted in success, once I’d adjusted the wordcount.py example to write to a directory on my computer that exists.

The test Beam job I’m using is: https://github.com/apache/beam/blob/6083f748a02fb64323ee0a0a43a74bf06d963cd0/sdks/python/apache_beam/examples/wordcount.py

def main(argv=None, save_main_session=True):
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--input',
      dest='input',
      default='gs://dataflow-samples/shakespeare/kinglear.txt',
      help='Input file to process.')
  parser.add_argument(
      '--output',
      dest='output',
      default='/home/blah/blah/dataflow/op.txt',

Thanks for the help.

2 Likes