TL;DR, would adding TIGER lines/shapefiles, US Census data, and EPA air quality data to pkgs/data be an acceptable contribution?
I am a part of the open-source Observational Health Data Sciences and Informatics (OHDSI) group. This is a group of researchers and owners of health data records that looks to standardize data formats and write study packages that can be run on any standard-conforming database to do large network studies without transmitting health records across the network.
As a part of that initiative, the GIS workgroup (which I am a member of) is looking to collect publicly available datasets and put them in a standard format to merge with patient data. Environmental data and US Census data are the first targets.
I immediately thought of nix when thinking about how to track data provenance and transformations, keeping track of source data updates and versions and tracking data dependencies in study packages. Initial work that I have done can be found here, with the most recent work taking place in the “svi” PR.