Camels USGS Streamflow
Load in Python
from intake import open_catalog
cat = open_catalog("https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/hydro/camels.yaml") ds = cat["usgs_streamflow_gcp"].to_dask()
Working with requester pays dataSeveral of the datasets within the cloud data catalog are contained in requester pays storage buckets. This means that a user requesting data must provide their own billing project (created and authenticated through Google Cloud Platform) to be billed for the charges associated with accessing a dataset. To set up an GCP billing project and use it for authentication in applications:
- Create a project on GCP; if this is the first time using GCP, a prompt will appear to choose a Google account to link to all GCP-related activities.
- Create a Cloud Billing account associated with the project and enable billing for the project through this account.
- Using Google Cloud IAM, add the Service Usage Consumer role to your account, which enables it to make billed requests on the behalf of the project.
- Through command line, install the Google Cloud SDK; this can be done using conda:
conda install -c conda-forge google-cloud-sdk
- Initialize the
gcloudcommand line interface, logging into the account used to create the aforementioned project and selecting it as the default project; this will allow the project to be used for requester pays access through the command line:
gcloud auth login gcloud init
- Finally, use
gcloudto establish application default credentials; this will allow the project to be used for requester pays access through applications:
gcloud auth application-default login
Dask DataFrame Structure:
Dask Name: from-delayed, 3 tasks