Is it possible to extract data (to google cloud storage) from a shared dataset (where I have only have view permissions) using the client APIs (python)?
I can do this manually using the web browser, but cannot get it to work using the APIs.
I have created a project (MyProject) and a service account for MyProject to use as credentials when creating the service using the API. This account has view permissions on a shared dataset (MySharedDataset) and write permissions on my google cloud storage bucket. If I attempt to run a job in my own project to extract data from the shared project:
job_data = {
'jobReference': {
'projectId': myProjectId,
'jobId': str(uuid.uuid4())
},
'configuration': {
'extract': {
'sourceTable': {
'projectId': sharedProjectId,
'datasetId': sharedDatasetId,
'tableId': sharedTableId,
},
'destinationUris': [cloud_storage_path],
'destinationFormat': 'AVRO'
}
}
}
I get the error:
googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json returned "Value 'myProjectId' in content does not agree with value sharedProjectId'. This can happen when a value set through a parameter is inconsistent with a value set in the request.">
Using the sharedProjectId in both the jobReference and sourceTable I get:
googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json returned "Access Denied: Job myJobId: The user myServiceAccountEmail does not have permission to run a job in project sharedProjectId">
Using myProjectId for both the job immediately comes back with a status of 'DONE' and with no errors, but nothing has been exported. My GCS bucket is empty.
If this is indeed not possible using the API, is there another method/tool that can be used to automate the extraction of data from a shared dataset?
* UPDATE *
This works fine using the API explorer running under my GA login. In my code I use the following method:
service.jobs().insert(projectId=myProjectId, body=job_data).execute()
and removed the jobReference object containing the projectId
job_data = {
'configuration': {
'extract': {
'sourceTable': {
'projectId': sharedProjectId,
'datasetId': sharedDatasetId,
'tableId': sharedTableId,
},
'destinationUris': [cloud_storage_path],
'destinationFormat': 'AVRO'
}
}
}
but this returns the error
Access Denied: Table sharedProjectId:sharedDatasetId.sharedTableId: The user 'serviceAccountEmail' does not have permission to export a table in dataset sharedProjectId:sharedDatasetId
My service account now is an owner on the shared dataset and has edit permissions on MyProject, where else do permissions need to be set or is it possible to use the python API using my GA login credentials rather than the service account?
* UPDATE *
Finally got it to work. How? Make sure the service account has permissions to view the dataset (and if you don't have access to check this yourself and someone tells you that it does, ask them to double check/send you a screenshot!)