0
votes

Our company is creating multi-tenant products and services under our own Google Cloud Platform Account/Organization. Close to 90% of data will be managed and stored within this one account. But each of our customers has their own GCP Account/Organization, and roughly 10% of the total data will come from their end of things (via GCP Storage, databases, etc). Customers will also have their own unrelated data and services, hence the need to use separate accounts.

The data quantity could be as low as 1GB per day or high as 100GB per day, depending on the size of the customer. The data will generally be numerous large files between 100 and 500MB (CSV/row-based data).

What are strategies to safely and efficiently share data between two or more GCP Accounts? Is there something native within GCP that allows for this and helps manage users/permissions, or do we need to build our own APIs/services as if we were communicating with someone else external to GCP?

2
How do you want to share data? Is it file on Cloud Storage? Database? BigQuery Dataset? How are structured the data? Does all GCP account can access to all data? or only to a subset of data?guillaume blaquiere
@guillaumeblaquiere - The data most likely will come from files on Cloud Storage. The data will be a dump of tens/hundreds of CSV files (50-300MB each, roughly). I would like to prioritize fetching that data first. We have directing power to tell our customers where to put the data, so that's a perk. We can tell them to put the files at X and then pull from X. Given these will be simple data files, my instinct was to go with Storage.P_impl55
You can use Cloud Storage for data sharing and collaboration. You can also look at this document to make proper strategies for your company.Md Daud Walizarif

2 Answers

0
votes

GCP has a shared VPC concept (https://cloud.google.com/vpc/docs/shared-vpc) that allows you to create a shared network between projects, so you can share resources using internal IPs between projects. This isn't useful for sharing data between accounts though, it is for sharing it inside one organization with multiple projects for different departments.

AFAIK, for sharing data between accounts you have to use VPC Peering (https://cloud.google.com/vpc/docs/vpc-peering) or go through the internet. With peering your data doesn't leave Google's network and it is used by 3rd parties like MongoDB that sell their own cloud platform that actually runs on GCP (and other cloud vendors).

If your actually data is just files though, I don't think there is much risk in going over the internet and using cloud storage. There are many strategies for securing this type of data transfer.

0
votes

The Google cloud resources and IAM role are enough for segregating the data.

  • For cloud storage, create a bucket per customer. Grant the correct accounts (user or service) on the bucket to allow a customer to see one, or several (in case of merge for example)
  • For bigquery, create a dataset per customer and apply the same IAM policy as before.
  • For Cloud SQL, it's more tricky, because not bind to IAM role. Create a database per customer and play with database user right for granting the access.

Remember, that IAM perform authentication, and authorization only on GCP resources. You can't have custom authorization with IAM. If it's a requirement, you have to implement by yourselves this checks.

In my company, we use Firestore for storing the authorization and the user profiles. The authentication is ensure by GCP (IAP for example) and we use the email of the user as key for the authorizations.