2
votes

Question: can wildcards be used in GCS bucketnames with gsutil?

I want to grab multiple files in GCS using wildcards that are split across buckets. But, I'm consistently running into errors when using wildcards in bucket names with gsutil. I'm using wildcards like this:

gsutil ls gs://myBucket-abcd-*/log/data_*

I want to match all these file names (variations in bucket name AND in object name):

gs://myBucket-abcd-1234/log/data_foo.csv
gs://myBucket-abcd-1234/log/data_bar.csv
gs://myBucket-abcd-5678/log/data_foo.csv
gs://myBucket-abcd-5678/log/data_bar.csv

Documentation on Bucket Wildcards tells me I should be able to use wildcards both in the bucketname and object name, but the code sample above always gets "BadRequestException: 400 Invalid argument."

gsutil is otherwise working when I use no wildcards or use wildcards in the object name only. But adding a wildcard to the bucket name results in the error. Are there workarounds to make the wildcard work in bucket names, or am I misinterpreting the linked documentation?

2
The wildcard on buckets and objects will work. I have tested it with my project. You can run gsutil -DD flag to get more debugging information. This issue seems to be related to ACLs set on objects or your buckets. Make sure you have permission to view these objects or buckets.Faizan
Wildcards should work. If "gsutil -DD ls your-wildcard..." doesn't help you understand what is wrong, please email the output of the gsutil -DD ls command to [email protected] and I'll take a look.Travis Hobrla

2 Answers

3
votes

Found that not being able to use bucket wildcards in this case is working as intended, and is due to differences in permission settings. Google Cloud Storage permissions can be set at both bucket and project levels.

Though the access token used in this case can access every individual bucket, it doesn't have reader/editor/owner access to the top-level project (shared across many users of the system). Without access to the project, wildcards cannot be used on buckets.

This can be fixed by having a project owner add the user as a reader/editor/owner to the project.

In this case, for security reasons we can't give an individual token access to all buckets in the project, but its helpful to understand why the wildcard didn't work. Thanks all for the input, and especially Travis for the contact.

1
votes

Some shells (Zsh) is trying to expand the * and ** , so you need to include these inside quotation marks. Like this

gsutil ls 'gs://myBucket-abcd-*/log/data_*'

I found it here gsutil returning "no matches found"