1
votes

How to execute a wildcard/RegEx search in Data Catalog (Google Cloud Platform) ?

  • It would make sense to search metadata across column names and tag attributes (and there values).

The current documentation only lists very strict search behavior e.g. for tag:data_gov_template.hasPII(=true)

  • Needed would be a result for "PII" - I don't care about specifying the exact template name etc.

e.g. labels:etl

  • if I only search for etl there is no result

(metadata/attributes and values is not searchable on a direct way?)

1
According to the documentation you shared, you can use name:x , which it will match all the entities which matches the predicate x. So this behaviour it is similar to wildcards. Does it addresses your question? Here is an overview of how Data Catalog works. - Alexandre Moraes
I updated my question with examples. You are right that predicate "x" is very broad (and not a controlled and precise search) - InLaw
e.g. ´column:difference.old_mode´ is not working even it is the exact name of the column - InLaw
@AlexandreMoraes the docs telling sometimes not much and are sometimes incorrect. Interesting what Google internally they think of the current state of Data Catalog (e.g. 22:50) youtube.com/watch?v=gCXgZ5ZkJeI - InLaw
After reading your update, in order for label:ets to work, you data assets should be labeld, such as explained here for BigQuery. Have you labelled the data assets you want to retrieve? label:etl returns your data assets that have this label and the label key has etl as a substring. - Alexandre Moraes

1 Answers

1
votes

From your use case, I understood that you want to search for a particular metadata attribute, like a Tag field, PII, right?

For tagged assets

If you don't care about the template name. You could use the tag:x search facet.

So if all your templates, data_gov_template, data_curator_template, data_etl_template, all contain the same Tag field name, has_pii, you can search using:

tag:has_pii and this will return all assets with that metadata attribute, no matter what the template name is.

For columns

You can use the column:x search facet to match a substring of the column name in the schema of the data asset. Which does not support nested columns yet.

For labels

You can use the labels:bar search facet for data assets that have a label (with some value) and the label key has bar as a substring.

You are also able to search on their values. So yes, the metadata/attributes and values are searchable.

But it is not a regex kind, it is a substring match when the search facet uses colon :, like labels:bar or an exact match when the search facet uses equals =, like type=table.