Data Mesh Governance / Policies / Isolation / Project Structure
Category: Isolation
Platform: BigQuery
For consistency, we want a uniform structure and naming of our BigQuery projects.
The structure must fit to BigQuery’s strict 3-level-hierarchy:
BigQuery has some naming restrictions: Project IDs must be 6-30 characters, contain letters, numbers, and hyphens and are globally unique, cannot be in use or have previously been used. Datasets and table names can contain up to 1024 characters, numbers and underscores.
We agree on a set of conventions for our BigQuery projects, datasets, and tables:
Format:
<orgname>[-<env>]-data-<domain>
Elements:
Examples:
acme-data-search
acme-data-articles
acme-data-checkout
acme-data-fulfillment
acme-test-data-checkout
A BigQuery dataset equals one data product.
Examples:
searches
searches_daily_top100
inventory
shelf_warmers
We use prefixes to structure the data models within a data product:
src_googleanalytics__activity_search
stg_googleanalytics__activity_search
_latest
or _history
, e.g. objects_users_latest
event_searches
manual_country_codes
agg_searches__total_by_day
Further naming conventions:
__
The BigQuery project structure can be set up through a self-service web-app, when a new data product is created.
A dbt hook can be implemented that makes sure that all models use the defined prefixes.