Sanitize your database when syncing between environments
If you wish to sanitize your database when syncing, it is enough to create a platform.sanitize.yml
file
and place it inside the application directory of the source environment whose database must be sanitized.
Tip: How to manage the platform.sanitize.yml
file
Use the scaffolding capabilities of platform to only place this file in, e.g. your production environment's application directory.
If this file is present for the sync's source environment, sanitization will be applied to the database during the synchronization process.
Sanitize File¶
- Filename:
platform.sanitize.yml
- Location: after scaffolding, should be located directly in the application directory.
- Structure (example):
mysqldump_path: <path-to-overridden-mysqldump> (only necessary on externally hosted environments)
structure_only:
- ^cache.*
- ^sessions$
sql:
table name:
column name:
expression: "NULL"
column name 2:
replace: userName
table name 2:
column name:
expression: CONCAT(column name 2, "-addendum")
column name 2:
replace: phoneNumber
mysqldump_path¶
mysqldump_path
is an optional field. By default, all our servers have the correct gdpr-mysqldump but for
externally hosted environments it will be necessary to provide this yourself. This can be added to your
projects via composer.
composer require druidfi/gdpr-mysqldump
And then add the path to the downloaded 'binary' in the platform.sanitize.yml
file. The path must be relative to
the application directory. If downloaded using composer, the path should be vendor/bin/mysqldump
.
structure_only¶
structure_only
is an optional field that determines which tables to only export the structure of, skipping
any data in those tables. Each entry is a regex, and any table that matches any of the regexes will only have its
structure synced, resulting in an empty table.
By default, for drupal projects, platform defaults to the following list:
- ^cache.*
- ^sessions$
If you define the structure_only key, that will overwrite the default, not add to it.
sql¶
The keys under this field are the tables you wish to sanitize. Inside each table you can then determine how to sanitize each column.
The columns that use the expression
key, will change the SELECT statement run during the dump. Think of it
as adding "SELECT column name as expression" to the MySQL dump command.
- You can add other column names, to overwrite the chosen column with data from the other column.
- You can set the expression to
"NULL"
to empty the column of data. - You can set the expression to a more complex statement, such as
CONCAT(uid, "@example.com")
to, for example, sanitize an email column.
The columns that use the replace
key, will be replaced after the SELECT statement has run. Each row will
be passed to Faker and the corresponding replacement generator called.
Unfortunately, there is currently no option to ensure the generated Faker data is unique, so it can not be relied on for columns that must have a unique value, such as ID, email, username, etc. Faker itself can do this, but the integration with the tool is not there yet.
A list of possible replace values can be found on the Faker documentation website. Note that it is impossible to pass arguments to the Faker formatters, so only use formatters that require none or where the default values are acceptable.
Important!
For expressions, it is necessary to correctly quote (or not quote) everything. The expressions are passed to a sql statement, so it follows those rules. If it is not quoted twice, it will be added to the statement without extra quotes and will be interpreted as a column name or special MySQL variable (e.g. NULL).
If you wish to add a string to the expression, quote it twice.
Examples:
sql:
users_field_data:
mail:
expression: CONCAT(`uid`, "@example.com") or "CONCAT(uid, \"@example.com\")" or CONCAT(uid, '@example.com')
name:
expression: uid or "uid" or "`uid`" or 'uid'
pass:
expression: "NULL" or 'NULL'
- Replace each email with the value of the UID column, concatenated with '@example.com'. e.g. '3456@example.com'
- Replace each username with the value of the UID column.
- NULL each password
sql:
users_field_data:
mail:
expression: CONCAT("uid", "@example.com") or "CONCAT(\"uid\", \"@example.com\")" or CONCAT('uid', '@example.com')
name:
expression: "\"uid\"" or "'uid'"
pass:
expression: "\"NULL\"" or "'NULL'"
- Replace each email with the string 'uid', concatenated with '@example.com': 'uid@example.com'
- Replace each username with the string 'uid'.
- Replace each password with the string 'NULL'