Skip to content
Dropsolid Help Center home page
Product documentation
DevOps Cloud

Sanitize your database when syncing between environments

This feature is still in beta!

If you want to try out the sanitization options, add the druidfi/gdpr-mysqldump package to your project and fill in the mysqldump_path.

If you wish to sanitize your database when syncing, it is enough to create a platform.sanitize.yml file and place it inside the application directory of the source environment whose database must be sanitized.

Tip: How to manage the platform.sanitize.yml file

Use the scaffolding capabilities of platform to only place this file in, e.g. your production environment's application directory.

If this file is present for the sync's source environment, sanitization will be applied to the database during the synchronization process.

Sanitize File

  • Filename: platform.sanitize.yml
  • Location: after scaffolding, should be located directly in the application directory.
  • Structure (example):
mysqldump_path: <path-to-overridden-mysqldump> (only necessary on externally hosted environments)
structure_only:
  - ^cache.*
  - ^sessions$
sql:
  table name:
    column name:
      expression: "NULL"
    column name 2:
      replace: userName
  table name 2:
    column name:
      expression: CONCAT(column name 2, "-addendum")
    column name 2:
      replace: phoneNumber

mysqldump_path

mysqldump_path is an optional field. By default, all our servers have the correct gdpr-mysqldump1 but for externally hosted environments it will be necessary to provide this yourself. This can be added to your projects via composer.

composer require druidfi/gdpr-mysqldump

And then add the path to the downloaded 'binary' in the platform.sanitize.yml file. The path must be relative to the application directory. If downloaded using composer, the path should be vendor/bin/mysqldump.

structure_only

structure_only is an optional field that determines which tables to only export the structure of, skipping any data in those tables. Each entry is a regex, and any table that matches any of the regexes will only have its structure synced, resulting in an empty table.

By default, for drupal projects, platform defaults to the following list:

  - ^cache.*
  - ^sessions$

If you define the structure_only key, that will overwrite the default, not add to it.

sql

The keys under this field are the tables you wish to sanitize. Inside each table you can then determine how to sanitize each column.

The columns that use the expression key, will change the SELECT statement run during the dump. Think of it as adding "SELECT column name as expression" to the MySQL dump command.

  • You can add other column names, to overwrite the chosen column with data from the other column.
  • You can set the expression to "NULL" to empty the column of data.
  • You can set the expression to a more complex statement, such as CONCAT(uid, "@example.com") to, for example, sanitize an email column.

The columns that use the replace key, will be replaced after the SELECT statement has run. Each row will be passed to Faker and the corresponding replacement generator called.

Unfortunately, there is currently no option to ensure the generated Faker data is unique, so it can not be relied on for columns that must have a unique value, such as ID, email, username, etc. Faker itself can do this, but the integration with the tool is not there yet.

A list of possible replace values can be found on the Faker documentation website. Note that it is impossible to pass arguments to the Faker formatters, so only use formatters that require none or where the default values are acceptable.

Important!

For expressions, it is necessary to correctly quote (or not quote) everything. The expressions are passed to a sql statement, so it follows those rules. If it is not quoted twice, it will be added to the statement without extra quotes and will be interpreted as a column name or special MySQL variable (e.g. NULL).

If you wish to add a string to the expression, quote it twice.

Examples:

 sql:
   users_field_data:
     mail:
       expression: CONCAT(`uid`, "@example.com") or "CONCAT(uid, \"@example.com\")" or CONCAT(uid, '@example.com')
     name:
       expression: uid or "uid" or "`uid`" or 'uid'
     pass:
       expression: "NULL" or 'NULL'
The above will:

  • Replace each email with the value of the UID column, concatenated with '@example.com'. e.g. '3456@example.com'
  • Replace each username with the value of the UID column.
  • NULL each password

 sql:
   users_field_data:
     mail:
       expression: CONCAT("uid", "@example.com") or "CONCAT(\"uid\", \"@example.com\")" or CONCAT('uid', '@example.com')
     name:
       expression: "\"uid\"" or "'uid'"
     pass:
       expression: "\"NULL\"" or "'NULL'"
However, the above will:

  • Replace each email with the string 'uid', concatenated with '@example.com': 'uid@example.com'
  • Replace each username with the string 'uid'.
  • Replace each password with the string 'NULL'

  1. They will be available by default once this feature has been officially released. For now, contact us so we can install the correct mysqldump on the required servers. 

Send us your question

Do you still have unanswered questions or do you want to make some adjustments to your project? Don't hesitate to ask us your question at support@dropsolid.com or at +3293950290.