background

MySQL Change Database Encoding

License: GPL v3Ruby

The Challenge of Database Encoding

In the world of data, character encoding is a foundational element, dictating how text is stored and retrieved. Historically, many MySQL databases adopted utf8 as their character set. However, as digital communication evolved, the need to support a broader range of characters—especially emojis and complex international scripts—highlighted a limitation: utf8 in MySQL only supports up to 3-byte characters. This led to the widespread adoption of utf8mb4, which provides full 4-byte Unicode support.

Migrating a live production database from utf8 to utf8mb4 (or any other encoding) presents a significant challenge. Direct ALTER TABLE operations can lock tables for extended periods, leading to unacceptable downtime for critical applications. The goal, therefore, is to achieve this migration safely, efficiently, and with minimal disruption.

Introducing mysql_change_database_encoding.rb

This Ruby tool is designed to address these challenges, providing a robust solution for changing MySQL database encoding and collation. It prioritizes minimal disruption to production systems by integrating with online schema change tools, ensuring your database remains operational throughout the migration process.

Core Capabilities

Installation

Prerequisites

Setup

git clone https://github.com/durableprogramming/mysql_change_database_encoding.git
cd mysql_change_database_encoding
bundle install

Usage: Migrating Your MySQL Database

This section details how to use mysql_change_database_encoding.rb to perform database migrations. We will cover setting up necessary environment variables, understanding command-line options, and walking through practical examples. Remember to always test your migration on a staging environment before applying it to production.

Environment Variables

The tool relies on standard MySQL environment variables for database connection details. These variables allow you to configure access without hardcoding credentials directly into your scripts, enhancing security and flexibility. If not explicitly set, default values will be used where applicable.

Command Options

These command-line options control the behavior of the migration tool:

Basic Migration

To change your database encoding and collation using direct ALTER TABLE statements (without pt-online-schema-change), you can use the following command. This approach is suitable for development environments or databases where downtime is acceptable.

MYSQL_PASSWORD=your_password MYSQL_DATABASE=your_database \
ruby mysql_change_database_encoding.rb \
--collation utf8mb4_unicode_ci --encoding utf8mb4

Online Schema Change

For production systems where zero-downtime is a requirement, enable the online schema change functionality by including the --osc flag. This will leverage pt-online-schema-change to perform the migration with minimal disruption.

MYSQL_PASSWORD=secure_password MYSQL_DATABASE=legacy_database \
ruby mysql_change_database_encoding.rb \
--collation utf8mb4_unicode_ci --encoding utf8mb4 --osc

Practical Migration Scenarios

Here are some common scenarios for using the tool:

Migrate to UTF8MB4

The most common migration path involves moving from utf8 to utf8mb4 for full Unicode support, including emojis:

MYSQL_PASSWORD=pwd MYSQL_DATABASE=myapp \
ruby mysql_change_database_encoding.rb \
--encoding utf8mb4 --collation utf8mb4_unicode_ci --osc

Update Collation Only

If your encoding is already correct, you can change only the collation without modifying the character set:

MYSQL_PASSWORD=pwd MYSQL_DATABASE=myapp \
ruby mysql_change_database_encoding.rb \
--collation utf8mb4_unicode_ci

Legacy Database Migration

Migrate older latin1 databases to a modern encoding like utf8mb4:

MYSQL_PASSWORD=pwd MYSQL_DATABASE=legacy_app \
ruby mysql_change_database_encoding.rb \
--encoding utf8mb4 --collation utf8mb4_unicode_ci --osc

Deep Dive: Online Schema Change with pt-online-schema-change

Migrating a database schema, especially for large tables in a live production environment, can be a daunting task. Traditional ALTER TABLE operations often require exclusive locks, leading to significant downtime. This is where online schema change tools become indispensable. pt-online-schema-change, part of the Percona Toolkit, is a powerful utility designed to perform schema modifications without blocking reads or writes on the original table.

When you use the --osc flag with mysql_change_database_encoding.rb, the tool intelligently integrates with pt-online-schema-change to orchestrate a zero-downtime migration. This approach is crucial for applications that demand continuous availability and cannot afford maintenance windows.

Here’s how pt-online-schema-change typically operates, minimizing disruption:

  1. Create a New Table: A new, empty table is created with the desired schema (including the new encoding and collation).
  2. Copy Data Incrementally: Data is copied from the original table to the new table in small chunks. During this process, triggers are placed on the original table to capture and apply any changes (INSERTs, UPDATEs, DELETEs) that occur while the copy is in progress.
  3. Atomic Table Swap: Once all data is copied and synchronized, the original table is atomically swapped with the new table. This is a very fast operation, typically involving a RENAME TABLE command, which minimizes the actual downtime to milliseconds.
  4. Minimize Locking: Throughout the process, pt-online-schema-change strives to acquire only very brief metadata locks, allowing your application to continue operating on the original table almost uninterrupted.

This sophisticated process ensures that your database remains responsive, even during complex encoding migrations, making it a cornerstone for production-ready database management.

Recommendations for a Smooth Migration

Migrating database encoding is a critical operation that requires careful planning and execution. Adhering to these recommendations will help ensure a smooth, safe, and successful migration, minimizing risks to your production environment.

Before Migration

  1. Backup your database: This is the most crucial step. Always create a full, verified backup of your database before initiating any encoding changes. This provides a safety net in case of unforeseen issues.
  2. Test in staging: Never run a migration directly on production without prior testing. Execute the entire migration process on a staging or development database that mirrors your production environment. This allows you to identify and resolve potential problems without impacting live data.
  3. Check disk space: Online schema changes, particularly when using pt-online-schema-change, require additional disk space as a new table is created and data is copied. Ensure you have sufficient free space to accommodate this temporary overhead.
  4. Monitor replication: If you are operating a replicated MySQL environment, be aware of potential replication lag during the migration. Monitor your replication status closely to ensure data consistency across your replicas.

During Migration

  1. Use --dry-run first: Before executing any actual changes, always run the tool with the --dry-run flag. This will output the SQL commands that would be executed, allowing you to review them and confirm they align with your expectations without modifying your database.
  2. Monitor performance: Keep a close eye on your database server’s performance metrics (CPU, I/O, connections) during the migration. While pt-online-schema-change is designed for minimal impact, large migrations can still introduce temporary load.
  3. Start with small tables: If possible, consider testing the migration on smaller, less critical tables first. This can provide valuable insights into the process and potential issues before tackling your largest or most sensitive tables.

After Migration

  1. Verify data integrity: After the migration completes, thoroughly check your data to ensure that all characters migrated correctly and that no data corruption occurred. This might involve spot-checking critical tables or running data validation scripts.
  2. Update application: Ensure your application’s database connection settings are updated to use the new character set and collation. If your application is still connecting with the old encoding, you may encounter new character corruption issues.
  3. Update backups: Create a new full backup of your database with the updated encoding. This ensures that future restores will reflect the correct character set.

System Requirements

To effectively utilize mysql_change_database_encoding.rb, ensure your environment meets the following criteria:

Contributing

Contributions to mysql_change_database_encoding.rb are welcome! We encourage you to help improve this tool. Please review our contributing guidelines before submitting pull requests to ensure a smooth collaboration process.

  1. Fork the repository: Start by forking the project to your GitHub account.
  2. Create a feature branch: Create a new branch for your feature or bug fix (e.g., git checkout -b feature/my-new-feature).
  3. Make your changes: Implement your modifications, ensuring they align with the project’s coding style.
  4. Test thoroughly: Before submitting, test your changes rigorously on a dedicated test database to prevent regressions.
  5. Commit your changes: Write clear, concise commit messages that explain the purpose of your changes.
  6. Push to your fork: Push your feature branch to your forked repository.
  7. Open a pull request: Submit a pull request to the main repository, describing your changes and their benefits.

Commercial Support

For organizations requiring specialized assistance or advanced solutions, commercial support for mysql_change_database_encoding.rb is available from Durable Programming, LLC. We offer expert services to ensure your database migrations are seamless and efficient:

Contact us at [email protected] for inquiries.

License

This tool is released under the GNU General Public License v3.0. See the LICENSE file in the repository for full details.

Copyright (c) 2018-2025 Durable Programming, LLC. All rights reserved.

Community Support

For general inquiries, bug reports, or feature requests, please utilize the GitHub Issues page. Your feedback helps us improve the tool for everyone.

Open an issue on GitHub

Acknowledgments

We extend our gratitude to the following:

GitHub Repository

View the project on GitHub

10 stars | 📅 Created March 2018 | 🔄 Updated 2025