The Challenge of Database Encoding
In the world of data, character encoding is a foundational element, dictating how text is stored and retrieved. Historically, many MySQL databases adopted utf8 as their character set. However, as digital communication evolved, the need to support a broader range of characters—especially emojis and complex international scripts—highlighted a limitation: utf8 in MySQL only supports up to 3-byte characters. This led to the widespread adoption of utf8mb4, which provides full 4-byte Unicode support.
Migrating a live production database from utf8 to utf8mb4 (or any other encoding) presents a significant challenge. Direct ALTER TABLE operations can lock tables for extended periods, leading to unacceptable downtime for critical applications. The goal, therefore, is to achieve this migration safely, efficiently, and with minimal disruption.
Introducing mysql_change_database_encoding.rb
This Ruby tool is designed to address these challenges, providing a robust solution for changing MySQL database encoding and collation. It prioritizes minimal disruption to production systems by integrating with online schema change tools, ensuring your database remains operational throughout the migration process.
Core Capabilities
- Safe Encoding Migration: Change database character sets without data loss, preserving the integrity of your existing data.
- Collation Updates: Modify collation settings across entire databases or specific tables to ensure consistent sorting and comparison behavior.
- Online Schema Change Integration: Leverage
pt-online-schema-changefor zero-downtime migrations, crucial for high-availability environments. - Configurable Fallback: Automatically falls back to direct
ALTER TABLEoperations when online schema change tools are unavailable or not desired, offering flexibility. - Table-by-Table Processing: Provides granular control over the migration process, allowing for systematic updates.
- Production-Ready Design: Engineered for live database migrations, focusing on stability and minimizing disruption.
Installation
Prerequisites
- Ruby 3.0 or higher
- MySQL database
- (Optional) pt-online-schema-change for online migrations
Setup
git clone https://github.com/durableprogramming/mysql_change_database_encoding.git
cd mysql_change_database_encoding
bundle install
Usage: Migrating Your MySQL Database
This section details how to use mysql_change_database_encoding.rb to perform database migrations. We will cover setting up necessary environment variables, understanding command-line options, and walking through practical examples. Remember to always test your migration on a staging environment before applying it to production.
Environment Variables
The tool relies on standard MySQL environment variables for database connection details. These variables allow you to configure access without hardcoding credentials directly into your scripts, enhancing security and flexibility. If not explicitly set, default values will be used where applicable.
MYSQL_PASSWORD: The database password. This is a required variable for authentication.MYSQL_DATABASE: The name of the target database you intend to migrate. This is also a required variable.MYSQL_HOST: The database host. Defaults tolocalhostif not specified.MYSQL_USER: The database user. Defaults torootif not specified.MYSQL_PORT: The database port. Defaults to3306if not specified.
Command Options
These command-line options control the behavior of the migration tool:
--encoding: Specifies the target character set for your database and tables. Common values includeutf8mb4(for full Unicode support),utf8, orlatin1. This option is crucial for ensuring proper character representation.--collation: Defines the target collation, which dictates how character strings are sorted and compared. Forutf8mb4,utf8mb4_unicode_ciis often recommended for linguistic correctness, whileutf8mb4_general_cioffers a performance trade-off. You should choose a collation that matches your application’s needs.--osc: When this flag is present, the tool will attempt to usept-online-schema-changefor the migration. This enables a zero-downtime migration strategy, which is vital for production systems. Ifpt-online-schema-changeis not found or fails, the tool will fall back to directALTER TABLEoperations.--dry-run: This powerful option allows you to preview all the SQL commands that would be executed without actually making any changes to your database. It is highly recommended to use--dry-runfirst to understand the scope of the migration and identify any potential issues.
Basic Migration
To change your database encoding and collation using direct ALTER TABLE statements (without pt-online-schema-change), you can use the following command. This approach is suitable for development environments or databases where downtime is acceptable.
MYSQL_PASSWORD=your_password MYSQL_DATABASE=your_database \
ruby mysql_change_database_encoding.rb \
--collation utf8mb4_unicode_ci --encoding utf8mb4
Online Schema Change
For production systems where zero-downtime is a requirement, enable the online schema change functionality by including the --osc flag. This will leverage pt-online-schema-change to perform the migration with minimal disruption.
MYSQL_PASSWORD=secure_password MYSQL_DATABASE=legacy_database \
ruby mysql_change_database_encoding.rb \
--collation utf8mb4_unicode_ci --encoding utf8mb4 --osc
Practical Migration Scenarios
Here are some common scenarios for using the tool:
Migrate to UTF8MB4
The most common migration path involves moving from utf8 to utf8mb4 for full Unicode support, including emojis:
MYSQL_PASSWORD=pwd MYSQL_DATABASE=myapp \
ruby mysql_change_database_encoding.rb \
--encoding utf8mb4 --collation utf8mb4_unicode_ci --osc
Update Collation Only
If your encoding is already correct, you can change only the collation without modifying the character set:
MYSQL_PASSWORD=pwd MYSQL_DATABASE=myapp \
ruby mysql_change_database_encoding.rb \
--collation utf8mb4_unicode_ci
Legacy Database Migration
Migrate older latin1 databases to a modern encoding like utf8mb4:
MYSQL_PASSWORD=pwd MYSQL_DATABASE=legacy_app \
ruby mysql_change_database_encoding.rb \
--encoding utf8mb4 --collation utf8mb4_unicode_ci --osc
Deep Dive: Online Schema Change with pt-online-schema-change
Migrating a database schema, especially for large tables in a live production environment, can be a daunting task. Traditional ALTER TABLE operations often require exclusive locks, leading to significant downtime. This is where online schema change tools become indispensable. pt-online-schema-change, part of the Percona Toolkit, is a powerful utility designed to perform schema modifications without blocking reads or writes on the original table.
When you use the --osc flag with mysql_change_database_encoding.rb, the tool intelligently integrates with pt-online-schema-change to orchestrate a zero-downtime migration. This approach is crucial for applications that demand continuous availability and cannot afford maintenance windows.
Here’s how pt-online-schema-change typically operates, minimizing disruption:
- Create a New Table: A new, empty table is created with the desired schema (including the new encoding and collation).
- Copy Data Incrementally: Data is copied from the original table to the new table in small chunks. During this process, triggers are placed on the original table to capture and apply any changes (INSERTs, UPDATEs, DELETEs) that occur while the copy is in progress.
- Atomic Table Swap: Once all data is copied and synchronized, the original table is atomically swapped with the new table. This is a very fast operation, typically involving a
RENAME TABLEcommand, which minimizes the actual downtime to milliseconds. - Minimize Locking: Throughout the process,
pt-online-schema-changestrives to acquire only very brief metadata locks, allowing your application to continue operating on the original table almost uninterrupted.
This sophisticated process ensures that your database remains responsive, even during complex encoding migrations, making it a cornerstone for production-ready database management.
Recommendations for a Smooth Migration
Migrating database encoding is a critical operation that requires careful planning and execution. Adhering to these recommendations will help ensure a smooth, safe, and successful migration, minimizing risks to your production environment.
Before Migration
- Backup your database: This is the most crucial step. Always create a full, verified backup of your database before initiating any encoding changes. This provides a safety net in case of unforeseen issues.
- Test in staging: Never run a migration directly on production without prior testing. Execute the entire migration process on a staging or development database that mirrors your production environment. This allows you to identify and resolve potential problems without impacting live data.
- Check disk space: Online schema changes, particularly when using
pt-online-schema-change, require additional disk space as a new table is created and data is copied. Ensure you have sufficient free space to accommodate this temporary overhead. - Monitor replication: If you are operating a replicated MySQL environment, be aware of potential replication lag during the migration. Monitor your replication status closely to ensure data consistency across your replicas.
During Migration
- Use
--dry-runfirst: Before executing any actual changes, always run the tool with the--dry-runflag. This will output the SQL commands that would be executed, allowing you to review them and confirm they align with your expectations without modifying your database. - Monitor performance: Keep a close eye on your database server’s performance metrics (CPU, I/O, connections) during the migration. While
pt-online-schema-changeis designed for minimal impact, large migrations can still introduce temporary load. - Start with small tables: If possible, consider testing the migration on smaller, less critical tables first. This can provide valuable insights into the process and potential issues before tackling your largest or most sensitive tables.
After Migration
- Verify data integrity: After the migration completes, thoroughly check your data to ensure that all characters migrated correctly and that no data corruption occurred. This might involve spot-checking critical tables or running data validation scripts.
- Update application: Ensure your application’s database connection settings are updated to use the new character set and collation. If your application is still connecting with the old encoding, you may encounter new character corruption issues.
- Update backups: Create a new full backup of your database with the updated encoding. This ensures that future restores will reflect the correct character set.
System Requirements
To effectively utilize mysql_change_database_encoding.rb, ensure your environment meets the following criteria:
- Ruby 3.0+: The tool is developed and tested with Ruby version 3.0 and newer. Ensure you have a compatible Ruby installation.
- MySQL 5.7+ or MariaDB 10.2+: The target database must be MySQL version 5.7 or later, or MariaDB version 10.2 or later. These versions provide the necessary features and stability for the migration process.
pt-online-schema-change(optional): For zero-downtime online migrations, thept-online-schema-changeutility from Percona Toolkit is required. While optional, its use is highly recommended for production environments. If not installed, the tool will fall back to directALTER TABLEoperations, which may incur downtime.
Contributing
Contributions to mysql_change_database_encoding.rb are welcome! We encourage you to help improve this tool. Please review our contributing guidelines before submitting pull requests to ensure a smooth collaboration process.
- Fork the repository: Start by forking the project to your GitHub account.
- Create a feature branch: Create a new branch for your feature or bug fix (e.g.,
git checkout -b feature/my-new-feature). - Make your changes: Implement your modifications, ensuring they align with the project’s coding style.
- Test thoroughly: Before submitting, test your changes rigorously on a dedicated test database to prevent regressions.
- Commit your changes: Write clear, concise commit messages that explain the purpose of your changes.
- Push to your fork: Push your feature branch to your forked repository.
- Open a pull request: Submit a pull request to the main repository, describing your changes and their benefits.
Commercial Support
For organizations requiring specialized assistance or advanced solutions, commercial support for mysql_change_database_encoding.rb is available from Durable Programming, LLC. We offer expert services to ensure your database migrations are seamless and efficient:
- Migration Planning and Consulting: Strategic guidance for complex migration scenarios.
- Custom Migration Scripts: Development of tailored scripts for unique database structures or requirements.
- Production Migration Assistance: Hands-on support for executing migrations in live production environments.
- Post-Migration Optimization: Services to fine-tune your database and application after migration for optimal performance.
Contact us at [email protected] for inquiries.
License
This tool is released under the GNU General Public License v3.0. See the LICENSE file in the repository for full details.
Copyright (c) 2018-2025 Durable Programming, LLC. All rights reserved.
Community Support
For general inquiries, bug reports, or feature requests, please utilize the GitHub Issues page. Your feedback helps us improve the tool for everyone.
Acknowledgments
We extend our gratitude to the following:
- Percona Toolkit: For providing
pt-online-schema-change, a critical component for zero-downtime migrations. - The Ruby and MySQL Communities: For their continuous innovation, support, and the foundational technologies that make this tool possible.
GitHub Repository
⭐ 10 stars | 📅 Created March 2018 | 🔄 Updated 2025

