background

Converting Microsoft Access to MySQL with mdbtools

A practical guide to migrating Microsoft Access databases to MySQL using open-source tools, covering schema extraction, data validation, and production migration strategies.

Migrating data from aging systems to modern platforms presents a recurring challenge in information management. We see this in efforts to digitizing ancient manuscripts or transferring records from obsolete mainframe systems to contemporary databases. Each transition, though complex, shares a common goal: to preserve valuable information while adapting to new demands.

Today, many organizations find themselves grappling with a similar, albeit more recent, historical challenge: the continued reliance on Microsoft Access databases. These systems, often conceived decades ago, frequently house critical business data, operating long past the point where their original design limitations become apparent. When the inevitable issues arise—be it from file corruption, compatibility problems with newer operating environments, or simply outgrowing Access’s inherent constraints—the imperative to migrate to more robust database platforms becomes undeniable.

Access databases, though seemingly straightforward, present a unique set of migration challenges. Their proprietary file format resists direct conversion, and Access’s SQL dialect often diverges from modern database standards. Furthermore, business logic frequently lies embedded within forms, reports, and VBA code, often undocumented, especially when original developers have moved on. Preserving data relationships and validation rules during such a move requires meticulous planning and execution.

In this article, we explore a pragmatic approach to migrating from Access to MySQL using open-source tools. We will examine the technical considerations that shape our migration strategy, walk through systematic extraction and conversion processes, and discuss validation techniques to ensure data integrity. While our focus remains on MySQL as the target system, the principles we discuss apply broadly to other modern database platforms, offering a pathway from legacy Access systems to sustainable database infrastructure.

The migration process breaks down into several phases:

  1. Schema extraction and analysis
  2. Data export and transformation
  3. Import preparation and testing
  4. Production migration and validation

Each phase builds on the previous one, creating a methodical path from legacy Access systems to modern database infrastructure. We’ll explore these phases in detail, with specific attention to the tools and techniques that help ensure successful transitions.

Before diving into the technical details, let’s understand the fundamental differences between Access and modern database systems, and why organizations choose to migrate despite the challenges involved.

Understanding Access Database Characteristics

Access differs from modern database systems in ways that significantly impact migration planning. Strictly speaking, Access functions not purely as a data storage system, but rather as a complete application development platform. Each .mdb or .accdb file, though, contains not only tables and data, but also queries, forms, reports, and VBA code modules—all bundled into a single file. Modern database systems, by contrast, typically separate these concerns, focusing primarily on data storage and retrieval, leaving application logic to external services.

This integrated approach profoundly influenced how organizations built applications with Access. Business logic that would typically reside in separate application layers in modern systems often finds itself embedded directly in Access forms and reports. Data validation rules, too, frequently appear within table definitions, and presentation logic often gets hard-coded into report designs. While this bundling could accelerate initial development, it invariably creates significant complexity during migration, a factor we must account for in our planning.

The proprietary file format presents another challenge. Microsoft’s Jet database engine uses a format that requires specialized tools for extraction. Open-source utilities like mdbtools can access schema and data, but they don’t handle all Access features equally well.

Data type handling in Access adds further considerations. Where MySQL provides explicit types like INTEGER or VARCHAR, Access uses more flexible categories. A “Number” field might contain integers, decimals, or floating-point values depending on the data entered. Text fields expand dynamically to accommodate content. This flexibility aided rapid development but requires careful mapping during conversion to ensure we don’t lose precision or introduce errors.

Access also imposes practical limitations that often drive migration decisions. Individual database files, for instance, cannot exceed 2GB, though organizations sometimes work around this constraint by linking multiple files. Security models typically rely on Windows authentication, rather than the more granular, database-level user management common in modern systems. These fundamental differences mean migration involves more than a mere data transfer—it necessitates a rethinking of the entire application architecture.

SQL syntax variations create additional complexity. Access uses different wildcard characters, date literal formats, and function names compared to standard SQL. Queries that work in Access often need modification for MySQL compatibility.

Forms and reports represent perhaps the most challenging aspect of Access migration. These components frequently contain undocumented business logic, custom calculations, and validation rules implemented in VBA. Successful migration requires either rebuilding this functionality in new application layers or carefully extracting and translating the embedded logic.

Transaction handling differs between systems as well. Access often uses implicit transactions, while MySQL requires explicit transaction management. These behavioral differences can cause unexpected issues if not addressed during migration planning.

Understanding these characteristics is crucial for developing effective migration strategies. Access, we must recognize, functions as a legacy application platform, rather than solely as a data store. Migration success, therefore, hinges on identifying embedded business logic, mapping data types appropriately, and planning for architectural changes that extend beyond basic data transfer.

Schema Extraction and Conversion

The first phase of migration involves extracting the database schema—the structural definition of tables, columns, relationships, and constraints—from the Access database. This schema provides the blueprint for recreating the database structure in MySQL.

Access stores schema information in its proprietary file format, requiring specialized tools for extraction. We use the open-source mdbtools utility to access this information, though it requires careful handling of the output to ensure accuracy.

Here’s an initial script for schema extraction:

#!/bin/bash
# Schema extraction script
# Usage: ./extract-schema.sh source.mdb

if [ "$#" -ne 1 ]; then
  echo "Usage: $0 source.mdb"
  exit 1
fi

SOURCE_DB="$1"
WORK_DIR="schema_extraction_$(date +%Y%m%d_%H%M%S)"
SCHEMA_FILE="$WORK_DIR/schema.sql"

# Create working directory
mkdir -p "$WORK_DIR"

# Extract schema
echo "Extracting schema..."
mdb-schema "$SOURCE_DB" mysql > "$SCHEMA_FILE"

echo "Schema extraction complete. Review $SCHEMA_FILE"

This script, though concise, performs several crucial functions. It includes basic error checking, creates a timestamped working directory to avoid conflicts, and, most importantly, uses the mdb-schema utility to extract the schema. The ‘mysql’ argument passed to mdb-schema is vital here, as it instructs the tool to generate SQL statements specifically tailored for MySQL compatibility. We include these safeguards because schema extraction, particularly from proprietary formats, can often fail in unexpected ways, and we want to preserve our work for thorough review.

The extracted schema will require review and modification. Access and MySQL handle several elements differently:

Common issues to address in the extracted schema include:

Of course, while mdb-schema attempts to provide a MySQL-compatible output, it cannot account for all nuances or business-specific requirements. Human intervention and careful review remain indispensable.

Schema conversion requires both technical understanding and knowledge of the business context. While tools handle basic translation, human judgment ensures the converted schema properly represents the original business rules and relationships.

Take time with this phase—rushing schema conversion often creates issues that become expensive to fix during later migration steps. The foundation laid here determines the success of subsequent data migration and application integration.

The next steps involve detailed analysis of the extracted schema, making necessary adjustments, and validating that the converted structure will support your business needs in MySQL. We’ll explore those processes in subsequent articles.

In the meantime, save this script and run it against a test database. Examine the output carefully - it will give you valuable insights into the complexity of your upcoming migration project.

Data Export and Preparation

Data migration requires careful handling to ensure accuracy. While schema issues can often be corrected, data corruption may have serious business consequences. The process involves extracting data from Access, converting it to a suitable format, and preparing it for MySQL import.

We choose CSV files as an effective intermediate format for data transfer. They remain human-readable for verification and work with most database systems. However, the export process requires attention to encoding, formatting, and potential data issues that could compromise data integrity.

Here’s a script for systematic data export:

#!/bin/bash
# Data export script
# Usage: ./export-data.sh source.mdb

if [ "$#" -ne 1 ]; then
  echo "Usage: $0 source.mdb"
  exit 1
fi

SOURCE_DB="$1"
WORK_DIR="data_export_$(date +%Y%m%d_%H%M%S)"
TABLES_FILE="$WORK_DIR/tables.txt"
LOG_FILE="$WORK_DIR/export.log"

# Create working directory
mkdir -p "$WORK_DIR"

# Get list of tables
echo "Identifying tables..."
mdb-tables "$SOURCE_DB" > "$TABLES_FILE"

# Export each table
while read -r table; do
  echo "Exporting $table..."
  mdb-export "$SOURCE_DB" "$table" > "$WORK_DIR/${table}.csv" 2>> "$LOG_FILE"
done < "$TABLES_FILE"

echo "Export complete. Check $LOG_FILE for any errors."

This script systematically handles data export. It first creates a timestamped working directory, ensuring that each export operation has its own isolated space. It then uses mdb-tables to identify all tables within the Access database, and subsequently, mdb-export to extract the data from each table into a separate CSV file. Error messages, though, are carefully logged for later review, a crucial step in ensuring data integrity.

The resulting CSV files require careful examination before import. Common issues include:

Of course, automated tools can only go so far; human review of these CSV files remains an indispensable step before any import.

Review sample records from each table, particularly focusing on:

This verification step helps identify potential problems before they affect the MySQL import process. The CSV files provide a stable checkpoint for data validation and transformation as needed.

Data Validation

Data validation ensures migration accuracy by comparing source and target data. Automated checks verify record counts and sample content, while human review identifies subtle issues that automated processes might miss. This step protects against problems that could affect business operations for years to come.

We recommend validation at multiple points during migration. Compare record counts between Access and MySQL tables, examine sample data for formatting issues, and verify that relationships between tables remain intact. This systematic approach helps us catch issues early when they’re easier to fix.

Let’s examine a foundational validation script that helps catch common migration issues:

#!/bin/bash

# Data validation script for Access to MySQL migration
# Usage: ./validate-data.sh source.mdb target_mysql_db

if [ "$#" -ne 2 ]; then
  echo "Usage: $0 source.mdb target_mysql_db"
  exit 1
fi

SOURCE_DB="$1"
TARGET_DB="$2"
WORK_DIR="validation_$(date +%Y%m%d_%H%M%S)"
REPORT_FILE="$WORK_DIR/validation_report.txt"

# Create working directory
mkdir -p "$WORK_DIR"

# Get list of tables
echo "Identifying tables..."
mdb-tables "$SOURCE_DB" > "$WORK_DIR/tables.txt"

# Check each table
while read -r table; do
  echo "Validating $table..."
# Compare record counts
  access_count=$(mdb-export "$SOURCE_DB" "$table" | wc -l)
  mysql_count=$(mysql -N -e "SELECT COUNT(*) FROM $table" "$TARGET_DB")
  echo "Table: $table" >> "$REPORT_FILE"
  echo "Access records: $access_count" >> "$REPORT_FILE"
  echo "MySQL records: $mysql_count" >> "$REPORT_FILE"
  echo "-------------------" >> "$REPORT_FILE"
# Sample record comparison
  echo "Sample records from $table:" >> "$REPORT_FILE"
  mdb-export "$SOURCE_DB" "$table" | head -n 5 >> "$REPORT_FILE"
  echo "MySQL samples:" >> "$REPORT_FILE"
  mysql -N -e "SELECT * FROM $table LIMIT 5" "$TARGET_DB" >> "$REPORT_FILE"
  echo "===================" >> "$REPORT_FILE"
done < "$WORK_DIR/tables.txt"

echo "Validation complete. Check $REPORT_FILE for results."

Let’s break down this script to understand exactly what it does and why each step matters.

The script begins with essential error checking—ensuring, for instance, that both a source Access database and a target MySQL database name have been provided. Without both pieces of information, validation, of course, becomes impossible. The script uses clear error messages to help users understand what went wrong:

if [ "$#" -ne 2 ]; then
  echo "Usage: $0 source.mdb target_mysql_db"
  exit 1
fi

Next comes setup. The script creates a uniquely named working directory using the current timestamp. This prevents overwriting previous validation results and provides an audit trail. All validation outputs go to this directory:

SOURCE_DB="$1"
TARGET_DB="$2"
WORK_DIR="validation_$(date +%Y%m%d_%H%M%S)"
REPORT_FILE="$WORK_DIR/validation_report.txt"

mkdir -p "$WORK_DIR"

The actual validation process starts by identifying all tables in the Access database. We save this list to a file for reference:

echo "Identifying tables..."
mdb-tables "$SOURCE_DB" > "$WORK_DIR/tables.txt"

The heart of the validation occurs in the main loop. For each table, we:

  1. Count records in both databases
  2. Compare these counts
  3. Extract sample records from both systems4. Save all this information to our report file

Here’s our code to accomplish this:

while read -r table; do
echo "Validating $table..."
access_count=$(mdb-export "$SOURCE_DB" "$table" | wc -l)
mysql_count=$(mysql -N -e "SELECT COUNT(*) FROM $table" "$TARGET_DB")

The record counting, though seemingly straightforward, deserves special attention. For the Access database, we leverage mdb-export to output the table data, piping it to wc -l to count the lines, which effectively gives us the record count. For MySQL, we execute a direct COUNT(*) query using the mysql client with the -N (skip column names) and -e (execute statement) options. If these numbers do not match, we know immediately that something went wrong during migration, signaling a potential data loss or discrepancy.

The script then saves sample records from both databases:

  echo "Sample records from $table:" >> "$REPORT_FILE"
  mdb-export "$SOURCE_DB" "$table" | head -n 5 >> "$REPORT_FILE"
  echo "MySQL samples:" >> "$REPORT_FILE"
  mysql -N -e "SELECT * FROM $table LIMIT 5" "$TARGET_DB" >> "$REPORT_FILE"

These samples prove invaluable for spotting subtle conversion issues like character encoding problems or decimal errors.

While this script provides a good starting point, thorough validation requires additional manual checks:

Remember: while automated validation is invaluable for catching certain classes of problems, human review remains absolutely essential. It is, in fact, the only way to spot more subtle issues that could profoundly impact business operations. We must, therefore, take the time to thoroughly examine the validation results before considering any migration complete.

Common pitfalls to watch for include:

The validation process might feel tedious, but it prevents costly problems down the road. Every hour spent validating data saves many more hours fixing issues after they’ve impacted business operations. Think of it as insurance - a small investment that protects against potentially massive losses.

Finally, keep your validation results. They provide valuable documentation of the migration process and can help troubleshoot any issues that surface later. Consider them part of your system documentation, proof that the migration succeeded and the data arrived intact in its new home.# Production Migration

Production migration represents the final step of moving from Access to MySQL. This phase requires careful planning, clear communication with stakeholders, and comprehensive backup procedures. The goal is to minimize downtime while ensuring data integrity, though some disruption may be unavoidable depending on your system’s complexity.

Key considerations for production migration include:

Let’s examine a script that handles this critical phase:

# Production migration script for Access to MySQL
# Usage: ./migrate-production.sh source.mdb target_mysql_db

if [ "$#" -ne 2 ]; then
  echo "Usage: $0 source.mdb target_mysql_db"
  exit 1
fi

SOURCE_DB="$1"
TARGET_DB="$2"
WORK_DIR="production_migration_$(date +%Y%m%d_%H%M%S)"
LOG_FILE="$WORK_DIR/migration.log"
BACKUP_DIR="$WORK_DIR/backup"

# Create working directories
mkdir -p "$WORK_DIR"
mkdir -p "$BACKUP_DIR"

# Log start time
echo "Migration started at $(date)" >> "$LOG_FILE"

# Backup existing MySQL databaseecho "Creating backup of current MySQL database..."
mysqldump "$TARGET_DB" > "$BACKUP_DIR/${TARGET_DB}_backup.sql" 2>> "$LOG_FILE"

# Get list of tables in order of dependencies
echo "Analyzing table dependencies..."
mdb-tables -1 "$SOURCE_DB" | sort > "$WORK_DIR/tables_ordered.txt"

# Export and import each tablewhile read -r table; do
echo "Processing $table..."

	# Export to CSV
	 echo "Exporting $table to CSV..."
	 mdb-export "$SOURCE_DB" "$table" > "$WORK_DIR/${table}.csv" 2>> "$LOG_FILE"

	# Clear existing data
	 echo "Clearing existing data from $table..."
	 mysql -e "DELETE FROM $table" "$TARGET_DB" 2>> "$LOG_FILE"

	# Import new data
	 echo "Importing new data into $table..."
	 mysql -e "LOAD DATA INFILE '$WORK_DIR/${table}.csv'
	    INTO TABLE $table
	    FIELDS TERMINATED BY ','
	    ENCLOSED BY '\"'
	    LINES TERMINATED BY '\n'
	    IGNORE 1 LINES" "$TARGET_DB" 2>> "$LOG_FILE"

	  # Verify record count
	orig_count=$(wc -l < "$WORK_DIR/${table}.csv")
	new_count=$(mysql -N -e "SELECT COUNT(*) FROM $table" "$TARGET_DB")
	echo "Table: $table" >> "$LOG_FILE"
	echo "Original records: $orig_count" >> "$LOG_FILE"
	echo "Imported records: $new_count" >> "$LOG_FILE"
	echo "-------------------" >> "$LOG_FILE"
	done < "$WORK_DIR/tables_ordered.txt"

# Log completion
echo "Migration completed at $(date)" >> "$LOG_FILE"
echo "Migration complete. Check $LOG_FILE for details."

Let’s break down this script section by section to understand exactly what it does and why each step matters.

First, the script ensures proper usage:

if [ "$#" -ne 2 ]; then
	 echo "Usage: $0 source.mdb target_mysql_db"
 exit 1
fi```

Next comes the setup phase:

```bash
SOURCE_DB="$1"
TARGET_DB="$2"
WORK_DIR="production_migration_$(date +%Y%m%d_%H%M%S)"
LOG_FILE="$WORK_DIR/migration.log"BACKUP_DIR="$WORK_DIR/backup"

mkdir -p "$WORK_DIR"
mkdir -p "$BACKUP_DIR"

As previously, this section creates a uniquely named working directory using the current timestamp. This prevents any conflict with previous migration attempts and provides a clear audit trail. The script also creates a separate backup directory - a crucial safety net if something goes wrong.

The backup phase follows, a truly critical step that cannot be overstated:

echo "Creating backup of current MySQL database..."
mysqldump "$TARGET_DB" > "$BACKUP_DIR/${TARGET_DB}_backup.sql" 2>> "$LOG_FILE"

This command leverages mysqldump, a powerful utility for creating logical backups of MySQL databases. It saves the current state of the target MySQL database before any changes are made. If, though, problems arise during migration—and they often do in complex systems—this backup provides the essential means to restore the system to its previous, known-good state. We must emphasize: you should, under almost no circumstances, skip this step, regardless of how confident you feel about the migration. It is your key safeguard against data loss.

The table analysis section comes next, where we prepare the order of operations:

echo "Analyzing table dependencies..."
mdb-tables -1 "$SOURCE_DB" | sort > "$WORK_DIR/tables_ordered.txt"

This command first uses mdb-tables with the -1 option to list all tables in the Access database, ensuring each table name appears on its own line. This output is then piped to the sort command, which provides a consistent, alphabetical order for processing tables. It is important to note, however, that this approach, while effective for many scenarios, does not inherently handle complex table dependencies (e.g., foreign key constraints). For highly interconnected schemas, a more sophisticated dependency analysis might be required to prevent import errors.

The main migration loop forms the heart of the script:

while read -r table; do
  echo "Processing $table..."
# Export to CSV
  echo "Exporting $table to CSV..."
mdb-export "$SOURCE_DB" "$table" > "$WORK_DIR/${table}.csv" 2>> "$LOG_FILE"

For each table, the script first exports data to CSV format. This intermediate step allows us to verify the data before import and provides a backup of the original data.

The data import section follows, where the actual transfer of data takes place:

	# Clear existing data
	echo "Clearing existing data from $table..."
	mysql -e "DELETE FROM $table" "$TARGET_DB" 2>> "$LOG_FILE"

	# Import new data
	echo "Importing new data into $table..."
	mysql -e "LOAD DATA INFILE '$WORK_DIR/${table}.csv'
            INTO TABLE $table
            FIELDS TERMINATED BY ','
            ENCLOSED BY '\"'
            LINES TERMINATED BY '\n'
            IGNORE 1 LINES" "$TARGET_DB" 2>> "$LOG_FILE"

This block first clears any existing data from the target table using a DELETE FROM statement executed via the mysql client. This ensures a clean slate for the incoming data. Subsequently, it loads the new data from our CSV file into the MySQL table using the highly efficient LOAD DATA INFILE command. This command is generally the fastest way to bulk load data into MySQL, as it bypasses much of the overhead associated with individual INSERT statements. The FIELDS TERMINATED BY, ENCLOSED BY, and LINES TERMINATED BY clauses are crucial for correctly parsing the CSV format. Finally, the verification step:

# Verify record count
orig_count=$(wc -l < "$WORK_DIR/${table}.csv")
new_count=$(mysql -N -e "SELECT COUNT(*) FROM $table" "$TARGET_DB")

echo "Table: $table" >> "$LOG_FILE"
echo "Original records: $orig_count" >> "$LOG_FILE"
echo "Imported records: $new_count" >> "$LOG_FILE"
echo "-------------------" >> "$LOG_FILE"```

This section counts records in both the source CSV and target table, logging the results. This immediate verification helps catch any data loss during import.

The script concludes with completion logging:

```bash
echo "Migration completed at $(date)" >> "$LOG_FILE"
echo "Migration complete. Check $LOG_FILE for details."

These final messages provide a clear record of when the migration finished and remind users to check the log file for details.

While this script effectively handles the technical aspects of migration, successful production migration, we must acknowledge, requires more than code alone. Consider these crucial factors, which often determine the success or failure of such an endeavor:

Remember: production migration represents the culmination of your conversion project. Take time to plan carefully, test thoroughly, and prepare for contingencies. The effort invested in preparation pays off in a smooth, successful migration that keeps your business running without disruption.

This migration approach provides a foundation for moving your data to MySQL, but remember that Access serves as both a database and an application platform. The forms, reports, and VBA code that make up the application layer will require separate consideration. We’ll explore those aspects in future articles, as they often represent the most complex part of Access modernization.

The investment in careful migration planning pays dividends over the long term. Modern database systems offer better performance, security, and scalability than Access can provide. By following these principles, you position your organization for sustainable growth rather than continued technical debt.