Sunday, 7 August 2011

Structure of the Data Directory

The MySQL data directory contains all of the databases and tables managed by the server. In general, these are organized into a tree structure that is implemented in straightforward fashion by taking advantage of the hierarchical structure of the UNIX or Windows file systems:

Each database corresponds to a directory under the data directory.

Tables within a database correspond to files in the database directory.

The exception to this hierarchical implementation of databases and tables as directories and files is that the InnoDB table handler stores all InnoDB tables from all databases within a single common tablespace. This tablespace is implemented using one or more large files that are treated as a single unified data structure within which tables and indexes are represented. The InnoDB tablespace files are stored in the data directory by default.

The data directory also may contain other files:

An option file, my.cnf.

The server's process ID (PID) file. When it starts up, the server writes its process ID to this file so that other programs can discover the value if they need to send signals to it. (This file is not used on Windows or by the embedded server.)

Status and log files that are generated by the server. These files provide important information about the server's operation and are valuable for administrators, especially when something goes wrong and you're trying to determine the cause of the problem. If some particular query kills the server, for example, you may be able to identify the offending query by examining the log files.

It's common to store files in the data directory such as the DES key file or the server's SSL certificate and key files.

How the MySQL Server Provides Access to Data
When MySQL is used in the usual client/server setup, all databases under the data directory are managed by a single entity—the MySQL server mysqld. Client programs never manipulate data directly. Instead, the server provides the sole point of contact though which databases are accessed, acting as the intermediary between client programs and the data they want to use. Figure 10.1 illustrates this architecture.

Figure 10.1. How the MySQL server controls access to the data directory.


When the server starts up, it opens any log files that you request it to maintain and then presents a network interface to the data directory by listening for various types of network connections. To access data, client programs establish a connection to the server and then communicate requests as SQL queries to perform the desired operations—for example, creating a table, selecting records, or updating records. The server performs each operation and sends back the result to the client. The server is multi-threaded and can service multiple simultaneous client connections. However, because update operations are performed one at a time, the practical effect is to serialize requests so that two clients can never change a given record at exactly the same time.

If you're running an application that uses the embedded server, a slightly different architecture applies, because there is only one "client"—that is, the application into which the server is linked. In this case, the server listens to an internal communication channel rather than to network interfaces. Nevertheless, it's still the embedded server part of the application that manages access to the data directory, and it's still necessary to coordinate query activity arriving over multiple connections if the application happens to open several connections to the server.

Under normal conditions, having the server act as the sole arbiter of database access provides assurance against the kinds of corruption that can result from multiple processes accessing the database tables at the same time. Nevertheless, administrators should be aware that there are times when the server does not have exclusive control of the data directory:

When you run multiple servers on a single data directory. Normally you run a single server to manage all databases on a host, but it's possible to run multiple servers. If each server manages its own independent data directory, there is no problem of interaction. But it's possible to start multiple servers and point them at the same data directory. In general, this is not a good idea. If you try it, you'd better make sure your system provides good file locking or the servers will not cooperate properly. You also risk having your log files become a source of confusion (rather than a source of helpful information) if you have multiple servers writing to them at the same time.

When you run table repair utilities. Programs such as myisamchk and isamchk are used for table maintenance, troubleshooting, and repair operations, and they operate directly on the files that correspond to the tables. As you might guess, because these utilities can change table contents, allowing them to operate on tables at the same time the server is doing so can cause table damage. The best way to avoid problems of this sort is to bring down the server before running any repair utilities. If that is not possible, it's very important to understand how to tell the server not to access a table while you're operating on it with a repair utility. "Database Backups, Maintenance, and Repair," for instructions on the proper use of these programs. (Another alternative is to use the CHECK TABLE and REPAIR TABLE statements, which eliminate the interaction by instructing the server itself to perform the table maintenance operations.)

How MySQL Represents Databases in the File System
Each database managed by the MySQL server has its own database directory. This exists as a subdirectory of the data directory, with the same name as the database it represents. For example, a database mydb corresponds to the database directory DATADIR/mydb. This representation allows several database-level statements to be almost trivial in their implementation.

SHOW DATABASES is essentially nothing more than a list of the names of the directories located within the data directory. Some database systems keep a master table that lists all the databases maintained, but there is no such construct in MySQL. Given the simplicity of the data directory structure, the list of databases is implicit in the contents of the data directory, and such a table would be unnecessary overhead.

CREATE DATABASE db_name creates an empty directory db_name in the data directory. Under UNIX, the directory is owned by and accessible only to the login account that is used for running the server. This means that the CREATE DATABASE operation is equivalent to executing the following shell commands on the server host while logged in under that account:

% cd DATADIR
% mkdir db_name
% chmod u=rwx,go-rwx db_name

The minimal approach of representing a new database by an empty directory contrasts with other database systems that create a number of control or system files even for an "empty" database.

The DROP DATABASE statement is implemented nearly as easily. DROP DATABASE db_name removes the db_name directory in the data directory, along with any table files contained within it. This is almost the same as executing the following commands on UNIX:

% cd DATADIR
% rm -rf db_name

or the following commands on Windows:

C:\> cd DATADIR
C:\> del /S db_name

The differences between a DROP DATABASE statement and the shell commands are as follows:

For DROP DATABASE, the server removes only files with extensions known to be used for tables. If you've created other files in the database directory, the server leaves them intact, and the directory itself is not removed. (One implication of this is that the database name continues to be displayed by SHOW DATABASES.)

InnoDB table and index contents are maintained in the InnoDB tablespace, not as files in the database directory. If a database contains InnoDB tables, you must use DROP DATABASE so that the InnoDB handler can remove the tables from the tablespace; do not remove the database directory by using an rm or del command.

How MySQL Represents Tables in the File System
MySQL supports handlers for several types of database tables: ISAM, MyISAM, MERGE, BDB, InnoDB, and HEAP. Every table in MySQL is represented on disk by at least one file, which is the .frm file that contains a description of the table's structure. For most table types, there are also other files that contain the data rows and index information. These vary according to the table type, as outlined in the following discussion. (The descriptions here focus primarily on the characteristics of the table types as they are stored on disk. For information about how these types differ in features and behavior, see Chapter 3, "MySQL SQL Syntax and Use.")

ISAM Tables
The original table type in MySQL is the ISAM type. MySQL represents each ISAM table by three files in the database directory of the database that contains the table. The files all have a basename that is the same as the table name and an extension that indicates the purpose of the file. For example, a table named mytbl is represented by three files:

mytbl.frm is the description file that stores the format (structure) of the table.

mytbl.ISD is the ISAM data file that stores the contents of the table's rows.

mytbl.ISM contains index information for any indexes the table has.

MyISAM Tables
MySQL 3.23 introduced the MyISAM table type as the successor to the ISAM type, which now is considered pretty much obsolete. Like the ISAM handler, the MyISAM handler represents each table by three files, using the extensions .frm, .MYD, and .MYI for the description, data, and index files, respectively.

MERGE Tables
A MERGE table is a logical construct. It represents a collection of identically structured MyISAM tables that are treated for query purposes as a single larger table. Within a database directory, a MERGE table is represented by its .frm file and a .MRG file that is nothing more than a list of the names of the table's constituent MyISAM tables, one name per line.

One implication of this representation is that it's possible to change the definition of a MERGE table by flushing the table cache with FLUSH TABLES and then directly editing the .MRG file to change the list of MyISAM tables named there. (I'm not sure I'd recommend actually doing this, though.)

BDB Tables
The BDB handler represents each table by two files, the .frm description file and a .db file that contains the table's data and index information.

InnoDB Tables
The preceding table types all are represented using files that each are uniquely associated with a single table. InnoDB tables are handled in a somewhat different way. The only file that corresponds directly to a given InnoDB table is the .frm table description file, which is located in the directory for the database to which the table belongs. The data and indexes for all InnoDB tables are managed together within a single unified tablespace. Typically, the tablespace itself is represented by one or more large files in the data directory. These components of the tablespace form a logically contiguous storage area equal in size to the sum of the sizes of the individual files.

HEAP Tables
HEAP tables are in-memory tables. Because the server stores a HEAP table's data and indexes in memory rather than on disk, the table is not represented in the file system at all, other than by its .frm file.

How SQL Statements Map onto Table File Operations
Every table type uses a .frm file to store the table description, so the output from SHOW TABLES db_name is the same as a listing of the basenames of the .frm files in the database directory for db_name. Some database systems maintain a registry that lists all tables contained in a database. MySQL does not because it is unnecessary; the "registry" is implicit in the structure of the data directory.

To create a table of any of the types supported by MySQL, you issue a CREATE TABLE statement that defines the table's structure. For all table types, the server creates a .frm file that contains the internal encoding of that structure. The server also creates any other files that are associated with tables of the given type. For example, it creates .MYD and .MYI data and index files for a MyISAM table or a .db data/index file for a BDB table. For InnoDB tables, the handler initializes data and index information for the table within the InnoDB tablespace. Under UNIX, the ownership and mode of any files created to represent the table are set to allow access only to the account that the server runs as.

When you issue an ALTER TABLE statement, the server re-encodes the table's .frm file to reflect the structural change indicated by the statement and modifies the contents of the data and index files likewise. This happens for CREATE INDEX and DROP INDEX as well because they are handled by the server as equivalent ALTER TABLE statements. Altering an InnoDB table causes the handler to modify the table's data and indexes within the InnoDB tablespace.

DROP TABLE is implemented by removing the files that represent the table. Dropping an InnoDB table also causes any space associated with the table in the InnoDB tablespace to be marked as free.

For some table types, you can remove a table manually by removing the files in the database directory to which the table corresponds. For example, if mydb is the current database and mytbl is an ISAM, MyISAM, BDB, or MERGE table, DROP TABLE mytbl is roughly equivalent to the following commands on UNIX:

% cd DATADIR
% rm -f mydb/mytbl.*

or to the following commands on Windows:

C:\> cd DATADIR
C:\> del mydb\mytbl.*

For an InnoDB or HEAP table, parts of the table are not represented within the file system in discrete files, so DROP TABLE does not have a file system command equivalent. For example, the .frm file is the only file uniquely associated with an InnoDB table. Removing that file will leave the table data and indexes "stranded" within the InnoDB tablespace.

Operating System Constraints on Database and Table Naming
MySQL has general rules for naming databases and tables. The rules are listed in detail in Chapter 3 but can be summarized briefly as follows:

Names can be constructed from the alphanumeric characters in the current character set, as well as the underscore and dollar characters ('_' and '$').

Names can be up to 64 characters long.

From MySQL 3.23.6 on, other characters can be used in a name by quoting the name in backticks (for example, `odd@name`). Quoting is often necessary if you use reserved words as column names.

However, because names of databases and tables correspond to names of directories and files, the operating system on which a server runs may impose additional constraints that stem from file system naming conventions:

You are limited in database and table names to the characters that are legal in filenames. This is true for every table type, because every type is represented in the file system by at least a .frm file. For example, '$' is allowed in a name by MySQL's rules, but if your operating system doesn't allow it in filenames, you can't use it in directory or table names either. In practice, this is not a concern for either UNIX or Windows. The greatest difficulty you might have is referring to names directly from the shell when performing database administration. For example, the '$' dollar sign character is special to UNIX shells. If you give a database a name such as $mydb that includes that character, any reference to the name from the command line may be interpreted by the shell as a variable reference:

% ls $mydb
mydb: Undefined variable.

If this happens, you must escape the '$' character or use quoting to suppress its special meaning:

% ls \$mydb
% ls '$mydb'

If you use quotes, use single quotes. Double quotes do not suppress variable interpretation.

A database or table name cannot include the pathname separator character, even if quoted. For example, on UNIX and Windows, pathname components are separated by '/' and '\', respectively, and neither character can be used. The reason that both are disallowed regardless of platform is to make it easier to move databases and tables from one platform to another. (For example, if you were allowed to use '/' in the name of a table on Windows, you could not move the table to UNIX.)

Although MySQL allows database and table names to be up to 64 characters long, the length of names is also bound by the length allowed by your operating system. Normally this is not a problem, although under UNIX, some older System V-ish systems may enforce a 14-character limit. In that case, the effective limit on database names is 14 characters. The limit for table names is 10 characters because names of files representing tables end with a period and an extension of up to three characters.

Case sensitivity of the underlying file system affects how you name and refer to databases and tables. If the file system is case sensitive (as is typical for UNIX), the two names mytbl and MYTBL refer to different tables. If the file system is not case sensitive (as for Windows or for HFS+ file systems under Mac OS X), mytbl and MYTBL refer to the same table. You should keep this issue in mind if you develop a database on a server that uses case sensitive filenames and there is a possibility you might move the database to a server where filenames are not case sensitive.

One way to deal with the issue of case sensitivity is to always name your databases and tables with a given lettercase. Another is to set the lower_case_table_names server variable to 1, which has two effects:

The server converts a table's name to lowercase before creating the corresponding disk files.

When the table is referenced later in a query, the server converts its name to lowercase before attempting to find the table on disk.

The result of these actions is that names are not treated as case sensitive. However, there are two caveats you should keep in mind with respect to use of the lower_case_table_names variable. First, it does not apply to database names until MySQL 4.0.2; prior to that it applies only to table names. Second, you should enable this variable before you start creating databases or tables, not after. If you create names that include uppercase characters and then set the variable, it will not have the desired effect because then you will already have names stored on disk that are not entirely lowercase. To avoid this problem, rename any databases or tables with names that contain uppercase characters to names that are entirely lowercase before enabling the variable.

Factors That Affect Maximum Table Size
Table sizes in MySQL are bounded, but sizes are limited by a combination of factors, so it is not always a simple matter to determine precisely what the bounds are. Factors that affect table size are as follows:

MySQL has its own internal limits on table sizes. These vary by table type:

For ISAM tables, the .ISD and .ISM files are limited to 4GB apiece.

For MyISAM tables, the .MYD and .MYI files also are limited to 4GB apiece by default. However, by using the AVG_ROW_LENGTH and MAX_ROWS options when you create the table, the files can be approximately 8 million terabytes apiece. (See the description for CREATE TABLE in Appendix D, "SQL Syntax Reference.")

The maximum size of a MERGE table is a function of the combined maximum sizes of its constituent MyISAM tables.

BDB table sizes are bound by the .db file size allowed by the handler. This varies according to the table page size (which is determined at server build time), but for even the smallest page size (512 bytes), the maximum file size is 2 terabytes.

For InnoDB, the maximum size of the InnoDB tablespace is 4 billion pages, where the default page size is 16KB. (MySQL can be recompiled from source to use an InnoDB page size ranging from 8KB to 64KB.) The maximum tablespace size also is the bound on the size of any individual InnoDB table.

The operating system imposes a maximum file size limit. In general, the trend has been for operating systems to relax file size constraints over time, but limits as low as 2GB are still relatively common. This size limit applies to files used to represent tables, such as the .MYD and .MYI files for a MyISAM table. It also applies to the component files that make up the InnoDB tablespace. However, the InnoDB tablespace size can easily exceed the maximum file size; just configure a tablespace that consists of multiple files, each of which is the maximum file size. Another way to circumvent the file size limit is to use raw partitions in the InnoDB tablespace, which can be done as of MySQL 3.23.41. Tablespace components that are on raw partitions can be as large as the partition itself.

For table types that represent data and indexes in separate files (such as ISAM and MyISAM), a table's size limit is reached when any of its individual files hits its own size limit. The table's indexing characteristics will affect which file this will be. For a table with no or few indexes, it is likely that the data file will reach its size limit first. For a heavily indexed table, the index file may hit the limit first.

The presence of an AUTO_INCREMENT column implicitly limits the number of rows a table may have. For example, if the column is TINYINT UNSIGNED, the maximum value it may hold is 255, so that also becomes the maximum number of rows the table may hold. Larger integer types allow more rows. More generally, placing any unique index on a table limits its row count to the maximum number of unique values in the index.

To determine the actual table size you can achieve, you must consider all applicable factors. The effective maximum table size likely will be determined by the smallest of those factors. Suppose you want to create a ISAM table. MySQL will allow the data and index files to reach 4GB. But if your operating system imposes a size limit on files of 2GB, that will be the effective limit for the table files. On the other hand, if your system has large file support, files can be bigger than 4GB and then the determining factor on table size will be MySQL's internal 4GB limit.

With respect to InnoDB tables, one point to keep in mind is that all such tables must fit within the InnoDB tablespace. If you have a single InnoDB table, it can be as large as the tablespace. But if, as is more likely, you have many InnoDB tables, they all share the same space and thus each is constrained in size not only by the size of the tablespace but also by how much of the tablespace is allocated to other tables. Any individual InnoDB table can grow as long as the tablespace is not full. Conversely, when the tablespace fills up, no InnoDB table can grow any larger until you add another component to the tablespace to make it bigger. (As of MySQL 3.23.50, you can make the last tablespace component auto-extending, so that it will grow as long as it does not exceed the file size limit of your system and disk space is available. See Chapter 11 for details on tablespace configuration.)

Implications of Data Directory Structure for System Performance
The structure of the MySQL data directory is easy to understand because it uses the hierarchical structure of the file system in such a natural way. At the same time, this structure has certain performance implications, particularly regarding operations that open the files that represent database tables.

One consequence of the data directory structure is that for table handlers that represent each table using multiple files, an open table can require multiple file descriptors, not just one. The server caches descriptors intelligently, but a busy server can easily use up lots of them while servicing many simultaneous client connections or executing complex queries that reference several tables. This can be a problem, because file descriptors are a scarce resource on many systems, particularly those that set the default per-process descriptor limit fairly low.

Another effect of representing each table by its own files is that table-opening time increases with the number of tables. Operations that open tables map onto the file-opening operations provided by the operating system and, as such, are bound by the efficiency of the system's directory-lookup routines. Normally this isn't much of an issue, but it is something to consider if you'll need large numbers of tables in a database. For example, a MyISAM table is represented by three files. If you want to have 10,000 MyISAM tables, your database directory will contain 30,000 files. With that many files, you may notice a slowdown due to the time taken by file-opening operations. (Linux ext2 and Solaris file systems are subject to this problem.) If this is cause for concern, you might want to think about using a type of file system that is highly efficient at dealing with large numbers of files. For example, ReiserFS exhibits good performance even with large numbers of small files. If that is not possible, it may be necessary to reconsider the structure of your tables in relation to the needs of your applications and reorganize your tables accordingly. Ask whether or not you really require so many tables; sometimes applications multiply tables needlessly. An application that creates a separate table per user results in many tables, all of which have identical structures. If you wanted to combine the tables into a single table, you might be able to do so by adding another column identifying the user to which each row applies. If this significantly reduces the number of tables, the application's performance improves.

As always in database design, you must consider whether this particular strategy is worthwhile for a given application. Reasons not to combine tables in the manner just described are as follows:

Increased disk space requirements. Combining tables reduces the number of tables required (decreasing table-opening times) but adds another column (increasing disk space requirements). This is a typical time versus space tradeoff, and you'd need to decide which factor is most important. If speed is paramount, you'd probably be willing to sacrifice a little extra disk space. If space is tight, it might be more acceptable to use multiple tables and live with a slight delay.

Security considerations. These may constrain your ability or desire to combine tables. One reason to use a separate table per user is to allow access to each table only to that user by means of table-level privileges. If you combine tables, data for all users will be in the same table.

MySQL has no provision for restricting access to particular rows to a given user; thus, you might not be able to combine tables without compromising access control. On the other hand, if all access to the data is controlled by your application (users never connect directly to the database), you can combine the tables and use application logic to enforce row-level access to the combined result.

Another way to create many tables without requiring so many individual files is to use InnoDB tables. The InnoDB handler associates only a .frm file uniquely with each table and stores the data and index information for all InnoDB tables together in the InnoDB tablespace. This minimizes the number of disk files needed to represent the tables, and it also substantially reduces the number of file descriptors required for open tables. InnoDB needs only one descriptor per component file of the tablespace (which is constant during the life of the server process) and briefly a descriptor for any table that it opens while it reads the table's .frm file.

MySQL Status and Log Files
In addition to database directories, the MySQL data directory contains a number of status and log files, as summarized in Table 10.1. The default location for these files is the server's data directory, and the default name for each of them is derived using the server host name, denoted as HOSTNAME in the table.

Table 10.1. MySQL Status and Log Files File Type Default Name File Contents
Process ID file HOSTNAME.pid The server process ID
General query log HOSTNAME.log Connect/disconnect events and query information
Slow-query log HOSTNAME-slow.log Text of queries that take a long time to process
Update log HOSTNAME.nnn Text of queries that modify data
Binary update log HOSTNAME-bin.nnn Binary representation of queries that modify data
Binary update log index HOSTNAME-bin.index List of current binary update log files
Error log HOSTNAME.err Startup and shutdown events and error conditions


The Process ID File
The server writes its process ID (PID) into the PID file when it starts up and removes the file when it shuts down. The PID file is the means by which a server allows itself to be found by other processes. For example, if the operating system runs the mysql.server script at system shutdown time to shut down the MySQL server, that script examines the PID file to determine which process it needs to send a termination signal to.

The MySQL Log Files
MySQL can maintain a number of different log files. Most logging is optional; you can use server startup options to enable just the logs you need and also to specify their names if you don't like the defaults. This section describes the log files briefly. For more information about the logs and the options that control the server's logging behavior, see Chapter 11.

The general log provides general information about server operation: who is connecting from where and what queries they are issuing. The update log provides query information, too, but only for queries that modify database contents. The contents of the update log are written as SQL statements that can be executed by providing them as input to the mysql client. The binary update log is similar to the update log but is represented in a more efficient binary format. The accompanying binary log index file lists which binary log files the server currently is maintaining.

Update and binary update logs are useful if you have a crash and must revert to backup files because you can repeat the updates performed after the backup was made by feeding the logs to the server. This allows you to bring your databases up to the state they were in when the crash occurred. This procedure is described in more detail in Chapter 13. The binary logs are also used if you set up replication servers because they serve as a record of the updates that must be transmitted from a master server to slave servers.

Here is a sample of the kind of information that appears in the general log as the result of a short client session that creates a table in the test database, inserts a row into the table, and then drops the table:

020727 15:00:17 1 Connect sampadm@localhost on test
2 Query CREATE TABLE mytbl (val INT)
2 Query INSERT INTO mytbl VALUES(1)
2 Query DROP TABLE mytbl
2 Quit

The general log contains columns for date and time, server thread (connection) ID, event type, and event-specific information. For any line that is missing the date and time fields, the values are the same as for the previous line that does have them. (In other words, the server logs the date and time only when they change from the previously logged date and time.)

The same session appears in the update log as follows. The statements include terminating semicolons, allowing them to be given as input to the mysql program should the updates need to be repeated for a recovery operation.

use test;
CREATE TABLE mytbl (val INT);
INSERT INTO mytbl VALUES(1);
DROP TABLE mytbl;

For the update log, an extended form of logging is available by using the --log-long-format option. Extended logging provides information about who issued each query and when. This uses more disk space, of course, but may be useful if you want to know who is doing what without trying to correlate update log contents with the connection events in the general log.

For the session just shown, extended update logging produces the following information:

# Time: 020727 15:00:17
# User@Host: sampadm[sampadm] @ localhost []
use test;
CREATE TABLE mytbl (val INT);
# User@Host: sampadm[sampadm] @ localhost []
INSERT INTO mytbl VALUES(1);
# User@Host: sampadm[sampadm] @ localhost []
DROP TABLE mytbl;

The extra information is written using lines that begin with '#' so that they are interpreted as comments if you feed the log to mysql for execution by the server.

The error log contains a record of diagnostic information produced by the server when exceptional conditions occur. It's useful if the server fails to start up or exits unexpectedly because it often will contain the reason why.

Logs can grow quite large, so it's important to make sure they don't fill up your file system. You can expire the logs periodically to keep the amount of space that use within bounds. For information on log file maintenance.

It's a good idea to make sure your log files are secure and not readable by arbitrary users because they may contain the text of queries that include sensitive information, such as passwords. For example, the following log entry displays the password for the root user; it's certainly not the kind of information you want just anyone to have access to:

020727 15:47:24 4 Query UPDATE user SET
Password=PASSWORD('secret')
WHERE user='root'

The logs are written to the data directory by default, so securing your logs is a matter of securing the data directory against being accessed through login accounts other than that of the MySQL administrator.

No comments:

Post a Comment