Friday, April 2, 2010

Internal MySQL Storage Engines for Data Warehousing

Although any MySQL engine can be used for data warehousing, the ones that specifically lend themselves
to the data warehousing arena include:
• MyISAM
• Archive
• Memory
• CSV
• Merge
• Federated


MyISAM
The MyISAM storage engine is the default engine for MySQL. It offers high-speed query/insert capability, is
non-transactional, has table level locking (although it does have a concurrent insert feature that allows
inserts to be performed without blocking queries), and has good support for indexes (B-tree, full-text, etc.)
MyISAM is a good general engine for data marts and traditional data warehouses.

Archive
The Archive storage engine has already been covered in brief detail. It compresses data by up to 80% and
therefore offers good storage savings. In addition, it supplies fast table scans for large tables (>1GB), and
offers MVCC and row-level locking. Archive is also unique in that it only allows data to be inserted and
read, but never selectively modified (i.e. no UPDATE or DELETE), which makes it good for auditing data or
other sensitive information that should not be manipulated in any way.

Memory
The Memory engine is what one would expect – an engine that keeps all data in memory at all times. Main
memory tables are especially useful in data warehouses for dimension tables that are the object of joins
and single table scans, with MySQL memory tables offering very fast response times for both full table
scans and index lookups (B-tree and hash indexes are supported).


CSV
CSV tables allow flat file data that exists in comma delimited form to be accessed via SQL from within the
MySQL server. Data in CSV format can be instantly loaded into MySQL by simply creating a table object
that mirrors the format of the CSV flat file, renaming the CSV flat file on the operating system to the name
of the MySQL CSV table object, and all the data is immediately made available to the MySQL server. It
doesn’t matter if the file as one billion or 100 billion records, the data is instantaneously available for use. In
addition, data in CSV tables can be manipulated via DML inside MySQL or edited outside of the server with
file editors (and the proper file privileges).

Merge
The Merge engine is how basic data partitioning was accomplished in versions of MySQL below 5.1. A
DBA can combine 1-n identical MyISAM tables together to form one table object that can inserted, updated,
deleted, and queried. Each underlying MyISAM table can be placed on a separate physical disk drive than
the others and can have its own indexes. In MySQL 5.1 and higher, the Merge engine is still available,
however DBA’s can also utilize normal data partitioning (horizontal partitioning of rows with range, hash,
key, list, and composite being supported).


Federated
Finally, the Federated engine allows data warehouse designers to create one logical database out of many
different physical database servers. Working in much the same was as Oracle database links or SQL
Server Linked Servers do, the federated engine provides the ability to create distributed links to other
physical database servers and reference their objects as if they existed on the initial source server.
Other internal storage engines are developed and maintained by MySQL (Falcon, Blackhole, etc.), but the
above represent the engines that lend themselves best to a data warehousing installation.

No comments: