In-Memory Databases At A Glance
With the advent of high speed processors, the need to reduce latency between the processor and data storage becomes more compelling. With continuously dropping memory prices and steadily increasing system RAM capacity, in-memory databases are gaining lot of traction as a solution for improving the performance of real-time OLTP (Online Transaction Processing) applications as well as OLAP (Online Analytical processing) applications that process a high volume of data.
Data in in-memory database systems (IMDB) reside permanently in main physical memory as opposed to conventional database systems where data resides on disk. Since data can be accessed directly from memory, IMDBs can provide faster response times compared to sequentially accessed Disk Resident Databases (DRDB). Main memory databases are faster than disk-optimized databases since the processing algorithms are simpler and execute fewer CPU instructions. Data access in memory eliminates seek time when querying the data, which provides faster performance than disk. Data access in-memory is more than a 100-1000x faster than disk and data access from memory are measured in nanoseconds whereas data access from disk is typically measured in milliseconds.
An obvious question that could arise is that, “Is it not possible to attain this performance by deploying a large cache or by deploying a traditional database on RAM disk?” Even though disk based data can be cached for improving performance, the benefits are applicable only for data retrieval. For insert and update operations the data needs to be written to disk. Moreover, the cache processing functions such as cache synchronisation, cache lookup, cache eviction etc. present a significant overhead when considered in aggregation. This results in a slower data retrieval process when compared to in-memory processing. Even if we move the traditional database to a RAM disk, it will not be able to perform as good as IMDBs as the data structures and data access algorithms are designed for disk based system and cannot work well in memory.
Data Storage & Access
IMDB data resides in main memory and it will have a backup copy on disk. There are cases where the database size is huge/rapidly growing and an entire database may not fit into memory. Most of the database vendors support lazy loading where only essential data will be loaded into the memory during database start-up and the rest of the data will be loaded on demand.
Based on the data storage mechanism different variants of in-memory databases are available. It can be a row store, column store, NoSQL or a NewSQL database. Most in-memory databases provide good SQL support and are shipped with ODBC/JDBC drivers. This enables applications to use IMDB in place of existing Relational Database without significant changes.
Backup & Recovery
Since data resides in the volatile memory, all the stored information are lost when the device loses power or when it is reset. This makes the in-memory databases lack support for ‘Durability’ portion of the ACID properties. Hence it is extremely important to have a robust back-up and recovery mechanism in place. Some of the features that in-memory databases use to ensure data durability are
- Maintain a backup copy of the database in disks. Transaction logs are used to manage data modifications. When a transaction is committed, its details are written to transaction log files which normally reside in flash(non-volatile) memory and these changes will be written asynchronously to disk data files at regular checkpoints. After system failure the database manager restores its data from a latest checkpoint and then the log changes are applied to bring the DB up-to-date.
The data should be backed up more frequently when compared to disk resident data. Unlike DRDB, the entire data may be lost when a memory board fails and such a failure entails a time-consuming recovery process. Hence it is desirable to always have a recent backup copy available.
- Database replication: Maintain one or more duplicate copy of the data and have a fail-over mechanism that allows the system to use one of these standby database when the primary database fails.
- Use Non-volatile RAM that can survive a system reset or power failure.
Leading IMDB Solutions
Most leading RDBMS vendors (Oracle, IBM, Microsoft, SAP, and Teradata etc.) have extended their platforms to support in-memory processing. SAP HANA, IBM DB2 with BLU Acceleration, Oracle TimesTen, Oracle 12c and Microsoft Hekaton etc. are some of the leading IMDB solutions. Other competitive solutions available for in-memory data processing include Aerospike (NoSQL- Opensource), Apache Cassandra (NoSQL), Kognitio (SQL & NoSQL), and VoltDB (NewSQL- Opensource). Some of these database vendors claim to have 8 to 25 times faster reporting and analytics and cases of more than 1,000 times faster answers to queries
IMDB in Action
In-memory databases are suitable for applications that require very fast data storage, access and analysis. Some of the applications that can benefit from the IMDB are:
- Real-time embedded systems such as set-top boxes, telecom equipment and consumer electronics etc.
- Non-embedded applications that require real time data processing such as trading and financial services in order to identify and explore market opportunities
- Batch processes that take hours for execution can be squeezed into minutes or even seconds
- Recommendation engines that delivers more personalized product recommendations to the customer based on their buying patterns, customer preferences, order history etc.
- IoT applications that help to improve operational efficiency by providing proactive maintenance of various systems through real-time data analysis
- Real-time read/write cache for ecommerce systems
Even though IMDB offers significant performance improvements over traditional DRDB, there are some drawbacks to IMDB that hinder their widespread adoption for OLTP. The main disadvantage of IMDB is that the recovery process is time consuming as the database needs to be rebuilt from the snapshot and transaction logs. The impact of failure can be reduced by setting up IMDB in high availability mode using storage replication, but it requires that the data be replicated synchronously to another data node. This will cause some performance degradation. To get the full benefit of IMDB and to ensure data durability you would need to add additional hardware which entails some additional cost.
The performance benefits make IMDB a perfect candidate for OLAP application where business insight can be derived and decisions can be made in real-time.
When choosing IMDB for OLTP applications with high availability requirements, the cost of attaining high availability must be carefully weighed against the performance benefits of using IMDB before arriving at a decision.