ACID & OpenInsight (General)

What is ACID?

ACID simply stands for Atomicity, Consistency, Isolation and Durability which consist of a set of properties that guarantee that our database transactions are processed reliably. So that makes things a little clearer.

We can further illustrate that with an example. We all understand the basic interaction between bank accounts. When one bank account is debited, the funds are transferred through the banking system and into another account. In computing talk, this ‘single transaction’ consists of multiple changes to the system and the underlying database. The debtors account data is modified to show a debit on the account, various banking records are changed to reflect the status of the transfer of funds, bank charges are applied, send and receive confirmations are logged and finally the creditors bank account data files are updated to show the money transfer for the statement and the balance is updated, etc. In computing terms, all of these actions form ‘one single transaction’ and that transaction must be managed as a whole to maintain data and system integrity.

Enter ACID, an acronym first introduced in 1983 by Andreas Reuter and Theo Härder – If you Google ACID you will find plenty of information, but to summarise:

Atomicity - Requires that the whole of the transaction completes successfully. If any one part of the transaction fails then the whole transaction fails and the database remains unchanged. A failure can be for any reason whatsoever. It can be due to hardware, network or operating system issues, or simply a failure (bug) in the software itself.

For example, in our example above, if the receiving bank failed to receive the transferred funds, the transaction would break and system would reverse (or not commit) any data changes on the debtors’ account side, even though the computer processing may have already begun the process of debiting the account.

In short, the database remains in a condition as if the process never even started.

Consistency - Requires that the transaction remains consistent throughout its lifecycle. As far as the database is concerned, this means using validation to ensure that all defined rules are maintained (including all constraints, cascades and triggers) and that any programming errors do not result in any violation of those defined rules.

Most database application developers build various layers of validation into their systems to ensure consistency is maintained and this becomes a paradigm for good basic system design. In addition, developers often include double checks in their code to ensure consistency is maintained – much like an accountant running a trial balance or double entry book keeping.

For example, our banking system has two accounts. We know that A has a balance of 100 and B has a balance of 0 and that the system needs to complete a transaction that moves 10 units from A across to B. We simply deduct 10 units from A and add 10 units to B. We could stop there and assume that the transaction completed and consistency is maintained, but that would not satisfy the C in ACID.

As developers we need to go one step further and verify that the transaction completed and that the system is on a state where A now equals 90 and B equals 10, or more simply put, we run a check on the system (A+B=100). If A+B is not equal to 100, then we need to trap the error and not commit the data transaction to the database and leave the database in the state that it was in before we started. i.e. A=100 and B =0. Consistency is maintained.

Isolation - Isolation is the main goal of concurrency control and simply ensures that transactions are completed from beginning to end in a logical number of steps and that the system state is maintained throughout.

In our banking example, this would ensure that the various database transactions that make up the transfer of the funds across accounts are processed in a logical and coherent order. i.e. the debtor’s bank account is debited as a first step and their balance reduced, that a debit record is created to show the debit on the account and on statements, that the funds are then moved into the debtor’s banking processes, transaction log files are updated, etc. etc. If the transaction fails before the funds are passed into the creditor’s banking system, the transaction is stopped.

At this point the process remains isolated. In other words, the debtor’s bank’s side of the transaction contains a number of database changes waiting to be finalized (written to the database), but the creditor’s side of the transaction does not yet even know that there is a transaction underway – the initial debtor’s side processes are isolated from the creditor’s side of the process, or the creditor’s bank knows nothing about the transaction taking place.

In addition, Isolation needs to isolate two similar transactions that are made at the same time. For example, we have a multi-user system and two different users request to move some funds from Account A to Account B. As developers, we need to maintain isolation between the two transactions so that the first user’s transaction completes before the second users transaction begins. If the first transaction fails, the second remains unaffected but it can complete in its own right; the first being backed out, or reversed.

Durability - Durability simply means that the whole process is committed to the database once the transaction has completed and ALL data fields must be updated accordingly, even if there is a system failure for any reason, including a power outage, operating system crash, etc. In other words, the whole transaction has to complete without any loss of data and that the state of the system must remain intact. This requirement often requires the use of some form of non-volatile memory to temporarily maintain the various data changes so that they can be written once the system is back up and running again. Without the use of non-volatile memory, the data changes would be lost and the Durability element of ACID compromised.

For example, during our bank transfer the process breaks towards the end of the transaction – the debtors accounts has been debited, the logs updated and the creditors account is credited AND all of that information is held in memory awaiting a final commit to write to the database. We now want to free up the system for the user to carry on working whilst we run some validation checks before updating the database. Then the system experiences a power outage and crashes. The user rightly assumes that the data has been written away but it remains held in memory and the database is unchanged. Durability has been lost.

It therefore follows that the D in ACID really comes down to application development and design. In the example above, it would have been better to have simply held the user message until the validation checks had been completed and the data committed (written) to the database. So check that A=90, that B=10 and that C=100, where C=A+B. However, the developer still needs to tackle the requirement to update the database following an outage or other technical issue that halted the writing of the data to the database.

There are countless points of failure in any database system and most of those are beyond the developer’s control. We all include error trapping in our code, but how do you legislate for poor system management that results in the hard disk running out of space, or a network failure that results in a crash on the system or an operating system crash resulting in a reboot. As developers wishing to tick the ACID checkbox on our pre-sales checklists, we need to address these issues and the ACID paradigm helps us to do this. As a rule, disk writes should not be cached either.

Acid Support with OpenInsight.pdf

ACID & OpenInsight (General)

Created at 23 APR 2015 01:28PM

Revelation On-Line Wiki