Relational Database Concepts

ACID (Atomicity, Consistency, Isolation, Durability) Properties

In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction.

Atomicity
Atomicity refers to the ability of the DBMS to guarantee that either all of the tasks of a transaction are performed or none of them are. For example, the transfer of funds can be completed or it can fail for a multitude of reasons, but atomicity guarantees that one account won't be debited if the other is not credited.


Consistency
Consistency property ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not).

Isolation
Isolation refers to the ability of the application to make operations in a transaction appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state; for example, a bank manager can see the transferred funds on one account or the other, but never on both—even if he ran his query while the transfer was still being processed. More formally, isolation means the transaction history (or schedule) is serializable. This ability is the constraint which is most frequently relaxed for performance reasons.

Durability
Durability refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone. This means it will survive system failure, and that the database system has checked the integrity constraints and won't need to abort the transaction. Many databases implement durability by writing all transactions into a log that can be played back to recreate the system state right before the failure. A transaction can only be deemed committed after it is safely in the log.


Need-to-Know for the Database Developer

The Rules of the Game Codd's Twelve Rules.

Many references to the twelve rules include a thirteenth rule - or rule zero:

A relational database management system (DBMS) must manage its stored data using only its relational capabilities.This is basically a corollary or companion requirement to rule #4

1.Information RuleAll information in the database should be represented in one and only one way -- as values in a table.

2.Guaranteed Access RuleEach and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name.

3.Systematic Treatment of Null ValuesNull values (distinct from empty character string or a string of blank characters and distinct from zero or any other number) are supported in the fully relational DBMS for representing missing information in a systematic way, independent of data type.

4.Dynamic Online Catalog Based on the Relational ModelThe database description is represented at the logical level in the same way as ordinary data, so authorized users can apply the same relational language to its interrogation as they apply to regular data.

5.Comprehensive Data Sublanguage RuleA relational system may support several languages and various modes of terminal use. However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and whose ability to support all of the following is comprehensible:
a. data definition
b. view definition
c. data manipulation (interactive and by program)
d. integrity constraints
e. authorization
f. transaction boundaries (begin, commit, and rollback).

6.View Updating RuleAll views that are theoretically updateable are also updateable by the system.

7.High-Level Insert, Update, and DeleteThe capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update, and deletion of data.

8.Physical Data IndependenceApplication programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods.

9.Logical Data IndependenceApplication programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

10.Integrity IndependenceIntegrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

11.Distribution IndependenceThe data manipulation sublanguage of a relational DBMS must enable application programs and terminal activities to remain logically unimpaired whether and whenever data are physically centralized or distributed.

12.Nonsubversion RuleIf a relational system has or supports a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-time) relational language.

Relational Database Normalization

The concept of database normalization is not unique to any particular Relational Database Management System. It can be applied to any of several implications of relational databases including Microsoft Access, dBase, Oracle, etc.

The benefits of normalizing your database include:
1.Avoiding repetitive entries
2.Reducing required storage space
3.Preventing the need to restructure existing tables to accommodate new data
4.Increased speed and flexibility of queries, sorts, and summaries

There are 5 normal forms in all, each progressively building on its predecessor. In order to reach peak efficiency, it is recommended that relational databases be normalized through at least the third normal form. In order to normalize a database, each table should have a primary key field that uniquely identifies each record in that table. A primary key can consist of a single field (an ID Number field for instance) or a combination of two or more fields that together make a unique key (called a multiple field primary key).

1NF
The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of values within a column.
One method for bringing a table into first normal form is to separate the entities contained in the table into separate tables. In our case this would result in Book, Author, Subject and Publisher tables.

2NF
The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key.

3NF
Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency).

2PL (2-Phase Locking) vs 2PC (2-Phase Commit)

2PC and 2PL are protocols used in conjunction with distributed databases.

The two phase lock protocol (2PL) deals uniquely with the fact how locks are are acquired during a transaction whereas the two phase commit (2PC) protocol deals with the fact how multiple hosts decide whether one specific transaction is written (committed) or not (abort).

2PL says that first there is a phase where locks are (during a transaction) acquired (growth phase) and then there is a phase where the locks are being removed (shrinking phase). Once the shrinking phase started no more locks can be acquired during this transaction. The shrinking phase usually takes place after an abort or a commit phase in a typical database system.

The essence of 2PC is that after a transaction is complete and should be committed a vote starts. Each node which is part of the transaction is asked to "prepare to commit". The node will then check whether a local commit is possible and if yes it votes with "ready to commit" (RTC) [Important: changes are not being written to the database at that point]. Once a node signaled RTC the system must be kept in a state where the transaction is always committable. If all nodes signal RTC the transaction the transaction master signals them a commit. If one of the nodes does not signal RTC the transaction master will signal abort to all local transactions.

If all transactions follow 2PL principal, their interleaved execution is always serializable But 2PC does not guarantee that the execution would be deadlock free

Deadlock: two (or more) transactions, each of them waiting for a resource held by the other.

Deadlock Prevention Algorithms: Wait-Die, Wound-Wait


Links
http://databases.about.com/od/specificproducts/a/acid.htm

http://www.15seconds.com/issue/020522.htm

http://www.ianywhere.com/developer/product_manuals/sqlanywhere/0901/en/html/dbugen9/00000159.htm

http://www.devhood.com/tutorials/tutorial_details.aspx?tutorial_id=95

http://www.bkent.net/Doc/simple5.htm

http://www.serverwatch.com/tutorials/article.php/10825_1549781_3

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.databasejournal.com/sqletc/article.php/1428511

http://www.bkent.net/Doc/simple5.htm http://www.acm.org/classics/nov95/toc.html

http://en.wikipedia.org/wiki/Database_normalization

Database Knowledgebase : http://database.ittoolbox.com/

All about SqlServer - http://www.sqlservercentral.com/