A log of articles I found for later reading. ...................................................... ..............................Not necessarily my point of view though.

Sunday, January 11, 2009

Database 2.0 - Part II

Object-oriented databases are naturally fast. Much faster than their SQL counterparts. They are re-emerging in the world of embedded devices, video games, CAD applications - the general class of desktop applications that require persistent storage but don’t necessarily have a large server to connect to. They can get storage from an ODBMS, because they don’t rely on inefficient relational technologies that warrant large and expensive computer hardware.

Now consider a commercial application which operates in a transactional business model. These include banks, the stock exchange, telecommunications, billing systems and so on. Basically anything that relies not only on secure storage of customer data, but also on the storage of all those activities that have led to the present state. Each such activity is a transaction, and these too, need to be stored and retrieved from time to time. Previously, object databases weren’t quite ready, while relational juggernauts weren’t even competing in the transactional arena. Sure, a relational database offers a COMMIT and ROLLBACK, but these serve purely to demarcate a transaction so that it can be recovered should a rat chew through the 30 amp power cord (God have mercy on its soul). No relational database will allow you to query the transaction records and, in fact, there isn’t even a SQL standard that governs this sort of “transitional” data. So if you are a financial institution in need of proper auditability of your customers’ actions, you had to follow one of two paths.

The first path is simple… at first. You buy an Oracle license. Make that 10 licenses because you will need a cluster to make a relational database write records at an acceptable rate. You then plan out a schema, as you would, but allow for a few extra tables that would store transitional information. For instance, if you wanted to store the fact that customer A deposited $10 into customer B’s account, you would need a table with at least 3 columns: 2 for the customers’ account numbers and 1 for the amount, as well as some additional data such as the date/time to log when this transaction had occurred, as well as some unique identifier. Don’t forget to index this, by both account numbers and the transaction ID, otherwise the data is next to useless. Presto! Now you can query this table with SQL and you have yourself a makeshift transaction processor.

The second path is to purchase IBM CICS, or some other dedicated transaction processor that is inherently aware of not just the current state, but also all of the transitional elements that have collectively formulated that state over time. But you don’t really want to do this. CICS is morally outdated and is generally reserved for legacy stuff. The impedance mismatch with object-oriented languages is enormous. Building with a greenfield app with CICS now, would be like building an LCD TV with valves. And, on top of everything, it locks you into a proprietary OS.

So back to the first path then. Now we have to track transitional data, as well as everything else in separate tables. When a transaction is processed, two independent sets of tables are modified. The workload has just doubled. Remember when I mentioned 10 Oracle licenses? Let’s make that 20.

But outright power isn’t everything. This is a well-known fact in motorsport; known but not always appreciated in high performance computing. What about the flexibility of data?

A little while ago I received a statement from my bank, outlining the transactions on my account for the recent month. One transaction that caught my eye was an information message, which on that occasion stated “Your interest rate is now …”. But oddly enough, there was an entry beside this message in the credit column. The amount was $0.00. An innocent entry, no question about it. But it is obvious that the bank’s transaction processor is incapable of adequately persisting transactions of different types, and so the developers have simply shoehorned an information message into a credit transaction. This is the second largest bank in Australia, by the way.

When people piggyback a transaction processor atop of a relational databases, they face the same problems they did before, only now they have to solve it a second time. Just as well, because transactions are too, rich objects with an arbitrary structure that doesn’t fit in a relational model. Personally, I’ve seen solutions that use XML fragments in VARCHAR fields and BLOBs to solve this problem. My bank has solved this problem using the least sensible way: through denormalisation. This really is an example of a square peg in a round hole, and if you happen to be a DBA or a back-end developer, you would have experienced worse examples than this.

So why did I do it?

Because I know that, given a few more years of ignorance on behalf of people in my position, CICS won’t be kicking while Oracle and Sun’s new foster child MySQL will dominate the market. Not in any better form that they are now, just less competition due to the Sun / Oracle duopoly. People will be using relational databases for everything and SQL will be taught in secondary schools. People are already accustomed to ORM frameworks for stateful data, and soon (if not already) this will naturally be applied to transitional data. And so there you have it, a new age transaction processor, sitting on top of a MySQL cluster, running a dozen machines and using O-R mapping frameworks to alleviate the pain of object-oriented transaction processing. Unstructured data is still stored as BLOBs and the whole system barely manages 100 transactions per second. It costs about $1,000,000 to purchase and requires about a room full of cooling equipment.

 

via http://blog.gtradenet.com/?p=25