In his November 24 article Database Thaw Martin Fowler wrote about the lock-in of relational/SQL databases and some possible alternatives. I was with him until he wrote that “you can’t get [a] bigger breach of encapsulation than that,” referring to a central database shared by multiple applications. (The indented quotes below are verbatim from Fowler’s article).
For many organizations today, the primary pattern for integration is Shared Database Integration - where multiple applications are integrated by all using a common database. When you have these Integration Databases, it’s important that all these applications can easily get at this shared data - hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.
That’s a fair argument, though I don’t agree that SQL has been central to the dominance of databases, or central to making shared database integration the “primary pattern” for integration. Before SQL-based databases took over in the mid-1980s, integration through shared non-relational databases was already the norm. SQL standardizes how users and application get data in and out of the database, and how database schemas are described, but it’s just a language that more or less replaced many proprietary and unstandardized languages that did the same thing.
The heating of the database space comes from the presence of alternatives to integration - in particular the rise of web services. Under various banners there’s a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I’ve never liked the approach of multiple applications tightly coupled through a common database - you can’t get bigger breach of encapsulation than that.
I’ll go along with the first part, about applications talking with XML over HTTP. I hesitate to call that talking “integration,” though; it’s not comparable to the kind of integration trusted to databases. I’ll talk more about that point later. What jumped out at me was the statement “you can’t get a bigger breach of encapsulation” than a shared database. That struck me as a misunderstanding of what a shared database is for, and particularly a misunderstanding of how a relational/SQL database should be used.
All modern relational databases engines, with the arguable exception of MySQL, are intended to be more than dumb backing store. Database engines are designed to encapsulate data and the valid operations on the data, and to guarantee data integrity to multiple instances of multiple applications accessing the data simultaneously. Large enterprise databases are usually implemented like that. Using those databases, programmers don’t send raw SQL SELECT and UPDATE statements to the database server, and they don’t serialize objects into dumb tables for convenience. The API offered by the database to client applications can and should be the same kind of API you’d expect in a good object-oriented library: well-defined operations on abstract objects, with the underlying data and its representation invisible. If your code is doing this:
INSERT INTO invoices (customer, date, amount) ...;
UPDATE customers SET balance = balance + amount;
and not something like this:
NEW_INVOICE(customer, date, amount);
you are doing it wrong. A shared database should not expose the invoices and customers tables. Instead there should be an API implemented with stored procedures in the database, or a common data access layer written in a non-SQL procedural language to hide the underlying data representation.
I’ll restate Fowler’s concern like this: You can’t get a bigger breach of encapsulation than spreading knowledge of the data and operations on it across multiple applications.
It isn’t the database engine, or the concept of relational databases, or SQL that is the breach of encapsulation. It’s that far too many programmers are either unaware of how to use a database to promote encapsulation, or they don’t see the problem. My own experience is that a lot of programmers learned relational databases from MySQL, which until recently didn’t support things like transactions, stored procedures, views, and triggers, which are required to make a database into something more than a convenient backing store.
The view of databases as something that promotes bad software implementation is common. Ignorance of how relational databases work and how they should be used to make application integration easier and more robust is widespread and growing, evidenced by the ongoing “alternative database” movement that trumpets every reinvention of old ideas (key/value pairs, network databases, hierarchical databases, etc.) as something new and revolutionary. Fowler continues:
If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being Integration Databases to Application Databases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping - such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.
I can’t imagine a worse breach of encapsulation than integrating multiple “application databases” over HTTP. And I can’t imagine a worse threat to data integrity. If each application can have its own copy of data, and its own rules for operating on the data, either the applications are going to have a lot of duplicated code and data (a violation of the DRY principle to say the least), or each application’s copy of the data will be possibly wrong and untrustworthy.
SQL is not comparable to XML over HTTP, for reasons that should be obvious, but apparently aren’t. SQL is a declarative language for describing legal operations on a database. XML over HTTP is a data transport mechanism. XML+HTTP can’t address the problems already solved by relational database engines: integrity, consistency, reliability, transactions, locking, and so on. All of those things that are important to real applications will either be ignored or implemented over and over again, many times incorrectly.
Fowler concludes:
I don’t think this means that relational databases will disappear - after all they are the right choice for many situations. But it does mean that now application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options.
I’ll agree that relational databases are not the right choice for everything, and that there are applications that don’t need to care about data integrity or encapsulation or sharing across applications. I hope that when developers look at those “other options” they carefully consider what they might be giving up going down the non-relational path. It will help if developers making those decisions have a serious understanding of relational databases and the powerful features of modern RDBMSs, and don’t just dismiss a successful and mature set of tools as an obsolete holdover from the computing stone age.
I’m always interested in what might be happening in the world of databases, which is why I read Martin Fowler’s article in the first place, but if this is what is happening — a giant step backward in data integrity and encapsulation — I am going to keep my relational database and SQL skills sharp for the coming Rails/MySQL, XML+HTTP, “application database” apocalypse.