16 February 2009

The future of RDBMS's

Tony Bain writes over at ReadWriteWeb about the subject Is the Relational Database Doomed? While I think relationships are essential to data, there for sure exists problems with RDBMS's and how they handle data. I mentioned a bit about it in my previous post. The article by Tony Bain is a nice wrap up on RDBMS vs. key/value stores, but there's still a lot to discuss around it. In the beginning of the article Tony Bain describes how RDBMS's function and says:

Those tables have constraints, and relationships are defined between them.

As far as I know you technically speaking define constraints on foreign keys in a RDBMS. Then you choose to think of those keys as representing relationships. But as long as a corresponding key entry exists there's no way for a RDBMS to tell if the application (SQL statement) got the relationship right! Usually it's also quite hard to see this from the SQL code. I think Pawel Lubczonok is with me on this one:

The word relational should be replaced with slightly relational. The relations reflected are only of the most trivial nature: key lookup. All other relations are embedded in programs that read some data and write some data.

That's why you should think twice about statements like this (from the article):

The inherent constraints of a relational database ensure that data at the lowest level have integrity.

"Lowest level" is very adequate here in my opinion.

Bain goes on to say that the problem RDBMS's face today is that of scalability. To this end Pawel Lubczonok wrote response well worth thinking through:

What is being discussed is scaling to volume. yes rdbms scales badly. however there is other scaling that is much greater problem : scaling to complexity. here rdbms is hopeless, vast number of tables have to be created and this is solidified upfront.

Next, Bain goes on to describe key/value stores and comparing them to RDBMS. As a comment to the term "non-relational" for the key/value stores Lemon Obrien writes:

here's the deal: if you use a "key" in any way to access data, it's a relationship, aka Relational Database.

I don't get why people think keys are relationships. You can implement relationships in different ways like using keys or pointers (hello C/C++!) It's also possible to let the DBMS abstract away the details of this for you - after all, you have a DBMS to abstract things away for you. Are everyone so obsessed with keys while they once put a lot of effort into understanding them?! In graph databases relationships are first class citizens of the model, so you don't actually need to know so much about how they are implemented. As Bain doesn't mention graph databases or Neo4j, I'm happy to see someone else did.

Andrej Koelewijn goes for the really big scale stuff, saying:

In my opinion the database implementation is getting less and less important, but the ability to view loosely coupled distributed data as consistent whole. We need to be able to treat the internet as a database.

In a blog post he takes this further and says "REST is a distributed data model". Interesting thoughts, especially if you have read Martin Fowlers post on the future of databases and integration.

My conclusion from all of this is: The future of databases is to combine different ways to store application data. Don't squeeze data into a model that isn't a good fit - at least for web-scale applications, it won't perform well. So there's a lot of fun here in learning about the new models that exist and inventing new ones!

3 comments:

Andrej said...

My response: http://www.andrejkoelewijn.com/wp/2009/02/16/your-data-wants-to-be-free/

Tony Andrews said...

A common mistake: the word "relational" has nothing to do with "relationships". It comes from the mathematical term "relation" - see http://en.wikipedia.org/wiki/Relational_database#Relations_or_Tables

Also you say "But as long as a corresponding key entry exists there's no way for a RDBMS to tell if the application (SQL statement) got the relationship right!" Well of course there isn't! No database or application can prevent you from recording untrue information, only logically inconsistent information.

Anders Nawroth said...

@Tony: regarding: "No database or application can prevent you from recording untrue information, only logically inconsistent information."

They can be more or less helpful. Compare to pointers in C vs. Java. Java hides more of the implementation and thus prevents you from messing up things in some ways. I believe SQL forces you to think a lot about the implementation detail "key", where you rather should focus on your data. At the moment, SQL has defined our way to think about data models to such an extent that we actually have problems thinking about our data while not thinking in terms of tables and keys!

Post a Comment