04 March 2009

Flexibility in data modeling

Martin Fowler posted an interesting article regarding Contradictory Observations in his bliki. Simply put, real world data doesn't always adapt well to our idea of how data models should be organized. Fowlers uses the blood group of a patient as an example:

One thing the clinicians were very strong about was this need to capture contradictory information. I might have a note from the Royal Hope Hospital saying my blood type is A and another note from the Sisters of Plenitude saying my blood type is B. This would clearly be nonsense, blood types don't change. But that doesn't mean we cannot record these two bits of data. Without further investigation we don't know which one is correct.

The solution the team used to solve this problem was to record observations, not only simple attributes. The structure for the particular blood group problem looks along the lines of this:

Inside the nodes you find the blood group and the hospital that made the observation. To sum up, we can say the following:

  • a patient can have multiple blood group observations
  • every observation contains metadata as well
  • there can be relationships between observations ("rejects")

Now, this data isn't stored for its own sake, but to guide behavior. Test results end up as evidence used to arrive at correct or at least likely to be correct conclusions. Our data structure now looks like this:

At the end of his article Fowler concludes:

Most of the time, of course, we don't use complicated schemes like this. We mostly program in a world that we assume is consistent.

Here I would like to object a bit. What's so complicated about this, really? It should be quite straightforward to model this on a whiteboard. But how to put this information into a database? When it comes to data storage we tend to think in terms of tables, as most DBMS:s are table-based. However there are alternatives, like graph databases. Actually, the screen shots are from the Neo4j graph database. I made them using Neoclipse, a Neo4j tool where I'm the main contributor at the moment. This is the full interface focusing the Albion Hospital node (using the trunk version of Neoclipse):

Let's get back to the more philosophical aspects for a moment. Why is it so hard for us to think in a terms of a flexible graph structure, why do we want all data structured in square tables?! I think part of the problem is that we want behavior to be tied to classes of objects in a static way. That's a nice and simple model, but the question is how well it reflects the real world our applications try to mirror. Jim Coplien and others are developing some interesting thoughts on how roles that encapsulate behavior could be related to objects in a different manner. Read about the DCI architecture in the Lean Architecture book (draft version; pdf)!

4 comments:

Therese said...

Jim Coplien talked about DCI architecture at JAOO Aarhus 2008 - the presentation is online:
http://blog.jaoo.dk/2009/03/04/handling-architecture-in-the-agile-world/

Anonymous said...

Very intresting Anders! I´m looking forward for a new demonstration of Neo4j someday.

Regards
Per

Moandji Ezana said...

I'd read Fowler's post, but hadn't thought about it in the context of Neo4J.

It feels more like decision-making than data-manipulation, so it would probably be cleaner to embed those decisions right at the node and relationship levels, rather than marshalling and unmarshalling them from dumb square tables to smarter objects, as we usually do.

Thanks for the example, it allowed me to articulate something I'd vaguely felt about Neo4J, without exactly knowing what it was.

Michael Hunger said...

The "others" is also qi4j (http://qi4j.org) a java based composite oriented programming framework that even has a neo4j based entity store for persisting its entities.

Michael

Post a Comment