This is the second installment of my "fundamentals of integration metadata" series.
Entire series:
Part 1: Delving into the concept of IT "system"
Part 2: Tracing the integration spiderweb
Part 3: Software deployment
Part 4: An integration metamodel
Now that we have discussed the basic concepts around "system" and looked at some industry standard data structures for representing systems/applications, let's turn to our first look at how to document their interconnections.
We know that systems talk to each other, typically to exchange data. Informally, the following kind of diagram often is used in whiteboard sessions:
Our last discussion of system focused on its decomposition aspects, and the need for systems to own their components. We now move into the question of non-ownership system dependencies - the interconnections between systems. In order to understand the consequences of this, a little foundational work is needed.
Graph theory: the foundation
Metadata, or IT configuration management data (I see them as synonomous) presents unique problems compared to the data that IT manages on behalf of its partners. Financial, logistics, and HR data has deep roots in paper-based history; a purchase order or hiring authorization message can be traced directly back to its roots in the forms once routed by interoffice mail to in baskets throughout pre-electronic corporations.
When one looks at a sales journal, or a stack of purchase orders, one generally sees consistency: the data model is the same for all the information.
The data also has limited interconnections. A purchase order may reference common employee lookup tables and product tables, resulting in data models that are relatively straightforward to understand:

With metadata, everything gets much more complex. Data metadata is the most tractable; tables (or entities) have columns (or attributes) and therefore building simple data dictionaries is straightforward.
But when one moves beyond this into technical metadata (i.e. configuration management) the data starts to take on new characteristics. In mathematical terms, it becomes graph-based; that is, it looks like this:
This kind of data presents well-known problems in storage, querying, and presentation, as it requires "any to any" data models and can rapidly become complex to the point of incomprehensibility (see this Google search for interesting references).
This kind of data is not typically encountered in business-centric systems that are the successors to forms-based paper processes. It is the kind of data stored by configuration management databases and metadata repositories when they move into managing technical metadata such as interconnections between network devices, integration flows, and so forth.
Application integration
When one says that a system (or a CI) may be connected to any other system, one is calling for a data model that essentially looks like this:
These structures are a standing joke among seasoned data modelers, as one can "solve" any data modeling problem with them in theory. One has little choice in using something like them to represent system interlinkages - but careless use can be misleading. Consider this example again:
The casual viewer may well assume that the connections between A-B and B-C have something to do with each other; i.e. the diagram is saying that the same data is flowing from A to C (it is transitive). But nothing of the sort may be true, which is why large graphs of systems can be so misleading. Systems A and B may be exchanging personnel data, while systems B and C are exchanging financial data, and the two exchanges may have no dependency or relationship to each other at all.
The following information model may help to clarify things:
This model gives us much more richness. First, it reifies "system relationship" as a first-class entity, which allows us to associate attributes and other entities - there are many different kinds of relationships, which may for example be implemented in turn by infrastructure and supported by parties.
This model in particular allows us to capture the precise data semantics flowing across the integration, and would be suitable for example for a message queuing architecture. When the relationship is tied to a data structure, it becomes possible to constrain a complex graph to interesting subsets of systems that are exchanging the same data topic. Logically a query would read something like,
Find all systems
joined by relationships
where the relationships are tied to data structure A.
(Note that relationships involving data structures are only one kind of relationship; the "System Relationship entity would be subtyped in a more robust model.)
The need for rich information models like the above continues to present a challenge for our CMDB vendors and their simplistic information models. Until CMDBs converge with metadata repository technology we will not have the solutions we need.
Next: Fundamentals of integration metadata III: software deployment
Note: This is the first installment in my Fundamentals of Integration Metadata series. It is a foundation piece, delving into the concept of system which is a key structuring mechanism in understanding the IS environment.
Entire series:
Part 1: Delving into the concept of IT "system"
Part 2: Tracing the integration spiderweb
Part 3: Software deployment
Part 4: An integration metamodel
