Was first in line for David Marco's and Michael Jennings' new book on metadata and have now spent some time with it. Let me say immediately that this book is an essential for the modern metadata professional, and in my world this also includes those of you doing configuration management or IT portfolio management. All in all, there is much of value in the work; careful thought and analysis is evident and the material makes solid efforts in grappling with modern challenges such as Web services.
I want to emphasize that my perspective is from the overall IT portfolio and service management side. I'm not going to discuss the data metadata problem in depth: databases, data models, data dictionaries, etc, as I consider this area essentially a solved problem -- more below.
From the IT service management (ITSM)perspective, the book strongly reinforces my core argument that metadata and ITSM are converging. Substantial coverage is given to the IT portfolio side of metadata, including hardware and software management. Metadata has never been just "data about data," as this book makes quite clear.
Some of the most interesting material in the book is the material relating to industry-leading metadata organizations at Allstate and RBC Financial. I recommend this volume to the ITSM community for these in particular. Configuration management (the ITSM version of metadata management) has impact analysis as a key goal, and the ITSM reader can begin to understand just how pivotal a metadata practice is to many types of impact analysis by reading the case studies.
But the book stops short of fully embracing the real challenge of IT service management. While Marco and Jennings insist that a metadata repository is not merely a "data warehouse for metadata," nowhere does the book address the processes necessary to keep the IT portfolio data up to date. A database consolidating data from operational sources for correlation and analysis is a data mart or warehouse, period, and that is the extent of the vision here.
Otherwise, they would have to be discussing IT processes like configuration management, change management, release management, and the other ITSM/ITIL process areas, as well as the software development lifecycle and data modeling/management. This book does not address these areas; it is mostly a review of the complex data structures necessary to contain metadata, with some attention to the technical considerations of running a production data mart.
(However, there is a good discussion of data stewardship from a process perspective.)
The relational data structures themselves are interesting as a basis for design, but have limitations. First, as I argue here, metadata has deep inheritance, recursion, and frequent many to manys. The object-oriented OMG metamodels (which are mentioned in passing) have much richer semantics, semantics that have proven useful repeatedly in practice.
Second, there are any number of critiques one can raise regarding Marco and Jennings' specific choices in their IT portfolio management metamodel. For example, their System entity is not recursively decomposable (it can't contain subsystems). The hardware units cannot be connected many to many (e.g. a SAN array supporting multiple servers), nor are logical execution environments (OS instances) distinguished from physical hardware; thus, the metamodel supports neither clusters nor multiple virtual OS images, both important hardware requirements. (See here for a detailed discussion of the UML 2 approach to these questions, which I find concise and elegant.)
IP Address is seen as an attribute of hardware, which is incorrect; MAC address or equivalent is the hardware attribute, and the IP address is bound to the hardware address via the OS network stack. (The authors should have turned to the DMTF metamodels rather than the OMG's at this level of the stack.)
The project hierarchy is fixed at three levels (project, phase, and task); modern project management tools support any number of levels in work breakdown structures. My ITIL expert friend did not like SLA being tied directly to system; he feels it should be tied to a logical Service entity in turn composed of Systems and/or Processes. And so on. (BTW, all of the preceding examples are real requirements I have encountered and in many cases implemented.)
In sum, the models are thought provoking but will need extensive customization for any given shop. And that's fine; there is a need for pattern literature of this nature to bootstrap the practitioner. Even OMG metamodels frequently fall short.
However, the key question: how is all of this documentation to be kept up to date? Scanned in from what sources? Show me the tools that effectively capture software to server deployments, or EAI semantics. If they exist at all, they are either coming out of the UML space, or from management frameworks/discovery tools. If UML, why should I convert from UML back to relational? If from a management framework or discovery tool (e.g. HP Openview or Altiris), these tools increasingly have their own repositories, and again, there is the mapping problem. (I think it is only a matter of time before HP, CA or IBM position their management frameworks as also handling metadata; HP Openview integrates with HP ServiceDesk, which is a Configuration Management Database, the ITSM version of a metadata repository.)
The broader issue of standards also remains. These are highly compex data structures, and mapping/populating them is nontrivial. On the other hand, they represent entities that are common to all large IT organizations. There is a clear business case for standards here, which is why we have the OMG, DMTF, and DCML all playing in this space. (While DCML is a recent effort, the book's omission of the DMTF work is a notable oversight.) A standard serialization format (such as the OMG's XML Metadata Interchange) would be ideal, so that the metadata sources discussed might directly support integrated repositories, rather than point to point transformations. (And if you think business ETL is hard, wait till you see metadata ETL!)
Smaller organizations may see no point in the overhead of supporting such standards (OMG-compliant repositories for example are still uncommon, although several vendors have emerged). But for larger, more hetergeneous shops standards are essential. The complexities are just too much to manage with purely custom builds. The OMG standard metamodels in particular are done by the best information theorists practicing today; they are the products of collective debate among some of the most senior engineers at the most advanced technology organizations. This book is overlapping directly in many places with OMG work (the Common Warehouse Metamodel) that is deeper and more rigorous, a duplication of effort that I find unfortunate.
However, the book makes the reasonable argument that the OMG work is inaccessible to many shops because it is purely in UML, requiring an object-relational mapping capability in order to implement. The structures in Jennings and Marco can be directly implemented in any RDBMS, which is a powerful advantage. The book's authors do state that "Where the design areas of this book and CWM overlap, great effort was taken to make them structurally compatible..." and in fact, one does see marked similarities.
The Business Transactions model is an interesting and reasonable representation of much of the EAI space, down to detailed concepts such as Message Queue. However, the maintenance question again comes to the fore. Such metadata is increasingly stored in BPM/BPA tools, using standards such as UML or BPEL. (Either that, or it's in Visio diagrams that are semantically imprecise and cannot be scanned!) Organizations seeking to extract metadata from a BPEL repository into this non-standard metamodel will be faced with some very challenging issues of mapping and integration. Why not just use the BPEL metamodel, or its definitive UML mapping, directly?
Finally, back on the subject of IT Service Management/ITIL: the concepts of Configuration Item (CI) and CMDB continue to loom large over this debate. In data terms, a CI is a master supertype, used for operational purposes of change management. Anything under change control is a CI, and all CIs have certain attributes such as type, name, and description, and can be tied to enterprise change processes (e.g. through a Change Ticket entity). I have extensively critiqued the use of the CI concept as the only metadata structure, as important entity/relationship semantics become lost.
However, there is a business case for a CI supertype, as it provides a convenient management framework. First, it allows one to align diverse IT asset types with common change and configuration management processes. Second, the CI supertype is also a convenient object upon which to impose general classification taxonomies. In turn, the existence of such an abstracted supertype starts to drive us towards notations and metamodeling approaches more inheritance-friendly, such as UML. The alternative is "pure CI" tools such as the ITIL suites, which often have weak or nonexistent metamodels. Marco and Jennings do not address the concept of Configuration Item in any way, but until metadata theorists do, there will continue to be a gap between metadata and ITSM, and the vendors and advocates of weakly-typed CMDBs will continue to gain momentum, to the detriment of the more precise metadata approach.
For further reading on CMDB vs. metadata repository, see:
The rise and resurrection of enterprise metadata: repository as CMDB
Repository proliferation: time for a freeze
We shouldn't need Configuration Management black belts
A CMDB rant
For further information on integrating metadata and IT portfolio management, see the ITIL volume on Application Management (reviewed here).
For an extended debate on metadata repository vs. CMDB, see here.
Many regards,
Charlie
