Paul Kiel's Data Integration Blog
Data integration using Xml / Xslt and anything else...





  Thursday, October 11, 2007


Came across this very interesting article on the Economist.com about Capturing Talent in China. In some ways it isn't news, given the global need to capture and retain talent. But it says something that flies against the outsourcing trend.

For example, this seems to say that the IT jobs and HR back office functions in the west will not all be outsourced. The article says that in China "[e]ven a junior software-engineer can expect to take home $45,000 a year." Compare this with the west. According to Salary.com, a Programmer I position in the US market would have a national wage average of about $53.000. (In fact it has a bell curve putting the range between the 25th percentile and the 75th percentile in the salary range of $44,000 to $63,000.) Now of course in the IT industry, many point to India instead. This article points to Indian salaries being 75% of US ones.

What is the benefit of outsourcing technical functions to such markets?

As the global economy becomes more integrated, there is a leveling effect on wages.

So from a data integration standpoint, outsourcing will be a major impact on connecting endpoints. But data integration will not become synonymous with outsourcing.


9:23:06 AM    comment []

  Tuesday, September 25, 2007


The reality of outsourcing has really hit home.  Some first generation of outsourcing has been an expensive learning curve.  Forrester - about year ago - claimed profitability in HR outsourcing was elusive.  Now over at Spending Matters, they ask if the outsourcing rate is beginning to slow now that the reality is clearer. 

In terms of data integration, outsourcing presents the challenge of connected more and more companies together.  The need for open standards such as OAGIS and HR-XML increases as more data is flowing between more endpoints.  If outsourcing is going to continue, we need to get better at data integration so the costs of connection does not become a drag on the bottom line.





9:04:23 AM    comment []

Came across this funny video of "Greg the Architect".  It's from the folks at Tibco and illustrates the perspective of a software architect.

This episode: "SOA this.  SOA that."

Greg the Architect


8:53:42 AM    comment []

  Thursday, September 13, 2007


The ability to create a subset schema is a perenial request I get from clients in the data integration space. Using standards developed within their industry consortia, they need to restrict the base schemas into ones that reflect the needs of their particular business scenarios. A subset schema (or "lite BOD" in the OAGIS lingo) is a schema that adds addition restrictions onto an existing schema. For example, removing optional elements that are not implemented can be done by deleting them from the schema. The resulting schema (called a subset) is still conformant to the original data model. And an xml instance of that subset schema will validate against the original data model schema.

Restrictions can include:

  • removal of optional elements
  • making optional elements required
  • removing enumerated values from lists
  • making a choice of elements into a sequence of one of the choices

It cannot include (because these are additions or extensions and not restrictions):

  • making required elements optional
  • adding additional enumeration values to a list

A tool called GEFEG can do this function. I wish there were others too. I've often wanted to create one myself using xforms and Xslt. But have never had enough time or someone committed help me do it. Anyone care to lend a hand and help me do this?

Data integration would really be enabled if we had this subsetting issue resolved.


8:25:55 AM    comment []

  Monday, September 10, 2007


Interesting thread on the XML-DEV list about Xml as a long term archival storage medium. Years ago I was trained as an archivist and so this topic was certainly of interest. The thread begins with a query as to what happens in 2027 with respect to Xml data. In this day, Xml is said to have disappeared from the scene (imagine the movie promo guy saying "...in a world where Xml has vanished...").
All that is needed to read and manage this Xml data is the ability to understand ascii or unicode. And in fact, that is the case.

I did some coding for the state of north carolina archives and records division prototyping and conversion of governor's records into Xml as a long term storage medium. So in archival terms (meaning 2127 instead of 2027) all that would be needed to read this material is a program that understands unicode.

Of course there is a physical issue too. One needs to be able to read the disk/harddrive/tape or other physical format in order to read the data as well.


4:28:46 PM    comment []

  Sunday, September 09, 2007


Putting in a plug for the Global Partnering & Integration Summit at the end of the month. I'm attending and am interested to see this event. HR-XML puts on good meetings, so I have expectations that this one will be good as well. There is a mix of speakers I've heard and those I have not. Its a great line up. Check out the schedule.
Having the meeting in Vegas (and hopefully it will not ALL stay in Vegas) puts this in an expensive venue. I've paid less for a hotel in midtown Manhattan.
Nevertheless - please come and let's chat about data integration !


9:48:51 AM    comment []

  Thursday, September 06, 2007


I got the APC_INDEX_MISMATCH error again, which is the equivalent of the blue screen of death on my windows vista laptop. Of course googling found lots of information, but little of it helpful.

Here is an unhelpful message on msdn.

Some folks point to CD/DVD drive problems or RAM issues.
  • I checked the drivers on my drives and found no problem (I don't even use my DVD drive so it isn't related to its frequent use)
  • Did a memory check using the Memory Diagnostic Tool (see link below in solutions page) and found no error.
  • an old fashioned virus scan showed nothing (even though I know it isn't a virus - but hey that is always the first response for a problem, right?)
  • Checked the hotfixes to see if they installed correctly and there was no problem there.
  • Even used the Windows Driver Verifier utility, a first for me
Here is a great page for possible solutions.

The laptop each time reboots without error and there is a long gap in between times this happens. So its working fine now (although I have to hold down the power button to get it to reboot).

Gotta admit that XP Service Pack 2 was more stable than Vista. ;-(




10:11:40 AM    comment []

  Friday, August 24, 2007


I've come across a few clients now with some internal debate as to whether SOA is a good idea. Sometimes of course it isn't. But when part of an organization buys into the value proposition and another doesn't, there can be serious conflict. A couple clients I've had have been in the throws of this debate but the interesting part was that the management and business analysts had bought into idea, but the IT department was dead set against it.

As it turns out, I've noticed 2 common reasons why IT departments resist adopting SOA architecture. The first reason is, they have deadlines. This is where the IT department understands the value proposition and may even be enthusiastic about its possibilities, but they are either understaffed or overcommitted to current, immediate concerns. In short, its hard to think about a better house when the one you have is on fire. The only progress that can be made here is baby steps at best.

The second reason I've found that IT departments resist SOA is that they see it as "just another rearchitecting". These folks are long timers and have heard many pitches about new technologies that will make their lives better. They are experienced and jaded enough to view SOA as just another one of these fads. What is even more interesting is that sometimes they are using SOA-like thinking but are just not calling it SOA. For example, focusing on discrete services they offer to clients. Enforcing good best practices (governance) of those service architectures. Using standardized data models. All of these are good SOA features. Because SOA did not invent these features, some people use them but don't call it SOA.

In my book, I often avoid the term SOA when talking to clients, because it can bring up assumptions or opinions on the term rather than get to what it means in terms of practices. So I don't care if you call it SOA, if it works for your use case, just use it.


9:42:00 AM    comment []

  Saturday, August 18, 2007


Been reading alot recently about HR-XML Resumes and the hResume microformat especially on the HR-XML blog (see references). I was interested in what the differences between them were and what use cases might be most beneficial to each. As it turns out, the differences were minimal in terms of semantics, at least for the most often used resume fields of data.

I then got a wild hair and created an XSLT to transform an HR-XML Resume into an html document with the hResume microformat embedded in it. (The best way to learn something is to use it for real, right?) It really wasn't hard at all. First, I went to the http://hresume.weblogswork.com/hresumecreator/ tool to create a resume for myself using hResume. I used it as a model. I made no attempt to "pretty it up" as that was not the point. I took the formatting that they used without change. Then I took an HR-XML resume of myself and began to do the transform.

Here is the result of the transform.

References:
Tapping into Competencies with hResume and a “Wikipedia of Skills”
http://www.hr-xml.org/blog/?p=161

hResumes and HR-XML’s Resume
http://www.hr-xml.org/blog/?p=158

hResume Microformat
http://microformats.org/wiki/hresume

Roger Costello on hResumes and “Unanticipated Mashups”
http://www.hr-xml.org/blog/?p=156


3:13:59 PM    comment []

  Monday, August 06, 2007


A brief item in the "anything else" category. I am proud to be driving a 1989 honda civic with over 155k miles. It has been a reliable car and I still get folks asking me if I want to sell it. But I won't!
1) It has had very few problems
2) It still gets over 30 mpg in city driving
3) I haven't had a car payment since 1993

Now that I've sold you on the value of my little hobby, I'll share with you another bonus on this beauty. I recently had a starter problem and the car would not crank. Ordinarily this would put folks in a spot - especially if caught away from home. But because it is a manual transmission, I can start the car magically by "popping the clutch". This is the process of allowing the car to get some speed via a hill and gravity. Then putting the car suddenly into second gear will make the car run.

So here is a link to the procedure (they call it "push start" but the traditional term is pop the clutch):

http://www.ehow.com/how_7414_push-start-car.html

After popping the clutch, I continued on my way. A brief stop a few days later at the repair shop resulted in a $40 repair. How many $40 repair jobs exist anymore? I'll keep my little honda!


10:38:06 AM    comment []

  Friday, July 06, 2007


Last fall I wrote an article published on Xml.com with my research into profiles of Xml Schema. (URL here - xml.com/pub/a/2006/09/20/profiling-xml-schema.)

The main point of the article was analyzing what groups of folks are doing in terms of creating profiles of Xml Schema that may reflect best practices as implemented.

The article was well received, and I got a couple of client contracts as a direct result. Now the article has apparently been translated into Russian. Here is the url - citforum.ru/internet/xml/profiling/.




12:26:53 PM    comment []

  Tuesday, June 12, 2007


A lot of folks talk about the centrality of the data model. They correctly emphasize that the actual technology you use is less important as the quality of your data model. If your data model does not adequately address the business problem you have, then no amount of great technology can save you from troubles.

In a sense, a data model is like an acting script and technology like an actor. A great actor can bring a great script to life. A bad actor can do equally well at destroying it. But no kind of actor can turn a bad script into an academy award winner. It is the script - the screen play - the story - the data model that has the potential to make a happy ending. Enough on that analogy.

So if I want to create the world's best data model, which should I choose as the authoritative data model source? I've been in my share of discussions about which format is better, and it sometimes resembles operating system disputes in terms of intensity of opinion.

Suppose we have a UML class diagram, an Xml Schema, a C# class, a Java class, an XMI file, and a database file of a certain business object. Which of these would contain "THE" authoritative data model? What is the difference between them? Is one better than the others? Is one more Service Oriented that another?

Look at the objects listed below. (UML class diagram, Xml Schema, java class, c# class, XMI file, and database - some edited for simplicity). Which is the authoritative source?

As it turns out the data models are the same. And not only are they the same, but they were created from the same source!

The data modeler may say the UML is the definitive source. The data integration expert may say Xml goes on the wire, so the Schema is the definitive source. The programmer may say the code does the work, so it is the definitive source. And the DBA may say that in the end it all ends up in my database, so it is the definitive source.

The true authoritative source depends on audience, usability, and needs.

First, the audience (the customer) is always right and what they want to see will usually rise to the top of the food chain. So identifying the primary or secondary audience(s) is necessary when choosing an authoritative data format. If your audience is proficient in UML and you have lots of Rational licenses, then UML may be the best fit. If however, your audience doesn't know a thing about UML and just needs to get data from point A to point B, then perhaps Xml Schema is the best format. And so on.

Second, the usability of the model will separate the "good idea" formats from the most practical ones. This is a round about way of saying tools matter. The audiences need to have good tool support for the data formats you are aiming for. If the tools are ubiquitous and intuitive, then you have a good candidate for an authoritative data format source.

Third, and not intended to be last, the needs assessment is key. If your need to is primarily to communicate an idea or a design to a wide organization, then the visual nature of UML may be a good fit. If the primary need is to pass Xml on the wire that only techies will see and support, then Xml Schema may be the choice. If the data is internal (within the firewall), the systems are homogeneous, and simple objects need to be passed, then code is a good option. Look at what the needs of your data format are ultimately going to be and let this affect the data modeling technology used.

Beauty is in the eye of the beholder. Whichever one you are most comfortable working with is the best model of the lot. If you like UML, then go with it. If Xml Schema, then use it.

The key again is not to get overly concerned about which form the data model is in, but to create the best data model with that technology.

=====
=====




12:34:45 PM    comment []

  Thursday, May 17, 2007


I had an interesting problem to solve recently regarding Xml Schemas, XSLT and namespaces. The task was to use XSLT to create Xml Schema dynamically. This was something I had done before and figured I would pull upon previous work to give me a head start. The interesting part was the requirements around creating namespaces. Dynamically created namespaces can be tricky enough, especially if you are not yet using XSLT 2.0 as I was. The added feature was that I needed to use pre-defined namespace prefixes on those dynamically generated namespaces.

In short, I was tasked with generating namespaces in a resulting Xml Schema, based on data dynamically created at processing time, and with namespace prefixes predefined.

The tempation is to try something like this:

<xsl:attribute name="xmlns"><xsl:value-of select="$xmlns"/>xsl:attribute>

But that doesn't work. The solution is to create a dummy attribute and copy it to the result via the namespace axis and local-name.

In the scenario here, I want to create 3 namespaces, one for the default and target, a second for "common" components that are to be imported, and a third for "custom" components that are used for extensions.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://xmlns.myexample.com/v3" targetNamespace="http://xmlns.myexample.com/v3" xmlns:common="http://xmlns.myexample.com/common/v3" xmlns:custom="http://xmlns.myexample.com/custom/v3"

elementFormDefault ="qualified"
attributeFormDefault="unqualified">
<xsd:import
schemaLocation="common.xsd"/>
<xsd:import
schemaLocation="custom.xsd"/>
xsd:schema>

1:01:37 PM    comment []

  Sunday, May 06, 2007


I'll preface this by saying that I love the new features in XSLT 2.0. The very first features that I used were ones that took care of some problems around grouping of nodes and finding unique ones. The xsl:for-each-group is a life saver. Previously, I'd used tool extensions such as Saxon's saxon:distinct() function to solve the problem. However, depending on the client, I may not have had the choice to use 2.0 or extensions. So I was stuck with trying to find out how to group nodes or find unique nodes using XLST 1.0 features and without proprietary extensions.

Some solutions have been documented, most notably Jeni Tennison who describes the Muenchian Method to grouping nodes. And this is probably the best one.

The problem is that the Muenchian Method requires the use of xsl keys. Some processors (and Jeni notes James Clark's XT is one) don't implement keys. I am a big fan of XT for certain applications and continue to use it on occasion. So I wanted to highligh how I've done grouping, iteration, and uniqueness without keys or extensions and using XSLT 1.0 techniques.

Assumptions:

  • I assume the same scenario as in Jeni's example. Records in a database that contain contacts that need to be sorted. Surnames need to be displayed only once (uniqueness) and matching all forenames (iteration).
  • I am using XSLT 1.0 without any extensions
  • I cannot use keys.
  • I need to iterate through a unique list of nodes (there are likely duplicate values within nodes)
  • There are an unknown number of nodes to iterate through.

While this example is very simple, and could be done with the preceeding or following axes, a complex xml may cause the XPath to get too confusing to track. This method keeps the XPath simple.

The whole story can be found here.


4:55:07 PM    comment []

  Sunday, February 18, 2007


For the last few months, I've been delving deep into the world of ebXML Core Components Technical Specification (CCTS) and the Xml Naming and Design Rules (NDR). It is a world that is becoming more and more the basis of standards development frameworks. Organizations such as OAGi have been at the forefront of its development. CIDX, HR-XML, ACORD, AIAG, UBL, and others have been involved in, are working on, or are at least examining core components as the basis for interoperability across domain standards.

As with any major development, its complexity is its greatest asset and its greatest weakness. On the latter, I was often one who scratched his head trying to understand the acronym soup that is CCTS. I frequently felt that while the theory was good, the gap in education and awareness was the biggest hindrance towards its adoption. Now that I have been deep into the specification, I feel much more enlightened. And I feel like there are in the pipeline some really interesting implementations that will help usher the CCTS into a new phase as the basis for interoperability.

The very real problem with interop across domains is that either one side has to give up its well-developed and well-designed components in favor of another OR both sides need to split the difference and agree to a common data model that neither had in full. For industries that have an established installation base, the cost of doing this is significant. So one first needs to develop the business need for making change. Then, once the case is made on the business level, then the work of finding a common framework begins. Certainly the convergence of groups such as the ones I've listed makes things easier and it seems fairly clear where things are headed. This was not the case until relatively recently I should add. Case in point is this cross industry core components workshop held this past summer.

So watch out for the emerging support for CCTS in the future. I'd love to spend some time and work on its usability. Doing some educational pieces or develop some training modules for this to make it easier to comprehend and get started. By comparison, I look back at the great value of the educational efforts of Roger Costello in advancing the complex spec that is Xml Schema. Would love to work on the same for core components.


5:50:43 PM    comment []

  Thursday, December 21, 2006


Another great story from Joe McKendrick on the issue of auto generated services:

Do generated services invalidate SOA or reuse?

I say certainly not. Auto-generating services does not invalidate reuse or SOA. The secret sauce and key IP for businesses is the data/business models. If you can get to the point where the models drive everything (generate services, slice, dice, etc) then you can still reuse because the models reflect this. The reuse can lay in the models and not just in the instantiation of the services.

Now while I use UML regularly with my clients, I am often critical if its ability to "generate everything". Some refer to this approach as Model Drive Architecture as proposed by OMG. The theory is great, but the reality is hard. In addition, there are more ways to represent a model. At HR-XML, we used Xml Schema to represent the models and it was quite successful. I was in fact able to do some substatial generation of code based on these, such as for a validation web service used in their certification effort. The reuse was reflected in the models. This of course is not to say I recommend using Xml Schema for data models instead of UML. I am simply saying that there is more than one way to do it, each with its own strengths and weaknesses. (And in fact, I'm working with a client who wants to use RDF to do this same function.)

Now the idea of models driving everything is in the ideal. Very few have such great models that can drive everything regardless of what format they are represented in. So often it is the case that the instantiation of the services is where the reuse comes into play. And this is perfectly fine as well. Do what works for you. model driven architecture (lower case m, d, and a: meaning with or without UML) is one of the holy grails we are all looking toward.







8:07:44 AM    comment []

  Tuesday, November 21, 2006


I'm a bit late on this, but there are some real gems here. I found out about it from this posting. The SOA in Action virtual conference has some really good information in it. I've downloaded the powerpoints and found them very interesting.
12:03:31 PM    comment []

  Friday, November 17, 2006


Definition:

"dot oh" = version controlled terminology used as marketing speak to describe a concept

The concept may be a new one or it may simply be new to the marketing department. Most importantly, advocates want readers to believe it is new. The underlying concepts can be very meaningful, especially when one tries to compare trends (as in SOA versus Web 2.0). I just cringe at the shrink wrapped packaging of fuzzily defined terms. So let's talk about the meat on the bones. Here is the run down.

SOA 1.0 - client/server service oriented architecture. This is what we know today as SOA including soap/web services.

SOA 2.0 - event driven service oriented architecture. Defined as "a software architecture that defines how systems can be engineered and designed to sense and respond to events."

blogs.zdnet.com/service-oriented/

infoworld.com/article/06/05/17/

javaworld

Web 2.0 - Collaborative, fast development of applications. RSS/Atom, mashups, wikis and the like. Often uses existing technology in a new way to add value.

Web 3.0 - Semantic web. RDF and related semantically based applications.

http://blogs.zdnet.com/service-oriented/?p=753

http://www.hr-xml.org/blog/?p=98

I think these terms really have meaning when it comes to comparing trends. Comparing SOA (1.0) versus Web 2.0 is an interesting one. The idea behind these indicates two different views of how applications are to be developed. Take data integration (something I know a bit about). The traditional route of thinking is to create a data model of a business document, express it in Xml Schema, and ship it to my trading partner via services as SOAP (or message que or ...). This reflects a certain architecture and modeling that is not trivial. In a web 2.0 world, you could have a data integration in simpler terms. Use Xml and share it with trading partners via simple https or even syndicate it via RSS. The former is basically architected and the latter is mashed together. This reflects an interesting debate about how to develop apps and integrate systems. Each has their own strengths and I think will both have their niches, so I think it is a false choice.

Comparing Web 2.0 with Web 3.0 is also an interesting one. The web2.0 is intended to be quick, easy, leveraging existing apps, and collaborative in nature. The idea of a semantically based web (3.0 here) is basically the opposite of this. Many will disagree with me here, but creating a semantic web is very difficult and involved alot of analysis which is the antithesis of the quick mashup. Entire companies do nothing but mine business data in a semantic web based fashion. So here again each of these concept has their own niche.

As a side bar, I am an old librarian/archivist and so am sympathetic to semantic web technologies. I looked at RDF a long time ago and have always wanted to justify using it in my work, but never had the opportunity. I even had a thought to use it at HR-XML. But I always kept getting caught between what was the ideal thing to do and what was the reality of current demands.


9:13:11 AM    comment []

  Tuesday, November 07, 2006


Once again Joe McKendrick is spot on. The jist of this posting is that we can get past the buzzword effect and solve real world business problems. Whether we call it proper SOA or not doesn't matter.

http://blogs.zdnet.com/service-oriented/?p=746


8:44:47 AM    comment []

  Sunday, November 05, 2006


After all the xml and xml schema tools I have tried over the years, I often come back to wanting to test each one to see what Xml Schema features it supports and which it doesn't. In order to test these, I need a test suite of features in the form of a sample xml schema and xml instance. I have tons of schemas accumulated from many different domains and sources. However, I wanted just a simple, single schema that packed in one of all the schema features that could fit in it. Searching the web did produce some test suites, such as the Test Collection and the Test Suite from the W3C. These are good, but they are in many files and are a bit much for my simple tool test use case. I just wanted a real simple, single example file.

So here is an early version of this simple Xml Schema test file. I have a more detailed version of this with even more schema features as well as an xml instance conforming to it.

A few other links:

UPDATE:

You can find the simple XSD here.

You can find the simple XML Instance here.

 


5:37:21 PM    comment []

  Friday, October 20, 2006


Some recent debate on SOA has come across the blogosphere with 2 theories:

1) SOA is a technical solution (not a business one) and the term has been hijacked by marketing hype.

2) SOA is nothing different than what has been done with the web from the beginning.

Some links:

blogs.zdnet.com/service-oriented/

redmonk.com/

Each of these perspectives have their underlying truths. I think some folks may be thinking too hard here. The fundamental problem for IT always starts with a business problem. (or always should start with one.) So in that sense I disagree with the first theory. If my mousetrap is working fine, why would I want to risk anything by changing it? Even if the new spiffy SOA architecture will save the world? A business problem at hand is the beginning. When the business folks say "we have a problem that needs fixing" or "we have an idea for a new service for our customers" then we get to IT. The technologists can architect the best way to solve the business problem using technology.

Now perhaps SOA has suffered from hype and been over marketed as a solution for everything. This part of the first theory I agree with. But it doesn't negate the value underneath it. It simply reflects the fact that SOA is a very loosely defined term. It can mean many things in many contexts. In the end we start with business problems, and apply architecture to best solve them. If we do this in a loosely coupled manner, we get SOA. Sound familiar?

Technology is the means to a business ends and not an end in itself. Once a problem exists, then there is an opportunity perhaps not only to solve an immediate problem, but to plan for the future. Service orientation can occur during these architecting spaces in the development cycle.

Regarding the second theory: yes, the web itself is a great service oriented solution. eBay, Amazon, etc all provide services in a decentralized and decoupled way. SOA brings this same thinking inside an enterprise. I recently spent some time working on some data integration issues for a large company that had many acquisitions. With all the growth and subsidiaries, I don't know how they would even try to offer stovepipe solutions. Decoupled and service oriented is the only way possible to get anything done in the short term. They may or may not call it SOA, but it fits the bill in my book.


1:37:03 PM    comment []

  Thursday, September 21, 2006


In my original posting, I did some looking around at publically available listings of support for Xml Schema features among code generation tools. In my research as a follow up, I realized that if I did a bit more work that it would make for a better article than a simple blog entry. Xml.com bought it and here is the full article:

xml.com/pub/a/2006/09/20/profiling-xml-schema


4:39:31 PM    comment []

  Tuesday, September 12, 2006


One of the key responsibilities I had at HR-XML was figuring out how our standard schemas would work within existing tools. As time went by, IDEs became more and more compliant to the spec. Then there was growing interest regarding code generators. Specifically, the idea is to generate stub code (or more) from the xml schema that would give you a head start on processing the xml that is valid against it. I'd done some testing and got some feedback on other tools. (An interesting list of data binding tools is here.)

I wanted to revisit this topic in the context of Xml Schema Profiling. Creating a profile of Xml Schema at one point was controversial, witness the former Xml Schema Profile work group activity at WS-I. This ended up being addressed in the W3C Xml Databinding activity now underway.

But we all know that things are rarely 100%. What I'd like to do is examine some of the documentation around what features of Xml Schema are routinely not supported in tools. Is there a sort of Xml Schema Profile consensus? Here I touch base with some of the more common ones:

Xsd.exe, Castor, XMLBeans, JAXB, CodeXS, XSDObjectGen, Systinet, webMethods, Dingo, and Xmlspy.

The whole story is found here.


10:54:41 AM    comment []

  Monday, September 11, 2006


Been checking out the new Xsd.exe tool that ships in VS.net 2005. While I only have the beta installed, it has already shown to be valuable. I've been checking this code generation tool to see if it has been improved. The last version I looked at had numerous problems and really was not worth the effort. While at HR-XML, we got so many complaints about this older version that we recommended folks find an alternative such as XsdObjectGen or CodeXS. Neither of these latter tools had the problems of the original Xsd.exe. I even had some long emails on schema support in tools with the CodeXS people. I found them to be smart folks.

However in the new version of the Xsd.exe tool has indeed improved. The classes actually compile (no small feat given earlier experience) and there is better support for Xml Schema structures. I am impressed with it thus far.

This is not to say that Xsd.exe supports the entire Xml Schema spec (see profile of serialization class). Few tools do. Profiling Xml Schema is a subject I've blogged about in the past and am working on an update. Stay tuned for that.


5:19:34 PM    comment []

  Friday, September 01, 2006


Been hearing some wranglings over W3C and Consortium standards processes (see links at bottom). While I have not worked in the W3C (not had a company willing to pay for participation), I have worked with a Consortium standards process, namly HR-XML. It seems there are several issues:

  1. Openness. This is a common complaint of folks outside the Consortium process. And there is validity to it. The problem is that one needs to pay the electric bills too. At HR-XML, we were a globally scoped organization with an extremely small full time staff. We kept barriers as low as possible (such as having individual memberships), but ultimately there are costs that need covering. I'd be more concerned about "how" the Consortium spends its money rather than whether pay-to-play is valid to begin with.
  2. Diminishing returns. A general pattern for acceptance of standards is shooting for a good "1.1". The first version, 1.0, is the first thing that many people take note of, except the few that worked on it. Getting implementation feedback on 1.0 makes a real good push for a quality 1.1 version. However after that, the law of diminishing returns comes in. The number of problems you solve with 2.0 is often less than the number you solve with the 1.1, and so on. The great base specs, XML, HTML, CSS, XSLT, etc solved many problems. But getting folks excited about working on CSS 5.0 or some such can be hard and not as compelling.
  3. Reinventing itself. Each year, Consortia need to reinvent themselves to justify to their members why they still need to exist. In essense, one has to constantly re-justify one's existance. So there is a tension between sticking to one's bread and butter and moving on to new ventures. Stray too far from your bread and butter and your membership withers. Don't venture into new territory and members wonder why they fork out fees. Here is a great article on associations .

    "Operating a professional or trade association has a number of sisyphusian characteristics. At each step of the organization's evolution, the core staff must deliver more novelty and interesting experiences in order to convey additional value. Each year, the cycle begins anew. The largest risk in the association business is that the enterprise will lose its relevance while navigating the hamster wheel of ongoing operations. The penalty for failure is mediocrity. The reward for success is rarely excellence. It's more like 'a little bit better than mediocrity'. It's a very tough grind."

  4. Decision by committee. This is used as a derogatory comment, but studies generally find that decisions made by a group are better than those done by individuals. It's the least bad mechanism.

  5. Democracy or marketshare based influence. Another complaint is that large multinationals disproportionately influence the results. The assumption is that there should be a true democracy. Well, what kind of standards would actually get implemented if a one-person consulting firm has the same say as IBM or Microsoft? Yes a true democracy is an ideal we hope to achieve, but there is a marketplace reality. These larger companies get disproportionate influence because the end result will fail without them. This is leverage. Lack of big multinational support does not mean standards can't succeed, but it would need to be really compelling to overcome this. I'm not saying it "should" be this way or that its fair in any way. It simply is a fact and needs to be shown in the light. A successful Consortium must balance an ideal of democracy with the reality of market share based influence.

It was Churchill that said Democracy is the worst form of government, except for all those other forms ...

Here are links to the controversy and responses to it:

blogs.msdn.com/xmlteam

25hoursaday.com/weblog

docuverse.com/blog/

http://meyerweb.com/eric/thoughts/2006/08/14/angry-indeed/


5:29:09 PM    comment []

  Wednesday, August 30, 2006


I frequently get requests for good sites for people to get up to speed on various Xml related technologies. In terms of usability, there are some stand outs. Because I learn best with examples (annotated ones are even better), the Xml Schema tutorial at www.xfront.com is amazing. Roger Costello developed it. He helps bring a complex topic into the light. This will take you from novice to expert if you consume all the information.

A good "first place" to go is the very simple but effective W3Schools. www.w3schools.com. It is for the newbie and doesn't pontificate on any topic. Straight forward and easy to consume.

And by far the best place I have found for ongoing reference is Zvon.org. www.zvon.org This site, while not exactly simple to use, contains a wealth of information and references on all kinds of Xml related technologies. Lots of examples, downloadable tutorials, and quick reference sheets. Xml, Xslt, XPath, Xml Namespaces, CSS, Schematron - you name it and it's there. Whenever I get stuck and can't remember a feature I haven't used in a while, I refer to this site.


4:42:45 PM    comment []

  Tuesday, August 29, 2006


There has been quite alot of debate around "Web 2.0 versus SOA". I've included a couple detailed links below. Personally I think these approaches serve different functions and are complimentary. Web 2.0's benefits "tend" to be outside the firewall. RSS, blogs, wikis, and mashups are great for connecting folks together collaboratively, quickly and simply. SOA, in the other hand, has benefits around how software and services are engineered. Yes, they result in connections outside the firewall, but they are big drivers in how software is developed to support services inside the firewall.

There is a great 2-part blog on this debate by Joe McKendrick (with lots of internal links to other material in this debate):

Web 2.0 or SOA? Web 2.0 and SOA? Let the Debate Begin! - Part 1

Web 2.0 or SOA? Web 2.0 and SOA? Let the Debate Begin! - Part 2


12:53:48 PM    comment []

  Sunday, August 27, 2006


I have often gotten questions regarding the use of default namespaces in Xslt. It is really an XPath issue, but it becomes an issue as folks progress from using Xml data documents (which handles default namespaces nicely) to doing transforms. They want to minimize the impact of namespaces, and so use the same logic in the Xslt as in their data documents. However logical the thinking is, a common gotcha is to try to use a default namespace in XPath statements in an Xslt template.

It is a real gotcha that trips up folks getting started with transforms. Instead of using a namspace prefix for every node of every XPath statement, they think that a default namespace could make their lives easier. It is a logical conclusion. Given this Xml snip:

<BackgroundSearchPackage xmlns="http://ns.hr-xml.org">

<Screenings>this is the screenings element contentScreenings>

<AdditionalItems>this is the additional items element contentAdditionalItems>

BackgroundSearchPackage>

Stylesheet writers think they can get the value of the child elements by using the XPath:

<xsl:value-of select="BackgroundSearchPackage/ Screenings/ AdditionalItems"/>

This of course does not work because it does not have a namespace reference. When they add in a namespace prefix, it looks like this:

<xsl:value-of select="hrxml:BackgroundSearchPackage/ hrxml:Screenings/ hrxml:AdditionalItems"/>

However this seems awkward and unnecessarily repetitive. (An alternative would be an even more awkward syntax using "local-name()".) The logical conclusion is to define a default namespace in the Xslt and that would solve the repetitive prefix in all XPaths. While quite logical, this in fact does not work.

Indeed, Xslt 1.0 writers need to get used to the prefixes in their XPaths, because they are necessary for hitting the node correctly. The 2.0 of Xslt addresses this common gotcha.


6:06:48 PM    comment []

  Saturday, August 26, 2006


This annotated version of the Xml spec is something that I refer back to occasionally. Sometimes there is nothing so good as to go back to the source. It answers questions like "why did they do that" and such.

http://www.xml.com/axml/testaxml.htm


9:44:49 PM    comment []

  Thursday, August 24, 2006


Each tool used for Xml related development has its own tricks, shortcuts, and idiosyncracies. One such tool is XmlSpy©. Anyone familiar with this tool in the context of Xml Schema creation knows of the three views it offers for editing data. The TextView offers the raw Xml, tags and all. The GridView shows the schema in a table-like display. This view's advantage is that it can fit a larger document into the editable screen. It takes advantage of Xml's well-formedness and shows it in nested boxes with a drop-down-like clickable format. Finally, the Xml Schema view offers the WYSIWYG graphical view that enables the visual editing of schemas in a friendly display.

I say the three personalities of XmlSpy©, because each view allows one to validate the Xml Schema, and the results are not always identical. Indeed, most of the time the tool will show the same validation results in all three views. If there is an obvious error, or a simple one, then the same message is thrown regardless of the view. But on occasion, the three views differ on their results. I've gotten a few questions over the years about what should be done when faced with different responses of this tool based on the view.

A query into Altova, the company that makes it, regarding this behavior came back with some interest results. They indicated that the validation code that is performed in the three views is different, although they did not say exactly how they were different. I then asked "which view is correct" when presented with the inconsistent results. They stated that the TextView was the most thorough, and therefore the most authoritative, validation path. We were advised to go with the TextView results and ignore the others.

I've been using XmlSpy for many years now and have come to the conclusion that the issue does not stop here. What I have found is that you should strive for all three views to agree on validation results. If even one view returns differing results on validation, then there is usually some issue to be resolved. It could be in the Xml Schema, it could be in the memory management of the tool, and it could be gremlins in the gears. Every time I have had differing results, I have either found some issue to be resolved with the schema, ended up restarting the tool and trying again, or reporting a bug. In the first case, it required some work on my part to resolve; however, ultimately, all three views have agreed on validation before I was finished. So I recommend you make sure all three personalities agree with each other.


3:01:03 PM    comment []

Interoperability is a goal with which few people would disagree. In fact, it is always on the mind of those trying to integrate systems and use standards to help. However, it is a term that, while we can easily agreed upon as a goal, its exact meaning can have variations. Assumptions or corollaries about the true meaning of interoperability can affect the steps we take to achieve it.

So what does interoperability mean? Wikipedia defines interoperability as "the ability of products, systems, or business processes to work together to accomplish a common task". This strikes me as a reasonable definition at a high level. We can also define it in more detail as it applies to connecting systems together with data standards. In my view, we can break this down into 3 major categories or types, namely syntactic, transactional, and semantic.

Syntactic interoperability is the ability of trading partners' systems to understand each other outside any subject domain. Things such as web services SOAP as an enveloping technology, HTTP as a transport medium, and Xml as a structure for containing data of any kind (and Xml Schema as an arbiter of that structure) are ways in which implementers can agree on syntax.

Transactional, or scenario, interoperability is an agreement to a certain setting for data transfer. Here we can model a use case that indicates a request-response pair of interactions that are expected on both ends. If we agree that "I'll send you a benefits enrollment for a subscriber" and that I'll expect a response indicating "the success of failure of that enrollment in my system" then we've agreed upon a scenario for passing data.

Semantic interoperability is the ability to agree on the meaning of the data itself. This can simply be getting your PersonName to mean the same thing as my PersonName. And even here, there are multiple types. First, we need to literally define what a PersonName is so that we can agree. Second, we need to work on common usage. For example, I may key my system on last names of persons, whereas you may only want the formatted version of a name so as to display it in a web page presentation.

While there are entire tomes written on this word, I wanted to at least blog on the 3 aspects of it that relate to data integration. Here is a brief round up of some items on this topic.


12:55:04 PM    comment []


Click here to visit the Radio UserLand website. © Copyright 2007 Paul Kiel.
Last update: 10/11/2007; 9:27:05 AM.

October 2007
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
Sep   Nov