Welcome!

Simeon Simeonov

Subscribe to Simeon Simeonov: eMailAlertsEmail Alerts
Get Simeon Simeonov via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: XML Magazine

XML: Article

The Evolution of XML Protocols

The Evolution of XML Protocols

XML protocols can be broadly classified into two generations. First-generation protocols are based purely on XML 1.0. Second-generation protocols take advantage of two revolutionary XML standards - XML Namespaces and XML Schema. This article analyzes the reasons why we need to make a shift to second-generation protocols, and looks at industry activity in this area.

First-Generation XML Protocols
Generally speaking, protocols specify in detail how certain business/application/network services are accessed through a set of requests and how responses/replies are received. XML protocols' requests and responses are encoded in XML.

There were many interesting first-generation protocol efforts. They informed the XML community of important protocol requirements and particular approaches to satisfy these requirements. Unfortunately, few of the first-generation XML protocols achieved multivendor support and broad adoption. I mentioned one of them, Web Distributed Data Exchange (WDDX), in my article on B2B computing in the last issue (XML-J, Vol. 1, issue 2). Another one worth mentioning is XML-RPC.

XML-RPC is a remote procedure call protocol. RPCs identify the target procedure to be called, pass some parameters to it and receive a response. XML-RPC uses HTTP as the underlying transport protocol, but all call and response data is in XML. Listing 1 shows an XML-RPC call example.

First-generation XML protocols share many common characteristics. Generally they use a fixed set of "envelope" elements for specifying requests and responses. XML-RPC uses methodCall, methodName and methodResponse. They also use a fixed set of elements to identify data types. In Listing 1 these are i4 and string.

First-generation protocols also share common problems.

Extensibility Through Namespaces
First-generation protocols weren't very extensible. The protocol architects had to reach agreement before any changes were implemented and the protocol version had to be revved up to let tools distinguish new protocol versions from old ones and handle the XML appropriately. For example, when XML-RPC and WDDX added support for binary data, both protocols had to update their specifications, and the protocol implementations on all different languages and platforms supporting the protocols had to be updated.

Needless to say, the overhead of constantly revising specifications and deploying updated tools for handling the latest versions of the protocols imposes limits on the speed and scope of adoption of first-generation protocols. Don't get me wrong: protocols like XML-RPC and WDDX do very useful things and are great at what they do. But it would be practically impossible to use them as the base for the One Generic XML Protocol to End All Protocols. The main issue is that the facilities of XML 1.0 on their own offer little help in creating extensible XML protocols. XML DTDs aren't designed with this in mind.

The extensibility problem boils down to the need to decentralize the evolution of XML protocol specifications. For example, TrustMe.com wants to add security support to an existing protocol. The company will define some XML format (schema) for representing security information that it will need to mix in with the existing protocol schema. How can it do so without any ambiguity?

In early 1998 the XML community, having realized the general importance of this problem, started work on Namespaces in XML. The original W3C Note stated: "We envision applications of XML in which a document instance may contain markup defined in multiple schemas. These schemas may have been authored independently. One motivation for this is that writing good schemas is hard, so it is beneficial to reuse parts from existing, well-designed schemas....These considerations require that document constructs should have universal names whose scope extends beyond their containing document." The effort became a W3C Recommendation in early 1999.

The namespace approach is simple and elegant. While a full discussion is beyond the scope of this article, a single example will give you a sense of the power of namespaces. There are two parts to using namespaces: you first identify them via URIs, then you identify the use of the namespace via a name prefix (see Listing 2). That's all it takes to integrate the work of TrustMe.com.

Because of their ability to promote the decentralized evolution and reuse of XML specifications and applications, namespaces can be considered the most fundamental advance in the XML standards space since XML 1.0.

Data Types and Validation Through Schemas
Without namespaces, most first-generation XML protocols stuck to a single DTD to describe the representation of serialized data in XML. Most implemented support for simple data types (strings, numbers, booleans and date-time values) as well as structured types (arrays, associative arrays, a k a structures and tabular data). In general, first-generation XML protocols used just a few XML elements. This made building tools supporting these protocols relatively easy. Everything seemed fine until people realized they'd lost the ability to declaratively specify what the protocols were working with.

Consider a simple example about people's names and ages. We can represent this information as:

<person>
<name>Peter Smith</name>
<age>42</age>
</person>

Say we want to represent the data in WDDX so that it can be freely exchanged between applications written in different programming languages and running on different platforms. We can do this as:

<struct>
<var name='name'>
<string>Peter Smith</string>
</var>
<var name='age'>
<number>42</number>
</var>
</struct>

So far, so good. We can process this chunk of WDDX to our heart's content. But wait a minute. How do we know that the WDDX data we are processing has to do with people's names and ages? For example, what if we get:

<struct>
<var name='name'>
<string>Fast Server</string>
</var>
<var name='price'>
<number>1500</number>
</var>
</struct>

This chunk of XML is also valid WDDX according to the WDDX DTD. However, it's most definitely not about people. It most likely has to do with hardware products. Another way to represent it using generic XML could have been:

<product>
<name>Fast Server</name>
<price>1500</price>
</product>

By now you must see what the problem is. It has to do with the fact that, by choosing to express your information using the single schema of an XML protocol, you've lost the ability to automatically perform validation on the data you receive. A validating XML processor will clearly be able to distinguish between <product> and <person>. This isn't the case once the information is represented in a common data format such as WDDX.

Is this a big problem? It depends on your viewpoint. I think it makes first-generation protocols no less useful but it does mean that application developers will have to perform some custom validation; they can't count on the XML processor to do all validation for them.

Is there a better way? Let's trace the root of the problem. The reason we wanted to go with something like WDDX was that we wanted to process the information about people as application data. Trouble was, there were no ready-made tools that knew that the contents of <name> should be a string and the contents of <age> should be a number. So we used a WDDX encoding , in which we put the name inside a <string> element and the age inside a <number> element. We performed a transformation in which the type of the data became expressed in XML but the semantic meaning (or origin) of the data was lost. We gained the ability to process the information with any number of programming languages but lost the ability to declaratively validate that we are dealing with information about people.

Ideally, we want to find a way to use meaningful element names such as name and age that give us readability and declarative validation, yet at the same time attach a data type of "string" to name and "positive integer" to age. XML 1.0 doesn't offer this, but XML Schemas do. For example, the following schema will fit our needs:

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<xsd:element name="person" type="personType"/>
<xsd:complexType name="personType">
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="age" type="xsd:positive-integer"/>
</xsd:complexType>
</xsd:schema>

As you can see, XML Schemas are a significant improvement over DTDs. (For more information see Robert DuCharme's article "Replace DTDs - Why?" in XML-J, Vol. 1, issue 1.) First, they're expressed in XML, which makes their authoring and automated processing a lot easier. Second, they allow you to specify in much greater detail the structure of your XML. They also allow you to associate a data type - string, positive integer and many others - with element content. Last but not least, they let you specify new complex types (such as personType in the example above). Complex types are the key to combining data type information with validation information.

The work on XML Schemas is, of course, handled by the W3C. It's been a long and arduous process, primarily because XML Schemas try to do a lot more than what's shown above. In the beginning of April the sixth draft of the specification was released. Sources that are part of the working group say the end is near. We should hope so, because XML Schemas are of great importance to XML protocols.

Second-Generation XML Protocols
Lacking support for namespaces and schemas, the architects of first-generation XML protocols kept reinventing the wheel. They all had to come up with ways to identify the XML messages used by the protocol, to serialize and deserialize data, and to manage other protocol aspects such as security and transactions. The end result was significant redundancy and repetition in most XML protocol specifications. Translation software had to be written to convert between different protocol formats.

This fragmentation raised the overall cost of adoption of XML protocols by businesses everywhere at a very inopportune time, just as B2B computing was really taking off. B2B relies on Internet application interoperability that's based on XML protocols. (For more information, see my article "XML for B2B" in the XML-J, Vol. 1, issue 2.)

Something had to be done - and it had to be done quickly. The only good solution to the fragmentation problem was the development of base XML protocol standards by an independent standards body such as W3C, IETF or OASIS. With namespaces and schemas, XML technologies had matured to the point at which it was conceivable that a single specification could address all relevant issues in an elegant yet extensible manner.

Around the turn of the millennium, there was a spike in activity. By that time, Microsoft had already published the Simple Object Access Protocol (SOAP) specification. SOAP 1.0 suggested a highly abstract and extensible mechanism for XML-based distributed computing. It opened the door for standardization of other protocol facets such as security and transaction management but didn't make any specific recommendations in these areas.

In January 2000 CommerceOne published an IETF draft on the requirements of XML messaging. The draft addressed issues of great importance to business messaging and thus the Electronic Business XML Initiative (ebXML) was formed. OASIS and the United Nations/CEFACT group manage ebXML. For more information read Bob Sutor's "Introducing ebXML" in the May-June issue (XML-J, Vol. 1, issue 2).

It was at this time that the W3C got involved. Right before XTech 2000 the W3C made an announcement that it was looking into starting an activity in this area: "We've been under pressure from many sources, including the advisory board, to address the threat of fragmentation of and investigate the exciting opportunities in the area of XML protocols. It makes sense to address this now because the technology is still early in its evolution and more resources for development will be available as XSL-FO and XML Schemas near completion."

At XTech 2000 there was a birds-of-a-feather (BoF) meeting on XML protocols that drew a crowd of XML VIPs. No answers were obtained but a lot of good questions were raised. Since then, Eric Prud'hommeaux of the W3C has been doing a lot of work analyzing the existing XML protocol space and organizing some of the activity on the W3C xml-dist-app mailing list.

In the beginning of May SOAP 1.1 was submitted as a Note to the W3C. The big surprise was that IBM coauthored the specification with Microsoft. Hopefully, this will limit the impact of politics on the standardization process and allow the W3C to fast-track work on XML protocols.

There is still uncertainty with regard to three important issues:

  • How much do SOAP and ebXML have in common, i.e., should the two efforts coordinate?
  • If there is an overlap in scope, who will coordinate and manage the joint process? The W3C, OASIS and UN/CEFACT all have their own political agendas and want to be associated with such an important development.
  • How detailed should the specifications get? For example, should they go into the details of how certain types of security and transactions should be handled, or should they just put in place extensibility hooks and let someone else decide on this later?

    At the 9th World Wide Web (WWW9) conference in Amsterdam, there was an "XML Protocol Shakedown" panel discussion in which some of these issues were addressed. Looking at how quickly standards bodies and vendors are willing to act, there's a good chance that by the end of the year we'll be close to having a solid specification for a second-generation XML protocol that will address the requirements of a broad set of distributed computing scenarios.

    What's Next?
    This article has traced the evolution of XML protocols from simple first-generation solutions based on XML 1.0 to extensible second-generation protocols such as SOAP. Namespaces and XML Schemas are the key enablers of this transition. There's a lot of industry activity around XML protocols and the W3C is close to starting a working group in this space.

    While B2B is the most business-press-worthy topic of the year, XML protocols are clearly where the action is in the XML technology and standards space. Since XML- Journal is dedicated to keeping you informed of important developments in the field of XML, we'll be adding a regular column - "XML in Transit" - that will focus on XML protocols.

    In the next issue I'll analyze the aftermath of WWW9 and the progress of SOAP through the W3C. I'll also address the Sun/Java standards play in the XML protocols space. If you'd like me to discuss some particular topic, drop me a note at simeons@allaire.com.

    XML Resources
    BizTalk: BizTalk. Microsoft Corporation. See www.biztalk.org
    Messaging Requirements: "Requirements for XML Messaging." IETF Draft. See www.ietf.org/internet-draft/draft-ietf-trade-xmlmsg-requirements-00.txt
    Namespaces in XML: "Namespaces in XML." W3C. (World Wide Web Consortium.) See www.w3.org/TR/1999/REC-xml-names-19990114/
    "NOTE Namespaces in XML" W3C. See www.w3.org/TR/1998/NOTE-xml-names-0119.html
    SOAP 1.0: Simple Object Access Protocol (SOAP) 1.0. Microsoft Corporation. See http://msdn.microsoft.com/xml/general/soapspec-v1.asp
    SOAP 1.1: Simple Object Access Protocol (SOAP) 1.1. W3C (World Wide Web Consortium). See www.w3.org/TR/2000/NOTE-SOAP-20000508
    WDDX: Web Distributed Data Exchange (WDDX). Wddx.org. See www.wddx.org
    XML-RPC: See www.xmlrpc.com
    XMLSchema:Datatypes: "XML Schema Part 2: Datatypes." W3C See www.w3.org/TR/xmlschema-2/
    XMLSchema:Structures: "XML Schema Part 1: Structures." W3C See www.w3.org/TR/xmlschema-1/

  • More Stories By Simeon Simeonov

    Simeon Simeonov is CEO of FastIgnite, where he invests in and advises startups. He was chief architect or CTO at companies such as Allaire, Macromedia, Better Advertising and Thing Labs. He blogs at blog.simeonov.com, tweets as @simeons and lives in the Greater Boston area with his wife, son and an adopted dog named Tye.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.