Simeon Simeonov

Subscribe to Simeon Simeonov: eMailAlertsEmail Alerts
Get Simeon Simeonov via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: ColdFusion on Ulitzer, Apache Web Server Journal, XML Magazine, ERP Journal on Ulitzer

CFDJ: Article

XML for B2B Integration

XML for B2B Integration

It wasn't long ago that computer industry pundits still thought that COM and CORBA would become the Internet business-to-business (B2B) integration infrastructure. Yet nowadays B2B integration on the Internet is done using XML on top of simple protocols such as HTTP, FTP and SMTP. The Web has won because of its simplicity, ubiquity and heterogeneity. XML has become the lingua franca of B2B because of its inherent capabilities: simplicity, extensibility and ease of processing.

In this article I'm going to discuss the role XML has to play in this business-to-business space, particularly with respect to the infrastructure technologies that enable B2B integration. I'll present three complementary views of B2B integration, each creating different requirements for XML use, then focus on two principal technology areas ­ XML data mapping and schema translation ­ that are common to all three views. Bear in mind that the B2B space is very broad and that every e-business­focused company has some B2B initiatives. Hopefully, this article can serve as a good starting point for your own foray into this exciting area.

Views of B2B Integration
There are three main views of B2B integration:

  • The business process view
  • The syndication view
  • The functional view

    The Business Process View
    The business process view focuses on the types of business processes that are being integrated, the process-specific information that needs to be exchanged and the business rules associated with the interaction.

    Consider an example business-to-customer (B2C) interaction in which a book is bought on Amazon.com. B2B happens behind the scenes because amazon.com will outsource the delivery of the book to FedEx. The order fulfillment process at Amazon will have to integrate with FedEx's order-taking process. In the case of door-to-door tracking, Amazon's order tracking system will have to integrate with the package tracking system at FedEx. The basic information that has to be exchanged is the customer address information and the tracking number. XML efforts in this area focus on establishing standard XML schemas for representing information pertaining to business process integration.

    The Syndication View
    The syndication process is often applied as another model for analyzing B2B integration. At its basic level, syndication on the Internet has to do with the creation, aggregation, distribution and consumption of some electronic asset. For example, Jim Davis is the creator of Garfield cartoons. The cartoons are aggregated by Uclick.com, a company related to Universal Press Syndicate. Uclick distributes the cartoon to companies such as AOL and New York Times, Inc. People everywhere view (consume) the cartoons. There are many forms of syndication on the Net. The syndication view of B2B focuses on analyzing what happens at every stage of the process.

    There are many XML standards currently in development that address aspects of the syndication process. Notable among these is the Information & Content Exchange (ICE) protocol. ICE facilitates the exchange and management of electronic assets between members of a syndication network. It defines the rules and mechanisms by which the assets are exchanged. An asset can be anything represented in XML. Unfortunately, the relative complexity of ICE and the high cost of solutions based on it have limited its market reach.

    It's useful to think of what's being syndicated in three broad categories: content, data and services. Content can be plaintext, HTML, PDF or some XML describing publishable materials. Content is produced at least partially by humans and is generally meant for human consumption, not for machine processing. Garfield's cartoon is a good example of syndicated content. Arbitrary binary content (such as images) is usually represented as base64-encoded XML content.

    Syndicated data is represented in an XML format that is machine-generated and meant for machine consumption. The XML maps to some application-level data structures, be they the results of a database query, the data of an enterprise business object or the parameters to an enterprise resource planning (ERP) system operation. A good example of syndicated data would be the information about unique customer visits to your hosted Web site. Your ISP provides the data. Ideally, it will come in not as a PDF report but in an easy-to-process XML format so that, for example, your customer profiling server can easily analyze the data.

    At a deep philosophical level the distinction between content and data disappears, but for our purposes it's an important one. We have to worry about the way in which application-level data structures are converted to and from XML. I'll take a further look at this in the next section.

    Syndication of services is very different from content and data syndication in that there isn't any single piece of content (article, picture, piece of data) that's being exchanged. To syndicate a service means to provide the right to access some remote functionality via a well-defined public API. For example, consider FedEx exposing access of its package tracking system to amazon.com's order tracking system. There's no specific content or data being syndicated. Instead, when customers look at the status of their book orders, Amazon has to take the tracking number that FedEx provided when they shipped the book and make a request against the FedEx system. The request will be in XML. The response will be some XML specifying the delivery status of the book. It is the request-response nature of processing that distinguishes the syndication of services. By comparison, content and data syndication constitutes a one-way dump of information.

    The Functional View
    The third and final view of B2B integration that we have to consider focuses on the types of functionality a B2B system must offer.

    The major areas of functionality are defined by the operations that need to be performed in the business process and syndication views. For data syndication, we need to be able to map data to and from XML. For both content and data syndication, we need some form of robust business quality messaging to get the XML from one point to another. For the syndication of services, we need some mechanism to define XML-based remote APIs, specify/send requests and receive/process responses. Last but not least, for interoperability between information formats for business processes, we need some XML data conversion capabilities.

    XML messaging has already been discussed to some extent in Sandeep Nayak's article in the premier issue of XML-Journal ("XML Middleware," Vol. 1, issue 1). All current technologies for XML-based remote APIs rely on XML data mapping technologies. Accordingly, the rest of this article will focus on XML data mapping and conversion technologies.

    XML Data Mapping
    XML data mapping has to do with generating XML from application data and creating application data from XML.

    Application data covers the common datatypes developers work with every day: boolean/logical values, numbers, strings, date-time values, arrays, associative arrays (dictionaries, maps, hash tables), database recordsets and complex object types. The process of converting application data to XML is called serialization. The XML is a serialized representation of the application data. The process of generating application data from XML is called deserialization.

    The traditional approach for generating XML from application data has been to sit down and custom-code how data values become elements, attributes and element content. The traditional approach of working with XML to produce application data has been to parse it using a simple API for XML (SAX) or Document Object Model (DOM) parser. Data structures are built from the SAX events or the DOM tree using custom code. There are, however, better ways to map data to and from XML using technologies specifically built for serializing and deserializing data.

    XML data mapping technologies have been developed in a variety of contexts. In most cases, XML data mapping is treated as a service within larger B2B XML technologies, e.g., messaging or request/response handling. (See the "XML References" at the end of this article, especially: Lightweight Distributed Objects (LDO), Schema for Object-Oriented XML (SOX), Simple Object Access Protocol (SOAP) and XML Schema Part 2: Datatypes.) One example technology can illustrate what XML data mapping is all about.

    Web Distributed Data Exchange (WDDX) is a language- and platform-neutral technology for XML data mapping. I have to come clean and confess that I created WDDX back in 1998, so my views may be somewhat partial. But the fact of the matter is that WDDX is used by tens of thousands of server installations on the Internet. It comes as part of two Web application servers: Allaire ColdFusion and PHP. WDDX also supports Java, JavaScript, Perl, Python, ASP and COM. It is a free, open source technology managed by wddx.org.

    WDDX lets developers achieve XML data mapping (a) without their knowing any XML, and (b) without their having to write any custom code for data conversion. When you use WDDX, you don't have to worry about XML at all. With one line of application code you can convert your data to XML and with one line of application code you can get data back from the XML.

    WDDX does its job by defining a single XML format for representing data. The format is specified by the WDDX DTD. Therefore the details of what format of XML to use are all taken care of. In addition, every language and platform that supports WDDX defines two platform-specific modules (see Figure 1).

    The serializer module converts data from that programming language to a chunk of XML conforming to the WDDX DTD. The deserializer module takes WDDX, parses it and creates data structures in the specific programming language. For example, the serializer/deserializer modules for JavaScript deal with JavaScript arrays while the ones for Java work with java.util.Vector objects. In your projects you simply use the serializer/deserializer modules that come as part of the WDDX Software Development Toolkit (WDDX SDK). The SDK is distributed by wddx.org.

    WDDX is perfect for data syndication and remote B2B integration APIs because it's all about representing data as XML. For example, Moreover.com, the Web feed company, exposes all its content through a WDDX-based remote API. Access http://moreover.com/cgi-local/page?index+wddx with an XML-aware browser such as IE 5.0 and you'll get a WDDX packet with current headline news. A simplified version of the packet is shown in Listing 1. We can see from it that the data format is a recordset (tabular data) with three fields containing the URL to the full article, its headline text and the publishing source.

    WDDX is flexible enough to handle most useful datatypes, but it doesn't give you control over the generated XML. However, there are some cases where a specific XML format is required and you need to be able to map from that specific XML format to a reasonable data structure representation of it. In this case you focus on the XML and don't care so much about the data structures. Enter schema compilation tools.

    The term XML schema is broadly used to refer to any type of XML format specification, e.g., DTDs, XML Schema, XML Data, and so on. Schema compilers are tools that analyze XML schema and code-generate serialization and deserialization modules specific to the schema (see Figure 2). These modules will work with data structures tuned to the schema. Let's take a fragment from a book DTD as an example:

    <!ELEMENT book (title, author, isbn)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT author (#PCDATA)>
    <!ELEMENT isbn (#PCDATA)>

    A schema compiler that works with Java can automatically define a class that represents books. A simplified version of how such a class might look is shown in Listing 2. In actual cases the class definition will probably include getter and setter methods and some other niceties that can be safely ignored for the purposes of this example.

    Consider an example XML fragment:

    <title>XML: A Primer</title>
    <author>Simon St. Laurent</author>

    When the deserializer module generated by the schema compiler parses the XML, it will create an instance of a Book object that will be equivalent to the one created by the following Java fragment:

    Book aBook = new Book();
    aBook.title = "XML: A Primer";
    aBook.author = "Simon St. Laurent";
    aBook.isbn = "076453310X";

    Because DTDs cannot specify many interesting data types, not even numbers, schema compilers won't become widely used until the W3C XML Schema activity produces a final specification, part of which will specifically cover datatypes. Therefore, don't count on robust schema compilation technology for at least 6­9 months. One of the notable efforts in this space is the XML Data Binding activity that is part of Sun's Java standardization effort. The folks at Sun, together with a bunch of XML experts, including yours truly, are working on a schema compiler for Java. No public information has been released yet so I have to keep relatively quiet on the subject.

    Schema Translation
    Schema translation refers to the conversion of XML documents from one format to another. It is also known as XML integration/conversion. Schema translation is very important in the context of B2B because the world of business is highly heterogeneous. No single organization owns or will ever own the standards that specify how the information relevant to B2B processes is going to be represented in XML. In fact, right now there are at least two broad efforts to manage all kinds of B2B XML standards. There also are hundreds of efforts focused on vertical business niches.

    The first of these broad efforts is BizTalk. This is a Microsoft initiative to drive the adoption of XML for e-commerce and application integration. The BizTalk Framework establishes guidelines for the successful use of XML for B2B. BizTalk.org is a repository of schemas related to different e-commerce segments that conform to the BizTalk Framework. Biztalk.org is also a community of companies using XML for B2B. Microsoft is working on tools that will facilitate the use of BizTalk schemas and the integration of business processes based on BizTalk.

    The Organization for the Advancement of Structured Information Standards (OASIS) is an international consortium focused on managing open standards for content and data interchange. Probably the most relevant initiative at OASIS is ebXML, the Electronic Business XML initiative, which is an effort to establish a global framework for the exchange of business data. The OASIS repository for XML schema is xml.org. One can find a lot of interesting specifications there. For example, see IBM's submission of the Trading Partner Agreement Markup Language (tpaML).

    By now you must realize that there will be many specifications produced by different standards bodies covering similar areas of e-commerce. Since interoperability is crucial to B2B, schema translation tools must be employed to facilitate B2B processes that work with multiple standards.

    There are two main approaches to schema translation: one utilizes custom software and the other uses XSL transformation. The custom software approach relies on you parsing the incoming XML, building some data structures that represent the data specified in the XML and then generating XML in the prescribed format.

    XSL Transformations
    The second approach centers around the Extensible Stylesheet Language Transformations (XSLT) specification.

    XSLT is the standard mechanism for transforming XML from one format to another. An XSLT processor takes an input XML document and, using rules from a stylesheet (another XML document that specifies how the conversion is to be performed), generates an output XML document. There are several freeware XSLT processors. You can obtain XT from James Clark's site (www.jclark.com) ­ James is the editor of the XSLT specification. Xalan is an implementation that was started by IBM's Alphaworks division. It has since been contributed to the open source initiative at xml.apache.org.

    Writing custom schema translation code or XSLT stylesheets for schema translation can take a lot of effort. To ease the pain, vendors are doing one of three things. Some are adding out-of-the-box translation tools for popular schemas. Others are defining metalanguages that specify at a higher level how schema translation should be done. Developers write an XML document that specifies how elements, attributes and content of the source document are mapped to elements, attributes and content in the target document. Some type of a code generator will then build the custom code or XSLT stylesheet that does the actual work (see Figure 3).

    This approach is similar to that of schema compilers. The difference is that schema compilers work with XML and application data while the schema translation tools work exclusively with XML. Finally, some vendors are adding GUI tools for schema translation. As a developer, you drag and drop elements and attributes between the source and target documents to specify how you want the translation to be done. Code generation then happens behind the scenes just as in the previous approach. Most application server vendors (Allaire, BlueStone, IBM, Microsoft, SilverStream) have announced initiatives in this area. The details are scant and quickly changing.

    Tried and True
    To finish, here are some simple guidelines for developers that can help steer you in the right direction when you want to apply XML to B2B e-commerce:

  • Analyze the problem space from three different views: business process, syndication and function.
  • Choose a reliable mechanism for delivering XML. Don't forget to worry about security and transactions. (Unfortunately, a solid discussion of reliable XML transport is outside the scope of this article.)

  • If the focus is on data, choose technologies that facilitate automatic XML data mapping such as WDDX.
  • If the focus is on a particular XML format, choose schema compilation tools that can generate serialization/deserialization modules for this format.
  • If the focus is on integrating different XML formats, choose XSLT processors or other schema translation tools.
  • If the focus is on exposing remote APIs for application integration, choose data-centric technologies that can handle request/response processing, e.g., WDDX, XML-RPC, LDO and SOAP.
  • Minimize custom coding, particularly for parsing XML and generating XML from application-level data structures.
  • Keep a simple approach and don't rely much on draft specification.

    Last, always bear in mind that the B2B space is growing at an amazing rate. Try to keep up with it as best you can.

    XML References

  • BizTalk: BizTalk. Microsoft Corporation. See www.biztalk.org
  • ICE: Information Content Exchange (ICE). W3C (World Wide Web Consortium). See www.w3.org/TR/NOTE-ice
  • LDO: Lightweight Distributed Objects (LDO). Casbah.org. See www.casbah.org/LDO
  • SOAP: Simple Object Access Protocol (SOAP). Microsoft Corporation. See http://msdn.microsoft.com/xml/general/soapspec-v1.asp.

  • SOX: Schema for Object-Oriented XML 2.0. W3C (World Wide Web Consortium). See www.w3.org/TR/NOTE-SOX
  • WDDX: Web Distributed Data Exchange (WDDX). Wddx.org. See www.wddx.org
  • XML Data Binding: JSR-000031 XML Data Binding Specification. Sun Microsystems. See http://java.sun.com/aboutJava/communityprocess/jsr/jsr_031_xmld.html
  • XML-RPC: XML-RPC. xmlrpc.com. See www.xmlrpc.com
  • XMLSchema: Datatypes: XML Schema Part 2: Datatypes. W3C (World Wide Web Consortium). See www.w3.org/TR/xmlschema-2/
  • XMLSchema:Structures: XML Schema Part 1: Structures. W3C (World Wide Web Consortium). See: www.w3.org/TR/xmlschema-1/
  • XSLT: XSL Transformations (XSLT). W3C (World Wide Web Consortium). See www.w3.org/TR/xslt
  • More Stories By Simeon Simeonov

    Simeon Simeonov is CEO of FastIgnite, where he invests in and advises startups. He was chief architect or CTO at companies such as Allaire, Macromedia, Better Advertising and Thing Labs. He blogs at blog.simeonov.com, tweets as @simeons and lives in the Greater Boston area with his wife, son and an adopted dog named Tye.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.