Welcome!

Simeon Simeonov

Subscribe to Simeon Simeonov: eMailAlertsEmail Alerts
Get Simeon Simeonov via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: XML Magazine

XML: Article

Intermediaries And More

Intermediaries And More

In my last XML in Transit column (XML-J, Vol. 1, issue 5) I promised to complete my trilogy on Simple Object Access Protocol (SOAP) by addressing the aspects of the latest specification that we haven't covered yet: intermediaries, error handling, and data encoding. Forgive me for deviating slightly from that plan. After fielding several questions about large-scale SOAP systems, I've gotten the impression that many people who've looked at the SOAP specification are confused by the notion of intermediaries. Therefore, I've decided to ignore data encoding for the time being and focus on intermediaries and error handling in SOAP.

The Need for Intermediaries
SOAP intermediaries are applications that can process parts of a SOAP message as it travels from its origination point to its final destination (see Figure 1). Intermediaries can accept and forward SOAP messages. SOAP needs intermediaries for three key reasons: crossing trust domains, ensuring scalability, and providing value-added services along the SOAP message path.

Crossing trust domains is a common issue when implementing security in distributed systems. Consider the relationship between your corporate or departmental network and the Internet. Most likely your IT department put a majority of the computers on your network within a single trusted security domain. You can see your co-workers' computers as well as the IT servers and can freely exchange information be-tween them without the need for separate logons. On the other hand, your corporate network probably treats all computers on the Internet as part of a separate security domain that's not trusted. Before an Internet request reaches your network, it needs to cross from its untrustworthy domain to the trusted domain of your network. Corporate firewalls and virtual private network (VPN) gateways are the Cerberean guards of the gates to your network's riches. Their job is to let some requests cross the trust domain boundary and deny access to others.

Because the SOAP specification doesn't currently address issues related to security, most applications exploit the built-in security features of the protocols that are used to deliver SOAP messages. For example, the HTTP specification addresses security at a number of levels. Firewalls and Web servers use the information provided by HTTP to authenticate clients and grant or deny their access to HTTP resources. Right now, people using SOAP with HTTP are passing username and password information as HTTP headers.

In general, we want to get to a point where security information can be part of a SOAP message. SOAP servers could then perform specific authentication and authorization rather than relying on other infrastructure components such as firewalls and Web servers. In addition, one could separate the issue of securing a SOAP message exchange channel - say, between your company and a supplier - from the issue of passing security information about who SOAP messages are coming from. Last but not least, you can benefit from secure SOAP communications even on top of protocols that don't provide adequate security. Think of delivering SOAP messages using e-mail (SMTP/POP/IMAP) or of high-performance SOAP message exchanges over pure sockets. I'll get deeper into these topics in an upcoming XML in Transit column that focuses on security in XML protocols.

Another important need for intermediaries arises due to the scalability requirements of distributed systems. A simplistic view of distributed systems can identify two types of entities: those that request some work to be done (clients) and those that do the work (servers). Clients send messages directly to the servers they want to communicate with. Servers, in turn, get some work done and respond. In this na•ve universe there's little need for a distributed computing infrastructure. Alas, we can't use this model to build highly scalable distributed systems.

As an example, take basic e-mail, the service we've grown to depend on so much in the Net era. When I, simeons@allaire.com, send an e-mail to myfriend@london.co.uk, my e-mail client doesn't locate the mail server london.co.uk and send the message to it. Instead, my client sends the message to my e-mail server at Allaire. Based on the priority of the message and how busy the mail server is, the message will leave either by itself or in a batch of other messages. Messages are often batched to improve performance. It's likely that the message will make a few hops through different nodes on the Internet before it gets to my friend's mail server.

The lesson from this example is that highly scalable distributed systems (such as e-mail) require flexible buffering of messages and routing based on message parameters, such as origin, destination, and priority, and on the state of the system measured by parameters, such as the availability and load of its nodes as well as network traffic information. Intermediaries hidden from the eyes of the originators and final recipients of messages perform all this work behind the scenes.

Last but not least, we need intermediaries so we can provide value-added services in a distributed system. The types of services can vary significantly. Here are some common examples:

  • Securing message exchanges, particularly when transmitting messages through untrustworthy domains (e.g., using HTTP/SMTP on the Internet): You can secure SOAP messages by passing them through an intermediary that first encrypts them and then digitally signs them. On the receiving side, an intermediary will perform the inverse operations - checking the digital signature and, if it's valid, decrypting the message.
  • Providing message-tracing facilities: Tracing allows the recipient of messages to find out the exact path that the message went through, complete with detailed timings of arrivals and departures to and from intermediaries along the way. This information is indispensable for tasks such as measuring high quality of service (QoS), auditing systems, and identifying scalability bottlenecks.
Intermediaries in SOAP
By now I hope you're convinced that intermediaries are an extremely important concept in distributed systems. Therefore, let's take a look at what facilities SOAP has for handling them. The three aspects to the problem are:
  1. How do we pass information to intermediaries?
    From our discussion of intermediaries you can see that most of the information they require is completely orthogonal to the information contained in message bodies. Therefore, the SOAP specification mandates that information can be passed to intermediaries only via SOAP headers.
  1. What happens to headers that are processed by intermediaries?
    The SOAP specification states, "[T]he role of a recipient of a header element is similar to that of accepting a contract in that it cannot be extended beyond the recipient." This means that (1) by default, an intermediary shouldn't forward the same header to the next application in the SOAP message path, and (2) if the intermediary forwards the same or similar header to the next application, then this constitutes a contract between the intermediary and the next application. The goal here is to reduce system complexity by requiring that contracts about the presence, absence, and content of information in SOAP messages be very narrow in scope - from the originator of that information to the first SOAP application that handles it, not beyond.

  2. How do we identify who should process what?
    Clearly, the message body is intended for the final recipient of the SOAP message. As far as the header entries are concerned, there's an elegant solution. All header elements can have the SOAP actor global attribute. The value of the attribute is a URI that identifies who should handle the header entry. The special value "http://schemas.xmlsoap.org/soap/actor /next" indicates that the header entry's recipient is the first SOAP application that processes the message. This is useful for hop-by-hop processing that's required by message tracing. Of course, omitting the actor attribute implies that the final recipient of the SOAP message should process the header entry.
Putting It All Together
Let's see how this comes together in the potentially realistic albeit contrived example of Big Corp.'s B2B integration project. Please keep in mind that the XML in these examples is purely fictional - currently there isn't a standardized way to handle security and routing of SOAP messages.

Big Corp. needs to integrate various applications in several of its departments with some of its partners' applications (see Figure 2). The companies agree to use SOAP messages over HTTPs (for security reasons).

Every department in Big Corp. has a server that hosts SOAP applications. These servers have their own trust domains and are sitting deep inside the corporate network invisible to the outside world. To address this issue, Big Corp. develops a partner-interface gateway SOAP application that will act as an intermediary between the partner applications sending SOAP messages and the department-level applications that will handle them. The gateway application is hosted on an application server that's visible to the partner applications. A firewall is configured to allow access to the gateway application from the partner networks only.

The gateway application validates partners' security credentials and routes messages to the appropriate departmental SOAP applications. Security information and department server locations are available from Big Corp.'s enterprise directory.

Listing 1 shows an example message with two header entries that the gateway application might receive. The first header identifies the target department as billing. The second one passes the authentication information of the message originator, partner A in this case. Both header entries are marked with mustUnderstand="1" because they're critical to the successful processing of the message. The actor attribute identifies the partner gateway application as the place to process these.

After processing the message, the partner gateway application might forward the message shown in Listing 2.

Note how the previous two header entries have disappeared. They were meant for the gateway application only. Having extracted the billing department's location from the enterprise directory, the gateway application forwards the message to http://billing.bigcorp.com/ Billing. There's a new header entry meant for the final recipient of the message. The entry specifies the security identity of the message originator as /External/Partners/PartnerA. This identity was presumably obtained from BigCorp.'s security system following the successful authentication of partner A. The applications in the billing department use this identity to check whether partner A is authorized to perform the operation requested in the SOAP message body.

Error Handling in SOAP
So far in our examples everything has gone according to plan. Murphy's Law guarantees that this is not how things work in the real world. What would happen, for example, if partner A failed to authenticate with the partner gateway application? How will this exceptional condition be communicated via SOAP? The answer lies in the semantics of the SOAP Fault element.

Listing 3 contains a possible reply message caused by the authentication failure.

Before we look at the XML, note that the HTTP response code is 500 Internal Server Error. This is a required response in the case of any SOAP-related error by the HTTP transport binding as presented in the SOAP specification. Other protocols have their own way to report errors.

The body of the response contains a single Fault element in the SOAP envelope namespace. This is the mechanism SOAP uses to indicate that an error has occurred and to provide some diagnostic information. This element contains three child elements.

The faultcode element must be present in all cases. It's meant to provide information that can be used to identify the specific error that occurred and is not meant for human consumption. The content of the element is a string prefixed by one of the four faultcode values specified by SOAP:

  • VersionMismatch: Indicates that the namespace of the Envelope element is invalid.
  • MustUnderstand: Indicates that a required header entry was not understood.
  • Client: Indicates that the likely cause of the error lies in the content or format of the SOAP message. In other words, the client probably shouldn't resend the message without making some changes to it.
  • Server: Indicates that the message failed due to reasons other than its content or format. This leaves the door open for the same message to succeed at a later time perhaps.
A hierarchical namespace of values can be obtained by separating fault values with the dot character. In our example, Client.AuthenticationFailure is a more specific fault code than simply Client.

The faultstring element contains a human-readable message identifying the cause of the fault. This message must always be present. Here we simply state that the client has failed to authenticate.

The fault actor element provides information about where in the message path the fault occurred. It must be present if the failure occurred someplace other than at the final destination of the SOAP message. The content of the element is the URI of the actor where the error occurred. In our example, we identify the partner gateway application as the failure point.

This example doesn't show how application-specific error-diagnostic information can be exchanged. SOAP provides a simple mechanism for this as well. If the fault occurred during the processing of the message body, an optional detail element can be added after faultactor. There are no restrictions on its contents. If the fault occurred during the processing of a header entry, the header entry should be returned with detailed error information contained therein. In our example we would have returned the AuthenticationInformation header with the response.

Next Steps
At this point we've covered all the fundamental aspects of SOAP. In the future I'll start looking at aspects of SOAP that are not addressed by the current specification. I've harped on them for a while now: various data encoding formats, programming language data bindings, and Web service discovery and specification, as well as the omnipresent security and transaction management.

I think data encoding and language data bindings make a good topic for the next XML in Transit. Also, I hope to have a chance to report on the activities of the W3C Working Group that's taking over the SOAP specification. This group has some excellent members. I've been very happy with the way SOAP has evolved so far and am looking forward to working with the group to help deliver a solid XML Protocol specification within a short time frame. The first meeting of the working group is about two weeks from the time of this writing. I'll keep you posted on how things develop.

More Stories By Simeon Simeonov

Simeon Simeonov is CEO of FastIgnite, where he invests in and advises startups. He was chief architect or CTO at companies such as Allaire, Macromedia, Better Advertising and Thing Labs. He blogs at blog.simeonov.com, tweets as @simeons and lives in the Greater Boston area with his wife, son and an adopted dog named Tye.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.