In the last installment of XML in Transit (XML-J, Vol. 2, issue 5), we established a framework for Web Services usage. The key roles in the framework are service providers, requesters, and brokers (see Figure 1). Moreover, the basic Web Services use workflow involves five steps (see Figure 2): providers enabling access to services and registering them with brokers, and requesters finding the right service to use from the broker service repository, binding their applications to the service, and finally, invoking the service directly from the provider.
We also looked at how binding information can be cached to improve performance and how retry-on-failure is a simple mechanism that enables robust service invocation. This month's column continues with some thoughts on creating a scalable Web Services infrastructure.
Web Services Scalability
Scalability means different things to different people. Typically, the narrow technical definition refers to "the ability of a system to handle increasing load," while the broad business definition is used along the lines of "the ability of a system to handle an increase in some factor X." Commonly looked-at factors are usage, complexity, change, and so on.
When looking at the scalability of the framework for Web Services usage, we need to maintain a middle ground. Primarily, we're interested in what will happen when Web Services becomes a commonplace reality. An obvious consequence will be an increase in the number of service providers, requesters, and brokers. For each of these roles we'll have to look at how an increase in its numbers impacts the following:
The Chicken and the Egg
Countless economists have written volumes on the process of introducing new goods and services in markets. There are two key factors at play:
This brief digression to economics is relevant to this month's topic for a simple reason: we talk about scaling Web Services usage but how are we going to get there? Is there a manic demand for Web Services that is pushing all vendors in the space to rapidly innovate and grow their offerings? Are the vendors building complex systems in a vacuum hoping that real businesses will use them someday? Or is this just the next layer in the XML hype tower?
These are all great questions and, perhaps, the topic of a future column. If you ask me, you'll get an undoubtedly biased answer. I certainly haven't spent a year writing a column on XML protocols and Web Services thinking that they are just hype. At the same time, I strongly believe that not all is going well in the Web Services industry. The greatest offense, in my opinion, is that we are trying to add too many layers to the interoperability stack without first stabilizing some of the core foundation technologies. I'm confident, however, that we can fix most of these issues in the near future.
Web Services are a fundamentally better way to do distributed computing. I know of many businesses that want to use them. I know of many vendors that are building great products. I'm not exactly sure (that is, I have several hypotheses) of the path that the industry will take to reach a state where Web Services are a commonplace infrastructure, but I know we will get there in the next few years.
Therefore, let's not ponder right now whether supply or demand will drive adoption, or whether there will first be many providers, requesters, or brokers. Instead, let's imagine a world of a pervasive Web Services presence and think about the infrastructure that we'll need to support using Web Services in this environment.
Scaling with an Increasing
Number of Requesters
Other things being equal, increasing the number of service requesters will result in increased service usage. From our simple workflow model in Figure 2, we see that will increase the frequency of bind and invoke operations, thus impacting both brokers and providers. The key is to have scalable and reliable providers and brokers. How do we do this?
(Note: The following discussion focuses on the simpler concepts of reliability and scalability rather than on more complex quality-of-service issues, such as guarantees, differential service, and so on.)
Service providers have many different mechanisms by which to increase the scalability and reliability of their offerings. Generally, there are three types of approaches:
As an industry we know how to address this issue - this is the classic high-performance application server problem. The leading application server vendors - Macromedia, BEA, IBM, Microsoft, and Sun - have been working on it for the past several years. For those of you not familiar with the space, here's a simple description of some of the options you have at your disposal. (For convenience I assume that the scalability bottleneck is application server processing, not some back-end system.)
Using bigger, better, faster hardware is at the top of the list. This is the easiest change to implement. It involves little more than software reinstallation and configuring the new system. The risk of failure is low because the complexity of the solution is low, too.
There are four main reasons why you would not be able to solve all your problems using this approach:
There are three common system architecture approaches to clustering:
Broker-Assisted Mechanisms
Brokers expose service binding information to requesters. Typically the binding link will simply point to a WSDL file on the provider's site. However, with a little bit of trickery, providers can use brokers' information intermediary roles to improve the reliability and scalability of their service offering.
The basic mechanism involves dispensing with the notion of WSDL as a static document and thinking of WSDL as a dynamic document controlled by the provider. In particular, the portion that the provider might want to change frequently is the address of the service. By returning different binding addresses for different WSDL document requests, the provider can implement a basic load-balancing scheme.
Reliability in the system is introduced via the retry-on-failure mechanism. When accessing a Web service, requesters will use cached binding information. If the operation fails, they go to the service broker and obtain updated binding information for that service. This model can be used to increase both the reliability and the scalability of services.
WSDL already allows multiple addresses to be provided for the same service; these can be used as backups in case of unexpected service failures. However, during scheduled updates of the back end, the top-level address can be changed to point to the server that will service requesters during the update. After retry-on-failure, requesters will bind to that server. Another retry-on-failure will switch them back to the original server after the update is completed.
Also, when a server is getting really busy and can't service requests on time, it can use retry-on-failure to bind the requester to other, less busy, servers.
Service-Redesign Mechanisms
It's hard to talk in depth about improving scalability and reliability of Web Services if we have the opportunity to completely redesign the way in which services are exposed. We can touch only the surface of what is possible. There are four key areas of focus:
Scaling with Increased Number
of Providers
Increasing the numbers of Web Services providers (and, therefore, the number of available services) affects brokers and requesters in very different ways. Because brokers act as intermediates in service discovery, they have to address the infrastructure scalability problems. At a basic level, requesters are just likely to see more entries in their Web Services search results.
Broker Networks
So far we've taken a simplistic view of brokers - they exist somewhere and both providers and users can work with them. In a world with many service providers, the job of being a broker becomes somewhat complicated for two reasons:
UDDI uses a DNS-style data replication mechanism. Periodically, a broker will inform "neighboring" brokers of any changes to its Web Services repository. These changes are propagated through the network. This is how all brokers will have information about a service provider, even if the provider registered with only one of the brokers.
Replicating information about Web Services is a great idea. However, the synchronization lag raises a problem.
Consider the example of a provider who tries to induce retry-on-failure to temporarily point requesters to a backup server. The effort will succeed only if all requesters ask for the same WSDL document when they want to rebind. This is one of the reasons why it's much better to implement dynamic WSDL on the provider side rather than modify the WSDL link in the broker repository; if the link changes, the synchronization lag can adversely affect some requesters.
Value-Adding Intermediaries
Try this: go to Yahoo! and do a search on "SOAP." I got 14 categories, 757 sites, 9,700 shopping items, and 100 auctions. On the first couple of pages I saw not a single piece of information on Simple Object Access Protocol. My point: it's hard to find something specific if your search engine doesn't understand your domain of interest.
Let's narrow down the domain. Try this: go to Yahoo! Shopping and search for "currency converter." I found 76 products in 42 stores with prices ranging from $0.00 (a free online currency converter) to $444.95 (a cellular phone with a currency converter). My point: even if the domain is fairly narrow, it's hard to specify what you are looking for and to choose one item from the results of a search.
Now, put yourself in the shoes of a business that is looking for a currency conversion Web service. How will you find one? More important, how will you design and build systems that may automatically search for and choose such a service for you?
So far we have thought only of brokers providing technical discovery services. Technical discovery deals with three basic types of information: (1) business-level contact information, (2) categorization (type of business, geographical area, etc.), and (3) binding information (WSDL). This information is grossly inadequate to enable manual, let alone automatic, discovery of services that satisfy real-world business requirements. Several key pieces are missing:
We are likely to see marketplaces, trust providers, rating agencies, domain-specific search engines, metabrokers and many others. Just recently, I heard of four startup plans in this area. Brace yourselves for rapid innovation.
Scaing with Increased Number of Brokers
It seems that the only way to scale with the number of Web Services providers is to increase the number of service brokers and the types of brokers. This raises the question of the impact on requesters and providers of an increasing number of service brokers. I think we can answer this question by breaking it into three parts:
The richer the service description model, the more detailed information about the service can be delivered automatically to requesters. This type of automaticity facilitates dynamic discovery and, hopefully, binding to the service. (To get to this level at a broad scale, we need domain-level schema standardization.)
The scope of the broker network refers to the number and type of service discovery partners the broker has. Service information will be replicated within the partner network to the best extent possible. It's unlikely that valuable domain-level information will leave the partner network; only basic technical discovery information is likely to reach the global Web Services brokers.
The final deciding criteria for service providers to go with a certain broker will be the broker's ability to reach potential service requesters. In addition, providers would consider factors such as the presence or absence of substitute services listed by the broker and any preferential treatment deals they can strike. The reverse analysis applies for requesters.
In addition to looking at different brokers' offerings, requesters and providers are likely to employ metabrokers (a special kind of service provider). Metabrokers will provide ratings and other types of useful information about service brokers to aid in the decision-making process.
One important point: despite the fact that it may seem as if providers could register with as many brokers as they want and that requesters could inquire with any broker, some brokers may impose exclusivity conditions and other mechanisms to provide value by restricting the free flow of information. It will be a mixed blessing.
Service Advertising
Increasing the number and types of brokers poses some problems for providers. Ideally, providers would like to be listed with all brokers that would take them based on some business criteria, for example, no cost for registration. How will they know who those brokers are? Further, how will providers know to register with brokers who come online months after their service was initially registered?
Registration provides "guaranteed discovery" within a set of brokers, but it requires the effort of choosing the brokers and registering all services with them. To make Web Services scale as the number and types of brokers increase, we need yet another mechanism for discovery - a low-cost, "best effort" approach. This is what Web Services advertising is about.
To see the difference between registration and advertising, think about going to Yahoo! and registering your Web site versus bringing your site online and hoping that the Yahoo! Web crawler will sooner or later hit your site and automatically register all pages on it.
There has been no standardization yet in the area of Web Services advertising. A couple of proposals have come out. IBM created Advertisement and Discovery of Services (ADS), while Microsoft created Discovery of Services (DISCO). Both approaches rely on the Web crawling model. Both were written pre-UDDI and are therefore somewhat technically out-of-place.
Let me try to describe abstractly how advertising would work.
First, we need a way to uniquely identify a Web service. We can do this as a combination of the address of a broker who has the service information and a key for the service information within that broker's repository.
Next, we need a simple mechanism to collect information about all Web Services. We can build this as a simple document hierarchy. An example follows:
<serviceListFinally, we need a way for these service lists to be discovered. There are at least two simple mechanisms that we can use: 1. A well-known file, for example, webServiceAdvertisement.xml, in the root of our Web site.
brokerURI="http://uddi.bigbroker.com">
<service ref="ExampleService-123"/>
<service ref="ExampleService-735"/>
<serviceList href="http://example.com/list.xml"/>
</serviceList>
2. Specialized META tags in our HTML documents. These could be automatically generated by our application to point to a service list. This list could point to services related to the content of the Web page. For example, <META name="webServiceAdvertisement"
content="http://example.com/list.xml">
That's all it takes to enable basic Web Services advertising. Standard Web crawlers can be easily extended to provide Web Services discovery through advertising. The infrastructure is trivial and the scalability potential is significant.
Tying It All Together
We can now update the simple Web Services usage workflow to include several relevant operations we discussed (see Figure 3): the tasks of providers and requesters choosing a broker, providers advertising services, and brokers choosing providers based on advertisement information. Last, but not least, we can add the retry-on-failure link between the invocation and binding operations.
In addition, Figure 3 uses color to identify the basic areas of responsibility of the providers, requesters, and brokers.
Now we can continue looking in more depth at the different roles in the model. The next installment of XML in Transit will make UDDI look like "an obvious good thing," something it would have been hard to do without establishing our reasonably realistic framework for Web Services usage in the first place.
Let me end on a small personal note. Several months ago my old company - Allaire Corporation - merged with Macromedia, Inc. I was with Allaire for five years and helped them grow from the early start-up phase to an Internet powerhouse of 600 employees and hundreds of thousands of customers.
I am now the chief architect at Macromedia. It's going to be an exciting ride and you will hopefully benefit from my exposure to some new areas of XML and Web Services. The company is very committed to Web Services. In addition to evolving the server-side assets of Allaire, Macromedia is doing exciting work in the areas of development tools (the Dreamweaver and Ultradev product line) and rich clients that can process XML and access the server (Flash and Shockwave).