| By Simeon Simeonov | Article Rating: |
|
| July 10, 2001 12:00 AM EDT | Reads: |
13,466 |
In the last installment of XML in Transit (XML-J, Vol. 2, issue 5), we established a framework for Web Services usage. The key roles in the framework are service providers, requesters, and brokers (see Figure 1). Moreover, the basic Web Services use workflow involves five steps (see Figure 2): providers enabling access to services and registering them with brokers, and requesters finding the right service to use from the broker service repository, binding their applications to the service, and finally, invoking the service directly from the provider.
We also looked at how binding information can be cached to improve performance and how retry-on-failure is a simple mechanism that enables robust service invocation. This month's column continues with some thoughts on creating a scalable Web Services infrastructure.
Web Services Scalability
Scalability means different things to different people. Typically, the narrow technical definition refers to "the ability of a system to handle increasing load," while the broad business definition is used along the lines of "the ability of a system to handle an increase in some factor X." Commonly looked-at factors are usage, complexity, change, and so on.
When looking at the scalability of the framework for Web Services usage, we need to maintain a middle ground. Primarily, we're interested in what will happen when Web Services becomes a commonplace reality. An obvious consequence will be an increase in the number of service providers, requesters, and brokers. For each of these roles we'll have to look at how an increase in its numbers impacts the following:
- Related roles: How does an increase in the number of Web Services brokers impact providers and requesters?
- The volume of Web Services-related operations: Publish, register, find, bind, and invoke
- The complexity of the Web Services use workflow: Is the usage model really as simple as we are making it out to be?
The Chicken and the Egg
Countless economists have written volumes on the process of introducing new goods and services in markets. There are two key factors at play:
- The extent to which supply generates its own demand
- The extent to which demand generates its own supply
This brief digression to economics is relevant to this month's topic for a simple reason: we talk about scaling Web Services usage but how are we going to get there? Is there a manic demand for Web Services that is pushing all vendors in the space to rapidly innovate and grow their offerings? Are the vendors building complex systems in a vacuum hoping that real businesses will use them someday? Or is this just the next layer in the XML hype tower?
These are all great questions and, perhaps, the topic of a future column. If you ask me, you'll get an undoubtedly biased answer. I certainly haven't spent a year writing a column on XML protocols and Web Services thinking that they are just hype. At the same time, I strongly believe that not all is going well in the Web Services industry. The greatest offense, in my opinion, is that we are trying to add too many layers to the interoperability stack without first stabilizing some of the core foundation technologies. I'm confident, however, that we can fix most of these issues in the near future.
Web Services are a fundamentally better way to do distributed computing. I know of many businesses that want to use them. I know of many vendors that are building great products. I'm not exactly sure (that is, I have several hypotheses) of the path that the industry will take to reach a state where Web Services are a commonplace infrastructure, but I know we will get there in the next few years.
Therefore, let's not ponder right now whether supply or demand will drive adoption, or whether there will first be many providers, requesters, or brokers. Instead, let's imagine a world of a pervasive Web Services presence and think about the infrastructure that we'll need to support using Web Services in this environment.
Scaling with an Increasing
Number of Requesters
Other things being equal, increasing the number of service requesters will result in increased service usage. From our simple workflow model in Figure 2, we see that will increase the frequency of bind and invoke operations, thus impacting both brokers and providers. The key is to have scalable and reliable providers and brokers. How do we do this?
(Note: The following discussion focuses on the simpler concepts of reliability and scalability rather than on more complex quality-of-service issues, such as guarantees, differential service, and so on.)
Service providers have many different mechanisms by which to increase the scalability and reliability of their offerings. Generally, there are three types of approaches:
- Transparent: Neither requesters nor providers are affected by the employed mechanisms.
- Broker-assisted: In this model, providers will use the capabilities of service brokers to improve service quality. Service requesters will not be impacted by the changes.
- Service redesign: Services are redesigned to improve reliability and scalability. In this model, service requesters may be impacted by the change and may have to modify their applications.
- Address: The address of the endpoint that provides the service, for example, www.example.com/services/Insurance or web-service-123@example.com
- Transport: The physical protocol used to transmit Web Services messages, for example, HTTP, SMTP or FTP
- Interaction pattern: One-way messaging, request-response, and more
Transparent scalability treats the endpoint of service provision as a black box. None of the facets of the Web Services usage described above can be changed. The question is how to increase quality of service within the black box.
As an industry we know how to address this issue - this is the classic high-performance application server problem. The leading application server vendors - Macromedia, BEA, IBM, Microsoft, and Sun - have been working on it for the past several years. For those of you not familiar with the space, here's a simple description of some of the options you have at your disposal. (For convenience I assume that the scalability bottleneck is application server processing, not some back-end system.)
Using bigger, better, faster hardware is at the top of the list. This is the easiest change to implement. It involves little more than software reinstallation and configuring the new system. The risk of failure is low because the complexity of the solution is low, too.
There are four main reasons why you would not be able to solve all your problems using this approach:
- There is a limit to the scalability of a single server. Further, most applications' performance does not scale linearly with increases in the hardware power.
- High-performance hardware gets a lot more expensive (and nonlinearly so) at the high end. For example, for the price of a single 16 CPU SPARC Solaris server, you'll be able to get many more than eight dual CPU Intel servers.
- A single, very complex piece of hardware is essentially unreliable. Yes, you can raid hard disks, but you can't raid CPUs, memory, network cards, and so on.
- Any hardware or software update that requires a reboot will cause a service interruption.
There are three common system architecture approaches to clustering:
- OS-level clustering: This approach is only applicable for high-end operating systems that offer support for it. Processing happens on one machine, but there is built-in failover to backup machines.
- Software load balancing: In this very flexible approach, all requests come to a piece of software (the load-balancing dispatcher) that chooses which server to execute the request on. The choice can be made based on dynamic parameters, such as the server load, the originator of the request, and so on. Note that the dispatcher does not have to be a singleton; there are models where a dispatcher lives on all servers.
- Hardware load balancing: This type of load balancing is performed by smart hardware directly on the Web Services message route. The simplest example is round-robin load balancing - messages are dispatched to servers based on their sequential DNS name or IP address, for example, www1.example.com and www2.example.com. Hardware load-balancers are typically faster than software ones, but they don't have the same dynamic configuration capabilities.
Broker-Assisted Mechanisms
Brokers expose service binding information to requesters. Typically the binding link will simply point to a WSDL file on the provider's site. However, with a little bit of trickery, providers can use brokers' information intermediary roles to improve the reliability and scalability of their service offering.
The basic mechanism involves dispensing with the notion of WSDL as a static document and thinking of WSDL as a dynamic document controlled by the provider. In particular, the portion that the provider might want to change frequently is the address of the service. By returning different binding addresses for different WSDL document requests, the provider can implement a basic load-balancing scheme.
Reliability in the system is introduced via the retry-on-failure mechanism. When accessing a Web service, requesters will use cached binding information. If the operation fails, they go to the service broker and obtain updated binding information for that service. This model can be used to increase both the reliability and the scalability of services.
WSDL already allows multiple addresses to be provided for the same service; these can be used as backups in case of unexpected service failures. However, during scheduled updates of the back end, the top-level address can be changed to point to the server that will service requesters during the update. After retry-on-failure, requesters will bind to that server. Another retry-on-failure will switch them back to the original server after the update is completed.
Also, when a server is getting really busy and can't service requests on time, it can use retry-on-failure to bind the requester to other, less busy, servers.
Service-Redesign Mechanisms
It's hard to talk in depth about improving scalability and reliability of Web Services if we have the opportunity to completely redesign the way in which services are exposed. We can touch only the surface of what is possible. There are four key areas of focus:
- Choice of physical protocols: A provider can use faster protocols, such as pure sockets, or more reliable protocols, such as fully transactable RMI or IIOP.
- Choice of protocol binding: There are protocol bindings that are much more efficient than the text/xml MIME type. For example, XML can be represented in binary form and/or it can be compressed for delivery to low-bandwidth devices.
- Choice of interaction patterns: Synchronous processing (particularly, remote RPC) is at the root of most scalability problems in distributed systems. Redesigning applications to use asynchronous, messaging-based interactions can radically increase the ease with which applications can be scaled. A change in an interaction pattern can also involve a change in physical protocol and encoding. For example, a purchase order submission Web service can be switched from HTTP to SMTP.
- Choice of application semantics: This is the broadest category; it's practically limitless. The typical approach for layering application semantics is through the header mechanism in Web Services messages. Messages can include transaction ID headers, message correlation headers for asynchronous request-response interactions, session ID headers for maintaining shared state on the requester and provider sides, and so on.
The beauty of the Web Services usage framework is that brokers expose their functionality through Web Services - they are providers of brokerage services. Therefore, all of the discussion so far on making services providers scalable and reliable automatically applies to brokers as well.
Scaling with Increased Number
of Providers
Increasing the numbers of Web Services providers (and, therefore, the number of available services) affects brokers and requesters in very different ways. Because brokers act as intermediates in service discovery, they have to address the infrastructure scalability problems. At a basic level, requesters are just likely to see more entries in their Web Services search results.
Broker Networks
So far we've taken a simplistic view of brokers - they exist somewhere and both providers and users can work with them. In a world with many service providers, the job of being a broker becomes somewhat complicated for two reasons:
- No single broker can scale to meet the demands of all providers and requesters.
- It's unlikely (and inconvenient) that all providers register with all brokers.
UDDI uses a DNS-style data replication mechanism. Periodically, a broker will inform "neighboring" brokers of any changes to its Web Services repository. These changes are propagated through the network. This is how all brokers will have information about a service provider, even if the provider registered with only one of the brokers.
Replicating information about Web Services is a great idea. However, the synchronization lag raises a problem.
Consider the example of a provider who tries to induce retry-on-failure to temporarily point requesters to a backup server. The effort will succeed only if all requesters ask for the same WSDL document when they want to rebind. This is one of the reasons why it's much better to implement dynamic WSDL on the provider side rather than modify the WSDL link in the broker repository; if the link changes, the synchronization lag can adversely affect some requesters.
Value-Adding Intermediaries
Try this: go to Yahoo! and do a search on "SOAP." I got 14 categories, 757 sites, 9,700 shopping items, and 100 auctions. On the first couple of pages I saw not a single piece of information on Simple Object Access Protocol. My point: it's hard to find something specific if your search engine doesn't understand your domain of interest.
Let's narrow down the domain. Try this: go to Yahoo! Shopping and search for "currency converter." I found 76 products in 42 stores with prices ranging from $0.00 (a free online currency converter) to $444.95 (a cellular phone with a currency converter). My point: even if the domain is fairly narrow, it's hard to specify what you are looking for and to choose one item from the results of a search.
Now, put yourself in the shoes of a business that is looking for a currency conversion Web service. How will you find one? More important, how will you design and build systems that may automatically search for and choose such a service for you?
So far we have thought only of brokers providing technical discovery services. Technical discovery deals with three basic types of information: (1) business-level contact information, (2) categorization (type of business, geographical area, etc.), and (3) binding information (WSDL). This information is grossly inadequate to enable manual, let alone automatic, discovery of services that satisfy real-world business requirements. Several key pieces are missing:
- Rich domain information models for relevant areas, for example, industry verticals, e-commerce metainformation (customer satisfaction, delivery options, defect rates), and so on
- The mechanisms to collect, aggregate, index, and search this information
- The brokers who will expose the above services to requesters
We are likely to see marketplaces, trust providers, rating agencies, domain-specific search engines, metabrokers and many others. Just recently, I heard of four startup plans in this area. Brace yourselves for rapid innovation.
Scaing with Increased Number of Brokers
It seems that the only way to scale with the number of Web Services providers is to increase the number of service brokers and the types of brokers. This raises the question of the impact on requesters and providers of an increasing number of service brokers. I think we can answer this question by breaking it into three parts:
- How do providers choose brokers?
- How do requesters choose brokers?
- How do brokers choose providers?
Service providers and requesters choose brokers based on three main criteria: (1) the richness of their service description models, (2) the scope of their network, and (3) the power of their reach. This is not much different from current e-commerce vendors choosing which marketplaces to join and consumers choosing which marketplace to shop from.
The richer the service description model, the more detailed information about the service can be delivered automatically to requesters. This type of automaticity facilitates dynamic discovery and, hopefully, binding to the service. (To get to this level at a broad scale, we need domain-level schema standardization.)
The scope of the broker network refers to the number and type of service discovery partners the broker has. Service information will be replicated within the partner network to the best extent possible. It's unlikely that valuable domain-level information will leave the partner network; only basic technical discovery information is likely to reach the global Web Services brokers.
The final deciding criteria for service providers to go with a certain broker will be the broker's ability to reach potential service requesters. In addition, providers would consider factors such as the presence or absence of substitute services listed by the broker and any preferential treatment deals they can strike. The reverse analysis applies for requesters.
In addition to looking at different brokers' offerings, requesters and providers are likely to employ metabrokers (a special kind of service provider). Metabrokers will provide ratings and other types of useful information about service brokers to aid in the decision-making process.
One important point: despite the fact that it may seem as if providers could register with as many brokers as they want and that requesters could inquire with any broker, some brokers may impose exclusivity conditions and other mechanisms to provide value by restricting the free flow of information. It will be a mixed blessing.
Service Advertising
Increasing the number and types of brokers poses some problems for providers. Ideally, providers would like to be listed with all brokers that would take them based on some business criteria, for example, no cost for registration. How will they know who those brokers are? Further, how will providers know to register with brokers who come online months after their service was initially registered?
Registration provides "guaranteed discovery" within a set of brokers, but it requires the effort of choosing the brokers and registering all services with them. To make Web Services scale as the number and types of brokers increase, we need yet another mechanism for discovery - a low-cost, "best effort" approach. This is what Web Services advertising is about.
To see the difference between registration and advertising, think about going to Yahoo! and registering your Web site versus bringing your site online and hoping that the Yahoo! Web crawler will sooner or later hit your site and automatically register all pages on it.
There has been no standardization yet in the area of Web Services advertising. A couple of proposals have come out. IBM created Advertisement and Discovery of Services (ADS), while Microsoft created Discovery of Services (DISCO). Both approaches rely on the Web crawling model. Both were written pre-UDDI and are therefore somewhat technically out-of-place.
Let me try to describe abstractly how advertising would work.
First, we need a way to uniquely identify a Web service. We can do this as a combination of the address of a broker who has the service information and a key for the service information within that broker's repository.
Next, we need a simple mechanism to collect information about all Web Services. We can build this as a simple document hierarchy. An example follows:
<serviceListFinally, we need a way for these service lists to be discovered. There are at least two simple mechanisms that we can use: 1. A well-known file, for example, webServiceAdvertisement.xml, in the root of our Web site.
brokerURI="http://uddi.bigbroker.com">
<service ref="ExampleService-123"/>
<service ref="ExampleService-735"/>
<serviceList href="http://example.com/list.xml"/>
</serviceList>
2. Specialized META tags in our HTML documents. These could be automatically generated by our application to point to a service list. This list could point to services related to the content of the Web page. For example, <META name="webServiceAdvertisement"
content="http://example.com/list.xml">
That's all it takes to enable basic Web Services advertising. Standard Web crawlers can be easily extended to provide Web Services discovery through advertising. The infrastructure is trivial and the scalability potential is significant.
Tying It All Together
We can now update the simple Web Services usage workflow to include several relevant operations we discussed (see Figure 3): the tasks of providers and requesters choosing a broker, providers advertising services, and brokers choosing providers based on advertisement information. Last, but not least, we can add the retry-on-failure link between the invocation and binding operations.
In addition, Figure 3 uses color to identify the basic areas of responsibility of the providers, requesters, and brokers.
Now we can continue looking in more depth at the different roles in the model. The next installment of XML in Transit will make UDDI look like "an obvious good thing," something it would have been hard to do without establishing our reasonably realistic framework for Web Services usage in the first place.
Let me end on a small personal note. Several months ago my old company - Allaire Corporation - merged with Macromedia, Inc. I was with Allaire for five years and helped them grow from the early start-up phase to an Internet powerhouse of 600 employees and hundreds of thousands of customers.
I am now the chief architect at Macromedia. It's going to be an exciting ride and you will hopefully benefit from my exposure to some new areas of XML and Web Services. The company is very committed to Web Services. In addition to evolving the server-side assets of Allaire, Macromedia is doing exciting work in the areas of development tools (the Dreamweaver and Ultradev product line) and rich clients that can process XML and access the server (Flash and Shockwave).
Published July 10, 2001 Reads 13,466
Copyright © 2001 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Simeon Simeonov
Simeon Simeonov is a technology partner of Polaris Ventures and invests primarily in Internet, mobile and enterprise technologies. Prior to joining Polaris, Sim was vice president of emerging technologies and chief architect at Macromedia (now Adobe). Earlier, Sim was a founding member and chief architect at Allaire which went from a tiny startup to become one of New England's most successful IPOs. Sim's expertise covers the gamut from strategy definition and positioning to R&D execution to go-to-market and alliances development. He has played a key role in eight v1.0 product initiatives and eight M&A and spinout transactions. Sim's innovation and leadership have brought about category-defining products with significant market impact: the first Web application server (ColdFusion), a pre-cursor to Web services and AJAX (WDDX), the best open-source Web services engine (Apache Axis) and the first rich Internet application platform (Flash/Flex). Sim has a track record of partnering with entrepreneurs prior to company creation.
- The Next Programming Models, RIAs and Composite Applications
- How Can Metcalfe's Law Be Updated for Web 2.0?
- "E-Commerce 2.0" – The Velvet Revolution
- E-Commerce 2.0
- Introduction To UDDI
- Integration Is the Killer App
- SOAP Part 2
- Integration Matters
- XML in Transit: Encoding Data
- Deeper into UDDI
- Can You Play the Standards Game?
- Private UDDI Registries






























