perfectxml.com
 Basic Search  Advanced Search   
Topics Resources Free Library Software XML News About Us
You are here: home Free Library Wrox Press » Professional Java SOAP Saturday, 17 November 2007
Professional Java SOAPProfessional Java SOAP

This book is for all Java developers and system archictects. The book is organized in three parts: Distributed Application Protocols, Sample Application, and Web Services.

Read Chapter 2: SOAP from this book

Print this page

Buy this book!

SOAP

In the first chapter, we discussed several distributed object protocols and compared them to SOAP. Although it is not possible to cover all protocols, we now have a good idea of where SOAP stands: it is a simple XML-based protocol that supports RPC and messaging, primarily over HTTP. In this chapter, we will look at SOAP in more detail.

To broaden our understanding of SOAP and how the Apache framework implements this protocol, we will follow the organization of the SOAP specification. However, this chapter is meant to be an introduction to SOAP, not a reference. For an authoritative guide, you should always refer back to the SOAP 1.1 specification.
The SOAP 1.1 specification can be found at http://www.w3c.org/TR/SOAP/, and the salient points are contained in Appendix A.
While we are looking through the SOAP specification, we will point out features that are either partially supported or not implemented in the Apache SOAP framework. The comprehensive list of what is and is not supported can be found at http://xml.apache.org/soap/features.html. The list of known interoperability issues with the Apache SOAP framework can be found at http://xml.apache.org/soap/docs/guide/interop.html.

Before getting into the specifics of SOAP, we will first have a look at the technologies that constitute its foundation.

We will then describe the anatomy of a SOAP packet:
  • The SOAP envelope

  • The SOAP header

  • The SOAP body

  • The SOAP fault

We will primarily focus on HTTP as the transport for SOAP packets.

We will then talk more specifically about the SOAP header and the SOAP envelope this carries a request, a response, or a fault and then we will discuss each of these in detail. In the course of this discussion, we will introduce the important topic of encoding and XML metadata.

Core Technologies

One of the design goals of SOAP is to be an open technology, from both a platform and a programming language point of view. The SOAP architects decided to meet this requirement by leveraging as many existing technologies as possible rather than inventing new ones.

For you, the SOAP developer, this means that you must be familiar with a variety of web technologies before being able to understand the SOAP specification. In no particular order, those technologies are:
  • HTTP

  • XML

  • XML Namespaces

  • XML Schemas

In addition, you must also be familiar with the prerequisites that we mentioned in the introduction to this book: basic object-oriented concepts and Java. We will introduce these core web technologies in the following sections. If you are already familiar with some or all of them, feel free to skip those sections and go directly to the SOAP section.

We will first talk about the protocol closest to the bits that travel over the wire: HTTP.

HTTP

The Hypertext Transfer Protocol (HTTP) is used to transport virtually all traffic on the World-Wide Web (WWW). HTTP is a client-server model: a client submits a request to the server, which in turn sends a response.

SOAP makes extensive use of the following HTTP features: HTTP headers (including Content-Type), POST, and HTTP return codes (2nn for success, 3nn for redirection, 4nn for client errors, and 5nn for server errors). See http://www.w3.org/Protocols/Specs.html for further details.

The HTTP protocol specifies the format of the request and the response:
  • The first line

  • Zero or more header lines

  • A Carriage Return-Linefeed (CRLF) by itself

  • An optional body

The following code snippet shows an example of an HTTP request.

GET /Authors/soap.html HTTP/1.1 
Host: www.wrox.com 
Content-Type: text/html; charset=utf-8 
Content-Length: 0


The first line of the HTTP header carries the verb (more on that later), the path (URL portion after the host name), and the version of HTTP that the client understands ( if it is a request). The Content-Type defines the Multipurpose Internet Mail Extensions (MIME) type of the request. The type of the previous request is text/html, which is used for most web pages. It simply specifies that the data being transmitted is text and that the text is an HTML document.

As a SOAP developer, you will mostly deal with text/xml: the data is text and contains an XML document. As its name suggests, the Content-Length defines the number of bytes in the request. In this particular case the content length is 0, which is typical for a web page request that does not need to submit any data to the server.

The Multipurpose Internet Mail Extensions (MIME) extend the format of Internet mail to allow non-ASCII information to be transmitted in e-mail headers and messages. As usual with Internet standards, MIME is used for many more applications than its intended target for instance, you can use MIME to add non- textual information (such as JPEG pictures) to SOAP packets. See RFC 1521 at http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1521.html for more details.


If all goes well, the server response will start with HTTP/1.0 200 OK, where 1.0 is replaced by the version of HTTP that is supported by the server. This is followed by the content of the requested resource.

The following HTTP response contains a simple HTML document.

HTTP/1.1. 200 OK 
Content-Type: text/html; charset=utf-8 
Content-Length: 25 

<html>Hello World!</html>


Note that the example in the previous HTTP response contains the encoding of the document. The value of Content-Length is in bytes (after the empty CRLF). There are other header fields such as Date and Expiration.

The Unicode Transformation Format (UTF) is an algorithmic mapping from a UNICODE character to a unique sequence of bytes ( one to four bytes). There are actually seven different forms of encoding for UNICODE characters (UTF-16, UTF-32, etc.). The major advantage of UTF-8 is that it is compact and therefore conserves precious network bandwidth. See http://www.unicode.org for more information.

However, things do not always go smoothly. To handle the more difficult cases, HTTP defines ranges of return codes:
  • 1nn
    This status is informational. It is typically sent by the server to indicate some kind of status. For instance, 100 means that the server is willing to accept the request and the client may proceed with the rest of the request.

  • 2nn
    The request succeeded 200 means the request was OK.

  • 3nn
    The request has been moved. This response is accompanied with the new URL, telling the user where to get the data. This is not really an error, but an indication that the document should be retrieved from an alternate location.

  • 4nn
    The request submitted by the client is in error. For instance, 401 means that the client does not have access to the resource and 404 means that the resource does not exist (presumably, the client requested the wrong URL).

  • 5nn
    The server is in error. This is usually a sign that something went wrong on the server side. For instance, as a SOAP developer, you will typically encounter a 500 error when an uncaught exception is thrown by a service.

In the code snippet below, the response indicates that the requested URL cannot be found.

HTTP/1.1 404 Object Not Found 
Server: Microsoft-IIS/5.0 
Date: Wed, 12 Sep 2001 23:57:41 GMT 
Connection: close 
Content-Length: 3252 
Content-Type: text/html


We can also see more header entries in that figure. Their meaning is obvious, except for the Connection: close that signifies that the server explicitly requests that the client close the HTTP connection (a browser, most of the time). HTTP is a stateless protocol; unless otherwise instructed by the server, the connection is closed once the request has been satisfied. Web browsers will keep a connection open as long as they are displaying a page for efficiency reasons, since a page is usually made of multiple resources (frames, bitmaps, etc.).

Most of the time, the HTTP protocol uses TCP/IP sockets to handle the connection between the client and the server. In TCP/IP sockets, the client and the server agree on a port number to use to start the connection. Different protocols based on TCP/IP use different port numbers. The standard port for HTTP is port 80, although any port can be used, if the client and the server agree on an alternate port number.

As we discussed earlier, HTTP is used to retrieve any data (resource) from a server. The resource can be a text file as we saw earlier, but it can also be a binary file, or a remote executable. Resources are identified by Universal Resource Locators (URLs), which define not only where to get the resource, but also how to get it. A URL starts with the protocol that is used to retrieve the data. For instance, ftp:// indicates that the data can be retrieved using the File Transfer Protocol (FTP).

With HTTP, a URL typically looks like the following:

http://server-name:port-number/file-path


The port number is assumed to be 80, when it is not present. Arguments can be added to the URL if needed. For instance, the following URLs are valid for HTTP:
  • http://www.wrox.com

  • http://myserver:1234/mystuff

  • http://myserver:1234/mysservlet?value=private

To allow the client to have a meaningful dialog with the server, HTTP defines a set of methods for requesting information from a server. The principal methods are:
  • GET
    This method is typically used to retrieve a file, or trigger the execution of some code on the server. The arguments, if any, are part of the URL requested. GET is not safe when used with HTTPS (secure HTTP) since only the header and the body are encrypted, and not the URL. In other words, even if you are using HTTPS, everyone will be able to see the entire URL. If you look at the example above, this means that value=private would be in the open and therefore not very private.

  • POST
    This method is similar to GET but the arguments for the requests are included as part of the body of the request. POST is safe when used with HTTPS.

  • PUT
    This allows a client to upload a file to the server.

  • DELETE
    This allows a client to delete a file from the server. Most installations do not support PUT and DELETE because of the inherent risk in these verbs. PUT and DELETE do not play a role in SOAP.

XML

HTML documents are portable and easy to use. This ease of use was a key ingredient to the success of HTML: with a simple editor like Notepad or Vi, anybody can publish his or her own pages on the Web. The primary drawback of HTML documents is that they do not offer much in terms of content management and document structure: HTML documents do not contain semantic information, they only contain presentation information.

For instance, if one is looking for published work on Winston Churchill as opposed to published work by Winston Churchill, one is left with thousands of possible hits to sort through. You might think that a better search engine would solve the issue, but a better search engine would only try to cope with the lack of semantics in the target documents.

HTML also suffers from a lack of reusability, mostly because it does not separate data from presentation. For example, imagine that you need to show the monthly sales figures to two radically different audiences: sales representatives and company executives. Sales representatives will want the minute details of each account to see where they should make a difference. Company executives are only interested in the aggregated data to see the trend. To show different levels of details in the sales figures, you must write two completely different HTML pages. The following document is an HTML page that might be returned as a hit in a search of work published by Winston Churchill. You will notice that the result set contains only books, but the concept could be extended to other results like articles, recordings, and so on.

<HTML>
	<HEAD>
		<TITLE>Books on Winston Churchill</TITLE>
	</HEAD>
	<BODY>
		<H1>Pre 1950</H1>
		<P>The Blitz
		<P>Breaking Enigma
		<H1>Post 1950</H1>
		<P>The Making of the Iron Lady
		<P>The Churchill Doctrine
	</BODY> 
</HTML>


There is no structure (the apparent structure merely improves readability not analysis) and no semantics in the HTML document: the name Churchill is devoid of any meaning.

Another limitation of HTML is the fact that it is a rigid standard developers cannot define custom tags for specific applications, such as server-side code execution or metadata definition.

A possible solution to those issues is the eXtensible Markup Language (XML). Like HTML documents, XML documents are portable and easy to use. In addition, XML pages can share a common data structure, which leads to more reusability. An XML document containing sales figures can be used to drive the content of HTML pages for sales associates, a printed annual report, and an executive summary.

The following XML document would be returned as a result for a search about Winston Churchill.

<?xml version="1.0" encoding="UTF-8"?> 
<subjects>
	<subject name="Winston Churchill">
		<category name="Pre 1950">
			<books>  
				<book title="The Blitz"/> 
				<book title="Breaking Enigma"/>
			</books>  
		</category>  

		<category name="Post 1950">
			<books> 
				<book title="The Making of the IronLady"/>
				<book title="The Churchill Doctrine"/> 
			</books> 
		</category>  
	</subject> 
</subjects> 


Once again, we have limited the scope to <books/>, but the returned document could include elements like <dvds/>, <articles/> and so on.

In the case of our search result in XML:
  • The structure of the document is exposed through the hierarchy of nested tags. For instance, it is clear that the books inside <category name="Post 1950"/> are distinct from the books inside <category name="Pre 1950"/>.
  • It is clear that the subject matter is Winston Churchill because of the <subject name="Winston Churchill"/> element.
Like HTML, XML defines the concepts of elements and attributes.

Elements represent a unit in the data hierarchy, and can be nested within other elements and have elements nested within them. They are part of the process of imposing order upon data values. Elements are delimited by a beginning tag and an end tag, as in <category>some value<category/>. The beginning tag is <category>, the end tag is <category/>, and the value is some value. An element may be empty, and have no value as in <category/>. Elements nested in other elements are called child elements of their parent elements. For instance, in the previous example, <books> is the parent element of <book>.

Attributes are merely name-value pairs contained inside the first tag of the element. For instance, in <category name="Pre 1950"/> an attribute name within <category> is name and the attribute value is Pre 1950.

The first line of the document requires further explanation:

<?xml version="1.0" encoding="UTF-8"?>


All XML documents start with the <?xml version="1.0"?> element, possibly followed an encoding attribute. This element is a particular form of what the XML specification calls a processing instruction: it indicates that the document is an XML document. In the previous example, the processing instruction indicates that the document is an XML document encoded using UTF-8. All XML documents are encoded in UNICODE.

We need to make one last point before we move on to a more complicated example: contrary to HTML, XML is case-sensitive. So the following documents are not equivalent:

<?xml version="1.0" encoding="UTF-8"?>
<subjects>
	<subject name="Winston Churchill"/>
</subjects>


<?xml version="1.0" encoding="UTF-8"?>
<Subjects>
	<Subject name="Winston Churchill"/>
</Subjects>


By the same token, the following document is not valid XML (the </subjects> tag does not match the <Subjects> tag):

<?xml version="1.0" encoding="UTF-8"?>
<Subjects>
	<Subject name="Winston Churchill"/>
</subjects>


The XML example that we have just analyzed demonstrated that it is possible to add structure and semantics to a document by defining a hierarchy of tags. We can take this example a little further by defining a more complex hierarchy. Let's assume that we want to add more tags to the document above to further classify books based on their physical aspect. To implement this classification, we have defined a <category> tag to mean hardcover or paperback, as opposed to chronological taxonomy. Clearly, we need a way to distinguish between the two classifications, or we would wind up with something like the following (ambiguous) XML document:

<?xml version="1.0" encoding="UTF-8"?>
<subjects>  
	<subject name="Winston Churchill">
		<category name="Pre 1950">  
			<books>  
				<book title="The Blitz" size="medium"> 
					<category>hardcover</category>  
					<price currency="lira" value="10345.00"/> 
					<price currency="us dollars" value="19.99"/>  
				</book>

				<book title="Breaking Enigma" size="medium">  
					<category>hardcover</category>
					<price currency="us dollars" value="49.99"/> 
				</book>
			</books> 
		</category> 

		<category name="Post 1950">  
			<books>  
				<book title="The Making of the IronLady" size="large">
					<category>hardcover</category>
				</book>  

				<book title="The Churchill Doctrine" size="medium">  
					<category>paperback</category>
					<price currency="us dollars" value="9.99"/>
				</book> 
			</books> 
		</category>
	</subject> 
</subjects>


Lifting that ambiguity by qualifying XML tags is the purpose of XML namespaces. Note that this kind of ambiguity is more likely to happen when more than one person defines the XML document or when XML documents are merged together.

The latest XML specification can be found at http://www.w3c.org/XML/.

XML Namespaces

Before we talk specifically about XML namespaces, we need to revisit the URL that we introduced with the HTTP protocol. URLs are in fact part of a larger taxonomy: Uniform Resource Identifiers or URIs. A URI is a string (sequence of characters) that uniquely identifies a resource. As you can see in the diagram below, URIs are divided into two sub-categories: the URL that we discussed earlier and the Uniform Resource Name or URN.


If you have done any IDL or COM development, think of URNs as human-understandable UUIDs and GUIDs.


A URN is a location-independent string representing the resource: there is no claim made as to where the resource is or how to get to it because a URN is simply a unique string. The syntax allows for global uniqueness:

"urn:" <namespace-identifier> ":" <namespace-specific-string>


The namespace identifier can be anything that is unique to your organization, for example: http://www.wrox.com. The namespace-specific string can be anything you want; so long as it is made of letters, numbers, parentheses, hyphens, etc. The exact syntax of URNs is defined in RFC 2141 and this can be downloaded from http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2141.html.

In summary, use a URL to locate a resource and use a URN to identify a resource in a location- independent fashion.

What does this have to do with XML namespaces? The short answer is that to uniquely identify XML elements, you need globally unique identifiers: URIs. More specifically, namespaces are used in XML documents so that elements with identical names can coexist in the same document. This concept is not a novelty: as Java programmers, we use package names all the time to segregate the space of class and interface names. XML namespaces are not any different they allow different people to define identical tags with different semantics.

To declare a namespace explicitly, use the following syntax:

"xmlns:" <prefix> "=" <URI>


For instance xmlns:cvrs="http://www.wrox.com/covers" declares the namespace cvrs that is referenced in the XML document using the prefix cvrs as in <cvrs:category> or <Envelope cvrs:color="red">. Once again, these URLs do not necessarily represent a "live" document; they are simply a unique identifier. The namespace declaration is made on the first element that uses the namespace, after the element tag, which is somewhat counter-intuitive, as you can see in the following code snippet:

<?xml version="1.0" encoding="UTF-8"?> 
<cvrs:category xmlns:cvrs="http://www.wrox.com/lesavon/covers">
	paperback 
</cvrs:category>


Note that namespaces explicitly declared are inherited; the following XML fragments are equivalent:

<SOAP-ENV:Envelope 
	xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
	xmlns:cvrs="htttp://www.wrox.com">

	<Body>  
		<cvrs:MyStruct> 
			<value>123</value>
		</cvrs:MyStruct> 
	</Body> 

</SOAP-ENV:Envelope>


<SOAP-ENV:Envelope
	xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
	xmlns:cvrs="htttp://www.wrox.com"> 

	<SOAP-ENV:Body> 
		<cvrs:MyStruct> 
			<cvrs:value>123</cvrs:value>
		</cvrs:MyStruct>
	</SOAP-ENV:Body>

</SOAP-ENV:Envelope>


In both documents, the top-level element is <SOAP-ENV:Envelope/> where SOAP-ENV refers to the http://schemas.xmlsoap.org/soap/envelope/ namespace. In the first document, the namespace of the <Body> element is inherited from the top-level element and is therefore equivalent to <SOAP-ENV:Body>. By the same reasoning, <value> is equivalent to <cvrs:value>.

Interestingly (and potentially confusingly) enough, the following document is equivalent to the previous two examples:

<cvrs:Envelope 
	xmlns:cvrs="http://schemas.xmlsoap.org/soap/envelope/" 
	xmlns:SOAP-ENV="htttp://www.wrox.com">

	<cvrs:Body>  
		<SOAP-ENV:MyStruct> 
			<SOAP-ENV:value>123</SOAP-ENV:value>
		</SOAP-ENV:MyStruct>
	</cvrs:Body> 

</cvrs:Envelope>


This is due to the fact that the actual value of the prefix is irrelevant, only the actual URI of the namespace defines the identity of the namespace.

We can also declare a default namespace with the syntax xmlns="some-uri" (no prefix in this case). All elements that are not explicitly prefixed are assumed to be part of the default namespace.

The next XML document shows our book description using namespaces. The default namespace of the document is http://www.wrox.com. Tags like <subjects> and <titles> now belong to that namespace. The <cvrs:category> tag belongs to its own namespace, as do the <cover> element and the size attribute because of the namespace scoping rules that we discussed earlier

<?xml version="1.0" encoding="utf-8"?>
<subjects xmlns="http://www.wrox.com" 
	xmlns:cvrs="http://www.wrox.com/covers">  

	<subject name="Winston Churchill"> 
		<category name="Pre 1950"> 
			<books>
				<book title="The Blitz">  
					<cvrs:category>
						<cover size="medium">hardcover</cover> 
					</cvrs:category>  
				</book> 

				<book title="Breaking Enigma">
					<cvrs:category>  
						<cover size="medium">hardcover</cover>
					</cvrs:category> 
				</book> 
			</books> 
		</category>  

		<category name="Post 1950">  
			<books> 
				<book title="The Making of the Iron Lady">
					<cvrs:category> 
						<cover size="large">hardcover</cover>
					</cvrs:category>
				</book>  

				<book title="The Churchill Doctrine"> 
					<cvrs:category>
						<cover size="medium">paperback</cover> 
						<price>1999</price>  
					</cvrs:category>
				</book> 
			</books> 
		</category> 
	</subject>
</subjects>


If you would like more information on namespaces, the latest XML namespace specification is available at http://www.w3.org/TR/REC-xml-names/.

Now that we have seen how namespaces can be used in XML to render tag and attribute names less ambiguous by associating them with a globally unique identifier, we can turn our attention to XML schemas.

XML Schema

There are two major problems with the previous namespace-aware XML document. The first problem is within its structure. If you look carefully, you will see that the last book has a price tag while the others do not. Is that the intention of the designer or is it an omission? If the price is not listed, does it have a default value? This ambiguity is a source of confusion and potential bugs.

Another problem with the price is the unit. Since everything is text in an XML document, you do not know if that 1999 is a key for a lookup table, the price in Belgian francs (you need a few of those for a $), or the price in pennies. Solving these kinds of problems leads us on to XML schemas.
When XML was introduced it came with Document Type Definitions (DTD) that are becoming out of favor for several reasons, not least of which is that DTDs have a distinct non-XML grammar. Another problem they have is a lack of flexibility when it comes to defining complex data types.
To give you an idea of how schemas are put together, let's build an XML schema for our sample document. Actually, we will write two XML schemas.
The XML Schemas discussed in this section can be found in the ProJavaSoap/Chapter02/ directory. The schema for http://www.wrox.com/lesavon can be found in winston.xsd and the schema for http://www.wrox.com/lesavon/covers can be found in covers.xsd.
The first schema (winston.xsd) is for the namespace: http://www.wrox.com/lesavon:

<?xml version="1.0" encoding="UTF-8"?>
<!-- winston.xsd: schema for winston.xml - Henry Bequet 08/15/01 --> 
<xs:schema
	xmlns:xs="http://www.w3.org/2001/XMLSchema" 
	xmlns="http://www.wrox.com/lesavon"
	xmlns:cvrs="http://www.wrox.com/lesavon/covers" 
	targetNamespace="http://www.wrox.com/lesavon">


An XML schema is an XML document with <xs:schema/> as the root element. The default namespace for the document is http://www.wrox.com/lesavon, and this is also the target namespace for the XML schema. The xs: or xsd: prefix is usually reserved for schema definitions. The xsi: prefix is used for schema instances. The target namespace is the namespace for which the schema is defined.

We will reference elements and attributes for another namespace (http://www.wrox.com/lesavon/covers) in another schema (covers.xsd) we use the prefix cvrs for that namespace. The covers.xsd is brought into the main schema (winston.xsd) using the import statement:

	<xs:import 
			namespace="http://www.wrox.com/lesavon/covers"
			schemaLocation="covers.xsd"/>


Our top-level element is <subjects> and contains elements tagged subject.

		<!-- the tag subjects is a collection of subject tags; 
			the order is not important -->
			<xs:element name="subjects">


An element can be of complex type or of simple type. A simple element may not contain other elements or attributes. Complex elements may contain elements and attributes. To declare a complex element, you use the complexType declaration as we did for <subjects>.

				<xs:complexType>


The <xsd:all> tag indicates that the subject tags can appear in any order. The use of the attribute ref in XML schemas allows the reuse of definitions. It also helps readability by preventing schemas from becoming too deeply nested.

					<xs:all>
						<xs:element ref="subject"/>
					</xs:all>
			</xs:complexType>
	</xs:element>


The <subject> element is declared as a reference to the following element:

			<!-- the tag subject is a list of category tags;
				the order is important -->
				<xs:element name="subject">
					<xs:complexType>


The <xsd:sequence> indicates that the subelements must appear in a specific order.

						<xs:sequence>


The declaration expresses that the type <subject> contains as many <category> elements as necessary: its cardinality starts at zero and is unbounded.

The <xsd:sequence> indicates that the subelements must appear in a specific order.

							<xs:element ref="category" maxOccurs="unbounded"/>
						</xs:sequence>


The xs:attribute defines the attribute of the <subject> tag (name). The name attribute is mandatory in the target document because of the use="required" declaration. The definition of the attribute also defines its type: xs:string, which stands for "a string as defined in the XML schema referenced by the http://www.w3c.org/1999/XMLSchema namespace", or a UNICODE string

Other types like integer, float, and double are also supported. (For instance, if you look in the covers.xsd file, you will see that a price is a decimal.)

						<xs:attribute name="name" type="xs:string" use="required"/>
					</xs:complexType> 
				</xs:element>  
		
				<!-- a category has a name and contain one books tag --> 
				<xs:element name="category">
					<xs:complexType>  
						<!-- a category contains one and only one books tag --> 
						<xs:sequence>
							<xs:element ref="books" minOccurs="1" maxOccurs="1"/> 
						</xs:sequence> 
					
						<!-- the name of a category is a string and is required --> 
						<xs:attribute name="name" type="xs:string" use="required"/>
					</xs:complexType>
				</xs:element> 

				<!-- the tag books is a list of book tags;the order is important -->  
				<xs:element name="books">
					<xs:complexType>  
						<xs:sequence>  
							<xs:element ref="book" maxOccurs="unbounded"/>  
						</xs:sequence>
					</xs:complexType> 
				</xs:element>  

				<!-- the tag book has a title; the title is a string and is required -->
				<xs:element name="book">
					<xs:complexType> 
						<xs:sequence>

The external references to the file are easily handled using namespaces:

							<xs:element ref="cvrs:category" minOccurs="1" maxOccurs="1"/>


We can qualify elements with a namespace as well. The remainder of the XML schema should contain no surprises at this point.


				<xs:element ref="cvrs:price" minOccurs="0" maxOccurs="unbounded"/> 
			</xs:sequence>  
			<xs:attribute name="title" type="xs:string" use="required"/>  
			<xs:attribute ref="cvrs:size" use="required"/>
		</xs:complexType>
	</xs:element> 
</xs:schema>


The second schema (covers.xsd) is for the namespace http://www.wrox.com/lesavon/covers.

<?xml version="1.0" encoding="UTF-8"?>
<!-- covers.xds: schema for winston.xml (covers ns) - Henry Bequet 08/15/01 --> 
<xs:schema 
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns="http://www.wrox.com/lesavon/covers"
	xmlns:cvrs="http://www.wrox.com/lesavon/covers"
	targetNamespace="http://www.wrox.com/lesavon/covers">

		<xs:attribute name="size" type="xs:string"/>
		<xs:element name="category" type="xs:string"/>
		<xs:element name="price">  
			<xs:complexType> 
				<xs:attribute name="currency" type="xs:string"/> 
				<xs:attribute name="value" type="xs:decimal"/>  
			</xs:complexType>
		</xs:element> 
</xs:schema>


The modified version of the Winston Churchill document that uses these XML schemas (winston.xml) is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<!-- winston.xml: sample xml with schema - Henry Bequet 08/15/01 -->
<subjectsxmlns="http://www.wrox.com/lesavon"  
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:cvrs="http://www.wrox.com/lesavon/covers"
	xsi:schemaLocation="http://www.wrox.com/lesavon winston.xsd
		http://www.wrox.com/lesavon/covers covers.xsd">
	<subject name="Winston Churchill">
		<category name="Pre 1950">
			<books>
				<book title="The Blitz" cvrs:size="medium">
					<cvrs:category>Hardcover</cvrs:category>
					<cvrs:price currency="lira" value="10345.00"/>
					<cvrs:price currency="us dollars" value="19.99"/> 
				</book>  
				<book title="Breaking Enigma" cvrs:size="medium">
					<cvrs:category>Hardcover</cvrs:category>  
					<cvrs:price currency="us dollars" value="49.99"/>
				</book> 
			</books>
		 </category>  

		<category name="Post 1950">
			<books>
				<book title="The Making of the IronLady" cvrs:size="large"> 
					<cvrs:category>Hardcover</cvrs:category>  
				</book>
				<book title="The Churchill Doctrine" cvrs:size="medium"> 
					<cvrs:category>Paperback</cvrs:category>  
					<cvrs:price currency="us dollars" value="9.99"/>  
				</book>
			</books>  
		</category>
	</subject> 
</subjects>


In order to test that our XML document is valid, we need an XML schema-aware parser. For instance, we can use Xerces 1.4.3, which we download from Apache (see below). The code samples that we use in this chapter are included as XML files (.xml) and XML schema definition files (.xsd) to give you a practical example to try. You can find these files in the code download, in the folder ProJavaSoap/Chapter02/.

SOAP 2.2 requires a JAXP-compatible and namespace-aware XML parser like Xerces 1.1.2 or later, however the samples for this chapter require Xerces 1.4.3. Before parsing the samples, make sure that xerces.jar and xercesSamples.jar are at the beginning of your classpath. You can download the 1.4.3 version of Xerces from http://xml.apache.org/dist/xerces-j/:



The file we are interested in for the Windows platform is Xerces-J-bin-1.4.3.zip. Download and expand it into a directory structure using a ZIP file utility like WinZip (http://www.winzip.com/). Your installation directory should look like the following screenshot:



The download and install procedure is similar for LINUX, but you should download the Xerces-J-bin- 1.4.3.tar.gz file instead of Xerces-J-bin-1.4.3.zip. To uncompress and explode the tar file, simply use the following commands:
	$ gzip d Xerces-J-bin-1.4.3.tar.gz 
	$ tar -xf Xerces-J-bin-1.4.3.tar

The result will be a directory tree named xerces-1_4_3. We will show how to run the samples on Windows for the remainder of this chapter, but these instructions are easily translatable to LINUX.

To validate the samples, we can use the DOMWriter class that comes with the Xerces samples. The DOMWriter class reads, parses, and prints the input document. Copy the code download for this chapter on to your own hard drive, and run the following:

C:\ProJavaSoap\Chapter02>set CLASSPATH=.;C:\xerces-1_4_3\xerces.jar;C:\xerces-1_4_3\xercesSamples.jar

C:\ProJavaSoap\Chapter02>java dom.DOMWriter -f winstonNoSchema.xml 
	winstonNoSchema.xml: 
	<?xml version="1.0" encoding="UTF-8"?>
	<subjects>  
		<subject name="Winston Churchill"> 
			<category name="Pre 1950">
				<books>  
					<book size="medium" title="The Blitz">
						<category>Hardcover</category>
						<price currency="lira" value="10345.00"></price> 
						<price currency="us dollars" value="19.99"></price>
					</book>  
					<book size="medium" title="Breaking Enigma"> 
						<category>Hardcover</category>  
						<price currency="us dollars" value="49.99"></price>
					</book> 
				</books>
			</category>

			<category name="Post 1950">
				<books>  
					<book size="large" title="The Making of the IronLady">
						<category>Hardcover</category>  
					</book> 
					<book size="medium" title="The Churchill Doctrine">  
						<category>Paperback</category> 
						<price currency="us dollars" value="9.99"></price>
					</book>
				</books>  
			</category>  
		</subject> 
	</subjects>
	
C:\ProJavaSoap\Chapter02>

And the following command can be used to parse and validate the winston.xml file:

C:\ProJavaSoap\Chapter02>java dom.DOMWriter -n -v -s -f winston.xml
	winston.xml: 
<?xml version="1.0" encoding="UTF-8"?> 
<subjects 
	xmlns="http://www.wrox.com/lesavon" 
	xmlns:cvrs="http://www.wrox.com/lesavon/covers" 
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xsi:schemaLocation="http://www.wrox.com/lesavon winston.xsd 
		http://www.wrox.com/lesavon/covers covers.xsd">

	<subject name="Winston Churchill">
		<category name="Pre 1950">   
			<books>      
				<book cvrs:size="medium" title="The Blitz">   
					<cvrs:category>Hardcover</cvrs:category>  
					<cvrs:price currency="lira" value="10345.00"></cvrs:price>
					<cvrs:price currency="us dollars" value="19.99"></cvrs:price> 
				</book>      
				<book cvrs:size="medium" title="Breaking Enigma">     
					<cvrs:category>Hardcover</cvrs:category>     
					<cvrs:price currency="us dollars" value="49.99"></cvrs:price>
				</book> 
			</books> 
		</category> 

		<category name="Post 1950">  
			<books>  
				<book cvrs:size="large" title="The Making of the IronLady">   
					<cvrs:category>Hardcover</cvrs:category>  
				</book>       
				<book cvrs:size="medium" title="The Churchill Doctrine">   
					<cvrs:category>Paperback</cvrs:category>   
					<cvrs:price currency="us dollars" value="9.99"></cvrs:price>
				</book>
			</books>  
		</category> 
	</subject>
</subjects> 

C:\ProJavaSoap\Chapter02>


Note that if you try to validate the winstonNoSchema.xml file, you will get errors. We would expect this, since it does not include a schema:

	C:\ProJavaSoap\Chapter02>java dom.DOMWriter -n -v -s -f winstonNoSchema.xml
		winstonNoSchema.xml: 
		[Error] winstonNoSchema.xml:3:11: Element type "subjects" must be declared. 
		[Error] winstonNoSchema.xml:4:37: Element type "subject" must be declared. 
		[Error] winstonNoSchema.xml:5:31: Element type "category" must be declared. 
		[Error] winstonNoSchema.xml:6:14: Element type "books" must be declared. 
		[Error] winstonNoSchema.xml:7:47: Element type "book" must be declared. 
		[Error] winstonNoSchema.xml:8:21: Element type "category" must be declared. 
		[Error] winstonNoSchema.xml:9:52: Element type "price" must be declared.
		. . .

This short introduction to XML schemas does not intend to be an exhaustive coverage of the topic, since the XML schema specification amounts to hundreds of pages. The point of this short discussion is to give you enough information to be able to understand the encoding of a SOAP packet.
If you would like more information about XML Schemas, the latest XML Schema specification can be downloaded from http://www.w3.org/XML/Schema.
We will see in the next section that the main use of XML schemas in SOAP is for serialization: without XML schemas (or their equivalent), we would have no means of exchanging metadata between the client and the server. Without this common understanding of metadata, interoperability will never be a reality.

Now that we have a better understanding of the technologies used by SOAP, we can have a closer look at the SOAP specification itself. We will start with a high-level view of the SOAP packet.

SOAP

As we saw in Chapter 1, SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. To achieve the goal of exchanging information, SOAP uses XML to encode its payload.

Let's look at an example that we will refine as we dig deeper into the specification:



Aside from the HTTP-specific data, the XML document above contains three parts specified by XML elements (the meaning of each part will be explained further in subsequent sections):
  • Envelope: <SOAP-ENV:Envelope>
    The SOAP envelope is analogous to a snail mail envelope, but without the address which is the responsibility of the transport and included in the HTTP header. The envelope specifies global settings such as the encoding.

  • Header: <SOAP-ENV:Header>
    The header is optional. If it is present, it contains header entries that define SOAP settings, such as the ultimate destination of a message (more on that later) and application-specific settings (the transaction identifier, for instance).

  • Body: <SOAP-ENV:Body>
    The body must be present and must follow the header, if any. The body contains a message, an RPC call, or a fault.
Unfortunately, software components sometimes fail, hence the necessity of having a standard placeholder for exceptional situations like a version mismatch, or a badly formed request. The SOAP fault (<SOAP-ENV:Fault>) is the placeholder for bad news.

Before we dig deeper into the specifications of the SOAP protocol, there are two points that are usually misunderstood about SOAP and that are worth clarifying:
  1. SOAP is a one-way protocol.
    SOAP does not require a response message to be sent as a response to the "request" message. Despite the fact that the SOAP specification does not explicitly define how to use SOAP with a messaging protocol, such as the Simple Mail Transport Protocol (SMTP), the architects of SOAP have that feature in mind. The Apache SOAP implementation supports SMTP as a transport.

  2. SOAP is not (exclusively) an RPC protocol.
    SOAP is usually used to trigger a remote method call, but nothing in the protocol specification forces a SOAP server to invoke a method as a result of a SOAP message being received. We will see an example of this functionality in the Messaging section, where we will describe a document-based protocol as opposed to a procedure-based protocol.
Looking at the SOAP 1.1 specification, we can see that Section 7 (Using SOAP for RPC) describes how to use SOAP for RPC. Roughly, SOAP serializes a function call as an XML structure containing:
  • A method name

  • An optional method signature

  • A list of arguments

  • An optional header>
The URI of the target object is not part of the serialized data: it is the responsibility of the transport protocol to carry the URI.

HTTP Bindings

Although it is possible to carry a SOAP payload using several HTTP methods, and even other protocols than HTTP, the SOAP 1.1 specification only defines HTTP bindings for HTTP POST requests. The definition of the SOAP bindings affects three areas of HTTP: the HTTP header, the HTTP response, and the HTTP extension framework.
In an HTTP POST, the request (SOAP or other) is part of the body of the document, as opposed to an HTTP GET where the request is part of the URL.

SOAPAction HTTP Header

The main goal of the SOAPAction HTTP header field is to provide a way for servers to quickly filter SOAP requests. An example of this situation is a firewall that needs to quickly filter out requests for unauthorized services. The value of the SOAPAction header must be specified by the client; however, its format is very loose as it is defined to be a URI in RFC 2396.

It is not a good idea to "stuff" the SOAPAction with complicated URIs that would promptly be rejected or simply not understood by a firewall designed to deal with vanilla HTTP requests. In particular, it is important to ensure that we do not have CRLF in the SOAPAction. This would terminate the HTTP header prematurely and give you strange errors concerning the trailing part of your SOAPAction, which would not be a valid XML document.

The following values for a SOAPAction are legal:
  • SOAPAction: "http://www.wrox.com/leavon/ncrouter/Orders#getOrderList"

  • SOAPAction: "OrderService.java"

  • SOAPAction: "http://wwww.wroxpress.com"

  • SOAPAction: ""

  • SOAPAction:
An empty string value ("") means that the intent of the request is provided by the HTTP request URI, and no value means that there is no indication of the intent of the SOAP request

The intent of a SOAP request can also be indicated by the first child of the <Body> element as shown in the following example, which comes from a call to the getOrderObject() method:

<?xml version='1.0' encoding='UTF-8'?> 
<SOAP-ENV:Envelope 
	xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
	xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" 
	xmlns:xsd="http://www.w3.org/1999/XMLSchema"> 

	<SOAP-ENV:Body>    
		<ns1:getOrderObject 
			xmlns:ns1="urn:order"     
			SOAP-ENV:encodingStyle="http://xml.apache.org/xml-soap/literalxml"> 
			<!-- Arguments have been omitted -->    
		</ns1:getOrder>   
	</SOAP-ENV:Body> 
</SOAP-ENV:Envelope>

Since there is nothing in the SOAP specification requiring the SOAPAction to be in sync with the reality of the request, it might be risky to rely on it for security purposes, unless your server-side implementation forces the SOAPAction and the first child of the <SOAP-ENV:Body> tag to match. The Apache SOAP implementation does not perform any validation of the SOAPAction header entry.

SOAP 1.0 supports the mandatory SOAPMethodName that must match the first element of the <SOAP-ENV:Body>. Since SOAP 1.0 implementations must reject a SOAP request that does not have a matching value for SOAPMethodName and the first child of the <Body> element, this can be used to reliably filter calls at the firewall level. This feature was dropped in favor of HTTP Extensions, which can be used to achieve the same purpose, as we will see in the section on HTTP Extension Framework later.

Another potential use of SOAPAction is to route the SOAP request to the appropriate web service. The Apache SOAP framework does not use the SOAPAction for routing; it uses the namespace of the first child of the body element to route the SOAP request (the "trading-uri" in our example).

Specifically, the Apache SOAP framework will set the SOAPAction based on the second argument you pass to Call.invoke(). The statement call.invoke(url, "this is the SOAPAction") will generate the following SOAP request:

POST /realtimequotes/ncrouter HTTP/1.0 
Host: localhost 
Content-Type: text/xml; charset=utf-8 
Content-Length: 454 
SOAPAction: "this is the SOAPAction" 
. . .

We will have a closer look at the API of the Apache SOAP framework in Chapter 3, when we write the HelloWorld sample.

If the target server is running Apache SOAP, then the request will be processed normally: the Apache SOAP router ignores the SOAPAction.

Note: this could lead to interoperability issues with a SOAP client that submits a SOAP request with the SOAPAction set to the target URL and no namespace on the first child of the <SOAP-ENV:Body> element.

HTTP Response

The HTTP binding for SOAP follows the semantics of the HTTP standard when it comes to status codes. In particular, a 2nn status code means that the request has been processed successfully. For instance, an HTTP return code of 200 means OK. In case of an error occurring while processing the request, the server must return a 500 status code (Internal Server Error) and include a SOAP fault as part of the response. Ideally, a SOAP server should always return a text/xml content type, no matter what the error. In practice, this is not true, since most SOAP implementations (including Apache SOAP) are deployed as part of a web site. This means that when the SOAP request does not make it to the SOAP server implementation, we will more than likely get an HTTP error with a text/html content type, as opposed to text/xml.

For instance, if we submit a SOAP request to the wrong URL, on a servlet engine such as Tomcat, it will respond with the following error:

HTTP/1.0 404 Not Found 
Content-Type: text/html 
Content-Length: 201 
Servlet-Engine: Tomcat Web Server/3.2.3 (JSP 1.1; Servlet 2.2; Java 1.3.0; Windows 
2000 5.0 x86; java.vendor=Sun Microsystems Inc.) 
<head><title>Not Found (404)</title></head> 
<body><h1>Not Found (404)</h1> 
<b>Original request:</b> /lesavon/the-missing-router<br><br> 
<b>Not found request:</b> /lesavon/the-missing-router</body>
Unless the application server that hosts the SOAP server is customized to return SOAP errors, it will return error messages intended for a browser. In other words, your client application must be prepared to deal with an error encoded in text/html rather than text/xml.

The HTTP Extension Framework

We have seen earlier that the SOAPAction HTTP header could be used to give a description of the intent of the SOAP call to a firewall. One drawback of the SOAPAction HTTP header is that it is not an HTTP header (as defined by the HTTP protocol) and there is a possibility of conflict with some other extension. Whenever you are working with XML and you encounter an ambiguity, namespaces are normally never far away.

The HTTP Extension Framework defines a mechanism to dynamically extend the functionality of HTTP clients and servers using namespaces. The extensions are dynamic because they do not need to be registered with some standard organization, but in practice, client and server components know about the extensions prior to runtime. The HTTP Extension Framework goes far beyond the reach of SOAP, since it can be used to extend the HTTP protocol for transmitting any type of data, not only SOAP payloads.
The HTTP Extension Framework can be downloaded from http://www.w3.org/Protocols/HTTP/ietf-http-ext
This is how it works:
  1. Software designers agree on an extension and assign the extension a globally unique URI. For SOAP, we will use http://schemas.xmlsoap.org/soap/envelope.

  2. A client or a server that implements that extension declares its use via the URI in the HTTP header. The declaration of the URI and its associated namespace is done according to the HTTP Extension Framework (see below).

  3. The HTTP application can implement the desired behavior without any risk of conflict.
Let's look at an (extended) HTTP request, to see how HTTP extensions work in practice:

M-POST /realtimequotes/ncrouter/ HTTP/1.0 
Man: "http://schemas.xmlsoap.org/soap/envelope"; ns=144 
Host: localhost Content-Type: text/xml; charset="utf-8" 
Content-Length: 454 
144-SOAPAction: http://stockquoteserver/realtimequotes#getLastTradePrice 
<SOAP-ENV:Envelope xmlns. . .    
	. . .  
</SOAP-ENV:Envelope>


The first difference between the previous HTTP request and what we have seen earlier is the HTTP verb: M-POST rather than POST. The M-POST defines a mandatory HTTP request. An HTTP request is called a mandatory request if it includes at least one mandatory extension declaration. All HTTP verbs are supported (M-POST, M-GET, etc.). The example above contains the mandatory extension declaration:

	Man: "http://schemas.xmlsoap.org/soap/envelope"; ns=144
In other words, once the Man: header is present, the remainder of the HTTP header must contain at least one extension declaration for the specified URI. In our case, the mandatory extension declaration is the following header element:

	144-SOAPAction: http://stockquoteserver/realtimequotes#getLastTradePrice
As you have probably noticed, SOAPAction is prefixed by 144; the prefix for the URI specified in the Man: header element.

The HTTP Extension Framework also defines optional headers with the Opt: header element, but they are not used as part of SOAP.

All this discussion would be pointless, if it were not for the fact that the server has the right to force clients to follow an extended protocol. Let's assume that the client submits a SOAP request with a plain HTTP POST. The server can then return a 501 HTTP error code (Not Extended) to notify the requester that it must submit a mandatory HTTP request, in other words, a request with M-POST and a Man: header element. If the server still doesn't like the request, for instance because the Man: header element is not for the http://schemas.xmlsoap.org/soap/envelope URI, then the server can reject the request again.

SOAP Envelope

The SOAP envelope is the top-level element of the XML document representing the message. The example shown in the following XML document uses UTF-8 encoding, but we can also use another encoding, such as UTF-16.



The SOAP envelope must be present and it must be called Envelope (not envelope). The previous example contains three namespace declarations:
  • xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
    This namespace is used for SOAP itself: all elements and attributes defined in the SOAP specification (Envelope, Body, mustUnderstand) are part of this namespace.

  • xmlns:xsd="http://www.w3.org/1999/XMLSchema"
    This namespace is used to reference elements and attributes from the master schema at the W3 website. For instance, to reference a float type, we will use <xsd:float>.

  • xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
    This namespace exists to reference elements and attributes from an instance of the xsd:schema.
The namespace declarations are optional, so the following <Envelope> is a valid SOAP request:

<Envelope>
	<Body/>
	<!-- an empty body is valid, but the body must be present --> 
</Envelope>

The encodingStyle attribute indicates the serialization rules used in the SOAP message. It is a global attribute scoped to all the children of the envelope. Note that children of the SOAP envelope can explicitly override this setting. We will see a practical use of this scoped encoding when we talk about the serialization of XML documents. The serialization rules as defined in Section 5 of the SOAP specification are identified by the URI http://schemas.xmlsoap.org/soap/encoding/. The absence of the encodingStyle attribute indicates that there are no restrictions on the default encoding used in the SOAP message. Once again, this global setting can be overridden by child elements.

Envelope Versioning Model

SOAP does not define a traditional versioning model like 1.0, 1.1, etc. SOAP relies on namespaces to define versions. The envelope of a SOAP message must reference the "http://schemas.xmlsoap.org/soap/envelope/" namespace. If a SOAP server receives a request that references another namespace, it must reject the request with a <VersionMismatch> element in a <Fault> element (see below).

SOAP Header

Despite the fact that the Apache SOAP implementation does not use the SOAP header, it is worth mentioning. The main motivation behind the SOAP header is to provide a mechanism to extend the protocol. For instance, you could add state management or transactional support to SOAP headers. As you can see below, the header is the first child element of the envelope element:



The header is optional. Elements for the SOAP headers must be qualified by a namespace. The mustUnderstand attribute requires a little explanation. The purpose of the mustUnderstand attribute is to support robust evolution of APIs. Imagine that (as the example above suggests) we decide to use SOAP headers to implement a transaction management mechanism. If a SOAP server received the message and decided to ignore the <transaction> element, consequences could be disastrous. To prevent this kind of unwanted behavior, header entries flagged with the mustUnderstand="1" attribute cannot be ignored (mustUnderstand can be 0 (false) or 1 (true)):

. . . 
<SOAP-ENV:Header>   
	<transaction 
		xmlns="http://www.wrox.com/lesavon" 
		SOAP-ENV:mustUnderstand="1">

		<id>124567890-124567890-124567890</id>   
	</transaction> 
</SOAP-ENV:Header>
. . .

When the mustUnderstand attribute is set on unknown header entries, servers must reject the request with a SOAP fault code set to "Must Understand", as we will see in the later section dealing with SOAP faults.

The Apache SOAP implementation does not support the mustUnderstand attribute.

SOAP Body

The body of a SOAP message is where most of the information is usually located. The SOAP body is meant to contain mandatory information for the ultimate recipient of the message. In this book, we are mostly interested in the RPC calls being serialized in the SOAP body, but we will also see some error messages in the SOAP Fault section.



The diagram above shows a simple example of a SOAP body. As you can see, the <Body> element is namespace-qualified, although this is optional. The SOAP body in the example above contains one entry called a <Body> entry. RPC calls usually contain one <Body> entry: the serialized method call. This statement will become clearer in the next section when we talk about encoding, but an RPC call is serialized similarly to a structure, where the field names are in fact argument names.

The content of the SOAP body is left up to the application with the exception of the SOAP fault that we discussed earlier, the only <Body> entry defined by SOAP. However, the body must obey certain rules:
  • The <Body> element must be encoded as an immediate child of the <Envelope> element. If a header is present, then the <Body> element must immediately follow the header. If no header is present, then the <Body> element must be the first direct child of the SOAP envelope.

  • The SOAP body must be encoded using XML. The SOAP encoding rules (see the section about Encoding) may be followed and the encodingStyle attribute may optionally be used to indicate which encoding rules are being used. The SOAP encoding rules are identified by the namespace http://schema.xmlsoap.org/soap/encoding.

    Other possible forms of encoding supported by the Apache SOAP framework include literal XML and XMI-based serialization. Literal XML is plain XML, XML that does not correspond to the encoding of an RPC call. We would typically use literal XML if the argument to a method were an XML document. XMI is the XML Metadata Interchange standard supported by the OMG. More information can be found at http://www.omg.org. In short, XMI is roughly equivalent to XML schemas. We will not use XMI in this book due to its limited acceptance, but it is worth being aware of it.

  • The <Body> entries are serialized as independent elements, however for RPC we should only have one <Body> entry: the serialized method call.

SOAP Fault

The following diagram shows that the SOAP <Fault> element appears as a <Body> entry.



The SOAP Fault is optional, but there can only be one <Fault> element per <Body> element. The SOAP <Fault> element contains up to four elements:
  • <faultcode/>
    The <faultcode/> element allows the client to algorithmically identify the source of the fault. As you can see from the following table, this mechanism is similar to what we saw for HTTP when it comes to differentiating the source of the error. However, rather than using numerical values, SOAP uses XML qualified names. The version mismatch and mustUnderstand fault codes are particular to SOAP.

    NameMeaning
    Version MismatchThe namespace for the SOAP envelope is not "http://schemas.xmlsoap.org/soap/envelope/".
    mustUnderstandOne of the header entries was not understood by the receiving party and the mustUnderstand flag was set to "1" (true).
    ClientThe message received was incorrectly formed. A typical example of this would be a lack of serialization information.
    ServerThe receiving web service could not process the message. This situation would arise if the application were to throw an exception. The detail section of the fault might contain the stack of exception.


  • <faultstring/>
    Once again, the strong affinity of SOAP to HTTP shows up in the <faultstring/> element, since it is similar to the reason phrase that we saw earlier. Note that the <faultstring/> must be present. The <faultstring/> is mostly intended for human readers as opposed to the <faultcode/>, which is intended for algorithmic use.

  • <faultactor/>
    The <faultactor/> element names the service that generated the fault. The NullPointerException in the example below was thrown inside the ncrouter of the lesavon context. The <faultactor/> is more meaningful in a situation where there is more than one recipient for the SOAP message. We will look at an example of this when we talk about the SOAPActor in the Messaging section.

  • <detail/>
    The <detail> element is meant to carry application-specific information. It must be present if the application could not process the <Body> element. In other words, its presence or absence can be used to test if the error was related to processing the body. For instance, the fault example with an unknown service does not contain a <detail> element, as the request never made it to the service (in this case because the service was not registered).


Note that the Apache SOAP implementation will not generate a <detail> element if you do not include a fault listener in your deployment descriptor. We will look more closely at deployment descriptors in the next chapter, when we write the HelloWorld sample. Instead of the fault shown in the previous example, you would get the following:

<SOAP-ENV:Fault>
	<faultcode>SOAP-ENV:Server</faultcode> 
	<faultstring>Exception from service object: null</faultstring> 
	<faultactor>/realtimequotes/ncrouter</faultactor>
</SOAP-ENV:Fault>

In addition to these four elements, a SOAP fault may contain additional namespace-qualified elements.

Next, we will discuss the rules that SOAP defines to encode its payloads.

Encoding

The encoding section of the SOAP 1.1 specification represents more than half of the volume of the document. This is mostly attributable to the fact that, at the time of the release of SOAP 1.1, XML schemas were still in a state of flux. Had XML schemas been a true standard, the SOAP specification would have shrunk to of a fraction of its size. Unfortunately, the XML schema specification is still not complete, and arguably needs a few more features to fully support SOAP encoding.

Basic Encoding Rules

As we mentioned previously, the encoding rules of SOAP are very similar to what you would expect to find in other programming languages and database systems. SOAP uses XML to encode its payload: however, SOAP defines a more restrictive set of rules for encoding than XML.
The specification for the data types supported by XML Schemas can be found at http://www.w3.org/TR/xmlschema-2/.
The schema used for SOAP encoding can be found at http://schemas.xmlsoap.org/soap/encoding/. To define the specific rules of encoding for your data, you can either reference an external schema or use the xsi:type mechanism. Apache SOAP use the latter, as you can see in this example that encodes a string and an integer:

<?xml version='1.0' encoding='UTF-8'?> 
<SOAP-ENV:Envelope 
	xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
	xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
	xmlns:xsd="http://www.w3.org/1999/XMLSchema"> 
	<SOAP-ENV:Body>  

		<ns1:getOrder 
			xmlns:ns1="urn:order"
			SOAP-ENV:encodingStyle="http://xml.apache.org/xml-soap/literalxml"> 

			<arg0 
				xsi:type="xsd:string"     
				SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">atase</arg0> 
			<arg1
				xsi:type="xsd:int"     
				SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">123456010</arg1>
		</ns1:getOrder>  
	</SOAP-ENV:Body> 
</SOAP-ENV:Envelope>

The advantage of using the xsi:type mechanism is that the SOAP document is self-describing, in both its structure and in the values of its data types. The disadvantage is that there is no schema describing the SOAP messages.

The rules governing serializations are strict. The complete rules are available in the specification, but here are the essential ones:
  • All values are represented as element content.
    In other words, attributes may not be used to represent values or pass arguments to remote methods. For instance, the ticker value that we saw earlier may not be represented as <ticker value="sunw"/>.

  • A simple value is represented as character data.
    In particular, a simple value may not contain elements, since this would make it a complex value. In addition, strings may be represented as character data.

  • A compound value is encoded as a sequence of elements.
    A compound value is a structure simply encoded as an unordered list of elements. For instance, the complex number (1, 2i) could be encoded with its real and imaginary numbers:
    <arg0 xsi:type="complex">   
    	<real xsi:type="xsd:double">1</real>   
    	<img xsi:type="xsd:double">2</img> 
    </arg0>
  • Each field of a compound element is distinguished using a role name "accessor", which is the name of the field.
    The accessor is the XML element used to access the data. For instance, the accessor of the first argument in the previous example is <arg0/>.

  • Arrays are compound values of type SOAP:Array.
    SOAP arrays have one or more dimensions. The elements of an array are accessed using an ordinal position or accessor. SOAP arrays may be heterogeneous arrays. The serialization rules for SOAP arrays support sparse arrays as well as partially transmitted arrays, but Apache SOAP does not support that specific serialization.
The SOAP specification allows for a role name accessor and an ordinal accessor to be used at the same time. However, in this book, we will only work with ordinal accessors to arrays (natural integers like 1, 2, 3), which is typically what you will encounter in RPC development as in array[0] or array[144].

The following table contains a few examples so you can see the implications of those rules. For each example, we have included the Java definition of the data along with the serialized version as produced by the Apache SOAP framework. As we mentioned earlier, this serialization is not the only valid serialization for SOAP.

Java DeclarationSOAP Serialization
int index = 144;
<index xsi:type="xsd:int">144</index>
String str = "1234";
<str xsi:type="xsd:string">1234</str>
Class Cl1 { int index;   
float fl1;   
... 
};
<item   
	xmlns:ns3="urn:your-service-urn"   
	xsi:type="ns3:Cl1">   
		<index xsi:type="xsd:int">1</index>
		<fl1 xsi:type="xsd:float">1.0</fl1>
</item>
int ints[] = new { 1, 2, 3 };
<ns2:Array   
	xmlns:ns2= "http://schemas.xmlsoap.org/soap/encoding/"   
	xsi:type="ns2:Array"   
	ns2:arrayType="xsd:int[3]">   
	
		<item xsi:type="xsd:int">1</item>   
		<item xsi:type="xsd:int">2</item>   
		<item xsi:type="xsd:int">3</item> 
</ns2:Array>
String strArray = { "first", 
"second", "third"};
<ns2:Array   
	xmlns:ns2= "http://schemas.xmlsoap.org/soap/encoding/"   
	xsi:type="ns2:Array"   
	ns2:arrayType="xsd:string[3]">   
		<item xsi:type="xsd:string">first</item>
		<item xsi:type="xsd:string">second</item>
		<item xsi:type="xsd:string">third</item>
</ns2:Array>

With the serialization rules mentioned so far, we should have enough information to understand the SOAP messages that we will work with in this book. For completeness, we will briefly mention another important feature of SOAP encoding: multiple references.

Multi-References

SOAP was designed as a wire-level protocol, and as such it needs to pay special attention to the amount of data that is transmitted in its messages. For instance, if you are sending an array of strings to the server and several of those strings are identical, you would be duplicating the strings with the serialization rules that we have mentioned so far:

<ns2:Array   
	xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"   
	xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"   
	xsi:type="ns2:Array"   
	ns2:arrayType="xsd:string[3]">   
	
	<item xsi:type="xsd:string">This is a long string to transmit on the wire</item>   
	<item xsi:type="xsd:string">This is a long string to transmit on the wire</item>
	<item xsi:type="xsd:string">This is a long string to transmit on the wire</item>   
	<item xsi:type="xsd:string">This is a long string to transmit on the wire</item> 
</ns2:Array>

This particular example does not use a particularly long string, but our application could. This is where the use of multi-reference accessors allows for a more compact representation:

<ns2:Array   
	xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"   
	xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"   
	xsi:type="ns2:Array"   
	ns2:arrayType="xsd:string[3]">   

		<item soap:href="#id1"/>   
		<item soap:href="#id1"/>   
		<item soap:href="#id1"/>   
		<item soap:href="#id1"/>   
		<item xsi:type="xsd:string" soap:id="id1">This is a long string to transmit on the wire</item> 
</ns2:Array>

More importantly, multi-reference accessors allow you to preserve the semantics of a method call (or of any data type) from the client to the server. Consider the following case:

String s1 = "My first string"; 
myMethod(s1, s1);

Without multi-reference, the call would be serialized as follows:

<ns1:myMethod  
	xmlns:ns1="urn:my-service"   
	SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">   
	
		<firstString xsi:type="xsd:string">My First String</firstString>
		<secondString xsi:type="xsd:string">My First String</secondString>
</ns1:myMethod>

With that information, it is impossible for the server-side doing the un-marshaling to reconstitute the intent of the call. However, if the call is serialized using multi-references, the semantics of the call can be preserved, because the SOAP payload carries the information that the first and second argument of the call are in fact identical:

<ns1:myMethod   
	xmlns:ns1="urn:my-service"   
	SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">   

	<firstString xsi:type="xsd:string" soap:href="#id1"/> 
	<secondString xsi:type="xsd:string" soap:id="id1">My First String</secondString>
</ns1:myMethod>

When working with the Apache SOAP framework, you must remember that it does not support multi- referencing. Therefore, you need to be careful when defining your method signatures.

Enumerations

The Java language does not support an enumerated type. However, many other programming languages do, including XML schema. SOAP defines its enumerated type in the same way as XML schemas do: an enumerated type or enumeration is a list of distinct values compatible with the base type of the enumeration. For instance, if you wanted to represent a Color enumerated type, you could base your representation on the type string as shown in the following example:

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope 
	xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
	xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" 
	xmlns:xsd="http://www.w3.org/1999/XMLSchema">
	
	<SOAP-ENV:Body>   
		<ns1:getRGB 
			xmlns:ns1="urn:rgb-converter"    
			SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">    
			<arg0 xsi:type="xsd:string">red</arg0>
		</ns1:getOrderList>
	</SOAP-ENV:Body> 
</SOAP-ENV:Envelope>

You could also base your representation on the type int, as demonstrated in the following SOAP request:

<?xml version='1.0' encoding='UTF-8'?>
<SOAP-ENV:Envelope 
	xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" 
	xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" 
	xmlns:xsd="http://www.w3.org/1999/XMLSchema">
	
	<SOAP-ENV:Body>   
		<ns1:getRGB 
			xmlns:ns1="urn:rgb-converter"    
			SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">    
			<arg0 xsi:type="xsd:int">0</arg0>
		</ns1:getOrderList>
	</SOAP-ENV:Body> 
</SOAP-ENV:Envelope>

As you can see in the previous examples, an enumerated type is simply serialized according to its base type.

Default Values

SOAP does not define a clear semantic for elements that do not have an accessor. For instance, if the argument of a method is omitted and no overloaded implementation is available, what should the receiving end do? The specifics are left to the implementation, since the SOAP specification merely suggests resolution through the use of Booleans and numeric values. In practice, you will maximize your chances of interoperability by not relying on default values.

Since we are on the subject of overloading methods, it is worth mentioning that SOAP will allow you to overload a method simply by changing its return value. However, this kind of overloading (since it is not legal in Java) might also hamper your ability to work with other services.

Messaging

When we discussed distributed application protocols in Chapter 1, we introduced the concept of document-based (also known as message-based) services, as opposed to the procedure-based services that we have been talking about in the previous sections.

As we said in Chapter 1, the fundamental difference between the two systems is not architectural, since it is based on the payload exchanged between computers: message-based systems do not encode a procedure call, they simply carry a document or a data structure. SOAP is designed to support message-based services as well as procedure-based services. Similarly to procedure-based services, message-based services can either be one way, or be based on a request-response protocol. We will review a one-way message-based service shortly.

So, when should we use a message service as opposed to an RPC service? A key advantage of a message service is its flexibility. Consider the following scenario. You have a customer database, and you want to periodically check and possibly update the quality of your data. If your database already has an XML import/export capability, you can export your customer addresses into a large XML document and send it to an address standardization provider.

Standardizing an address is the process of putting a mailing address in a canonical form that your local post office will deliver. For instance in the US, 123 Pearl Street Colorado Boulder would transformed into 123 Pearl St, Boulder Co 80301. Standardization of addresses is used in data quality applications.


As a response, you get back a modified document with the standardized addresses, which you can now import back into your database for updates. In an RPC system, you have to convert the customer addresses into one or more method calls, submit the request, and then do the reverse operation when you receive the response. You can see that with messaging you can leverage existing import/export infrastructure that you already have in place.

Another example of the usefulness of message systems is in setting up SOAP services to audit SOAP messages as they pass through. The following diagram shows just such an arrangement the first recipient of the SOAP request is the audit system, which passes the request to its final destination as specified by the actor attribute.



Most messaging protocols are asynchronous, so you might be tempted to list that characteristic as an advantage of messaging over RPC. However, as we have seen, SOAP is fundamentally a one-way protocol that supports asynchronous connections for RPC packets and document-based messages.

The audit system is called a SOAP intermediary, an application that is capable of both receiving and forwarding SOAP messages. There might be cases where you need to give specific directives to a SOAP intermediary. For instance, imagine that you have been given a promotional key and you want to tell the billing system that you are entitled to preferential treatment. Such a requirement could be handled with a SOAP actor.

A SOAP actor is a global attribute that can be used to indicate the intended recipient of a header element. Like any global identifier in XML, a SOAP actor is a URI. When a SOAP intermediary receives a header element intended for its use, it must not forward it to the next recipient of the message. (You do not want to dispense a double discount, do y ou?) In the previous diagram, the <h1/> header entry is morphed into the <h2/> header entry to notify the ultimate recipient of the message that the audit has been performed successfully.

The header elements that we saw previously while introducing SOAP actors did not have a SOAP actor attribute, and as such were intended for the ultimate destination of the SOAP message. This use of header elements shows you another example of how the header is used to extend the protocol. Note that the Apache SOAP framework does not support SOAP actors.

Because SOAP serializes method calls similarly to structures, the distinction between an RPC service and a message service is largely academic until you actually get to the implementation of your service. When it comes to RPC services, the Apache SOAP implementation relies on the service providing a remote method to call. All you need to do to implement a procedure-based service is to provide a Java method that implements the functionality. For instance, let's imagine that you want to expose a method through SOAP using the Apache SOAP framework. Specifically, to expose a method that returns "Hello World!", we would only need to write the following code:

package com.lesavon.service; 
public class HelloWorldService {     
	public String getMessage() {         
		return "Hello World!";     
	} 
}
We will discuss this idea in more detail in Chapter 3, but the important point here is that you can develop your RPC service using only Java without knowing anything about SOAP or the Apache SOAP framework. When it comes to the development of a message service, you have to provide a callback method containing low-level (in other words, implementation-dependent) objects. The signature for the callback method in Apache SOAP is:

void name(SOAPEnvelope request-envelope, 
	SOAPContext request-context, 
	SOAPContext response-context);

where SOAPEnvelope and SOAPContext are classes of the Apache SOAP implementation. In the case of RPC services, the Apache SOAP implementation is able to hide the details of SOAP development.

Where Are the Objects?

So where are the objects in the Simple Object Access Protocol? Surely, you cannot say that you have a distributed object-oriented protocol if you cannot hold a reference to a remote object! In particular, how can a protocol be called object-oriented if it does not support the fundamental characteristics of an object, characteristics that are apparent in most object-oriented languages from SmallTalk, to C++/C#, to Java?

The main features of objects are:
  • Instance Methods: the ability to automatically associate data to a set of functions, also known as methods in that concept

  • Polymorphism: the ability to define the behavior of an object at runtime

  • Data Encapsulation: the ability to hide the inner data of an object from the users of the object
The polymorphic accessors of SOAP are closer to method overloading than they are to method overriding. They are not polymorphic in the object-oriented sense.
In short, an object has instance data and polymorphic behavior.

The Apache SOAP implementation allows us to map SOAP calls to (static) class methods. We will discuss the lifetime of remote services when we introduce deployment descriptors.

The only "object" calls that we can make using SOAP are calls to static methods, since we cannot specifically call a remote object without a remote reference to that object. We could argue that the serialization of objects is specified in SOAP, but the data that travels inside a SOAP packet could just as easily represent a structure. There is no guarantee that an object is on the other side of the wire.

In other words, we can use SOAP to implement distributed object-oriented systems, but a purely function-based implementation is perfectly compatible with the specifications of SOAP.

Summary

In this chapter, we looked at the SOAP specification in more detail. We saw that the specification addresses three major pieces of functionality:
  • The SOAP envelope, which along with the header and the body defines a framework for exchanging messages in a distributed architecture

  • The encoding rules, which define an XML-based serialization mechanism that can be used for interoperability

  • The SOAP RPC mechanism, that uses structure-like serialization rules for method call encoding

In addition, SOAP defines HTTP as its preferred transport without precluding the use of other transports, such as SMTP. The use of HTTP is a key ingredient to the acceptance of SOAP by security- minded network administrators.

For the most part, SOAP is true to its goal of simplicity. Arguably the 'S' in SOAP gets lost in Section 5 where the SOAP specification tackles encoding, since it takes more than half of the overall document. However, this complexity should disappear as the XML Schema specification is accepted as standard.

What the SOAP specification does not specify is even more remarkable:
  • SOAP defines no proprietary protocol to carry its payload.

  • SOAP defines no lifetime and no garbage collection rules.

  • SOAP defines no remote activation.

  • SOAP defines no object-by-reference mechanism.

  • SOAP defines no batching process.

  • SOAP defines no programming language bindings.

  • SOAP defines no compatibility to the legacy systems that we discussed in Chapter 1 (CORBA, DCOM, amongst others).

Historically, these features are the toughest to implement and the hardest to get accepted across platforms.

In this chapter, we looked at the features of the Apache SOAP framework that we will be using in the remainder of the book. We also discussed some potential interoperability issues, such as the use of the SOAPAction header element.

Hopefully, this chapter gave you the urge to start developing web services using SOAP. In the next chapter, we will set up our SOAP development environment. We will also write the unavoidable HelloWorld! program, only this time with a SOAP flavor.
  Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise ©2004 perfectxml.com. All rights reserved. | Privacy