SOAP over binary XML

Abstract:

As one of core components of Web Service technologies, SOAP has evloved into the most widely supported messaging format and protocol for use with XML Web services. Generally SOAP is bound with Http protocol, over which the SOAP message encoded as a textual XML document is sent between client and server. XML processing can be slow and memory consuming, however, especially for scientific data. Consequently SOAP has been regarded as a poor-performance messaging protocol for scientific applications. Binary XML provides an alternative solution to use more efficient encodings of XML, thus the SOAP messages. By having SOAP utilize the binary XML encoding, we can gain the high performance of Web service with minimal sacrification of interoperability brougt by the XML and SOAP. In this paper we present a generic implemenation of SOAP message system, which supports both the textual XML and binary XML as the encoding of the SOAP message. We show that performance is comparable and even challengeable to that of commonly used practice of handling control and data separately in most of scientific applications.

Introduction

MOtivation

Binary XML and BXSA

Generic SOAP implemenation

SOAP

XQuery Data Model

The internel data model is based on the XML infoset, but has been augmented with atomic, typed values. This allows our API to represent numbers in their native, machine form, rather than as a character string. Our API is DOM-like, but more closely follows the XML infoset.

The Interface of generic SOAP implemenation

Conncepts:

Concept of SOAP binding

Associated types


Message stream type X::MessageStream the type of the stream of messages
Server connection type X::ServerConnection the connectio type on the server side
Client connection type X::ClientConnection the connection type on the client side
Server singleton type X::ServerSingleton the singleton type to represent the server instance

Valid expressions


send the SOAP request to server x.send_request(ch, env) ch is the X::ClientChannel type, env is a soap envelope which is going to be sent via the ch;
receive the SOAP response from the server SoapEnvelope x.receive_response(ch) ch is the X::ClientChannel type from which the response will be received, the return value will be the envople of the SOAP response;
receive the SOAP reqeust from the client SoapEnvelope x.receive_reqeust(ch) ch is the X::ServerChannel type from which the request will be received, the return value will be the envople of the SOAP resquest;
send the SOAP response back to client x.send_response(ch,env) ch is the X::ServerChannel type, representing the channel at server side,env is the envelope of SOAP response message;

General XML relaization

Binary XML relaization

Experiments and Results

File Size Comparaion

Figure: serialization size
\includegraphics[width=1.5\textwidth]{datasize}

The size of bxsa file and netCDF file is almost same

Invocation Performance

Client Machine is bleu;
Server machine is brick;

Two solution:

  1. BXSA: client send the request and data in one SOAP request, which is encoded to be a binary format by using BXSA. The encoded binary data is sent to the server via the XBS, a binary raw transportation protocol, to the server; Server get the reqeust, verify the request and data. If every thing is OK, server send back the SOAP response to indicate the result;

  2. Mixed solution: client generate the data and save it into a netCDF file, which can be accessed by remote machine via various protocols (like http or gridftp or ftp) then client send the request, whose only content is the URL to the netCDF file to the server, When the server get the reqeust, extract the URL then retrieve the netCDF file to local file system. Then server read the local netCDF file , verify its content. If every thing is OK, server send back the SOAP response to indicate the result;
Noet in the above mixed solution, we are using pull-based (i.e. server pulls the data from client), we also can adopt push-based, that is client pushs the data file to the server; The test shows two appraches have same performance.

When the data size is small (less then 1k doubles and integer in the data), the invocation performance comparation is

Figure: invocation performance for small binary data
\includegraphics[width=1.5\textwidth]{small}

GridFtp takes too much time compared with other two solutions, It is because its transportation layer is SSL. So to make some sense, we just compare the BXSA against the Http + SOAP mixed solution.

Figure: invocation performance for small binary data
\includegraphics[width=1.5\textwidth]{small_h_b}

From the above diagram , for mixed solution when the data size is relative small the extras File I/O is the performance killer.

When the data size is big (up to Mega doubles and integer in the data), the invocation performance comparation is

Figure: invocation performance for big binary data
\includegraphics[width=1.5\textwidth]{big}

Now for both solution, the networking transfering will dominate the performance cost. Since the data size of BXSA and netCDF are similier, BXSA solution is close to the mixed solution (either via GridFTP or via Http)

Performance Breakout

For bxsa to invoke a web service with 1M elements (the bxsa file size is around 12.58M),

In soap + http + netCDF solution, for 1M elements the netCDF file size is around 12.58M (almost same as the bxsa serialization result). The basic steps involved in the client and server are



Wei Lu 2005-10-19