TOC |
|
By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 27, 2008.
Copyright © The IETF Trust (2007).
1.
Introduction
2.
Design Goals
3.
Hessian Grammar
4.
Serialization
4.1.
binary data
4.1.1.
Compact: short binary
4.1.2.
Binary Examples
4.2.
boolean
4.2.1.
Boolean Examples
4.3.
date
4.3.1.
Date Examples
4.4.
double
4.4.1.
Compact: double zero
4.4.2.
Compact: double one
4.4.3.
Compact: double octet
4.4.4.
Compact: double short
4.4.5.
Compact: double float
4.4.6.
Double Examples
4.5.
int
4.5.1.
Compact: single octet integers
4.5.2.
Compact: two octet integers
4.5.3.
Compact: three octet integers
4.5.4.
Integer Examples
4.6.
list
4.6.1.
Compact: repeated list
4.6.2.
List examples
4.7.
long
4.7.1.
Compact: single octet longs
4.7.2.
Compact: two octet longs
4.7.3.
Compact: three octet longs
4.7.4.
Compact: four octet longs
4.7.5.
Long Examples
4.8.
map
4.8.1.
Map examples
4.9.
null
4.10.
object
4.10.1.
Compact: class definition
4.10.2.
Compact: object instantiation
4.10.3.
Object examples
4.11.
ref
4.11.1.
Compact: two octet reference
4.11.2.
Compact: three octet reference
4.11.3.
Ref Examples
4.12.
string
4.12.1.
Compact: short strings
4.12.2.
String Examples
4.13.
type
4.14.
Compact: type references
5.
Reference Maps
5.1.
value reference
5.2.
class reference
5.3.
type reference
6.
Bytecode map
§
Authors' Addresses
§
Intellectual Property and Copyright Statements
TOC |
Hessian is a dynamically-typed, binary serialization and Web Services protocol designed for object-oriented transmission.
TOC |
Hessian is dynamically-typed, compact, and portable across languages.
The Hessian protocol has the following design goals:
TOC |
Serialization Grammar
# starting production top ::= value # 8-bit binary data split into 64k chunks binary ::= 'b' b1 b0 <binary-data> binary # non-final chunk ::= 'B' b1 b0 <binary-data> # final chunk ::= [x20-x2f] <binary-data> # binary data of # length 0-15 # boolean true/false boolean ::= 'T' ::= 'F' # definition for an object (compact map) class-def ::= 'O' type int string* # time in UTC encoded as 64-bit long milliseconds since # epoch date ::= 'd' b7 b6 b5 b4 b3 b2 b1 b0 # 64-bit IEEE double double ::= 'D' b7 b6 b5 b4 b3 b2 b1 b0 ::= x67 # 0.0 ::= x68 # 1.0 ::= x69 b0 # byte cast to double # (-128.0 to 127.0) ::= x6a b1 b0 # short cast to double ::= x6b b3 b2 b1 b0 # 32-bit float cast to double # 32-bit signed integer int ::= 'I' b3 b2 b1 b0 ::= [x80-xbf] # -x10 to x3f ::= [xc0-xcf] b0 # -x800 to x7ff ::= [xd0-xd7] b1 b0 # -x40000 to x3ffff # list/vector length length ::= 'l' b3 b2 b1 b0 ::= x6e int # list/vector list ::= 'V' type? length? value* 'z' ::= 'v' int int value* # type-ref, length # 64-bit signed long integer long ::= 'L' b7 b6 b5 b4 b3 b2 b1 b0 ::= [xd8-xef] # -x08 to x0f ::= [xf0-xff] b0 # -x800 to x7ff ::= [x38-x3f] b1 b0 # -x40000 to x3ffff ::= x77 b3 b2 b1 b0 # 32-bit integer cast to long # map/object map ::= 'M' type? (value value)* 'z' # key, value map pairs # null value null ::= 'N' # Object instance object ::= 'o' int value* # value reference (e.g. circular trees and graphs) ref ::= 'R' b3 b2 b1 b0 # reference to nth map/list/object in # stream ::= x4a b0 # reference to 1-255th map/list/object ::= x4b b1 b0 # reference to 1-65535th map/list/object # UTF-8 encoded character string split into 64k chunks string ::= 's' b1 b0 <utf8-data> string # non-final chunk ::= 'S' b1 b0 <utf8-data> # string of length # 0-65535 ::= [x00-x1f] <utf8-data> # string of length # 0-31 # map/list types for OO languages type ::= 't' b1 b0 <type-string> # type name ::= x75 int # type reference # main production value ::= null ::= binary ::= boolean ::= date ::= double ::= int ::= list ::= long ::= map ::= class-def value ::= ref ::= string
Figure 1 |
TOC |
Hessian's object serialization has 8 primitive types:
It has 3 recursive types:
Finally, it has one special contruct:
Hessian 2.0 has 3 internal reference maps:
TOC |
Binary Grammar
binary ::= b b1 b0 <binary-data> binary ::= B b1 b0 <binary-data> ::= [x20-x2f] <binary-data>
Figure 2 |
Binary data is encoded in chunks. The octet x42 ('B') encodes the final chunk and x62 ('b') represents any non-final chunk. Each chunk has a 16-bit length value.
len = 256 * b1 + b0
TOC |
Binary data with length less than 15 may be encoded by a single octet length [x20-x2f].
len = code - 0x20
TOC |
x20 # zero-length binary data x23 x01 x02 x03 # 3 octet data B x10 x00 .... # 4k final chunk of data b x04 x00 .... # 1k non-final chunk of data
Figure 3 |
TOC |
Boolean Grammar
boolean ::= T ::= F
Figure 4 |
The octet 'F' represents false and the octet T represents true.
TOC |
T # true F # false
Figure 5 |
TOC |
Date Grammar
date ::= d b7 b6 b5 b4 b3 b2 b1 b0
Figure 6 |
Date represented by a 64-bit long of milliseconds since the Jan 1 1970 00:00H, UTC.
TOC |
d x00 x00 x00 xd0 x4b x92 x84 xb8 # 2:51:31 May 8, 1998 UTC
Figure 7 |
TOC |
Double Grammar
double ::= D b7 b6 b5 b4 b3 b2 b1 b0 ::= x67 ::= x68 ::= x69 b0 ::= x6a b1 b0 ::= x6b b3 b2 b1 b0
Figure 8 |
A 64-bit IEEE floating pointer number.
TOC |
The double 0.0 can be represented by the octet x67
TOC |
The double 1.0 can be represented by the octet x68
TOC |
Doubles between -128.0 and 127.0 with no fractional component can be represented in two octets by casting the byte value to a double.
value = (double) b0
TOC |
Doubles between -32768.0 and 32767.0 with no fractional component can be represented in three octets by casting the short value to a double.
value = (double) (256 * b1 + b0)
TOC |
Doubles which are equivalent to their 32-bit float representation can be represented as the 4-octet float and then cast to double.
TOC |
x67 # 0.0 x68 # 1.0 x69 x00 # 0.0 x69 x80 # -128.0 x69 xff # 127.0 x70 x00 x00 # 0.0 x70 x80 x00 # -32768.0 x70 xff xff # 32767.0 D x40 x28 x80 x00 x00 x00 x00 x00 # 12.25
Figure 9 |
TOC |
Integer Grammar
int ::= 'I' b3 b2 b1 b0 ::= [x80-xbf] ::= [xc0-xcf] b0 ::= [xd0-xd7] b1 b0
Figure 10 |
A 32-bit signed integer. An integer is represented by the octet x49 ('I') followed by the 4 octets of the integer in big-endian order.
value = (b3 << 24) + (b2 << 16) + (b1 << 8) + b0;
TOC |
Integers between -16 and 47 can be encoded by a single octet in the range x80 to xbf.
value = code - 0x90
TOC |
Integers between -2048 and 2047 can be encoded in two octets with the leading byte in the range xc0 to xcf.
value = ((code - 0xc8) << 8) + b0;
TOC |
Integers between -262144 and 262143 can be encoded in three bytes with the leading byte in the range xd0 to xd7.
value = ((code - 0xd4) << 16) + (b1 << 8) + b0;
TOC |
x90 # 0 x80 # -16 xbf # 47 xc8 x00 # 0 xc0 x00 # -2048 xc7 x00 # -256 xcf xff # 2047 xd4 x00 x00 # 0 xd0 x00 x00 # -262144 xd7 xff xff # 262143 I x00 x00 x00 x00 # 0 I x00 x00 x01 x2c # 300
Figure 11 |
TOC |
List Grammar
list ::= V type? length? value* z ::= v int int value*
Figure 12 |
An ordered list, like an array. All lists have a type string, a length, a list of values, and a trailing octet x7a ('z'). The type string may be an arbitrary UTF-8 string understood by the service. The length may be omitted to indicate that the list is variable length.
Each list item is added to the reference list to handle shared and circular elements. See the ref element.
Any parser expecting a list must also accept a null or a shared ref.
The valid values of type are not specified in this document and may depend on the specific application. For example, a server implemented in a language with static typing which exposes an Hessian interface can use the type information to instantiate the specific array type. On the other hand, a server written in a dynamicly-typed language would likely ignore the contents of type entirely and create a generic array.
TOC |
Hessian 2.0 allows a compact form of the list for successive lists of the same type where the length is known beforehand. The type and length are encoded by integers, where the type is a reference to an earlier specified type.
TOC |
Serialization of a typed int array: int[] = {0, 1}
V t x00 x04 [int # encoding of int[] type x6e x02 # length = 2 x90 # integer 0 x91 # integer 1 z
Figure 13 |
Anonymous variable-length list = {0, "foobar"}
V t x00 x04 [int # encoding of int[] type x6e x02 # length = 2 x90 # integer 0 x91 # integer 1 z
Figure 14 |
Repeated list type
V t x00 x04 [int # type for int[] (save as type #1) x63 x02 # length 2 x90 # integer 0 x91 # integer 1 z v x91 # type reference to int[] (integer #1) x92 # length 2 x92 # integer 2 x93 # integer 3
Figure 15 |
TOC |
Long Grammar
long ::= L b7 b6 b5 b4 b3 b2 b1 b0 ::= [xd8-xef] ::= [xf0-xff] b0 ::= [x38-x3f] b1 b0 ::= x77 b3 b2 b1 b0
Figure 16 |
A 64-bit signed integer. An long is represented by the octet x4c ('L' ) followed by the 8-bytes of the integer in big-endian order.
TOC |
Longs between -8 and 15 are represented by a single octet in the range xd8 to xef.
value = (code - 0xe0)
TOC |
Longs between -2048 and 2047 are encoded in two octets with the leading byte in the range xf0 to xff.
value = ((code - 0xf8) << 8) + b0
TOC |
Longs between -262144 and 262143 are encoded in three octets with the leading byte in the range x38 to x3f.
value = ((code - 0x3c) << 16) + (b1 << 8) + b0
TOC |
Longs between which fit into 32-bits are encoded in five octets with the leading byte x77.
value = (b3 << 24) + (b2 << 16) + (b1 << 8) + b0
TOC |
xe0 # 0 xd8 # -8 xef # 15 xf8 x00 # 0 xf0 x00 # -2048 xf7 x00 # -256 xff xff # 2047 x3c x00 x00 # 0 x38 x00 x00 # -262144 x3f xff xff # 262143 x77 x00 x00 x00 x00 # 0 x77 x00 x00 x01 x2c # 300 L x00 x00 x00 x00 x00 x00 x01 x2c # 300
Figure 17 |
TOC |
Map Grammar
map ::= M type? (value value)* z
Figure 18 |
Represents serialized maps and can represent objects. The type element describes the type of the map.
The type may be empty, i.e. a zero length. The parser is responsible for choosing a type if one is not specified. For objects, unrecognized keys will be ignored.
Each map is added to the reference list. Any time the parser expects a map, it must also be able to support a null or a ref.
The type is chosen by the service.
TOC |
A sparse array
map = new HashMap(); map.put(new Integer(1), "fee"); map.put(new Integer(16), "fie"); map.put(new Integer(256), "foe"); --- M x91 # 1 x03 fee # "fee" xa0 # 16 x03 fie # "fie" xb9 x00 # 256 x03 foe # "foe" z
Figure 19 |
Map Representation of a Java Object
public class Car implements Serializable { String color = "aquamarine"; String model = "Beetle"; int mileage = 65536; } --- M t x00 x13 com.caucho.test.Car # type x05 color # color field x0a aquamarine x05 model # model field x06 Beetle x07 mileage # mileage field I x00 x01 x00 x00 z
Figure 20 |
TOC |
Null Grammar
null ::= N
Figure 21 |
Null represents a null pointer.
The octet 'N' represents the null value.
TOC |
Object Grammar
object ::= 'o' int value* class-def ::= 'O' type int string*
Figure 22 |
TOC |
Hessian 2.0 has a compact object form where the field names are only serialized once. Following objects only need to serialize their values.
The object definition includes a mandatory type string, the number of fields, and the field names. The object definition is stored in the object definition map and will be referenced by object instances with an integer reference.
TOC |
Hessian 2.0 has a compact object form where the field names are only serialized once. Following objects only need to serialize their values.
The object instantiation creates a new object based on a previous definition. The integer value refers to the object definition.
TOC |
Object serialization
class Car { String color; String model; } out.writeObject(new Car("red", "corvette")); out.writeObject(new Car("green", "civic")); --- O # object definition (#0) t x00 x0b example.Car # type is example.Car x92 # two fields x05 color # color field name x05 model # model field name o x90 # object definition #0 x03 red # color field value x08 corvette # model field value o x90 # object definition #0 x05 green # color field value x05 civic # model field value
Figure 23 |
enum Color { RED, GREEN, BLUE, } out.writeObject(Color.RED); out.writeObject(Color.GREEN); out.writeObject(Color.BLUE); out.writeObject(Color.GREEN); --- O # object definition #0 t x00 x0b example.Color # type is example.Color x91 # one field x04 name # enumeration field is "name" o # object #0 x90 # object definition ref #0 x03 RED # RED value o # object #1 x90 # object definition ref #0 x05 GREEN # GREEN value o # object #2 x90 # object definition ref #0 x04 BLUE # BLUE value x4a x01 # object ref #1, i.e. Color.GREEN
Figure 24 |
TOC |
Ref Grammar
ref ::= R b3 b2 b1 b0 ::= x4a b0 ::= x4b b1 b0
Figure 25 |
An integer referring to a previous list, map, or object instance. As each list, map or object is read from the input stream, it is assigned the integer position in the stream, i.e. the first list or map is '0', the next is '1', etc. A later ref can then use the previous object. Writers MAY generate refs. Parsers MUST be able to recognize them.
ref can refer to incompletely-read items. For example, a circular linked-list will refer to the first link before the entire list has been read.
A possible implementation would add each map, list, and object to an array as it is read. The ref will return the corresponding value from the array. To support circular structures, the implementation would store the map, list or object immediately, before filling in the contents.
Each map or list is stored into an array as it is parsed. ref selects one of the stored objects. The first object is numbered '0'.
TOC |
References between 0 and 255 can be encoded by two octets
value = b0
TOC |
References between 0 and 255 can be encoded in three octets
value = (b1 << 8) + b0
TOC |
Circular list
list = new LinkedList(); list.data = 1; list.tail = list; --- O x9a LinkedList x92 x04 head x04 tail o x90 # object stores ref #0 x91 # data = 1 x4b x00 # next field refers to itself, i.e. ref #0
Figure 26 |
ref only refers to list, map and objects elements. Strings and binary data, in particular, will only share references if they're wrapped in a list or map.
TOC |
String Grammar
string ::= s b1 b0 <utf8-data> string ::= S b1 b0 <utf8-data> ::= [x00-x1f] <utf8-data>
Figure 27 |
A 16-bit unicode character string encoded in UTF-8. Strings are encoded in chunks. x53 ('S') represents the final chunk and x73 ('s') represents any non-final chunk. Each chunk has a 16-bit length value.
The length is the number of characters, which may be different than the number of bytes.
String chunks may not split surrogate pairs.
TOC |
Strings with length less than 32 may be encoded with a single octet length [x00-x1f].
value = code
TOC |
x00 # "", empty string x05 hello # "hello" x01 xc3 x83 # "\u00c3" S x00 x05 hello # "hello" in long form s x00 x07 hello, # "hello, world" split into two chunks x05 world
Figure 28 |
TOC |
Type Grammar
type ::= 't' b1 b0 <type-string> ::= x4a b0
Figure 29 |
A map (map) or list (list) MAY include a type attribute indicating the type name of the map or list for object-oriented languages.
Each type is added to the type map (type reference) for future reference.
TOC |
Repeated type strings MAY use the type map (type reference) to refer to a previously used type. The type reference is zero-based over all the types encountered during parsing.
TOC |
Hessian 2.0 has 3 internal reference maps:
The value reference map lets Hessian support arbitrary graphs, and recursive and circular data structures.
The class and type maps improve Hessian efficiency by avoiding repetition of common string data.
TOC |
Hessian supports arbitrary graphs by adding list (list), object (object), and map (map) as it encounters them in the bytecode stream.
Parsers MUST store each list, object and map in the reference map as they are encountered.
The stored objects can be used with a ref (ref) bytecode.
TOC |
Each object definition (object) is automatically added to the class-map. Parsers MUST add a class definition to the class map as each is encountered. Following object instances will refer to the defined class.
TOC |
The type (type) strings for map (map) and list (list) values are stored in a type map for reference.
Parsers MUST add a type string to the type map as each is encountered.
TOC |
Hessian is organized as a bytecode protocol. A Hessian reader is essentially a switch statement on the initial octet.
Bytecode Encoding
x00 - x1f # utf-8 string length 0-32 x20 - x2f # binary data length 0-16 x30 - x37 # reserved x38 - x3f # long from -x40000 to x3ffff x40 - x41 # reserved x42 # 8-bit binary data final chunk ('B') x43 # reserved ('C' streaming call) x44 # 64-bit IEEE encoded double ('D') x45 # reserved ('E' envelope) x46 # boolean false ('F') x47 # reserved x48 # reserved ('H' header) x49 # 32-bit signed integer ('I') x4a # reference to 1-256th map/list x4b # reference to 1-65536th map/list x4c # 64-bit signed long integer ('L') x4d # map with optional type ('M') x4e # null ('N') x4f # object definition ('O') x50 # reserved ('P' streaming message/post) x51 # reserved x52 # reference to map/list - integer ('R') x53 # utf-8 string final chunk ('S') x54 # boolean true ('T') x55 # reserved x56 # list/vector ('V') x57 - x62 # reserved x62 # 8-bit binary data non-final chunk ('b') x63 # reserved ('c' call for RPC) x64 # UTC time encoded as 64-bit long milliseconds since # epoch ('d') x65 # reserved x66 # reserved ('f' for fault for RPC) x67 # double 0.0 x68 # double 1.0 x69 # double represented as byte (-128.0 to 127.0) x6a # double represented as short (-32768.0 to 327676.0) x6b # double represented as float x6c # list/vector length ('l') x6d # reserved ('m' method for RPC call) x6e # list/vector compact length x6f # object instance ('o') x70 # reserved ('p' - message/post) x71 # reserved x72 # reserved ('r' reply for message/RPC) x73 # utf-8 string non-final chunk ('s') x74 # map/list type ('t') x75 # type-ref x76 # compact vector ('v') x77 # long encoded as 32-bit int x78 - x79 # reserved x7a # list/map terminator ('z') x7b - x7f # reserved x80 - xbf # one-octet compact int (-x10 to x3f, x90 is 0) xc0 - xcf # two-octet compact int (-x800 to x3ff) xd0 - xd7 # three-octet compact int (-x40000 to x3ffff) xd8 - xef # one-octet compact long (-x8 to x10, xe0 is 0) xf0 - xff # two-octet compact long (-x800 to x3ff, xf8 is 0)
Figure 30 |
TOC |
Scott Ferguson | |
Caucho Technology Inc. | |
P.O. Box 9001 | |
La Jolla, CA 92038 | |
USA | |
Email: | ferg@caucho.com |
Emil Ong | |
Caucho Technology Inc. | |
P.O. Box 9001 | |
La Jolla, CA 92038 | |
USA | |
Email: | emil@caucho.com |
TOC |
Copyright © The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.
Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).