<!-- ======================================================================== AnonymousData DTD: Author: John Tigue <jtigue@datachannel.com> Last edited: 1998.4.5 Copyright (c) 1998 DataChannel Inc. http://www.datachannel.com --> <!-- ======================================================================== History: Release 1998.3.19 (John Tigue) first relatively clean pass Release 1998.3.21 (JFT) more clean up Release 1998.3.25 (JFT) Release 1998.3.26 (JFT) changed "size" to "length" to more closely reflect with prevelant array terminology Release 1998.3.31 (JFT) 1998.3.30: comment clean up 1998.3.31: satisfied XML processor Release NEXT 1998.4.5: added primitive data type entities --> <!-- ======================================================================== Introductory Note: This work might best be thought of as standardized HTTP tunnelling. These declarations allow for the expression of sequences of typed data in XML 1.0 documents using the data-types defined in XML-Data: http://www.w3.org/TR/1998/NOTE-XML-data-0105 Although XML-Data's data types are used, its schema facilities are not yet used. XML 1.0 DTD facilities are used as much as possible. The only thing not achievable in vanilla XML 1.0 is data typing. (Note that some firewall administrators do not like the concept of HTTP tunnelling. There is an effort to design a new HTTP method. Just like GET, POST, PUT et alia, this new method would be something like RPC. Products will still be needed to detect HTTP tunnelling on POSTs even if there were a new RPC method.) For maximum flexibility, these definitions allow any sequence of datatypes to be expressed. More specific DTDs would want to define more highly constrained data structures. Indeed, the DTD has only primitive data-types and arrays of such, no complex data-types. Such things would be best expressed with the help of other features from XML-Data. This work is a simple subset of XML-Data. It is essentially how to mark up a datastream as an XML document. This is really a legacy code issue. New code would not be written to this DTD but rather would use a more structured schema. There is a lot of legacy code which needs to get on the Web and this is a handy tool. For example, CORBA/IIOP and DCOM do not "name" data as it goes across the wire. Rather they simply "know" where data element boundries are supposed to occur in the byte stream. XML is character based and structurally self-describing so the boundries of the data elements can simply be represented as XML open and close tags. This is useful for variably sized data. This design also allows for quick adoption of existing systems (e.g. CORBA and DCOM) to this model. Also a simple XML 1.0 processor can be used with a trivial amount of code as its using application which does the data typing. Of course, integrating the processor and data typers is more efficient but this makes it easier to reproduce with current technology. This DTD has been designed to handle many common datatypes which are expressed over networks. The first 2 columns of the following data-type schema mapping is taken from the microsoft site with subURL: /java/sdk/20/jnative/type_mappings_between_java_and_com.htm The rest are added for clarity. Ole Auto Type Java Type WebBroker Type XML-Data Type ============= ================ ============== ============= boolean boolean boolean boolean char char char char double double double float.IEEE.754.64 int int int int long long long i8 (alias = long) float float float float.IEEE.754.32 long int int int short short short i2 unsigned char byte byte bin.hex BSTR java.lang.String string string java.net.URL URI URI (Note: the following Ole Automation types have not yet been implemented: CY DATE VARIANT SAFEARRAY The following Ole Automation types will not be expressed. Rather they will be expressed as COM+ object references not DCOM structures. DCOM structures can be expressed in a DTD on a higher level than this low level primitive datatyping DTD: IDispatch IUnknown ) These definitions allows for the above listed types and arrays of them. It could be argued that the XML-Data attribute "dt:dt" on arrays should be of the form "primitiveArray" rather than just "primitive" e.g.: <byteArray dt:dt="byteArray" length="2" > OR <byteArray dt:dt="byte" length="2" > That decision was not made. Some types may actually be null. Null is indicated by an empty element of the correct type e.g.: <string /> Arrays appear as follows: <intArray length="2"><int>432908</int><int>0</int></intArray> or <intArray /> or <intArray></intArray> The first is a normal int array, the second is null occuring where an "intArray" should be, the third is a intArray of length zero. Note the dt:dt attribute is not explicitly included because it has a default value declared in the DTD. Note that an argument could be made for having an ID or ENTREF attribute which could be used to signify &null; but that choice was not made. There are special cases: boolean: is always an empty element which always has a non-defaulted "value" attribute. "boolean"s are never null after interpretation. They may appear in the form: <boolean value="true" /> or <boolean value="true"></boolean> char: can never be null so an empty element means the it should be interpreted as 0. numerics: whole numbers or floating can never be null. An empty element idicates the it should be interpreted as 0 or 0.0 as appropriate. The length attribute is currently required on all array element types. As an aside, this boils down to the same issue as chunked streams and the HTTP Content-Length header. Declaring the length is nice (easier parser memory allocation on read side b/ length is known at start) but manditorily having to calculate it can be expensive in terms of memory for small machines; the entire array needs to be held in memory to determine the value which needs to be assinged to the length attribute in the open tag. (Perhaps this could be made optional but strongly recommended.) Having an explicit length attribute for strings may seem unnatural to SGML experts but it helps (stupid) data marshallers and reduces the amount of code which needs to be written in order to Web enable legacy code. These declarations can be referred to as an external entity using either a public ID: <!DOCTYPE data PUBLIC "-//DataChannel//DTD AnonymousData V1.0//EN" "http://xml.datachannel.com/system/dtd/AnonymousData.dtd" > or using a system id such as: <!DOCTYPE data SYSTEM "http://xml.datachannel.com/system/dtd/AnonymousData.dtd" > Note that for network and parse efficiency, all the following element and attribute names can be mapped to single character name tokens. This is not done here for the sake of human readability. The terse analog of this DTD is available at: http://xml.datachannel.com/system/dtd/TerseAnonymousData.dtd (Note: could have defined datatypes as notations a la WebSGML TC datatypes (N1958(?) at: http://www.ornl.gov/sgml/WG4/ ) or data attributes from full SGML) Thanks to Eve Maler and W. Eliot Kimber for their help. --> <!-- ======================================================================== The XML-Data namespace declaration follows. XML 1.0 processors simply pass this processing instruction on to the application and do not understand the implications. XML-Data processors use it to determine the datatype of elements. Attribute definitions of the form "dt:dt" will not be recognized as namespace prefixed by XML 1.0 processors b/ the ":" character is just another name character in XML 1.0. This little trick will be dropped in later versions of the DTD but for now others can implement WebBrokers with a XML 1.0 processor. --> <?namespace name="urn:uuid:C2F41010-65B3-11d1-A29F-00AA00C14882/" as="dt"?> <!-- ======================================================================== The next bit simply brings in the primitive datatype entities for use in data typing attributes. This way data typing can be syntactically expressed with either an XML 1.0 processor or later with an XML-Data processor. --> <!ENTITY % primitiveDataTypeEntities SYSTEM "PrimitiveDataTypes.dtd" > %primitiveDataTypeEntities; <!-- ======================================================================== 'loquaciousPrimitiveDataTypes' is an entity used simply for shorthand convenience. It is referenced in other DTDs which depend on this one. --> <!ENTITY % loquaciousPrimitiveDataTypes "boolean | booleanArray | char | charArray | double | doubleArray | int | intArray | long | longArray | float | floatArray | short | shortArray | byte | byteArray | string | stringArray" > <!-- ======================================================================== In a simple AnonymousData document (as oppossed to in combination with the ObjectMethodMessages declarations), "data" is the intended root element. It can contain any sequence of the datatyped elements. --> <!ELEMENT data ( ( %loquaciousPrimitiveDataTypes; )* ) > <!-- ======================================================================== A boolean can be true or false. e.g. <boolean value="true" dt:dt="boolean"/> or <boolean value="false" /> --> <!ELEMENT boolean EMPTY> <!ATTLIST boolean value ( true | false ) #REQUIRED dt:dt CDATA #FIXED "boolean" dataType ENTITY #FIXED "boolean" > <!-- A booleanArray is an array of booleans. This element is given the XML-Data data type of boolean to help the processor. It could be argued that the datatype is actually "array" or "booleanArray" but that decision was not taken. Same holds for all arrays in these declarations. E.g.: <booleanArray length="2" dt:dt="boolean"> <boolean value="false"/> <boolean value="true" /> </booleanArray> --> <!ELEMENT booleanArray ( boolean* ) > <!ATTLIST booleanArray length CDATA #REQUIRED dt:dt CDATA #FIXED "boolean" dataType ENTITY #FIXED "boolean" > <!-- ======================================================================== a char is any single unicode character. Perhaps ISO/IEC 10646 needs to be considered for characters which are not in Unicode. --> <!ELEMENT char (#PCDATA)> <!ATTLIST char dt:dt CDATA #FIXED "char" dataType ENTITY #FIXED "char" > <!-- For a charArray, each chararcter in the element content is an element in the array. The "length" attributes reflects the length of the array after XML entity processing, of course. --> <!ELEMENT charArray (#PCDATA)> <!ATTLIST charArray length CDATA #REQUIRED dt:dt CDATA #FIXED "char" dataType ENTITY #FIXED "char" > <!-- ======================================================================== String is pretty natural in XML but needs to be explicitly declared here. --> <!ELEMENT string (#PCDATA)> <!ATTLIST string dt:dt CDATA #FIXED "string" dataType ENTITY #FIXED "string" > <!ELEMENT stringArray (string*)> <!ATTLIST stringArray length CDATA #REQUIRED dt:dt CDATA #FIXED "string" dataType ENTITY #FIXED "string" > <!-- ======================================================================== Currently each URI must have at least a protocol. Something like relative URIs could be done by putting a BASE attribute on an URIArray? --> <!ELEMENT URI (#PCDATA)> <!ATTLIST URI dt:dt CDATA #FIXED "URI" dataType ENTITY #FIXED "URI" > <!ELEMENT URIArray (URI*)> <!ATTLIST URIArray length CDATA #REQUIRED dt:dt CDATA #FIXED "URI" dataType ENTITY #FIXED "URI" > <!-- ======================================================================== "byte"s represent 8 bits of information. The interpreted value are signed and range from -128 to 127. "byte"s are encoded in hexidecimal i.e. 0-9A-F and will always have 2 characters per byte and start with "0x" e.g. <byte>0xB5</byte> --> <!ELEMENT byte (#PCDATA)> <!ATTLIST byte dt:dt CDATA #FIXED "bin.hex" dataType ENTITY #FIXED "byte" > <!-- byte arrays are encoded in hexidecimal (but could be in base64 for efficiency). Note this is not defined as <!ELEMENT byteArray (byte*)> because too inefficient. An example: <byteArray length="3" dt:dt="byte">0x9E00F8</byteArray> --> <!ELEMENT byteArray (#PCDATA)> <!ATTLIST byteArray length CDATA #REQUIRED dt:dt CDATA #FIXED "bin.hex" dataType ENTITY #FIXED "byte" > <!-- ======================================================================== short represents 16 bits of information about a whole number. shorts are encoded in decimal. Hexidecimal and base64 were the other options. shorts range in value from -32768 to 32767 (see XML-Data spec) --> <!ELEMENT short (#PCDATA)> <!ATTLIST short dt:dt CDATA #FIXED "i2" dataType ENTITY #FIXED "short" > <!ELEMENT shortArray (short*)> <!ATTLIST shortArray length CDATA #REQUIRED dt:dt CDATA #FIXED "i2" dataType ENTITY #FIXED "short" > <!-- ======================================================================== "int" represents 32 bits of information about a whole number. ints are encoded in decimal. Hexidecimal and base64 were the other options. ints range in value from (see XML-Data spec) --> <!ELEMENT int (#PCDATA)> <!ATTLIST int dt:dt CDATA #FIXED "int" dataType ENTITY #FIXED "int" > <!ELEMENT intArray (int*)> <!ATTLIST intArray length CDATA #REQUIRED dt:dt CDATA #FIXED "int" dataType ENTITY #FIXED "int" > <!-- ======================================================================== "long" represents 64 bits of information about a whole number. longs are encoded in decimal. Hexidecimal and base64 were the other options. ints range in value from -9223372036854775808 to 9223372036854775807 (see XML-Data spec) --> <!ELEMENT long (#PCDATA)> <!ATTLIST long dt:dt CDATA #FIXED "i8" dataType ENTITY #FIXED "long" > <!ELEMENT longArray (long*)> <!ATTLIST longArray length CDATA #REQUIRED dt:dt CDATA #FIXED "i8" dataType ENTITY #FIXED "long" > <!-- ======================================================================== "float"s are IEEE 754 32-bit floating-point number (see XML-Data spec) --> <!ELEMENT float (#PCDATA)> <!ATTLIST float dt:dt CDATA #FIXED "float.IEEE.754.32" dataType ENTITY #FIXED "float" > <!ELEMENT floatArray (float*)> <!ATTLIST floatArray length CDATA #REQUIRED dt:dt CDATA #FIXED "float.IEEE.754.32" dataType ENTITY #FIXED "float" > <!-- ======================================================================== "double"s are IEEE 754 64-bit floating-point (see XML-Data spec) --> <!ELEMENT double (#PCDATA)> <!ATTLIST double dt:dt CDATA #FIXED "float.IEEE.754.64" dataType ENTITY #FIXED "double" > <!ELEMENT doubleArray (double*)> <!ATTLIST doubleArray length CDATA #REQUIRED dt:dt CDATA #FIXED "float.IEEE.754.64" dataType ENTITY #FIXED "double" >