November 24, 2004
Basic idea Is it necessary that coding keeps same for different type? I dn't think so. Why the xml is verbose? I think it is because xml text represenation use a unified represenation for any type in the xml infoset. Or we say xml text represeatntion is suitable for xml inforset withoug xml-schema, but not suitable for a xml infoset with xml-schema;
So in a whole message, different part can have different encoding rule based on the type of that part for the coding performance. Basically this idea is not exactly the one of binary xml which also uses a unified encoding rules but in binary format, it is the basic idea behind of the BOXSA and BXSA;
I am thinking using tree grammar to give a taxonomy of the realationship between type and coding. If you are not familiar with tree grammer , then ``Taxonomy of XML Schema Languages using Formal Language Theory'' is recommended;
Information in a message
Taxonomy of Message Encoding
OpenMetaData(PBIO) is an example of this policy.
XBS and lots of other binary data representation will use
Taxonomy of Message Type for the high performance encoding
The tree grammar, the production rule only use , in its regular expression such as Doc -> doc(Para1, Para2)
In this case, the NAME information for every child element is redundent if the size of the representation of the element is known and SIZE information is associated with every child element;
If the size of every child element is fixed/static, then SIZE information also is redundant, only DATA is needed (i.e. NO-self-descriptive)
The XML-Schema is <sequence> <a> <b> </sequence>
The tree grammar, the production rule only use , * or ? in its regular expression ,such as Doc -> doc(Para1, Para2*)
In this case, the NAME information of Para2 isnot totally redundant since we need to check the last element or whether it is null; However if the SIZE of Para2* can be provied in the message, then the NAME information of Para2 can be omitted (i.e. Semi-self-descriptive); For example, a dynamical array of double;
The XML-Schema is <element name = 'a' maxOccurance = 100>
The tree grammar, the production rule will have | in its regular expression ,such as Doc -> doc(Para1 | Para2)
In this case, the NAME information is necessary since we need to tell which elemement does appear in the message; For this message type, we need the NAME and DATA, (note SIZE is optional due to the NAME) (i.e.Self discriptive representation)
The XML-Schema is <choice> <a> <b> </choice>