Monday, 11 August 2014

Generalized variable-length integers

This is used in punycode, aspect of which is the representation of a sequence of non-negative integers of arbitrary size in the kind of a sequence without delimiters, of "digits" from a collection of 36: a�z and 0�9, representing 0�25 and 26�35 respectively. A digit lower than a threshold value marks that it is the most-significant digit, hence the finish of the number. The threshold value depends on the position in the number. For example, if the threshold value for the first digit is b (i.e. one) then a (i.e. 0) marks the finish of the number (it's digit), so in numbers of over digit the range is only b�9 (1�35), therefore the weight b1 is 35 in lieu of 36. Suppose the threshold values for the second and third digits are c (two), then the third digit has a weight 34 � 35 = 1190 and they have the following sequence:

More general is using a mixed radix notation (here written little-endian) like a_0 a_1 a_2 for a_0 + a_1 b_1 + a_2 b_1 b_2, etc.

a (0), ba (one), ca (two), .., 9a (35), bb (36), cb (37), .., 9b (70), bca (71), .., 99a (1260), bcb (1261), etc.

Unlike a regular based numeral process, there's numbers like 9b where 9 and b each represents 35; yet the representation is distinctive because ac and aca are not allowed � the a would terminate the number.

The flexibility in choosing threshold values allows optimization depending on the frequency of occurrence of numbers of various sizes.

The case with all threshold values equal to one corresponds to bijective numeration, where the zeros correspond to separators of numbers with digits which are non-zero.

No comments:

Post a Comment