ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

How to Represent Integer Values of Arbitrary Size in QBasic?

Updated on June 20, 2012

A non-zero rational number can always be represented by:

numerator / denominator

where both numerator as well as denominator are whole numbers. What this means is that a rational number can be represented as:

sign * p(1)n(1) * p(2)n(2) * ...

* p(k-1)n(k-1) * p(k)n(k)

where p(i) > 1 is a prime and

n(i) ≠ 0 is whole number

(i=1,2,...,k-1,k; k>0).

And furthermore this representation of a rational number is unique if primes p(i) are sorted according to size.

When you are developing a software that would let you do floating point operations without loss of precision only limitation on size comes from one of these prime factors p(i) or their exponents n(i) becoming larger than maximum whole number that can be stored natively. This brings us to the problem: How to Represent Integer Values of Unlimited Size?

Inputting Large Number

If we get our number from a file as a number which is too large then as soon as an input statement is executed we will have overflow error. So it is not enough to make storage provision for large numbers but also input and output will have to be done creatively to work around this problem. One solution is to read large numbers into a string. So for example when we know we are going to input a number from a file and file has "coded" this numeric value as (say) 5523707870487398 then we have to suspend normal input process and switch over to a method which will process the file in character scan mode which must now read file one character at a time until entire number has been scanned into a string. Alternatively, if we have freedom to design the input file coding too, then we can code the input numbers too as strings. That is, above number would be presented as "5523707870487398" and not as a number. Either way what we have is a string that holds the number in human readable format as we got it from input file.

(Note: At this writing I do not know whether it is possible to use verb repertoir available in QBasic to build a software which will suspend normal input process and switch over to a method which will process the file in character scan mode. When QBasic excutes keyboard input verb "INPUT" which expects a numeric value and the number input is too large then QBasic responds with an error prompt:

Overflow
Redo from start

But in case of file input verb: "INPUT #" there are no equivalent tools or verbs available in QBasic which a software can use to recover from error and examine the offending input again using different syntax analyzer. If input design is outside the jurisdiction of a programmer then there is no way to solve this problem but to learn some compiler writing techniques and write a software that will perform tasks similar to "INPUT" verb but which gives more options to enter large numbers.)

Simplest Representation System for Integer of Unlimited Size

Absolute first idea that comes to our mind for representing large integer values is to do nothing! We got the number into our software as string and we keep it as a string! We do arithmetic on it by manipulating string. No fuss no muss. When one of our numbers need to be displayed or printed to a file we just print the string. No conversion from binary number system required! Simple!

String System

Another way would be to use string to hold a binary value. Just as we use MKI$ to produce string image of 16-bit 2's complement binary number, in the same way in string system we can store a 2's complement binary number of arbitrary size. Unlike the result of MKI$ and MKL$ which will produce strings or 2 and 4 byte long respectively, our string system will have as many bytes as we need. Maximum positive number that can be stored in a 16-bit signed number in 2's complement format is +32767. This can be stored in QBasic type INTEGER.

When we store this as 2 byte string it looks like this: 0111 1111 1111 1111b (the subscript b is used to indicate radix 2, "b" for binary). This can be written as hexadecimal digits as: 7FFFh (the subscript h is used to indicate radix 16, "h" for hexadecimal). The value of this maximum in decimal number system is: +32767d. Here subscript d is used to indicate radix 10, "d" for decimal.

Byte Order

Please note that the QBasic function MKI$ would produce string in which these bytes are in the increasing order of significance. But we are writing these bytes in reverse order because human readers of this article are more accustomed to reading numbers in Hindu-Arabic numeral system. Hindu mathematicians of antiquity may have written numbers in increasing order of significance from left to right, but the way this numeral system was transmitted to Europe by Arabian writers has come to be right to left in increasing order of significance. And this is how most English reading people are more accustomed with. Therefore, for the purpose of writing in this article we write numbers in increasing order of significance from right to left. In fact this is the reason why it would be more precise to term this digit order as Arabic numeral system and we can reserve the term "Hindu numeral system" for the positional notation devised by Hindu mathematicians of antiquity.

Byte Order in Intel x86 Based Computers

Just be aware that in computers based on Intel x86 processor chips, the numbers are stored in increasing order of significance from low address memory to high address memory. Also please be aware there are computer systems in which this order is reversed. So this order is not sacrosanct. It is up to a computer hardware designer to build computer hardware in any way that suits his whim or requirement.

We are writing bytes in increasing order of significance going from right to left because human readers would find it easier to read. However, we are free to design our string system in any way we like. We need not adhere to the same byte order when we design this string system. We are free to use reverse order. When we design something we are the boss. We can design in whichever way we like. As long as we are clear in our communications it is fine. We can write numbers in written communications right-to-left. And actual order designed into our system may be different. As is the case with QBasic types INTEGER and LONG.

Throughout this article we will show numbers in right to left byte order but assume that inside the string the order is left to right just as it is in Intel x86 and QBasic's native integer types. This design decision will save us some computational labor because should the need arise to "load" a substring of 4 bytes into a QBasic type LONG variable we will not have to reverse the order.

Sign Extension

For +32767d if we use the same byte order in our string system as output by MKI$ then both our string system and the output of MKI$ will have same bit pattern. Both are 2-byte long and same bytes in same order by design. But both will react differently when we add +1d to this integer. Remember that +32767d is the maximum that can be stored in QBasic type INTEGER, therefore QBasic add operation will fail and cause overflow error. But not so in our string system. In our proposed string system we will prefix one or more bytes if space required to store the result of arithmetic has increased. Please recall that +32767d was 7FFFh in both QBasic type INTEGER and in our string system by design. This number can be thought as 007FFFh. This is like writing +32767d as +032767d. In fact we can prefix as many zeroes as we like. That is, +32767d = +032767d = +0032767d. It makes no difference to the value of our number. Value remains +32767d. In the same way in binary number system it makes no difference to the value of a non-negative integer if we prefix one or two or ten or twenty bytes which all are: 00h. For negative integer in 2's complement binary system it makes no difference to the value of integer if we prefix one or more bytes all of which are: FFh.

For example, -32768d is minimum number that QBasic type INTEGER can store. MKI$ will output this as: 8000h. This is a negative number in 2's complement system that QBasic uses. However, if you assign this value to a QBasic type LONG variable and then use MKL$ to produce the string output to see how a string storage system would have stored this negative number as 4-byte binary number, what you would find is this: FFFF8000h.

What our string system will produce when +1d is added to a value of +32767d is the following bit pattern: 008000h. This is the 3-byte 2's complement representation of number +32768d. Our original number +32767d was stored in the string system as a 2-byte string to be interpreted as a 2-byte 2's complement integer. So why we were forced to prefix our number by 00h? Reason for this is that if we had removed the leading byte 00h then we would be left with 8000h. And 80h = 10000000b (these are 8 bits that make up the byte 80h and they start with bit 1b). Which means that the sign bit is 1b and therefore the number will be interpreted as a negative number in 2's complement system. (Note. In fact QBasic type INTEGER value for bit pattern 8000h is -32768d.) But result of addition of +1d to +32767d cannot be negative. Therefore we must prefix our resulting bit pattern with 00h.

Alternative Method

Because of problem highlighted above an alternative algorithm is preferred. In this alternative method before we perform the requested arithmetic we prefix the number with 00h. Doing that we are not changing the value of number because the number is non-negative. If number had been negative then we would have prefixed FFh. After this prefixing has been done we perform the requested arithmetic.

Now after arithmetic operation we examine the 3 bytes and if high order byte is 00h and then ask overselves: is the leading byte 00h an unnecessary leading 00h byte? It could be if after removing 00h we are left with a 2-byte number which is non-negative. In our case this is not the case. Our result is supposed to be +32768d. This cannot be stored in 2-byte 2's complement number system. We need the 3rd byte. We call such prefixing by extra digits or bytes as "sign extension" or "sign padding" or simply "padding".

10's Complement System

Reader might ask what is corresponding situation similar to this in decimal world? In decimal world -32768d is same as -032768d, -0032768d, etc. In this case we are not using 10's complement system. Instead what we are using is "sign magnitude" system. In sign magnitude system sign is supplied separately and magnitude is an unsigned number. An unsigned number behaves like positive number and in this case appropriate digit to prefix is 0d (zero) in both negative numbers and non-negative numbers because in both cases sign is supplied separately and magnitude is to be interpreted as an unsigned number. So these examples of padding from sign magnitude system are not exact correspondent of padding we saw in previous paragraph.

The exact correspondent of 2's complement padding in decimal system is when we work with 10's complement system. When we work with complement systems the sign would be "encoded" in the high order digit. And then we would have situation similar to what we described in case of string system. It would be instructive to study this 10's complement system.

(Note: High order digit is the digit that would occupy the position of most significant digit in a system which stores only unsigned numbers. This is the digit which is coefficient y(n-1) of highest exponent of radix (b) in positional notation of a n-digit radix b number:

y = y0 * b0 + y1 * b1 + ... + y(n-2) * b(n-2) + y(n-1) * b(n-1)

When we encode sign in radix 10 system using 10's complement system we are restricting the values y(n-1) can take so that entire space of digit configurations is divided into two halves of same size; one for non-negative numbers and one for negative numbers.)

In 10's complement system number -32768d would be represented by 67232d. This number is arrived at by first calculating 9s' complement of 32768d = 67231d. To get 10's complement we add 1d to the 9s' complement: 67231d + 1d = 67232d. This is a 5-digit number in 10's complement system. Please note if we wish to make a 6-digit number from 67232d then we have to prefix it with 9d. This would be "sign extension" or "sign padding" in case of 10's complement system. In 10's complement decimal number system: 67232d is same as 967232d, 9967232d etc. This is similar to prefixing the byte FFh to a negative number in 2's complement system stored in the byte string in our string system.

In 5-digit 10's complement system we can represent numbers from -50000d through 49999d. In this system -00001d is represented by 99999d, -00002d by 99998d, and so on. A 5-digit 10's complement number can be turned into a 6-digit 10's complement number by prefixing 0d if the number is non-negative (that is, it is in the range: 00000d through 49999d). But for negative numbers (that is, those in the range 99999d through 50000d) we will have to prefix 9d. For example, in 10's complement system 50000d is same as: 950000d, 9950000d, etc.

Negative Numbers

What will happen if we have a 2 byte string representing -32768 in 2's complement and we subtract 1 (or add -1)? There will be an overflow error if such a value were stored in QBasic type INTEGER variable. In this case resulting value will be -32769 which cannot be stored in 2 bytes as 2's complement binary number. Absolute value of -32769 is +32769. In hex this is: 008001h (Remember that this is 2's complement format so we have to prefix 00h to prevent an incorrect interpretation of the number as a negative number). Now let us calculate negative of this number:

Example: Negation of hexadecimal of +32769

008001 (hexadecimal of +32769 in 2's complement)
FF7FFE (1s' complement)
    +1 (add 1 to get 2's complement)
------
FF7FFF (2's complement)


In this case we were forced to prefix 00h because our binary number without this prefixed byte will be incorrectly interpreted as negative number. So this is not a good example to demonstrate utility of our alternative method. So let us take another example:

Example: Negation of hexadecimal of -32768

Problem is to negate 8000 (hexadecimal of -32768)
First prefix FF to this negative number:
FF8000
007FFF (1s' complement)
    +1 (add 1 to get 2's complement)
------
008000 (2's complement, value in decimal: 32768)


If we had not prefixed FFh before proceeding with negation operation then how we would have performed the negation operation? Let us try without prefixing first:

Example: Without Prefixing First

Problem is to negate 8000
First note the sign of number: it is negative
8000 (it is negative)
7FFF (1s' complement)
  +1 (add 1 to get 2's complement)
----
8000 (2's complement)
But we expected a positive number
because it was negation of a negative
therefore, prefix 00 to this
and we get: 008000


It is easy to see that the alternative method of prefixing FFh before performing arithmetic avoids complicated reasoning. Avoiding complicated reasoning is important because we want to keep our algorithms and software simple.

Example: Subtract 1 From -32768

Subtract 01 from 8000
FF8000 (Prefix FF because number is negative)
    -1
------
FF7FFF (Leading FF not redundant)


So we started with two bytes: 8000h in 2's complement. Since we are subtracting a positive number +1h from a negative number -32768d we expect a negative number. But if we drop the "leading" FFh then what we get is: 7FFFh which is a positive number! So we cannot drop the leading FFh. Leading FFh is not redundant, it is necessary and it shows that the number can no longer be stored in 2 bytes. Here also it is easy to see that the alternative method of prefixing FFh before performing arithmetic avoids complicated reasoning. The alternative method starts with "sign extension". What we started with was 8000h which is same as FF8000h. We subtract +1h from this to get: FF7FFFh. In other words if an add or subtract operation is expected to give positive or negative result but existing byte storage shows wrong sign bit then we simply prefix 00h if expected result is supposed to be positive and FFh if expected result is supposed to be negative.

Example: Add -1 to -32768

Add negative of +1
 01 (hexadecimal of +1)
 0001 (prefix to equalize size with size of other operand)
 FFFE (1s' complement)
   +1 (Add 1 to get 2's complement)
 FFFF (This is 2's complement of +1)
+8000 (Add -32768)
-----
17FFF (Result is incorrect in sign and value
       even if "carry" is discarded
Same computation after bigger operand is prefixed with FF
 FFFFFF (2's complement of +1)
+FF8000 (Add -32768)
-------
1FF7FFF (Disacard "carry")


Again the alternative method of "Prefix First-Arithmetic Later" works beautifully.

-32769d in hex can be stored in 3 bytes as: FF7FFFh and that is the minimum number of bytes needed to store it. Alternatively it can be stored in 4 bytes as: FFFF7FFFh or in 5 bytes as: FFFFFF7FFFh etc.

In summary the alternative method amounts to sign extension as first step followed by the arithmetic operation with a possibility of having to discard carry. This means that we prefix a sign appropriate byte: 00h if number is non-negative and FFh if it is negative. And then perform the arithmetic. The method in which we perform arithmetic first and then prefix a byte if need be is slightly complicated. Reader should try to implement such a method in a QBasic program to understand complexity involved. It is easier to first prefix. This is because all we need to do is to determine the sign of number being prefixed. Once sign is known we prefix sign appropriate byte. In this way we are not trying to figure what the sign of the result is going to be.

The underlying mathematics of the alternative method of "prefix first" is this: when two numbers are added at most one more extra byte will be needed should the size of result increase from size of operands. Same holds good for negation and subtraction if we discard carry.

In Summary

For addition, subtraction and negation we examine high order byte to determine the sign bit. If sign bit is 0b and the byte is not 00h then prefix the number with 00h. If sign bit is 1b and the byte is not FFh then prefix it with FFh. In second step we perform the arithmetic operation requested. Discard the carry if one generated. We may leave this prefix byte in the value. That is, after operation is performed we may not remove redundant 00h or FFh unless if there are more then one redundant 00h or FFh bytes. If we adopt this strategy we might not have to prefix a byte before operation because the prefix byte might already be there. Minimizing string operation of removing redundant byte or prefixing 00h or FFh will save computational time.

Multiplication

Multiplication is best handled in conventional way. If one or both operands are negative they are first negated. If only one number is negative we negate the result. Details are discussed in a later section with examples.

Division

Division too is best handled in conventional way. If one or both operands are negative they are first negated. If only one number is negative we negate the result. Details are discussed in a later section with examples.

Other Integer Representation Systems

When we use a string storage to represent an integer we are not bound by 2's complement system. The choice of 2's complement is a design decision. We might as well use Sign Magnitude system or 1s' Complement system while using a string to store its value. There is a new extensible integer storage scheme that has arisen in connection with development of a universal character system called: Unicode.

Unicode

Unicode is a vast topic. We will reserve its detailed discussion for an article to be published later. Here we are just interested in describing how a system inspired by UTF-8 can be adapted to represent integers of unspecified size. Primary focus in Unicode is character representation for characters in human written languages. In Unicode an encoding model is proposed which treats characters in human written languages in an unified extensible manner. Entire model looks at symbols used in human written languages as existing at 4 levels:

  1. First level thinks of symbols as abstract entities in a set. There is one set of symbols for one language.
  2. Next level associates each element in a symbol set to a nonnegative integer.
  3. Third level associates each nonnegative integer of level 2 to sequences of particular code units of some specified width, such as 32-bit integers. Examples of coding units of specified width are: Byte, 16-bit unsigned or signed number, 32-bit unsigned or signed number, etc.
  4. Fourth level is a reversible transformation from a set of sequences of code units (from one or more schemes used in level 3 to a serialized sequence of bytes)

In this article we are aiming at methods for representing value of integers of arbitrary size in QBasic and therefore, we are interested in processing performed in level 3 of Unicode encoding scheme. Just like the processing in level 3 we wish to associate an integer to a sequence of bytes. And we want to do this in a way that will permit byte sequence encoding of all integers regardless of size and sign. UTF-8 is one of the proposed schemes to implement level 3 processing. In Unicode level 3 processing a mapping from the set of integers to the set of sequences of code units is computed. A code unit is an integer occupying a specified binary width in a computer architecture, such as an 8-bit byte and so on. Since we will be using bytes to illustrate our scheme of integer representation we will use byte as our basic code unit.

Key concept in this is the sequence of code units. Since we have agreed to use only bytes for our illustration we will concern with only sequence of bytes. There are Unicode schemes whose sequences are not all of the same length. That is, they are variable width sequences. One such scheme is UTF-8. We shall use UTF-8 as an example to illustrate how such schemes can be adapted for integer representation purpose. In UTF-8 the width of a sequence of bytes is as shown in following table. It shows how UTF-8 uses multiple bytes to represent non-negative integer values requiring up to 31 bits:

Bits  Last         Byte     Number
      Code           1      of Bytes
      Point                 After
                            Byte #1
____________________________________
 7    U+007F      0xxxxxxx   0
11    U+07FF      110xxxxx   1
16    U+FFFF      1110xxxx   2
21    U+1FFFFF    11110xxx   3
26    U+3FFFFFF   111110xx   4
31    U+7FFFFFFF  1111110x   5
____________________________________
All bytes except Byte #1 have
pattern like 10xxxxxx
where "x" stands for bits that
can be assigned any value.


Now let us demonstrate how a scheme inpired by this scheme can be used to represent integers of unspecified size. Please note that in UTF-8 the bytes other than first byte always has 10b as most significant 2 bits. Reason for this is to allow the UTF-8 scheme to be "self-synchronizing", meaning that it was not necessary to read from the beginning of the string to find code point boundaries. The idea was proposed by Ken Thompson of Bell Labs. You can read more on these topics and other details of UTF-8 scheme at the wikipedia article on UTF-8.

For the purpose of integer representation we do not need self-synchronization. But a scheme of using the most significant bits to specify number of bytes used can be used as shown in following table:

Byte 1    Number of bytes
0xxxxxxx  1
10xxxxxx  2
110xxxxx  3
and so on ...


Remaining bits (if any) of the byte where first 0b bit is seen starts 2'complement representation of an integer. This means that bit following first 0b is sign bit and it must be extended to fill most significant bits of byte. Also please note that the first 0b bit might be at end of a byte. In which case number starts in next byte. Also we can interprete "number of bytes" to mean number bytes starting from first byte with sign. For example, if first byte is: 11111110b then that means that the following byte starts an integer in 2's complement format. And the integer value it stores occupies 8 bytes starting from following byte. That means total 9 bytes are used. Consider another example in which first 2 bytes are: 11111111 0xxxxxxxb. This means the integer occupies 9 bytes and first byte in integer value is the second byte: 0xxxxxxxb. Total 10 bytes. In second byte since most significant bit is used to specify size we have room for 7 bits only. So total number of bits available would be 7 + 8 * 8 = 7 + 64 = 71 bits. This can represent a value from -270 through +270-1.

Alternatively, we do not use remaining bits in the byte where first 0b occurs. Rather we begin our integer value in next byte. This avoids complicated bit manipulation. In the two examples given above first one specifies 8 bytes storage by 11111110b. The second example specifies 9 bytes: 11111111 0xxxxxxxb. Actual value is stored in following 9 bytes. So it will be a 8 * 9 = 72 bit 2's complement integer. This alternative scheme also avoids complicated programming to figure whether the last byte of length specifying bytes is part of value or not because it is always excluded.

Sky is the Limit

Human ingenuity knows no bound. Sky is the limit. We can use string of values of QBasic type INTEGER as radix 10000 integer or as radix +32767 integers either as a complement system or as a sign magnitude system. We can use string of values of QBasic type LONG as radix 1000000000 integer or as radix +2147483647 integers. These too can be stored in byte strings because storing them in resizeable arrays may not be the best option. We can even store individual digits in QBasic type DOUBLE which permits storage of integers of larger size as compared to LONG. These are just few of systems that come to my mind off the top of head.

Examples

We illustrate how the number 5523707870487398d will be represented in the proposed string system. Also we show how a negative number such as -5523707870487398d will be stored in these system. Then we demonstrate operations like negation, addition, subtraction, multiplication and division. Some of illustrations will be developed partially leaving the development of a full featured software to an incremental exercise which I shall conduct in future. As my software takes shape I shall be reporting the progress here in hubpages.

Before we present the examples we first show a QBasic program UNLMTD.BAS. This program uses random number generator to generate a random string of decimal digits. If you run this program as given it will generate the number: 5523707870487398d and then a function CVU$ is called which returns the string representation of input number as an unsigned binary number. For 5523707870487398d it produces: 139FC87576A366h. For 552370787048d it produces: 809BDD52E8h. Please note as an unsigned number we need not worry about 809BDD52E8h because it is just an unsigned binary number and it not 2's complement number. However if the output was going to be interpreted as a 2's complement number then we must prefix it with 00h to prevent it from being interpreted as a negative number. To find out what is the value of 809BDD52E8h interpreted as a negative number in 2's complement, we have to perform negation operation on it to find out its absolute value in string system and then convert the resulting string into decimal number string. The conversion from binary string system into decimal string is reverse of the conversion that CVU$ does. (See examples following this program where I have specific advice on how to use the Microsoft Windows program called Calculator to do this).

QBasic Program to Convert a Decimal Number of Arbitrary Size to String System Binary Number

UNLMTD.BAS Program

DECLARE FUNCTION CVU$ (unlimited AS STRING)
DECLARE FUNCTION numericLtrim$ (unlimited AS STRING)
CLS
DIM SHARED radix AS INTEGER
radix = 256
DIM SHARED capacity AS INTEGER
capacity = 9 - LEN(LTRIM$(STR$(radix - 1)))
DIM unlimited AS STRING
DIM unlimitedBinary AS STRING
DIM byteVAL AS INTEGER
DIM unlimitedBinaryHexStr AS STRING
DIM byteHexStr AS STRING
'
'Generate a random number
'RANDOMIZE TIMER
'10 ensures that number of digits are at least 10
nodigits = INT(RND * 9 + 10)
'
'ensure first digit is always > 0
unlimited = LTRIM$(STR$(INT(RND * 9 + 1)))
FOR i = 1 TO nodigits - 1
  unlimited = unlimited + LTRIM$(STR$(INT(RND * 10)))
NEXT i
PRINT "unlimited: "; unlimited
'
unlimitedBinary = CVU(unlimited)
'
'Print the converted binary number in human readable hex format
unlimitedBinaryHexStr = ""
FOR i = 1 TO LEN(unlimitedBinary)
  byteVAL = ASC(MID$(unlimitedBinary, i, 1))
  byteHexStr = HEX$(byteVAL)
  IF LEN(byteHexStr) = 1 THEN byteHexStr = "0" + byteHexStr
  unlimitedBinaryHexStr = byteHexStr + unlimitedBinaryHexStr
NEXT i
PRINT "unlimitedBinaryHexStr: "; unlimitedBinaryHexStr
END
'
FUNCTION CVU$ (u AS STRING)
DIM accumulator AS STRING
DIM accumulatorVAL AS LONG
DIM quotient AS STRING
DIM quotientVAL AS LONG
DIM quotientAccumulator AS STRING
DIM returnValueStr AS STRING
DIM remainder AS STRING
DIM remainderVAL AS INTEGER
DIM unlimited AS STRING
unlimited = u
DO
  digitsToPick = LEN(unlimited) MOD capacity'for first iteration
  iterations = LEN(unlimited) \ capacity
  IF digitsToPick = 0 THEN
    digitsToPick = digitsToPick + capacity
  ELSE
    iterations = iterations + 1
  END IF
  accumulator = LEFT$(unlimited, digitsToPick)
  accumulatorVAL = VAL(accumulator)
  unlimited = MID$(unlimited, digitsToPick + 1)
  digitsToPick = capacity'for subsequent iterations
  FOR i = 1 TO iterations
    quotientVAL = accumulatorVAL \ radix
    quotient = LTRIM$(STR$(quotientVAL))
    leadingZeroes = 6 - LEN(quotient)
    quotient = STRING$(leadingZeroes, "0") + quotient
    quotientAccumulator = quotientAccumulator + quotient
    remainderVAL = accumulatorVAL MOD radix
    remainder = LTRIM$(STR$(remainderVAL))
    accumulator = remainder + LEFT$(unlimited, digitsToPick)
    accumulatorVAL = VAL(accumulator)
    unlimited = MID$(unlimited, digitsToPick + 1)
  NEXT i
  'numeric ltrim - removes leading zeroes
  quotientAccumulator = numericLtrim(quotientAccumulator)
  returnValueStr = returnValueStr + CHR$(remainderVAL)
  unlimited = quotientAccumulator
  quotientAccumulator = ""
LOOP UNTIL unlimited = ""
CVU$ = returnValueStr
END FUNCTION
FUNCTION numericLtrim$ (unlimited AS STRING)
DIM returnValueStr AS STRING
FOR i = 1 TO LEN(unlimited)
  IF MID$(unlimited, i, 1) <> "0" THEN EXIT FOR
NEXT i
IF i > LEN(unlimited) THEN
  returnValueStr = ""
ELSE
  'IF i > 1 THEN STOP
  returnValueStr = MID$(unlimited, i)
END IF
numericLtrim$ = returnValueStr
END FUNCTION

Example: Convert 5523707870487398d into Unsigned Binary Number

If you run the program as given above then it generates 5523707870487398d randomly. But to illustrate how this program is used we give below an alternative mainline routine which uses an assignment statement to load a string with value: "5523707870487398" before calling CVU$ function. This alternative program illustrates how this program will be actually used for input purpose.

DECLARE FUNCTION CVU$ (unlimited AS STRING)
DECLARE FUNCTION numericLtrim$ (unlimited AS STRING)
CLS
DIM SHARED radix AS INTEGER
radix = 256
DIM SHARED capacity AS INTEGER
capacity = 9 - LEN(LTRIM$(STR$(radix - 1)))
DIM unlimited AS STRING
DIM unlimitedBinary AS STRING
DIM byteVAL AS INTEGER
DIM unlimitedBinaryHexStr AS STRING
DIM byteHexStr AS STRING
'
unlimited = "5523707870487398"
PRINT "unlimited: "; unlimited
'
unlimitedBinary = CVU(unlimited)
'
'Print the converted binary number in human readable hex format
unlimitedBinaryHexStr = ""
FOR i = 1 TO LEN(unlimitedBinary)
  byteVAL = ASC(MID$(unlimitedBinary, i, 1))
  byteHexStr = HEX$(byteVAL)
  IF LEN(byteHexStr) = 1 THEN byteHexStr = "0" + byteHexStr
  unlimitedBinaryHexStr = byteHexStr + unlimitedBinaryHexStr
NEXT i
PRINT "unlimitedBinaryHexStr: "; unlimitedBinaryHexStr
END


To use this program make a copy of all the SUB and FUNCTION subroutines of QBasic program UNLMTD.BAS in previous section. And copy above source as mainline steps of your program. An easy way to do all this is to first make a copy of the program UNLMTD.BAS and replace steps which are generating a random decimal integer by following assignment statement:

unlimited = "5523707870487398"


We run the program and get the binary number in string as: 139FC87576A366h. As just the binary number we do not need to do anything more on this string.

Question: As a 2's complement binary number in string system with value +5523707870487398d do we need to prefix 139FC87576A366h with 00h?

Answer: No. Reason for this is the fact that the high order byte is 13h which means the number as it is will be considered as positive number in 2's complement system. Which means that we need not prefix the unsigned binary number 139FC87576A366h with 00h. Considered as 2's complement binary number it has same value.

Example: Convert 552370787048d into Unsigned Binary Number

We follow the same steps as in previous example with this number: 552370787048d. What you will get is this binary number in string: 809BDD52E8h.

Question: As a 2's complement binary number in string system with value +552370787048d do we need to prefix 809BDD52E8h with 00h?

Answer: Yes. If we do not prefix 00h then the number will be interpreted as a negative number in 2's complement system. Reason for this is the fact that the high order byte is 80h which has most significant bit 1b. So correct 2's complement binary number in string storage system is: 00809BDD52E8h.

Negation

Example: Negation of 809BDD52E8h

We wish to calculate the negative of 2's complement number 809BDD52E8h so that we know what is its absolute value. If the value of this number is: -x then negation will give us x. We will enter this number in Microsoft Windows Calculator program as hexadecimal number to know its value in decimal. (Note: This is reverse of what CVU$ function in UNLMTD.BAS does. Later I shall publish an article with a QBasic program which will do this reverse conversion from binary number in our string system and produce value in decimal as a string.)

We first calculate 1s' complement. (Note: I have written an article on complement systems: Complement Systems of Integer Representation. In that article I call 1s' complement as Digitwise complement. I prefer the terms I use in that article because they are less confusing. What is Digitwise complement in that article is also known as Reduced Radix Complement. For 2's complement I use the term Modular complement. What is Modular complement in that article is also known as Radix Complement. I prefer my own terms but since these are new I am using more popular terminology in this article. In the context of radix 2 Reduced Radix Complement is 1s' Complement and Radix Complement is 2's Complement.)

809BDD52E8h
7F6422AD17h (1's Complement)
        +1h
7F6422AD18h (2's Complement)


Enter 7F6422AD18h in Microsoft Windows Calculator in hex and then switch to decimal to read off its value in decimal: 547140840728d. So the value of 809BDD52E8h interpreted as 2's complement number is: -547140840728d.

(Note. Older Windows systems used to have only two views of calculator: Standard and Scientific. Newer Windows comes with an improved Calculator program which now has a Programmer view. It is this view which allows you to switch between hex and decimal.)

To help readers not familiar with hexadecimal numerals I am giving below the list of numerals with their 1s' complements:

Decimal  Hexadecimal  Complement
 0       0            F
 1       1            E
 2       2            D
 3       3            C
 4       4            B
 5       5            A
 6       6            9
 7       7            8
 8       8            7
 9       9            6
10       A            5
11       B            4
12       C            3
13       D            2
14       E            1
15       F            0


Addition

(From this point onwards our examples deal with 2's complement numbers only.)

Example: Add 55d (37h) to 5d (05h)

 37h
+05h
----
 3Ch
 3Ch = 3d * 16d + 12d = 48d + 12d = 60d


Example: Add D7C5h to 1593h

Please note that as unsigned numbers these are: 55237d and 5523d respectively. But we are dealing them as 2's complement numbers. Therefore, D7C5h is a negative number and 1593h is a positive number. So carry if any will be consistent with no overflow. In 2's complement system there is a simple criteria for overflow. If "Carry In" bit and "Carry Out" bit are identical for most significant bit then there is no overflow.

00100 <---Carry in for the hexadecimal digit
 D7C5h
+1593h
------
 ED58h


As you can see there is no overflow. To verify that there is no overflow we must do the bit level arithmetic on most significant hexadecimal digit at:

    00000 0010 <---Carry in for the digit
 D = 1000 0101b
+1 = 0000 0001b
---------------
1000 0110b (please note that "Carry In"
'           and "Carry Out" are same
'           for most significant bit.)


Question: What if there is overflow?

Answer: If there is overflow then our string system reacts by "creating" one more byte of storage and overflow is avoided. That is, our binary numbers increase in their storage size dynamically and overflow never occurs.

We verify that our add operation for adding D7C5h to 1593h has worked correctly:

First we find out the decimal values of operands:
1s' complement of D7C5h is: 283Ah
Add 1h to it: 283Bh which has decimal value of: 10299.
So the decimal value of D7C5h = -10299.
And 1593h is 5523 in decimal. Sum of these two is: -4776.
Number +4776d in hexadecmal is: 12A8h.
1s' complement of 12A8h = ED57h.
Add 1 to it to get 2's complement: ED58h.


Reader can carry out this verification in Microsoft Windows Calculator program.

(Note. New Windows operating systems has better calculator which permits one to select a Programmer option from View menu option. This is better than Scientific View of earlier versions. It allows you to specify integer size.)

Oveflow

We now illustrate the case when we would be forced to increase storage space. Best way to do this is to double a non-zero number repeatedly until the resulting number becomes so large that we are forced to allocate a byte to accommodate the number. We can double a number by adding it to itself. We would be forced to allocate a byte to accommodate larger number when storage provided to operands would be insufficient to store the result of arithmetic. In fixed storage systems which are natively understood by QBasic this will cause overflow but in string system the result is stored in larger string. Absolute Gold Standard for a definition of what constitutes an overflow is given in my article on Complements: Complement Systems of Integer Representation. Here I am giving briefly its salient points:

An arithmetic operation has caused overflow if result of same operation when performed on same operands stored in a bigger storage capable of storing more digits cannot be shrunk back into original storage regardless of how many more digits the bigger storage has. In other words, size of result has exceeded the number of digits in original operands.

This definition of overflow seems to imply that we will be forced to check for results in infinite number of storage sizes before arriving at conclusion that overflow has not occurred. This is not true. If we know size of both operands then the size of a sum or difference of these is unlikely to be bigger than by one more digit than bigger of the two sizes. Same holds good for negation. For multiplication the size of multiplication result will be sum of the two sizes. So in case of addition, subtraction and negation we perform the operaton with size one digit larger than bigger of the two sizes. In case of multiplication we perform the arithmetic expecting a result requiring storage of sum of the two sizes. After arithmetic if we cannot shrink back to original size then we declare that overflow has occurred. (Note: I am leaving mathematical justification of these claims to readers. It is not difficult to work out a rigorous proof of these claims.)

Please note that in string system we are not interested in declaring overflow when it occurs. Rather the whole purpose of string system is to dynamically increase storage required to a bigger size so that overflow never occurs except when we run out of string space.

Since there will not be overflow in string system we can present only an example of addition causing increase in space required to store the result. We first try adding negative number D7C5h to itself and see what happens. That will double it. If that does not cause an increase in the space required then we keep doubling by adding result to itself until increase in space becomes necessary.

When we add we will prefix FFh to D7C5h first. After arithmetic we will remove leading redundant bytes. A leading redundant byte in 2's complement is the high order byte which is either 00h or FFh which can be removed without changing the sign. For example, FFD7C5h is same negative number as: D7C5h. Similarly, 0007C5h is same positive number as: 07C5h. We would have caused an increase in storage space requirement if we are forced to retain larger number of bytes than original storage required for any of operands.

Example: Add D7C5h to D7C5h

We would have managed to cause overflow for 2-byte system if we need more than 2-bytes to store the result.

First we increase the storage.
For this we will prefix this number with
hexadecimal digit "F". This gives us: FD7C5h.
(Note. In String System a byte FFh is prefixed.
But here we will demonstrate technique with
just one hexadecimal digit.)
This "expanded" number has same value
in 2's complement system.
10100 <---Carry
 FD7C5h
+FD7C5h
-------
1FAF8Ah discard 1 carry. We get FAF8Ah
This can be "shrunk" to AF8Ah


Now add AF8Ah to AF8Ah. Let us hope this time we are forced to increase the string size:

Example: Add AF8Ah to AF8Ah

111110 <---Carry
 FAF8Ah
+FAF8Ah
 ------
1F5F14h discard 1 carry. We get F5F14h.
This cannot be "shrunk" to 4 heaxadecimal digits.
Therefore we will be forced to increase storage
for the result to accommodate one more hexadecimal
digit. In string system we will now need 3 bytes
to store the number.


Multiplication

Now a multiplication example:

Example: Multiply 37h by 05h

  37h (3d*16d+7d=55d)
* 05h (0d*16d+5d=5d) ... 55d*5d=275d
-----
  23h (7h*5h=7d*5d=35d=23h)
+ Fh  (3h*5h=3d*5d=15d=Fh)
+ 0h  (7h*0h=7d*0d=0d=0h)
+0h   (3h*0h=3d*0d=0d=0h)
-----
 113h (113h=256d+16d+3d=272d+3d=275d)
Please note storage required would 
be 3 hexadecimal digits or in string 
system 2 bytes. And in 2 bytes it
would store: 0113h


Now we look at one example of negative number multipliying a positive number:

Example: Multiply 87h by 07h

87h is negative.
1s' complement= 78h.
Add 1h= 79h=7d*16d+9d=112d+9d=121d.
So value in decimal is: -121d.
-121d*5d=-605d.
605d from calculator=25Dh.
1s' complement of 25Dh= DA2h.
Add 1h= DA3h.
In Calculator in a Windows 7 system
I verified that FDA3h is: -605d.


  87h
  78h 1s' complement
+  1h Add 1h for 2's complement
-----
  79h
* 05h
-----
  2Dh (9h*5h=9d*5d=45d=2Dh)
+23h  (7h*5h=7d*5d=35d=23h)
+ 0h  (9h*0h=9d*0d=0d=0h)
+0h   (7h*0h=7d*0d=0d=0h)
-----
 25Dh 
 DA2h (1s' complement)
  +1h (add 1h for 2's complement)
-----
 DA3h 
=FDA3h=-605d


In this example we are showing computation as humans would do. In software we are free to use a pair of hexadecimal digits because we have access to QBasic type LONG which can handle multiplication of positive integers up to 255. So we can process numbers stored in strings one byte at a time. We cannot use QBasic type INTEGER because square of 255 (65025) is greater than 32767 (65025 > 32767).

Division

If you have studied the QBasic program included in this article then you must have noticed that the subroutine CVU$ processes the decimal number input to it into binary number by treating the input decimal number as if it is a radix 1000000d number. The reason for this is that the subroutine CVU$ is supposed to divide the number repeatedly by 256d to extract the remainders which become radix 256d digits of progressively higher exponent positions starting with unit position in positional notation. The algorithm we used was one of a long division that every primary school student learns. We could have used 65536d as divisor and then one round of division would have yielded a value < 65536d for remainder making up two radix 256d digits and but then we would have less space left for decimal digits to "bring down". The "bring down" is term used to describe a step in long division whereby some digits of dividend are read off to carry foreward the division algorithm. I will illustrate this by showing how the number 5523707870487398d was processed by CVU$ subroutine to produce the first pair of hexadecimal digits 66h. As you know from the first example in this article the number 5523707870487398d was converted to binary number which in hexadecimal happens to be: 139FC87576A366h. The first round of division by 256 should yield the remainder 102d which is in hexadecimal system: 66h. The algorithm is well known to any graduate of primary school. But to evolve terminology used for later use I am using this illustration.

Example: Compute Unit Position Digit For 5523707870487398d in Radix 256d System

Divisor: 256d
Dividend: [005523][707870][487398]
                                  *
Quotient: [000021][576983][869091]*
         _________________________*
      256)[005523][707870][487398]
          [005523]                 algorithm begins by "bringing 
          [005376]                 down" digit [005523] because
          ________________         we know divisor to be 3
          [000147][707870]         digit number. [000021] is
                                   highest exponent digit in
                                   quotient. Remainder=[000147]
                                   is multiplied by 1000000 and
                                   next "digit" [707870] is
                                   added. This is another "bring
                                   down". 147707870 is now in a
                                   QBasic LONG variable divide it
                                   by 256 to get next quotient
                                   digit: [576983].
          [000147][707648]         256*[576983]=147707648 
          ________________         subtract 147707648 giving 222
                  [000222][487398] "bring down" [487398]
                  [000222][487296] this is unit position "digit"
                  ________________ quotient determination in this
                          [000102] position ends division
                                   The remainder in this position
                                   is radix 256 digit in unit
                                   position: [000102]=102d = 66h


Apply this process repeatedly quotient to give higher exponent position radix 256 digits until all radix 256 digits have been computed.

The rationale for using the radix 1000000 for input number is the fact that at any point we will have a remainder which in maximum coud be 255 followed by some "brought down" decimal digits which could in maximum be all 9's And maximum number of decimal digits that can follow is 6 before exceeding the capacity to store positive integer in QBasic type LONG variable. That is, 255999999 (six 9's) can be accomodated in a LONG variable but not 2559999999 (seven 9's).

We could have used for divisor 65536 and got 2 bytes as remainder in one cycle of division instead of 1 byte. In that case we will have used radix 10000 for input numbers. That is because 655369999 (four 9's) can be stored in LONG variable but not 6553699999 (five 9's). And lastly we could have used 16777216 and got 3 bytes as remainder in one cycle of division. In that case we will have used radix 100 for input numbers. That is because 1677721699 can be stored in LONG variable but not 16777216999. (Hint. Maximum positive integer LONG can store is: 2147483647).

Above explanation of CVU$ subroutine had dual purpose. First I wanted to introduce terminology we use in division algorithm used by primary school students. And secondly, we wanted to make readers accustomed to idea that unlike primary school students we freely help ourselves with facility of LONG variable to quicken the process of computation of radix 256 digits. Same approach will be used when we devise the division algorithm for string system. But in general case of dividing a string system number by another string system number we have no guarantee that divisor is small. When divisor was small like 256, 65536 or 16777216 we still had some room where we could squeeze in some decimal digits needed to be squeezed in when performing the "bring down" step of division algorithm. In string system we do not have guarantee that our divisor would be small so we cannot use the same technique that we used in CVU$ subroutine. For divisions in string system we will have to perform algorithm which mimics primary school students when they do the divisions. This process is known as "long division" but I prefer to call it "primary school student's algorithm". Alternatively we can call it the "bring down" algorithm. So let us try to clarify that process. While we are trying to clarify we will at the same time be on look out for any economy in algorithm that we can have by exploiting features available in QBasic.

Example: Divide 5523707870487d by 398d

Dividend: 5523707870487d
Divisor: 398d

Quotient:1<---trial quotient digit
          3...etc
       _______________
   398)552370787048739
       552              "bring down" 552. why 552? because same
      -398<- trial*398  number of digits as divisor. Trial
       ___              quotient digit: is calculated by 5 \ 4
bring  1543             where "\" is integer division of QBasic.
down  -1194<- 3*398     Why 5 \ 4? because we round off 552
1 more ____             downwards to 500 and 398 upward to 400 to
etc     349             calculate "trial" digit. We get trial
         .              digit=1. Note we could have divisor=100
          .             in which case upward round off would give
           .            us 200 and if brought down digits were
            etc         999 then trial digit would be=4 and but
                        4*100=400 but 999-400=499 is not <100 as
                        required of a remainder. Fortunately for
                        this example we can store both 100 as
                        well as 999 in LONG and would know from
                        integer division of these number that
                        required quotient digit is: 9. But this
                        example was to illustrate how a primary
                        school student who cannot handle numbers
                        bigger than single digit proceeds.
                        Biggest byte string LONG can handle is 3
                        bytes if nothing is known about most
                        significant byte's sign bit. If we know
                        that most significant byte's sign bit is
                        zero then we can load 4 bytes into a LONG
                        variable.


Discussion

An average primary school student can memorize a division table like following:

Division Table
9 \ 2 = 4 ("\" is for integer division)
9 \ 3 = 3
9 \ 4 = 2
9 \ 5 = 1
9 \ 6 = 1
9 \ 7 = 1
9 \ 8 = 1
8 \ 2 = 4
8 \ 3 = 2
8 \ 4 = 2
8 \ 5 = 1
8 \ 6 = 1
8 \ 7 = 1
etc etc


(Note. I am sure many cheat by doing a quick multiplication to figure these digits. For example, doing 2*2, 2*3, 2*4, 2*5 in head quickly reveals that 9\2 is 4, because 2*5 exceeds 9. Exposing the psychology of children's arithmetic ability is not topic of this article. I am sure reader understands my sweeping statements about that topic as broad and inaccurate generalizations.)

Is there a corresponding situation in QBasic? Availibility of in-built integer types INTEGER and LONG are corresponding features of QBasic which can be construed to be "implementing" these division table inside the "head" of a computer.

Also please note that we said that 398 should be rounded off upwards to 400 is correct but when divisor is 100 rounding off upwards should not yield 200 but just 100. But in case of string system we may have divisors of many bytes and it is not practical to examine all lower order bytes to see if they are all 00h. So we may use this so-called upwards round-off in any case. Since this is not actual upwards round-off we may call it "good faith upwards round-off".

Question: Should we use actual upwards round-off or good faith upwards round-off?

Answer: The "trial quotient digit" may be different from actual quotient digit. Purpose of round-off is to "estimate" quotient digit which is less than or equal to the actual quotient digit. If our estimate is way off then we will be forced to revise it upwards and repeat all the calculations which are needed to verify that it is correct quotient digit. This verification involves multiplication of a number by divisor followed by subtraction from "brought down" digits. When size of dividend is greater than size of divisor this can happen many times. When size of divisor is small then we can check remaining digits in string system and satisfy that all of them are zeros and then use actual upwards round-off instead of good faith round-off.

The probability of actual upwards round-off differing from good faith upwards round-off is small. And gets smaller as number of digits increases. For example when divisor is 2 byte long second byte can take up 255 non-zero values and probability of it being zero is 1/256. For a divisor of n bytes it is (1/256)(n-1).

As you will see in the following paragraphs we are going to use QBasic type LONG in which we shall squeeze in as many radix 256 digits as possible for the purpose of estimating quotient digit. So savings are minuscule and increase in software complexity is quite unnecessary.

For example, supposing our divisor is: C87576A366h. We can easily place 3 bytes C87576h in low order positions in a QBasic type LONG variable and then add 1 which gives C87577h. If dividend is 139FC87576A366h then first "bring down" is: 139FC87576h. And our primary school student algorithm yields 00h (zero) as first trial as well as final quotient digit being the result of division of 139FC8h by C87577h.

In next step in the algorithm trial quotient digit will be calculated by dividing: 139FC87576A3h by C87576A366h. Fortunately we can now squeeze in 4 bytes of 139FC87576A3h in this case. So we divide 139FC875h by C87577h and get trial quotient digit as: 19h. If we could not squeeze in 4 bytes of divisor made from high order digits in "brought down" digits then we would have certainly managed to load 3 bytes and in that case we would have divided by C875h+1h=C876h because one more digit A3h has been brought down. This one more digit "bring down" causes shift right in divisor or we may manage to squeeze in one more digit of trial dividend. In this case we managed to squeeze 4 digits of trial dividend. So effectively 3 bytes of trial divisor are already shifted right and no shift in divisor is required.

This entire process is illustrated in the following Example section:

Example: Divide 139FC87576A366h by C87576A366h

Dividend: 139FC87576A366h
Divisor: C87576A366h
Quotient:           0019...
            ______________
C87576A366h)139FC87576A366
            139FC87576     "bring down" 139FC87576. Why
           -0000000000     139FC87576? Because same number of
            __________     digits as divisor. Trial quotient
            139FC87576A3   digit: 00h is calculated by
         -(19*C87576A366h) 139FC8 \ C87577 using QBasic type LONG
                           variables. Why 139FC8 \ C87577?
                           Because we round off 139FC87576
Divisions in LONG:         downwards to 139FC80000 and C87576A366
139FC8\C87577=0            upward to C875770000 to calculate
139FC875\C87577=19         "trial" digit. we get trial digit=00h.
                           "bring down" one more digit: A3. Now
19*C87576A366 calculated   "brought down" digits are:
in string system           139FC87576A3. Load 139FC875 in a LONG.
                           139FC875\C87577=19. If we could not
                           load 4 bytes(say) when high order
                           byte>80 when it will be interpreted as
                           negative number, then load only 3
subtraction 139FC87576A3   bytes. And "shift" divisor's 3 bytes
-(19*C87576A366h)          by 1 byte. That is, we would have
performed in string        performed 139FC8\C876 in LONG.
system


Because of use of QBasic type LONG the possibility of our trial quotient digit being different from actual quotient digit is virtually nil. But when we write the software we must take care of that possibility until we can prove mathematically that this possibility does not exist.

Last Words

I have given enough pointers in this article to those with brave heart to attempt construction of a software which will allow one to represent integers of arbitrary size. Our final destination is construction of a software which will perform exact rational arithmetic. This article was part of a series of articles working towards that final goal.

Please keep a watch on my articles here on hubpages for at least one more article on the design of a system which will permit exact floating point arithmetic. I have tentatively titled that article as: A Framework of Exact Floating Point Arithmetic.

Comments

    0 of 8192 characters used
    Post Comment

    No comments yet.

    Click to Rate This Article