Introduction to the digital representation of alphanumeric data

Az oldal fordítása nem teljes

Those readers who are familiar with the digital representation of numbers, characters and symbols in general may skip this paragraph. However the DICOM standard cannot be understood without the principles of encoding „signs” by the computers.
The most basic concept of the digital representation of data is the „bit”. The physical bit is a unit that has two different states. For example higher voltage / lower voltage between two points, or: electric current running / not running through a piece of wire, etc. Independently from the concrete physical representation we consider the „bit” like two possible logical states of an abstract unit, the bit.

The usual signs for the possible states of a bit are

0 and 1.
From this point on we are not going to worry about he actual physical representation of these two logical values. The only thing that matters is that based on these two logical values the computers can code/recognize alphanumerical characters. It is good to know that in the case of characters (letters) or other non-numeric symbols the coding is arbitrary. In the case of numbers, however, using the binary number representation has many advantages. This is obvious since the two possible digits in the binary representation of numbers are the zero and the one. Before introducing the binary number representation let us describe the modern way of grouping the bits. Computer users usually do not come across with the bits themselves, the bits are grouped in quadruples called

half bytes

There are 16 different half bytes as shown in the following table:

The four bits	The corresponding hexadecimal digit	The decimal value according to the binary representation
0000	0	0
0001	1	1
0010	2	2
0011	3	3
0100	4	4
0101	5	5
0110	6	6
0111	7	7
1000	8	8
1001	9	9
1010	A	10
1011	B	11
1100	C	12
1101	D	13
1110	E	14
1111	F	15

Here:

Hexadecimal: place value representation of base 16. (the place values in decimal form are 1,16,256,..., the different powers of 2, the base of he binary system.)
Decimal: place value representation of base 10. (the place values are 1, 10, 100, 1000, ... the different powers of 10, the base of the decimal system.)
Binary: place value representation of base 2. (the place values are 1, 2, 4, 16, ..., the different powers of 2) Note that our number system of base ten is not the only place value system ever used. The Mayans invented the vigesimal system of base 20 and they already took advantage of using the 0, the Mesopotamians used the sexagesimal (base 60) system but they were having problems with the missung place value; it took time to envision the „nothing”.

The first column lists the 16 different states of the half bytes (four bits together). The second column lists the so called hexadecimal symbol (hexa digit) belonging to the actual half byte. The third column contains the usual numeric value associated with the four bits forming the half byte. When calculating the numeric value we follow the usual rules of the space value method of the arithmetic. However in the case of numbers represented by half bytes the base is not 10, but 2, as the table above explains

The decimal place values of the bits in a half byte:

The decimal value of the left most bit is	The decimal value of the next bit to the right is	The decimal value of the next bit to the right is	The decimal value of the rightmost bit
8	4	2	1

As an exercise let us calculate the decimal value of the hexa digit called „B” that denotes the bit stream

1011

by definition. According to the place value table above we may go from the right to the left and calculate the values in question: In B=1011 we have a „one” because the right most bit of „B” is 1 and the decimal place value of the right most bit is also 1. The next bit to the left is also 1 and its place value is „two”, that is we have a „two” in B as well; the next bit to the left is 0, so we do not have a „four”. However the leftmost bit is 1, that is we have an „eight”. Putting together what we have:

1*”one”+1*”two”+0*”four”+1*”eight

That is, the decimal value of B is

1*1+1*2+0*4+1*8 = 11

As a result we may say that according to our „agreement”

B=11

Where B is a hexadecimal digit and 11 is its decimal value.

This is how to decode/encode all the possible 16 half bytes if we want to represent numbers. Unfortunately the half byte is still not the final unit of the digital representation of numbers and alphanumeric characters (so far we have been talking about numbers only). The better know unit of data representation in computers is the

byte

The byte is the combination of two half bytes. There are 16*16=256 possible combinations of eight bits, that is there are 256 different bytes. The following table contains a part of the full list of the bytes when they represent numbers. Note that the decimal values of the half bytes have to be calculated as shown above. Then the decimal value of the left half byte has to be multiplied by 16 and, finally, the two decimal values have to be added.

The two half bytes forming a byte	The decimal value of the byte
00	0*16+0=0
01	0*16+1=1
02	0*16+2=2
...	...
19	1*16+9=25
1A	1*16+10=26
1B	1*16+11=27
...	...
BA	11*16+10=186
BB	11*16+11=187
BC	11*16+12=188
...	...
FE	15*16+14=254
FF	15*16+15=255

3. table

It is clear now that one byte can represent the decimal values from 0 to 255 (256 different possibilities). Usually one byte is not enough to represent decimal values appearing in the practice. The next unit is the double byte or the

word

To calculate the decimal value of a double byte (word) is as follows. First we calculate the values of the half bytes then we multiply the left byte by 256. For instance:

AB 1F=AB*256+1F=(10*16+11)*256+(16+15)*1=171*256+31=43807

One more remark. When we discussed the numeric representation of the bytes we took it from granted that the space values of the bytes go from the right to the left . For instance we saw that the meaning of the byte stream:

AB 1F= 3807

We could have done the calculation in the reverse order, had we agreed that we take space values increase from the left to the right. In that case we would have arrived to the following result:

AB 1F=AB*1+1F*256=271*1+31*256=271+7936=8207

Having understood the traditional number representation in digital (hexadecimal) form, let us take a look at the digital representation of the alphanumeric characters, symbols like letters of the ABC’s of different languages, the punctuation marks and other symbols. Note that digits may appear under some circumstances when their meaning is not numeric. Like the symbol „2” may mean the symbol of a swan, and it may not have anything to do with the number two.
The following table, the so called Latin 2 code table that contains the digital representation of 256 different characters as it is given by the ISO 8859-2 standard:

Latin-2 character coding

Fig 2.

The rows and the columns of this table (Figure 2.) are marked by the hexadecimal characters. For instance the „Ő” character is in the row D- and is, at the same time in the column -5. That is, the two half bytes coding the Hungarian character Ő are D and 5. The byte formed by these half bytes is D5. That is, in the Latin-2 code table of the ISO standard:

Ő=D5

In binary form (since D=1101, and 5=1001)

Ő=D5=11011001

As we have already seen the byte D5 has another meaning when it is interpreted as a number:

D5=13*16+5=213 and at the same time: D5=Ő

How do we know what D5 exactly means?

We see now that the 2 bytes AB 1F may have 3 different meanings
:

43807	8207	Ť és ”nem értelmezhető”
If we consider AB 1F to be a number and if we calculate the decimal value assuming that the hexadecimal space values increase from the right to the left	If we consider AB 1F to be a number and if we calculate the decimal value assuming that the hexadecimal space values increase from the left to the right	If we consider AB 1F a sequence of alphanumeric characters and we interpret it according to the Latin-2 code page.

There are many other character pages and there are many different numeric interpretations. So if we see a so called byte dump or DICOM dump in hexadecimal form we have to make sure that the right interpretation is known. Let us see for instance the following dump containing 32 bytes:

4B 65 64 76 65 73 4F 6C 76 61 73 F3 21 4E 65 6D
65 6E 6A 65 6E 65 6C 61 6B 65 64 76 65 64 21 21

If we assume that these are Latin-2 characters then the interpretation is:

KedvesOlvasó!Nemenjenelakedved!!

If we assume that these are words (double bytes) to be interpreted as numbers in the way that the place values increase from the left to the right then we get 16 different numbers…

Usually words describe the status of one pixel on the screen in DICOM. By two bytes (four hexadecimal digits) digits 65536 different values can be represented. Such a four digit group of hexadecimal numbers can describe 65536 different statuses of a pixel. The simplest interpretation is that we can have 65536 different grey levels of a pixel.

What we have learned so far is that a half byte of a byte or combination of bytes like words can be interpreted many different ways. That is, a dump of bytes cannot be read without clearly communicating some rules about the actual way the characters, symbols and numbers are encoded. Part of the DICOM standard is the description of the allowed coding methods, We are going to discuss the related standard later.

Export

Medical Imaging

Site Language: English

Introduction to the digital representation of alphanumeric data

Sidebar

Medical Imaging

Site Language: English