create your own

SIMD Parallel computing

63
rate or flag this page

By AlexK2009


At  one very fulfilling time in my career, 8  years in total, I  developed code on  a Distributed Array Processor. (DAP).  This  was a grid of 64 by 64 single bit processors with nearest neighbour  data connections. This meant  each processor could access it's  4 nearest neighbours.  For example the content of each of the 4096 processors could be  added to the content of, say its northern neighbour in one parallell operation.

Here I give a few  example of how this architecture can give  more  speedup than expected. It  is not a comprehensive review of techniques, and I plan to include a few more advanced methods in  a second article.    While SIMD parallelism is not fashionable these days I think this architecture offers potential and the ideas involved should not be lost entirely. In any case tI hope to offer a glimpse into  an interesting byway of technology.


History

The original DAP was developed by ICL and the academic  part of the research was done at Queen Mary College London. While I was there ICL lost interest in the DAP and a group of academics and ex ICL employees founded Active Memory Technology (AMT) to develop the commercial potential of the DAP.  The main competitor of the DAP  was  the Thinking  Machines  Connection Machine and  AMT  had legal battles with them  over  alleged patent infringements by CM.
In the end both companies failed and AMT was, as I recall, taken over by an American company called Masspar. When I heard this I confidently predicted it would not be long before the DAP was introduced into the UK as a marvellous AMERICAN invention and I was not disappointed.

The Connection Machine, with 265 by 256 processors had a  catwalk with a handrail that allowed you to walk around  the top.  I  saw a photograph  of the CM that showed the entire machine and the catwalk looked so small it was almost invisible.   By comparison the AMT  64 by 64 DAP was the size of a four drawer filing cabinet and later versions were about the size of a can of soft drink.
The CM used a  parallel version of C and the DAP used a parallel version of Fortran, which, I believe, influenced the standard for  Fortran 90.  The DAP, and I believe the CM, were both Batch machines with  a compiled program sent from a host to the DAP  to be run there. Later versions of the operating system allowed a limited degree of interactivity. I believe both machine were inspired by the ILLIAC IV

Spatial Averaging

Here I use a pseudocode based on DAP Fortran. For example to replace each processor with the average of its 4 nearest neighbours the code would be

(1) X = 0.25*(SHEP(X) + SHWP(X) + SHSP(X) + SHNP(X))

Where for example SHEP(X) meant “Place the content of each processor in the processor to the east and pad the edges with zero. Cyclic boundary conditions were also possible and the corresponding command was SHEC(X).

Let's look at (1). In (1) an averaging is performed in 4 parallel additions. On a 64 by 64 DAP and 64 bit data assume the addition takes 64 times longer on a one bit processor than on a conventional processor. The result will still be 16 time faster than with a Von Neumann Architecture. In fact adding two numbers with a single bit processor was quicker than I assumed so the speedup will be greater.

More efficiency can be obtained when averaging content of all all eight neighbours with the following line of code

(2)X = X + SHEP(X) + SHWP(X)
X = X + SHSP(X) + SHNP(X)
X= 0.125*X
A total of only four parallel additions and one division at each Processing element (PE)\

To see what happened look at 3 by 3 patch of Processors with the following content

1 2 3
4 5 6
7 8 9

For the central column applying the first line gives

1 +2 + 3
4 + 5 + 6
7 + 8 + 9

and applying the second line to the intermediate result from the first line gives the central sum

1 +2 + 3 + 4 + 5 + 6 + 7 + 8 + 9


For completeness I should mention that the DAP provided the facility to specify which Processors would carry out an operation, for example

(3) X(x>3)=3


Is an instruction to clip only those Processors that hold a number larger than 3. A Processor holding a 2 will not be affected.


Adding numbers

The DAP allowed a vector view of data.  The array could be treated like a chain of 4096 processors.
This allowed the contents of the 4096 PEs to be  summed in 12  parallel additions.  To see how this could be done  look a a chain of  8 Processors  holding the values


1 2 3 4 5 6 7 8

first, to each  each  processor add  the one on its right with planar boundary conditions

(1+2 ) (2+3)  (3+4)  (4 + 5)  (5+6)  (6 +7)  (7+8)  (8)

Now  to each  processor the one two places to the right

(1+2+ 3+4)  ( 2+3+ 4 + 5)   (3+4 + 5+6)    ( 4 + 5 + 5+6)    (5+6 +7+8) (6+7 + 7+8)  (7+8) (8)

Now  to each PE add  the one 4 place to the right The first PE now holds
1+2+ 3+4 +  5+6 +7+8

In general  N number can be summed in  O(Log2(n)) parallel additions.

Interesting Applications

While the DAP was mostly seen as useful only for image based  applications like noise removal, optic flow matching and  other computer vision algorithms it  was used for many other things, for example a colleague developed  Prolog for the DAP and others developed sorting algorithms, while I  myself developed a fast Fourier Transform algorithm and worked for three years developing Cellular Automaton  models of Polymers.

Wrapping up

I have here only given the briefest flavour of the  elegant   ways in which this architecture can speed up calculations.  With all this I think I can be forgiven for considering  the recent introduction of very restricted SIMD  by Microsoft and others as a pale imitation. Four object not able to communicate with each other indeed.

Or am I missing something in  all the VISIO diagrams I see  on the  web that appear to be deigned to make a  simple idea complicated and incomprehensible.

Print   —   Rate it:  up  down  flag this hub

Comments

RSS for comments on this Hub

No comments yet.

Submit a Comment

Members and Guests

Sign in or sign up and post using a hubpages account.


optional


  • No HTML is allowed in comments, but URLs will be hyperlinked
  • Comments are not for promoting your hubs or other sites

working