v50 Steam/Premium information for editors
  • v50 information can now be added to pages in the main namespace. v0.47 information can still be found in the DF2014 namespace. See here for more details on the new versioning policy.
  • Use this page to report any issues related to the migration.
This notice may be cached—the current version can be found here.

User:Vasiln/Goblin Logic 2

From Dwarf Fortress Wiki
< User:Vasiln
Revision as of 23:07, 4 September 2011 by Vasiln (talk | contribs) (Created page with '==Memory Addressing== The key to that is to have a very small memory space that we can perform operations on, and a much larger memory space with very limited functionality. Th…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Memory Addressing

The key to that is to have a very small memory space that we can perform operations on, and a much larger memory space with very limited functionality.

The least functionality we need from memory is the ability to read and write. If we break it down further, we need the ability to read a true, read a false, write a true, and write a false.

######
rddc  
######
rdhc  
######
 #<#
 #h#
 #^##
##hd#
#dh##
##^#
 #h#
 #<#
######
rhdc
######
rhhc
######

r is reset, c is completion, < is stairway pathing to exit

This is the start of that functionality. We see 5 elements here. From top to bottom: read true from memory; write true to memory; memory; read false from memory; write false to memory.

We've talked about memory before, and you should recognize it. As for the other four pathways, we're going to call them 'addressing paths.'

The left door or hatch in each pathway is set by the truth of either our memory cell (for the read paths) or the truth of the cell we're reading from (for the write paths). The second is determined by the truth of a read bit-- a bit that keeps track of whether we're currently reading or writing. If we're reading, the doors and hatches open; if we're writing, they stay closed.

The completion will send a signal: either set memory to true or false, in the case of writes, or set a distant memory bit to true or false, in the case of reads.

So only one of these four pathways is going to be open at any given time. That's nice-- it means we can just connect all the entries to a single reset position, fed from each completion point.

There's one other thing we can do here.

######
rdddc  
######
rdhdc  
######
 #<#
 #h#
 #^##
##hd#
#dh##
##^#
 #h#
 #<#
######
rhddc
######
rhhdc
######

I've added an extra door before completion. This is going to be set by the state of our addressing bit. Imagine a second structure like this, a few z-levels below, fed from the same reset position-- the only difference being that we have hatches linked to our addressing bit instead of doors. We still only ever have one pathway-- but we can switch which memory cell we're going to read from or write to by changing the position of our addressing bit.

One addressing bit will let us address 2 bits separately. 4 will let us address 16 bits separately. Imagine 16 of the following, forming a tower of addressable memory:

########
rddddddc  
########
rdhddddc  
########
 #<#
 #h#
 #^##
##hd#
#dh##
##^#
 #h#
 #<#
########
rhdddddc
########
rhhddddc
########

Different levels would see different combinations of hatches and doors at the end of each completion.

If you do make a tower out of these, don't forget practical considerations. Either each cell needs to be offset, or there need to be two z-levels between each cell so that we can pit goblins without being interrupted by goblins from above. (We could get away with 1 z-level between cells if we only ever pitted from bottom to top, but then if there's a problem with a lower cell, we'd have to kill every goblin above that cell before we could pit a new goblin.)

We're using 2-bits for demonstration purposes though. We can address our first bit and our second bit independently, and read each to a buffer-- the first to the buffer's first bit, the second to our buffer's second bit. So really, we need two of these towers.

Adressing Toggle Memory

Remember our toggle memory? It had no way to set true or false, only a way of toggling it. That doesn't mean we can't address it though-- we just a few more pathways.

#########
rdddddd c  
#########
rdhdddddc  
#########
#########
rdhddddhc  
#########
 #<#
 #h#
 #^#
 #h#
 #h#
 #^#
 #h#
 #<#
#########
rhddddd c
#########
rhhddddhc
#########
rhhdddddc
#########

Whereas before we had pathways for read true from memory, write true to memory, read false from memory, write false from memory--

We've changed two of those pathways, and added two more. We still only have two read functions, but now we have four write functions: Write true to true memory, write true to false memory, write false to true memory, and write false to false memory. The extra hatches and doors at the ends of these write functions are linked to the status of our memory. Writing true to true memory doesn't trigger anything, but writing true to false memory triggers a toggle. The extra complexity of writing to toggle memory means that we're not going to use it. Settable memory is preferable.

Addressing manager

If we operate on all of our bits at the same time, we'll need to know when our operations have completed. Managing that task is tricky. We can't know when to evaluate for completion, because we have to know that we've completed to know that none of our completion flags are going to undergo any change! Luckily, we CAN know that the flags aren't going to change until we tell them to, because we can use steady-state signalling. So we can get away with something that I said not to do. Here we have a circuit where the goblin has the potential to be without path, yet with multiple tiles to stand on.

####    ####
#<h######h<#
###^dd^h^###
##h######h#
 ##     d##
  ########


I've spelled out infrastructure because it's an abnormal circuit, something we don't normally want to do. Left ^ is reset; right ^ is completion (indicating that memory has been read). Notice the important omission of a reset begin hatch! Instead, I have a completion begin hatch. Central doors are opened by the presence of a goblin on a memory address completion plate. Central ^ sends a trip to a hatch permitting return to reset for our addressing goblins, as well as to the central doors and the SE hatch leading from completion to reset. That ensures doors close at same time as hatch opens, permitting return to reset only when no path exists from reset to completion. (We can do it another way instead, if we want: addressing completion opens central doors and SE hatch. That's because we know they're going to return to reset as soon as they can, leading to a close signal to the SE hatch at the same time that we send a close to the central doors.)

We can only do this because we know that:

  1. All doors will open before any close (because it's only movement through the trip that will lead to their close)
  2. Addressing manager reset is completely infrastructural, not linked to any outside operations

Buffers and Registers: Full-featured memory

So remember: we've built minimal functionality into our basic memory, but we're going to build maximum functionality into a few pieces of memory. Let's ignore maximal functionality for right now and just make one bit of our buffer with basic functionality: read true from memory, write true to memory, read false from memory, write false from memory.

 #<#
 #h#
 #^##
##hd#
#dh##
##^#
 #h#
 #<#

Woah. That's just memory. Where's the read and write?

We don't need it. We built that functionality into our basic memory, and it's shared functionality. Every read true from memory trips the southern door; every read false from memory trips the northern door. Meanwhile, our northern and southern pressure plates are linked to the doors and hatches in our memory addresses, permitting only write true to memory, for instance, while our buffer bit is true.

Couldn't we build that functionality into our buffer, rather than our memory? Really, memory read/writes with addressing isn't built into either the memory or the buffer. It's a function that works between the two of them. The physical location of our memory addressing circuit doesn't matter-- we can build it next to the buffer, or we can build it next to the memory, or we can build it someplace else entirely. But we can't make it any simpler by building it one place than the other: we still need four paths per bit. Placing it near the memory lets us keep an eye on it at the same time that we keep on eye on memory, and since both are proportional in size to the same value (number of bits), that makes the most sense-- at least to me. Really, it doesn't matter where you put it.

So are we going to build in functionality, for instance, for our ordinal compare? Not yet. Remember, ordinal compares require two values. To do an ordinal compare with every piece of memory would take 16 compare circuits, if we had 16 words of memory! So really, what we need to do is give functionality to our buffer to write to at least one other new piece of memory-- a register. That way, we can read a value, write it to the register, read a new value, and run an ordinal compare on any two words.

But wait-- we might need to output something from our compare to a piece of memory, and the only way we have of writing to memory is through our buffer. So instead of reading from and writing to a single bit, we need to build our buffer such that it's capable of reading from at least 1 bit, and writing to at least 2 bits.


#####
rdddc      write true from buffer to register 1
#####
rddhc      write true from buffer to register 2
#####
rhddc      read true to buffer from register 1
#####
 #<#
 #h#
 #^##
##hd#
#dh##
##^#
 #h#
 #<#
#####
rdddc     write false from buffer to register 1
#####
rddhc     write false from buffer to register 2
#####
rhhdc     read false to buffer from register 1
#####

You can see this is pretty similar to our previous addressing system. The only differences are that we have only a 1 bit address (2 values, register 1 or register 2) and that we have extra pathways, allowing us to write to multiple locations. In reality, we probably want to be able to write to more than two locations from our buffer, but we can get away with reading from only one extra location. We could string these buffers along: we could write to buffer 1 instead of register 1, for instance, and from there, write to register 1 or 2, and write to buffer 2 instead of register 2, from there writing to registers 3 and 4, but that's unnecessary. To write to four different locations, we just need 2 new paths for every extra location, for a total of 8 write pathways, and we need an extra door, to allow addressing of 4 different locations. 4 locations will be enough. (Two are going to be data registers, and one is going to be an instruction register, that keeps track of what our instruction is while we execute it, but we'll get to that later. The extra location is just that-- extra. We can do whatever we want with it later. It's never a bad idea to build in reserve like this when you can.)

######
rdddhc      write true from buffer to instruction register
######
rddhhc      reserved
######
rddddc      write true from buffer to register 1
######
rddhdc      write true from buffer to register 2
######
rhdddc      read true to buffer from register 1
######
 #<#
 #h#
 #^##
##hd#
#dh##
##^#
 #h#
 #<#
######
rdhdhc     write false from buffer to instruction register
######
rdhhhc     reserved
######
rdhddc     write false from buffer to register 1
######
rddhdc     write false from buffer to register 2
######
rhhddc     read false to buffer from register 1
######

Our complete buffer. Next, of course, we'll need a few registers. But before that, we have to know exactly what kinds of operations we want our registers to do: for instance, we've shown how the capability to add or subtract can be built into memory. Before I recreate that, though, I want to try and figure out if I can make a look-ahead adder, and I need to spell out exactly which operations our registers are capable of.


The two-bit add revisited: Examining the look-ahead adder

If you remember, I demonstrated two different ways to add earlier in the thread.

Two? Yes. First I demonstrated an adder that adds a bit, sending a percolating effect through the word, waits 200 ticks, then adds the next bit. After that, I demonstrated a way to add multiple words, by keeping track of a carry bit between adds. If you think about it, the second method would work fine for single bit words as well as it does for two-bit words or eight-bit words, although it should be clear that this method leads to much slower computation. Either method is very space efficient; we need slightly modified memory, a delay circuit, a bit of memory for our carry, and a disgustingly simple operation pipeline.

The second version is a full adder (that means we add two integers of any word size). The first is a ripple-carry adder. That means that the carry ripples through the word, and we wait for the carry to ripple before continuing the add. Why is the first method so much faster than the second method? It's because we're actually processing carry adds and our main add in parallel. We add the first bit, then ripple the carry; the thing is, we don't wait for the ripple to finish before processing the next add. We can do this because each of our bits of memory is capable of processing on its own. By time we reach the third bit, any carry from our first add is being rippled through the sixth bit, and any carry from our second add is rippling through the fourth bit. We're actually processing three bits at once. The reason we need any kind of delay at all is because of the latency of our memory-- we can't write a carry and an add at the same time.

That's still slow. It's still going to take over a day to process the addition of two 6-bit words. There's another way to do it. What we're going to do is include processing our carry with our add.

Adding our first bit is easy. There's no carry, so we only need three pathways: adding a 1 to a 0, adding a 1 to a 1, or adding a 0 to a 0.

 #######
##hh   1
# ######
r#dd   2
# ######
## #hd##
 ## ## 3
  ##dh##
   #####

Here we have 3 continuation points, labelled 1 to 3. You'll notice that two paths lead to continuation 3, but only one is ever going to be open. You might also notice that this is extraordinarily similar to the begin of our ordinal compare. We have an XNOR for the top two paths and an XOR for the bottom two paths.

Let's consider what each of continuation points mean. 1 means neither bit was true. 0+0 is 0, so we write 0 to the output, and we also write 0 to our carry. 2 means both were true. 1+1 is 2, so we write 0 to our output, and 1 to our carry. 3 means one or the other were true, but not both. 1+0 is 1, so we write 1 to the output, and we write a 0 to our carry.

Adding the next bit is where it gets ugly. This is an octopus function.

      ######
     ##ddd 1
    ## #####
   ## #hhd 2
  ## #######
 ## #  hdd 3
## #########
# #    dhd 4
r###########
# #    hdh 5
## #########
 ## #  dhh 6
  ## #######
   ## #ddh 7
    ## #####
     ##hhh 8
      ######

The leftmost door or hatch corresponds to the second bit of the first word; the next door, the second bit of the second word. The third corresponds to the state of our carry bit. Essentially, what we're going to do is to get around rippling our carry by including it in our add so that we aren't limited by the latency of our memory.

I don't know what to call this. I'm not sure it exists in an electronic version.

Let's make sure we know what the 8 completion points mean:

# Both bits true, carry true: output 1 to bit and carry
# Both bits false, carry true: output 1 to bit, 0 to carry
# First bit false, second bit true, carry true: output 0 to bit, 1 to carry
# First bit true, second bit false, carry true: Ditto
# First bit false, second bit true, carry false: output 1 to bit, 0 to carry
# First bit true, second bit false, carry false: Ditto
# Both bits true, carry false: output 0 to bit, 1 to carry
# Both bits false, carry false: output 0 to both bit and carry

We could write the carry to memory (ideally, more than one bit, maybe alternating, maybe 8 bits, to get around latency)-- unless--

Let's rearrange this and expand it.

            ####                   ####        
           ##  ##                 ##  ##       
      ######d## ##           ######d## ##      
     ##hdhh5#### ##         ##hdhh5#### ##     
    ## #####h<### ##       ## #####h<### ##    
   ## #hhdh2###### ##     ## #hhdh2###### ##   
  ## #######d    ## ##   ## #######d    ## ##  
 ## #  dddh1##### ## ## ## #  dddh1##### ## ## 
## #########h<#### ## ### #########h<#### ## ##
# #    hddh3####### ## # #    hddh3####### ## ##
r###########d         #r###########d         # 
# #    dhdh4####### ## # #    dhdh4####### ## ##
## #########h<#### ## ### #########h<#### ## ##
 ## #  ddhh7##### ## ## ## #  ddhh7##### ## ## 
  ## #######d    ## ##   ## #######d    ## ##  
   ## #dhhh6###### ##     ## #dhhh6###### ##   
    ## #####h<# # ##       ## #####h<# # ##    
     ##hhhh8#### ##         ##hhhh8#### ##     
      ######d## ##           ######d## ##      
           ##  ##                 ##  ##       
            ####                   ####        

Nope. Doesn't work.

What I tried to do was see if I could design it in such a way that tripping a carry would last until the goblin next reached a carry=true completion point. In general, you've got 10 moves before the close part of a pressure plate signal comes; but even with redesign, I've got 23 moves, many of them diagonals. Not going to work.

However, this is still a decent design for memory based carry bits-- because of the latency of memory, we have to make sure that we're not going to write to it particularly frequently! In this case, I see that our shortest path is 23 straight steps long, no problem for using a single bit to store carry. Enough time to propagate the close from carry no longer being true, enough time by far for a goblin to move from false position to true position.

But our longest carrying is still 23 steps! That's far longer than our path with our parallel-ripple-add. This design is going to function more slowly than our parallel ripple add. 200+ ticks per bit is good enough, I guess.

It was a good design exercise, in any case.

Improved XOR

I've been using a fully expanded (two path) XOR, but I just realized this isn't necessary.

 ###
##d##
 dhh
#####

This is a single path XOR that will work as well as our two-path XOR for everything but our compare function. With it, our octopus can become a sextopus. That's not enough to save it though.

Look-ahead Adder

There's one more design to consider. Can we add in parallel, design a two-bit carry, then add our carry in parallel? It might be problematic-- what if our carry generates its own carry? Of course, it couldn't with two-bit word size, like we've been demonstrating; we have to consider it in something bigger, like four bits. It would look quite a bit like our previous carry check pathway: slower than it takes to propagate ripple carries.

What if we evaluated each bit in parallel (at the same time that we added each bit in parallel) for its ability to create, propagate, or end a carry? We'd know where carries began, we'd know where carries ended, and we could send a minimum of carries, all at the same time, to propagate in parallel. Each carry evaluation would be the same size (about 20 tiles across), so we'd have our addition and our look-ahead completed after about 200 ticks, regardless of our word size. Then we'd propagate the carries, at the precise locations called for.

In a best case scenario (no carries), we'd add two words of any size in 200 ticks. In a worst-case scenario, we'd have to wait an extra 800 ticks or so for propagation across an 8-bit word. 1600 ticks, down to 200-1000 ticks? Less than half the speed.

I didn't really want to design it, but I guess I have to now. We'll do it in three bits.

Assume memory a1-a3 (operand 1), b1-c3 (operand 2), c1-c3 (output)

 ####
##dh#
# dh1
r#####
# dd2
# ###
##hh3
 #### 

This is our basic add. Each operand is linked to two doors and two hatches. Top is an XOR, completion at 1 outputs a 1. Bottom is an XNOR, completion at 2 or 3 outputs a 0. We have one of these for each bit-- so 3 of them. Each can be run in, at most, 5 steps (2 diagonal)-- that's about 65 ticks to add all of our bits with no carry.

Why separate completion points? We're going to keep track of carry state. Completion 1 propagates a carry. Completion 2 creates a carry. Completion 3 ends a carry.

In an electronic circuit, carry determination would normally be done in parallel with adding, I believe, but in our circuit, our carry determination is our add. So it's sort of like running in parallel.

Notice that we need a few more bits of memory. We need to keep track of each bit's carry status: propagate, generate, or end. We could design special 3-state memory for this purpose, but for now, let's just imagine that we have 3 carry state bits, each keeping track of propagate T/F, generate T/F, end T/F. Memory latency adds another 105 ticks.

Now, we need to create a carry word out of our carry states. This can turn into another octopus. It's simpler in the case of 3-bit words. Let's spell out what our LCU(2) (look-ahead carry unit, bit 2) does in pseudocode before we design it.

If generate(1) OR (generate(0) AND propagate(1)) then carry(2):=1

We can't just do an else with creature logic. We have to specify the else.

If end(1) OR (propagate(1) AND (end(0) OR propagate(0))) then carry(2):=0

Let's simplify that last one.

If end(1) OR (propagate(1) AND NOT generate(0)) then carry(2):=0

Pay careful attention to the delimiters. End(x) means that at bit location x, carry state is end. Carry(x) refers to the carry bit (not the carry state!) at bit location x.

What about bits 0 and 1? Bit 0 can't be written to by a carry. We don't need an LCU(0). Bit 1 gets a carry IFF generate(0).

We need one more bit, so that we can use multiple words to add big numbers-- a carry(3) and LCU(3)

If generate(2) OR (propagate(2) AND (generate(1) OR (propagate(1) AND generate(0)))) then carry(3):=1

If (end(2) OR (propagate(2) AND NOT (generate(1) OR (propagate(1) AND generate(0))) then carry(3):=0

Just for shits and giggles, let's spell out the carry bit for carry(9)-- 8-bit words.

If generate(8) OR (propagate(8) AND (generate(7) OR (propagate(7) AND (generate(6) OR (propagate(6) AND (generate(5) OR (propagate(5) AND (generate(4) OR (propagate(4) AND (generate(3) OR (propagate(3) AND (generate(2) OR (propagate(2) AND (generate(1) OR (propagate(1) AND generate (0) (with a whole mess of closing delimiters here) then carry(9):=1

If end(8) OR (propagate(8) AND NOT (generate(7) OR (propagate(7) AND NOT (generate(6) OR (propagate(6) AND NOT (generate(5) OR (propagate(5) AND NOT (generate(4) OR (propagate(4) AND NOT (generate(3) OR (propagate(3) AND NOT (generate(2) OR (propagate(2) AND NOT (generate(1) OR (propagate(1) AND NOT generate (0) (with a whole mess of closing delimiters here) then carry(9):=0

What an ugly fucking mess. I think you can see why I didn't want to have to design this. However, I think it's clear how to extend the technique to any bit length.


Time to design our LCU(2).

    ####
   ##dd##                 AND propagate(1) AND generate(0)
  ## ## #
 ##d#d ##    propagate(2) AND generate(1)
## ### ##  
# #d  #1#    generate(2)                                            carry(2):=1
r########
# #d  #2#    end(2)                                                 carry(2):=0
## ### ##
 ##d#h# #    propagate(2) AND NOT generate(1)
  ## ## #
   ##hh##                 AND NOT (propagate(1) AND generate(0))
    ####

Looks like a 10-long path at worst. That means with 3 bits adds, we can perform at 600 ticks with parallel ripple, or ~250 ticks with XOR(a,b), LCU(a,b), XOR(a, LCU). We don't really have to worry about the timing of LCU(3)-- it won't be accessed this instruction, giving us plenty of time.

Crazy time: LCU(7). Carry(7):=1 arm displayed. This is much more than an octopus.

            ####
           ##dd##                                                                             AND propagate(1) AND generate(0)
          ## ## #
         ##d#d ##                                                            AND propagate(2) AND generate(1)
        ## ## ##
       ##d#d ##                                             AND propagate(3) AND generate(2)
      ## ## ##
     ##d#d ##                              AND propagate(4) AND generate(3)
    ## ## ##
   ##d#d ##               AND propagate(5) AND generate(4)
  ## ## ## 
 ##d#d ##    propagate(6) AND generate(5)
## ### ##    OR
# #d  #1#    generate(6)                                            carry(2):=1
r########

Max length, 26 steps. LCU(8) would for 30 steps. ~365 ticks. XOR, LCU, XOR for 8-bit add in less than 600 ticks, whereas the parallel ripple add would take us 1600. Total size, 29x10x1, plus infrastructure.

The really cool thing about this operation is that it generates a word that, when added to the original addition, generates NO carries. So when we do our second XOR, LCU(x)=0, for any x.

Operation 0101: XOR with LCU

So if we're going to do adds via XOR, with write to LCU. That's really cool. That's because we can manage with very few instructions. We were going to need an XOR anyways, to write to output.

From here on out, we're talking about using 8 bit words, containing bits 0-7. Operation XOR will take operands a and b, write XOR to a, and write an LCU-generated word to b. Operand b will have to be 9 bits long; XOR will write a zero to bit 0 and output to bits 1-8, whereas add will operate on bits 0-7.

Of course, we need to get our inputs in, and outputs out.

Operation 0001: Write from memory to register a Operation 0010: Write from memory to register b Operation 0011: Write from register a to memory

In reality, these operations will consist of an intermediary step-- writing to the buffer. Since we're now working with 8-bit words and 4-bit addresses, and these are 4-bit instructions, we can now specify a memory address to read from or write to.

But wait a minute. What if we want to add larger numbers? Numbers too large to be expressed in 8 bits? We've broken our carry functionality by our redesign. We need a way to access LCU(8).

Operation 0110: Arithmetic right shift

What we can do is shift LCU right 8 times! We'll move the value of each bit from x to x-8, and fill in the blank spaces with zeroes. So LCU=100000000 -> LCU=000000001, and we can XOR it with the next value to carry the one from an earlier addition.

But arithmetic right shift is a really useful function for other stuff too. We'll make it so that arithmetic right shift operates on register b, and takes an argument in the last four bits specifying the number of bits to shift. Since 0000=0 shifts, which is totally useless, we'll make an argument of 0000 shift it 8 times, rather than zero times; and since we might want to use this for other functions, we'll make shifts of fewer than 8 drop the 9th bit (read in 0s instead). That way, we can also use this to create an XOR word for toggling output memory.

Of course, we'll need to be able to save the LCU value to be able to do this.

Operation 0100: Write from register b to memory

We'll want a left shift as well, and we might want to do a bit rotate too. A left shift is just like a right shift but in reverse-- but we can do this by adding a number to itself. A right-hand bit rotate is just like a right shift, except it doesn't write in zeroes on the right hand side. Since there's no possible reason to want to do this with our LCU, we'll ignore the 9th bit of b.

Operation 0111: Right-hand bit rotate

There's one thing you might be worried about. We totally forgot about subtraction! Before, we had this beautiful memory design that can be used to add or subtract, but now, our system is so complex that nobody knows if we can subtract with it. Don't worry-- there's another trick we can do to make everything work.

Operation 1000: Twos complement

By changing a number to its two's complement value, we can just add it instead of subtracting it. That's next, as well as:

Operation 1001: Compare Operation 1010: Jump if Operation 1011: Long jump if

That leaves us with five operations codes to spare: 0000, 1100, 1101, 1110, and 1111.