v50 Steam/Premium information for editors
  • v50 information can now be added to pages in the main namespace. v0.47 information can still be found in the DF2014 namespace. See here for more details on the new versioning policy.
  • Use this page to report any issues related to the migration.
This notice may be cached—the current version can be found here.

User:Vasiln/Goblin Logic 3

From Dwarf Fortress Wiki
< User:Vasiln
Revision as of 20:38, 6 September 2011 by Vasiln (talk | contribs) (Created page with '==Optimization: Opcodes revisited== Right now, we have seven operations codes to spare: 0000, 1010, 1011, 1100, 1101, 1110, and 1111; and 15 arguments for XOR; and 8 arguments f…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Optimization: Opcodes revisited

Right now, we have seven operations codes to spare: 0000, 1010, 1011, 1100, 1101, 1110, and 1111; and 15 arguments for XOR; and 8 arguments for bit shift or bit rotate. Way more than we need. At the same time, we don't have nearly enough memory. We could do a couple of things: we could add instructions, so that for instance it wouldn't take two instructions to add two numbers. We could also switch it up to a 3-bit instruction and a 5-bit address.

Or, we could do both. Let's redefine our operations:

  • 000xxxxx Write from xxxxx to register a
  • 001xxxxx Write from xxxxx to register b
  • 010xxxxx Write from register a to xxxxx
  • 011xxxxx Write from register b to xxxxx
  • 100xxxxx Register operations, receives xxxxx as specifying argument
  • 101xxyyy Bit shifting register operations, receives xx as specifying argument, yyy as bit value to shift
  • 110xxxxx Jump if not b to address xxxxx
  • 111xxxxx Reserved (Halt?)

Two of these need specifying.

100xxxxx Register operations

*10000000 XOR a and b -> a
*10000001 Add: XOR a and b -> a, write 9-bit carry word to buffer, XOR a and carry word->a, write 8-bit carry word to buffer, with carry(8)=output of first XOR(8) OR output of second XOR(8)
*10000010 Add with carry: Add a and 0000000x where x is equal to carry(8), add a and b, output carry(8)
*10000011 XOR a with 11111111
*10000100 Add 1 to a (XOR a and 1, XOR a and carry word)
*10000101 Compare a to b, output to b
*10000110-10011111 Reserved (26 choices!)

101xxyyy Bit shifting register operations

*10100yyy Arithmetic right shift 0-7 units (y units)
*10101yyy Arithmetic left shift 0-7 units (y units)
*10110yyy Right rotate 0-7 units (y units)
*10111yyy Reserved

Note that there's no reason to send a y of 000. I suppose 1010000, 10101000, and 10110000 are also reserved values. We'll try and save 10111yyy for an operation that requires a 3-bit argument. Maybe like "write true" or "write false" or "bitwise XOR."

So I just bit the bullet. We made a carry byte; we established a carry bit. We also built two constants into our system: 1 and -1. We've succumbed to 5 bit addressing, but we still have plenty of reserved instructions. We can add, subtract, multiply or divide, with a smaller instruction count. Finally, we're ready to design the kind of memory to be used by our registers:

########
<h^hh^h<
########

Lol. All of our functionality is built into our operations circuits or our buffer. Our register memory doesn't need to do anything special.

A note on the notion of "Turing complete"

To be Turing complete, a computer has to have four things: a NAND, a NOR, a jump if, and infinite memory.

The idea is that a computer with all of those things can run any program that any computer imaginable can run. Those are our minimum specifications.

You might notice that we lack one of those criteria. That's okay: so does every other computer that exists. Typically, when people talk about being "Turing complete," they ignore the infinite memory part.

So if all we need are 3 operations, why do we have the rest? If you've been reading the programs so far, you might imagine that it's for reasons of speed and memory space. I could write an XOR with a NAND and NOR. I could write a bit shift with an XOR and a jump if-- it would just be long and slow. I could redesign my look-ahead unit in software. I could store carry bits in main memory.

With only 32 bytes, operating under circumstances that are less than ideal-- I need all of the speed and memory space I can get.

Optimization: Memory Addressing Revisited

We spent a lot of time optimizing our adder, and ended up with a parallel add/look-ahead that was very fast-- less than half the time of our parallel ripple, and far faster than a full adder. What I forgot to mention is that there was a lot of stupid delay built into the add just by virtue of requiring two memory accesses (two XOR operations). We've been assuming that a single goblin will address all of our bits in a particular bit location, but that takes a lot of time: trip the hatch takes 100 ticks just to get him started, and with 32 bits to address, we're probably looking at something like 30 moves to make it to our addressable bit-- so we've got around 400 ticks just to read memory/buffer, followed by 100 ticks of latency for our memory.

We can cut this in half by addressing our bits in parallel.

      #########
   ####       ###
  ##   ######d  ##
  # ####    ^### #
  # # #hhhhh#hX# #                    null path, if not addressed
  # #       #### #
  # # ########d  ##
  # # ddddddd^### #                   write true to memory
 ## # ########hX# #
 # ## ########### #
 #h#d ########d  ##
 ##^# ddddddd^### #  reset position   write true to buffer
 #h## ########hX# #
 #X## ########### #
 #h## ########d  ##
 #^## #######^### #  memory true
##hd# ddddddd#hX# #                   write true to buffer
#dh## ########### #
##^## ddddddd#d  ##  memory false     write false to buffer
 #h## #######^####
 #X###########hX#
 ###         ####

This is memory bit 11111(0): that means it is addressed by our memory address being 11111, and it reads/writes buffer bit location 0. For memory at address 01111, we would see the first door of each path replaced with a hatch, and the first hatch of our null path replaced by a door.

We've replaced our begin hatch with a begin door. When it gets tripped, the goblin begins operations. That lowers latency, so long as we can guarantee that reset path doesn't become available before that door closes. He returns r/w t/f or null via the top NAND then returns to reset. Longest time to completion: 12/4, 176 ticks, followed by another 100 ticks from memory latency. Time to reset isn't important-- it can happen simultaneous with other operations.

I could make it smaller, and thus faster, by using z-levels, but I don't want to do that. There is a lot to be said for keeping your design understandable. By the time I'm done, even I won't understand what I'm saying or what I've done.

One memory cell needs 2 goblins, 52 doors/hatches, and 80 mechanisms, if I counted right. Entire memory space will thus require 64 goblins, 1664 doors/hatches, and 2560 mechanisms.

Our memory cell is large-- 19 wide, 22 tall. Sides are just walls. 8 of these, side by side, would take up 128 tiles. That means that to keep a nice looking computer, we need a 3-wide embark. A 2 square tall embark would give us enough room for 4 of these. We might just need that-- each byte needs to be separated from the other by 2 dead z-levels, remember (could get around that with offsets, but that requires visualizing everything at once). So we could fit our entire memory space inside 24 z-levels of one map.

I'd rather keep it subterranean. I don't like the idea of a roc flying in and killing my memory. I guess I could get away with bridged pitting squares instead.

Let's consider build order too. We want any linked hatches or doors to be nearby in the build queue-- so that we can find them with a minimum of keystrokes. In our case, we can't build address doors until we've built an address buffer, buffer T/F doors until we've built a buffer, or the begin door until we've built a cycle manager, but everything else in this circuit just links to itself.

Optimization: Memory Latency

So far, we've only been linking our operations to the true part of memory. We've been using FALSE and NOT TRUE interchangeably. But they're not quite interchangeable-- not really. Consider the following two circuits:

  ###      #####
###d####   # h #
Xh^hh^hX   r###c
####d###   # d #
   ###     #####

Let's link the leftmost plate of our memory (TRUE) to the door, and the rightmost plate of our memory to the hatch. Starting state of memory is TRUE. The door is open, and the hatch is closed.

What happens when we open the northern memory door at tick 0, allowing our goblin to move to false? At tick 5, he leaves his TRUE pressure plate. At tick 33, he reaches the FALSE pressure plate. The hatch opens at 33. The door doesn't close, however, until 105.

It happens in reverse, the other way. When we set him to true at tick 0, the hatch closes at tick 105, and the door opens at tick 33.

Our memory has two different latencies and a refractory period. The latency to switch to TRUE is the time it takes a goblin to move from FALSE to TRUE-- about 33 ticks. The latency to switch to FALSE is the time it takes a goblin to move off of TRUE plus 100 ticks-- about 105 ticks. The refractory period is the time after a write to memory during which we cannot write to memory again-- usually, 110 ticks (time for the door to close from a delay 10 goblin running a trip plate).

Usually, design that takes advantage of this disconnect between FALSE and NOT TRUE gets really byzantine: you need to combine ORs with NANDs in the same design, make twice as many arms, etc. Logic was never designed to differentiate FALSE from NOT TRUE. But we're still going to keep it in mind for future design.

Cycle Manager

Every cycle of our computer follows a certain flowchart. Let's spell it out.

1) Copy memory at pointer to buffer 2) Copy buffer to instruction buffer 3) Increment pointer 4) Evaluate instruction 5) Go to 1

That's all. The complicated part of it is "evaluate instruction," which is sometimes going to involve writing to various places, someplace reading from various places, sometimes running operations to copy to buffer followed by operations to copy from buffer to register. But our basic cycle manager is very simple:

  ###############
 ##           h##
 # ############4#####    
 #d#hh1h^h23h^h#h#hX#
 ##r#############^###
##h#h#############h#
#X###d            ##
###################

r is reset. Each number corresponds to a pressure plate beginning that process. Each ^ corresponds to a pressure plate where we wait for completion of that operation. Notice that 2 and 3 can happen at the same time (we don't really need a separate pressure plate.)

I've added two other things. One is a lever operated door before reset. With this, we can permit operation or halt the system (very important if we accidentally program an endless loop!). I didn't design it very well, because it doesn't limit the goblin to a single tile, but it works. The other is a "back door" to our "execute instruction" part of the loop. This is important, because we need to get our values into memory to program the computer. We're going to do that via manually altering the instruction register and register a to write our program where we want it. We use our back door by opening the back door and closing, via hatch, normal progress along our cycle. Again, design was lazy and needs improvement.