Difference between revisions of "User talk:Jifodus/Dwarf Fortress Utility Framework"

Revision as of 09:24, 13 December 2007

Sphr's comments

Update: added some comments on binary version data at the end A suggestion to consider: Unify map type defninition so that it can be used in inline definitions as well as global references.

In general a map has:

@name (optional) Does not need name if inlined
@size (optional) not needed for predefined types of known size.  
    Optional for most cases except when used as valuetype in array/vector
@type (optional) name of predefined/user-defined map
@(type-specific parameters) (e.g. vector/array/pointer may have a "valuetype", 
    array may also need a "length". complex will have a "mapping" of named offsets)

An element in a "complex" 's "mapping" has

@name (required) name of mapping
@offset (required) address offset from base of map
@type (optional) name of predefined/user-defined map OR inlined defined type
@(type-specific parameters) (e.g. vector/array/pointer may have a "valuetype", 
    array may also need a "length". complex will have a "mapping" of named offsets)

Basically

Every map definition can be used inline in another definitionwhere a "type" is expected.
Every named map definition can be referenced wherever a "type" is expected.

The following shows some examples (may contain human errors) Pardon the bastardized lua syntax... I added the map{ ... } to mark out the part that defines a map, and I used [..] to denote a list.

E.g.

string_map = 
map{ 
   name="string",
   size=28,
   type="complex",
   mapping=[ 			// mapping is used only by "complex" type
   	{name="buffer", offset=0x04, type="array", valuetype="byte", length="16"},  
   	    // have to define length for array if arrays are used as valuetypes for other arrays/vectors
   	{name="pointer", offset=0x04, type="pointer", valuetype=map{type="array", valuetype="byte"} }, 
   	    // inline definintion of byte array as valuetype
   	{name="length", offset=0x14, type="dword"},
   	{name="capacity", offset=0x18, type="dword"}
   ]
}

creature_33b_map = 
map{
	name = "creature_33b",
	type="complex",
	mapping=[
		{ name="first_name"	, offset=0x00, type="string" }, // refers to string type, no inlined definintions
		{ name="nick_name"	, offset=0x04, type="string" }
		....
	]
}

creature_33b_ptr_map =
map{
    name = "creature_33b_ptr",
    // maybe no need to define size.  size of predefined types can be assumed to be well-known
    type="pointer",
    valuetype="creature_33b"
}

creature_vector_33b_map =
map{
    name = "creature_vector_33b",
    type="vector",
    valuetype="creature_33b_ptr"
 }

df_33b_map =  
map{
  name="df_v0_27_169_33b",
  size = ???,
  type = "complex",   
  mapping= // start of list of mappings for complex type
  [  
   { name="main_creature_vector",  offset=0x0141FA30, type = "creature_vector_33b"  },
   
    // the following is similar to "main_creature_vector" defined almost totally inlined (except string) in an alternative way
   
   { name="creature_vector_2",  offset=0x01417A48,
        type = "vector"
        valuetype = map{ 																																													
        	type = "pointer",
        	valuetype= map{
        	    type="complex",
					mapping=[
						{ name="first_name"	, offset=0x00, type="string" },
						{ name="nick_name"	, offset=0x04, type="string" }
						....
					]
	         }
        }																																													
    },

    
    ... 
    
  ] // end of list of mappings
 }

Notice that mapping defined for the whole process is no different from that defined for a structure. The process structure is just a global structure bound to address 0x0000 of the DF process at run-time.

As for version, I suggest a separate section or even separate file.

versions={
  {  version="v0_27_169_33b", 
    timstamp="????",  // include a variety of data to support different binary identification methods. 
    crc32="????",
    map="df_v0_27_169_33b"  //name of base process's memory map, as defined earlier
  }, 
  
   {  version="v0_27_169_33c", 
    timstamp="????",  // include a variety of data to support different binary identification methods. 
    crc32="????",
    map= {... } //can even define to whole monstrous structure inline!
  }, 
  ...
  
}

additonal notes:

This is some stuff taken from what I did which can hopefully be of use as ideas even your method is different. A memory map just describes a structure. A run-time mapped memory object simply consists of a ordered pair of a real address acting as a base, as well as a map.

e.g. the df_process "memory object" is simply (0x0000, getmap("v0_27_169_33c")) or something. Note that in the FULL case, it probably need the process handle as well as it is addressing a shared memory, i.e. { baseaddress=???, process_handle=???, map=??? }

Not sure about your implementation, but from what I have tried out, I find that if there are means to automatically keep track of memory objects (their base address as well as their map) returned to queries of named offsets, it would work better. Example follows:

say I create a df_process object

df_obj = CreateMemObject(0x0000, hProcess, getmap("v0_27_169_33c"))

, where hProcess is the processhandle the program has to get, and getmap is just returns the process memory map structure for some binary version.

If say I call a function to retrieve the named offset "main_creature_vector", e,g,

my_creature_vector = GetSubObject(df_obj, "main_creature_vector")

I should get back something equivalent to

my_creature_vector == (0x0141FA30, hProcess, getmap("creature_vector_33b"))

where the base-address, process handle and the resultant map is all automatically resolved, so that end user don't have to deal with addresses and stuff.

// mock up program (with no additional object wrappers, have to know memory maps)

df_obj = CreateMemObject(0x0000, hProcess, getmap("v0_27_169_33c"));
ASSERT(df_obj)
my_creature_vector = df_obj.GetSubObject("main_creature_vector");
ASSERT(my_creature_vector)
num_creatures = my_creature_vector.GetLength();
for( i=0; i < num_creatures; ++ i ) {
	acreature = my_creature_vector.GetIndexedObject(i);
	first_name = acreature.GetSubObject("first_name");
	// print out first name
	// etc
}

// mock up program (with object wrapping so that don't have to deal with memory maps after binding) // i.e. don;t have to use the GetSubObject("...") method, which can be error-prone if user gets the string wrong.

df_obj = CreateMemObject(0x0000, hProcess, getmap("v0_27_169_33c"));
ASSERT(df_obj)

DFWrappedProcess df_wrapped_ob(df_obj); //creates wrapped object
ASSERT(df_wrapped_obj.IsValid())

int num_creatures = df_wrapped_obj.GetCreatures().GetLength();

for( i=0; i < num_creatures; ++ i ) {
	DFWrappedCreature acreature = df_wrapped_obj.GetCreatures().GetIndexedObject(i);
	first_name = acreature.GetFirstName()
	// print out first name
	// etc
}

Of coz, the above is a little troublesome due to C's strong typing. Perhaps you can come up with an even easier-to-use version for lua.

Sphr 03:00, 13 December 2007 (EST)

Response to Sphr's Comments + Current Implementation Details

Response to Sphr's Comments

Correct me if I'm reading your comments incorrectly (it probably wasn't a good idea to respond while my brain is falling asleep).

I think I've got the basic system down already, some changes will probably still be made (fortunately it's still in development and so the structure can still change). A rough idea, taken straight from the data files as the stand right now:

Types[V0_27_169_33E]["raw"] = { size = 1 }; -- size is one, it represents a fixed
         -- array of chars which is done through overriding fixed_size
Types[V0_27_169_33E]["word"] = { size = 2 };
Types[V0_27_169_33E]["dword"] = { size = 2 };
Types[V0_27_169_33E]["pointer"] = { size = 4 };
Types[V0_27_169_33E]["string"] = { size = 28, members = {
	buffer = { type = { type = "raw", fixed_size = 16 }, offset = 0x4 },
	buffer_ptr = { type = "pointer", offset = 0x8 },
	length = { type = "dword", offset = 0x14 },
	capacity = { type = "dword", offset = 0x18 }
} };
Types[V0_27_169_33E]["creature"] = { size = 1636, members = {
	firstname = { type = "string", offset = 0x000 },
	nickname = { type = "string", offset = 0x01C },
	languagename = { type = "langname", offset = 0x038 },
	customprofession = { type = "string", offset = 0x06C },
	typeid = { type = { type = "word", fixed_size = 2 }, offset = 0x088 },
	...
	unknown1 = { type = { type = "vector", subtypes = { "word" } }, offset = 0x0B4 },
	...
} };
AddressMaps[V0_27_169_33E]["main_creatures"] = {
	type = {
		type = "vector",
		subtypes = { type = "pointer", subtypes = { "creature" } }
	},
	pointer = 0x01240AC8
};

Now to explain the above data definition. You have your basic types raw (equivalent to Sphr's array type), word (2-byte integer), pointer (a pointer to a memory location). Then there is the first complex object, the string. The only bit that really needs explaining is the type field of buffer. What happens is the type gets overriden, it takes the basic type (raw) and changes the fixed array size from the default of 1 to 16. Then the internal object managing the type "raw" will correctly read the 16 bytes of the buffer. A similar story for the typeid field of creature structure. The next bit needing explaining is unknown1 of creature, it overrides the vector object to set the subtype to word. Then when utilities start accessing indices to the vector, the framework correctly creates meaningful wrapper objects. The address map example takes the wrapping to a new level, it nests the subtypes.

There are two data limitations (partially caused by a framework limitations), which prevents directly follow what Sphr suggested, it is unable to nest definitions and you can't extend or override the member map. Meaning, you can't create a vector object inline.

As for identifing DF versions? This is what's available for the data file:

Signatures[V0_27_169_33E] = {
	pe_timestamp = 0x475B7526,
	adler32_of_text_section = 0x????????,
	text_segments = {
		{ address = 0x00??????, segment_data = "\034\123d_l..." },
		{ address = 0x00??????, segment_data = "\234\143r*3..." },
	}
}

The PE timestamp is currently the only item checked, the rest is for future versions of the framework to use. Also, I avoided CRC due to wikipedia stating there is no standard divisor upon which the CRC is built (there are standards, but not a single standard). Since adler32 does have a standard construction, I chose adler32 instead.

Pre-release Implementation Details

(Basically the only reason why I'm including it here is so that the chosen data format actually makes some sort of reasonable sense.)

The one thing about my framework is that somewhat good, somewhat bad is none of the types are actually hardcoded. Sure for accessing types, there are hardcoded limitations. If there were no memory accessing, the base framework doesn't care about the difference between:

Types[V0_27_169_33E]["pointer"] = { size = 4 };

and:

Types[V0_27_169_33E."x64"]["pointer"] = { size = 8 };

However I do have interfaces that wrap access to integers/pointers/floating-point values and they have hardcoded limitations. Pointers does not have much of a problem, because I also have an interface wrapping a pointer and if utility doesn't need the actual address, then the framework can do and store the pointer however it wants.

A side note about pointers, if the pointer gets changed (and it can only be changed internally to the framework), all the pointers stemming from that pointer get changed appropriately as well. The framework takes advantage of that by having each "memory object" maintain a pointer to where it is in the memory. As the utility maps members of the "memory object" for access it has the pointer wrapper create a new pointer wrapper to the offset location.

i.e. cPointer *pointer = base->getAddress(member); // returns a new cPointer object,
base maintains full rights to that new cPointer object and will destroy the object
when base gets freed.

What benefits does this have? I have this type of code in the vector wrapper object.

if (cache[index] == NULL) {
 cPointer *begin_ptr = begin->getAddress(); // begin_ptr is actually just the addressof a member in the begin object
 iType *subtype = type->getSubType(0); // first subtype is the type the vector wraps
 iMemoryType *member = dfprocess->mapObject(begin_ptr->getAddress(index * subtype->getSize()), subtype);
 cache[index] = member;
}
return cache[index];

So now, all the vector wrapper has to do is initialize the index once. Then when the position of the vector suddenly changes in memory (due to DF spamming the creation of new creatures), I don't have to worry about updating the cache. In addition, if the utility has stored any of the returned objects, those objects will still be usable.

I think I've covered just about everything worth covering. -- Jifodus 04:24, 13 December 2007 (EST)