- v50 information can now be added to pages in the main namespace. v0.47 information can still be found in the DF2014 namespace. See here for more details on the new versioning policy.
- Use this page to report any issues related to the migration.
User talk:Jifodus/Dwarf Fortress Utility Framework
Sphr's comments[edit]
Update: added some comments on binary version data at the end A suggestion to consider: Unify map type defninition so that it can be used in inline definitions as well as global references.
In general a map has:
@name (optional) Does not need name if inlined @size (optional) not needed for predefined types of known size. Optional for most cases except when used as valuetype in array/vector @type (optional) name of predefined/user-defined map @(type-specific parameters) (e.g. vector/array/pointer may have a "valuetype", array may also need a "length". complex will have a "mapping" of named offsets)
An element in a "complex" 's "mapping" has
@name (required) name of mapping @offset (required) address offset from base of map @type (optional) name of predefined/user-defined map OR inlined defined type @(type-specific parameters) (e.g. vector/array/pointer may have a "valuetype", array may also need a "length". complex will have a "mapping" of named offsets)
Basically
- Every map definition can be used inline in another definitionwhere a "type" is expected.
- Every named map definition can be referenced wherever a "type" is expected.
The following shows some examples (may contain human errors) Pardon the bastardized lua syntax... I added the map{ ... } to mark out the part that defines a map, and I used [..] to denote a list.
E.g.
string_map = map{ name="string", size=28, type="complex", mapping=[ // mapping is used only by "complex" type {name="buffer", offset=0x04, type="array", valuetype="byte", length="16"}, // have to define length for array if arrays are used as valuetypes for other arrays/vectors {name="pointer", offset=0x04, type="pointer", valuetype=map{type="array", valuetype="byte"} }, // inline definintion of byte array as valuetype {name="length", offset=0x14, type="dword"}, {name="capacity", offset=0x18, type="dword"} ] } creature_33b_map = map{ name = "creature_33b", type="complex", mapping=[ { name="first_name" , offset=0x00, type="string" }, // refers to string type, no inlined definintions { name="nick_name" , offset=0x04, type="string" } .... ] } creature_33b_ptr_map = map{ name = "creature_33b_ptr", // maybe no need to define size. size of predefined types can be assumed to be well-known type="pointer", valuetype="creature_33b" } creature_vector_33b_map = map{ name = "creature_vector_33b", type="vector", valuetype="creature_33b_ptr" } df_33b_map = map{ name="df_v0_27_169_33b", size = ???, type = "complex", mapping= // start of list of mappings for complex type [ { name="main_creature_vector", offset=0x0141FA30, type = "creature_vector_33b" }, // the following is similar to "main_creature_vector" defined almost totally inlined (except string) in an alternative way { name="creature_vector_2", offset=0x01417A48, type = "vector" valuetype = map{ type = "pointer", valuetype= map{ type="complex", mapping=[ { name="first_name" , offset=0x00, type="string" }, { name="nick_name" , offset=0x04, type="string" } .... ] } } }, ... ] // end of list of mappings }
Notice that mapping defined for the whole process is no different from that defined for a structure. The process structure is just a global structure bound to address 0x0000 of the DF process at run-time.
As for version, I suggest a separate section or even separate file.
versions={ { version="v0_27_169_33b", timstamp="????", // include a variety of data to support different binary identification methods. crc32="????", map="df_v0_27_169_33b" //name of base process's memory map, as defined earlier }, { version="v0_27_169_33c", timstamp="????", // include a variety of data to support different binary identification methods. crc32="????", map= {... } //can even define to whole monstrous structure inline! }, ... }
additonal notes:[edit]
This is some stuff taken from what I did which can hopefully be of use as ideas even your method is different. A memory map just describes a structure. A run-time mapped memory object simply consists of a ordered pair of a real address acting as a base, as well as a map.
e.g. the df_process "memory object" is simply (0x0000, getmap("v0_27_169_33c")) or something. Note that in the FULL case, it probably need the process handle as well as it is addressing a shared memory, i.e. { baseaddress=???, process_handle=???, map=??? }
Not sure about your implementation, but from what I have tried out, I find that if there are means to automatically keep track of memory objects (their base address as well as their map) returned to queries of named offsets, it would work better. Example follows:
say I create a df_process object
df_obj = CreateMemObject(0x0000, hProcess, getmap("v0_27_169_33c"))
, where hProcess is the processhandle the program has to get, and getmap is just returns the process memory map structure for some binary version.
If say I call a function to retrieve the named offset "main_creature_vector", e,g,
my_creature_vector = GetSubObject(df_obj, "main_creature_vector")
I should get back something equivalent to
my_creature_vector == (0x0141FA30, hProcess, getmap("creature_vector_33b"))
where the base-address, process handle and the resultant map is all automatically resolved, so that end user don't have to deal with addresses and stuff.
// mock up program (with no additional object wrappers, have to know memory maps)
df_obj = CreateMemObject(0x0000, hProcess, getmap("v0_27_169_33c")); ASSERT(df_obj) my_creature_vector = df_obj.GetSubObject("main_creature_vector"); ASSERT(my_creature_vector) num_creatures = my_creature_vector.GetLength(); for( i=0; i < num_creatures; ++ i ) { acreature = my_creature_vector.GetIndexedObject(i); first_name = acreature.GetSubObject("first_name"); // print out first name // etc }
// mock up program (with object wrapping so that don't have to deal with memory maps after binding)
// i.e. don;t have to use the GetSubObject("...") method, which can be error-prone if user gets the string wrong.
df_obj = CreateMemObject(0x0000, hProcess, getmap("v0_27_169_33c")); ASSERT(df_obj) DFWrappedProcess df_wrapped_ob(df_obj); //creates wrapped object ASSERT(df_wrapped_obj.IsValid()) int num_creatures = df_wrapped_obj.GetCreatures().GetLength(); for( i=0; i < num_creatures; ++ i ) { DFWrappedCreature acreature = df_wrapped_obj.GetCreatures().GetIndexedObject(i); first_name = acreature.GetFirstName() // print out first name // etc }
Of coz, the above is a little troublesome due to C's strong typing. Perhaps you can come up with an even easier-to-use version for lua.
Sphr 03:00, 13 December 2007 (EST)
Response to Sphr's Comments + Current Implementation Details[edit]
Response to Sphr's Comments[edit]
Correct me if I'm reading your comments incorrectly (it probably wasn't a good idea to respond while my brain is falling asleep).
I think I've got the basic system down already, some changes will probably still be made (fortunately it's still in development and so the structure can still change). A rough idea, taken straight from the data files as the stand right now:
Types[V0_27_169_33E]["raw"] = { size = 1 }; -- size is one, it represents a fixed -- array of chars which is done through overriding fixed_size Types[V0_27_169_33E]["word"] = { size = 2 }; Types[V0_27_169_33E]["dword"] = { size = 2 }; Types[V0_27_169_33E]["pointer"] = { size = 4 }; Types[V0_27_169_33E]["string"] = { size = 28, members = { buffer = { type = { type = "raw", fixed_size = 16 }, offset = 0x4 }, buffer_ptr = { type = "pointer", offset = 0x8 }, length = { type = "dword", offset = 0x14 }, capacity = { type = "dword", offset = 0x18 } } }; Types[V0_27_169_33E]["creature"] = { size = 1636, members = { firstname = { type = "string", offset = 0x000 }, nickname = { type = "string", offset = 0x01C }, languagename = { type = "langname", offset = 0x038 }, customprofession = { type = "string", offset = 0x06C }, typeid = { type = { type = "word", fixed_size = 2 }, offset = 0x088 }, ... unknown1 = { type = { type = "vector", subtypes = { "word" } }, offset = 0x0B4 }, ... } }; AddressMaps[V0_27_169_33E]["main_creatures"] = { type = { type = "vector", subtypes = { type = "pointer", subtypes = { "creature" } } }, pointer = 0x01240AC8 };
Now to explain the above data definition. You have your basic types raw (equivalent to Sphr's array type), word (2-byte integer), pointer (a pointer to a memory location). Then there is the first complex object, the string. The only bit that really needs explaining is the type field of buffer. What happens is the type gets overriden, it takes the basic type (raw) and changes the fixed array size from the default of 1 to 16. Then the internal object managing the type "raw" will correctly read the 16 bytes of the buffer. A similar story for the typeid field of creature structure. The next bit needing explaining is unknown1 of creature, it overrides the vector object to set the subtype to word. Then when utilities start accessing indices to the vector, the framework correctly creates meaningful wrapper objects. The address map example takes the wrapping to a new level, it nests the subtypes.
There are two data limitations (partially caused by a framework limitations), which prevents directly follow what Sphr suggested, it is unable to nest definitions and you can't extend or override the member map. Meaning, you can't create a vector object inline.
As for identifing DF versions? This is what's available for the data file:
Signatures[V0_27_169_33E] = { pe_timestamp = 0x475B7526, adler32_of_text_section = 0x????????, text_segments = { { address = 0x00??????, segment_data = "\034\123d_l..." }, { address = 0x00??????, segment_data = "\234\143r*3..." }, } }
The PE timestamp is currently the only item checked, the rest is for future versions of the framework to use. Also, I avoided CRC due to wikipedia stating there is no standard divisor upon which the CRC is built (there are standards, but not a single standard). Since adler32 does have a standard construction, I chose adler32 instead.
Pre-release Implementation Details[edit]
(Basically the only reason why I'm including it here is so that the chosen data format actually makes some sort of reasonable sense.)
The one thing about my framework is that somewhat good, somewhat bad is none of the types are actually hardcoded. Sure for accessing types, there are hardcoded limitations. If there were no memory accessing, the base framework doesn't care about the difference between:
Types[V0_27_169_33E]["pointer"] = { size = 4 };
and:
Types[V0_27_169_33E."x64"]["pointer"] = { size = 8 };
However I do have interfaces that wrap access to integers/pointers/floating-point values and they have hardcoded limitations. Pointers does not have much of a problem, because I also have an interface wrapping a pointer and if utility doesn't need the actual address, then the framework can do and store the pointer however it wants.
A side note about pointers, if the pointer gets changed (and it can only be changed internally to the framework), all the pointers stemming from that pointer get changed appropriately as well. The framework takes advantage of that by having each "memory object" maintain a pointer to where it is in the memory. As the utility maps members of the "memory object" for access it has the pointer wrapper create a new pointer wrapper to the offset location.
i.e. cPointer *pointer = base->getAddress(member); // returns a new cPointer object, base maintains full rights to that new cPointer object and will destroy the object when base gets freed.
What benefits does this have? I have this type of code in the vector wrapper object.
if (cache[index] == NULL) { cPointer *begin_ptr = begin->getAddress(); // begin_ptr is actually just the addressof a member in the begin object iType *subtype = type->getSubType(0); // first subtype is the type the vector wraps iMemoryType *member = dfprocess->mapObject(begin_ptr->getAddress(index * subtype->getSize()), subtype); cache[index] = member; } return cache[index];
So now, all the vector wrapper has to do is initialize the index once. Then when the position of the vector suddenly changes in memory (due to DF spamming the creation of new creatures), I don't have to worry about updating the cache. In addition, if the utility has stored any of the returned objects, those objects will still be usable.
I think I've covered just about everything worth covering. -- Jifodus 04:24, 13 December 2007 (EST)
- Forgot to point out something that may caused misunderstanding, everything that was before the "additional notes" part of my comments actually refers to the persistent data format rather than the run-time format :) I just choose something that is similar to the lua structs that you used. When defining the persistent data, I think it would be nice to allow various ways of doing the same thing to suit potential users. And I think even if the specification data is defined inline, the framework you have should be able to resolve it nicely during parsing (e.g. when encountering an inlined map, just create that map first, give it some name/reference and then continue the parsing the current map with the new map id, as if it was defined much earlier). It can all be resolved nicely into separate entries in your Type map.
- Another thing. You could extend your map all the way to the process itself. Looking up something like creature vector is just looking up an offset in the process's map just like any other complex type, so that you don't need to deal with a separate "AddressMaps" structure. Just a comment though, as you might want to keep things working as they are.
Types[V0_27_169_33E]["process"] = { members = { main_creatures = {type = "vector", subtypes = { type = "pointer", subtypes = { "creature" }}. offset = 0x01240AC8 } } df_process = { type = "process", version = V0_27_169_33E, pointer = 0x00 } ??
- Sphr 04:56, 13 December 2007 (EST)
- I just realized something, type inlining would fairly difficult to implement with lua. I've been using lua_next, so I can't assume anything about the order that the keys will be returned. Which basically means I would have to scan the entire tree multiple times, to locate all types first.
- Though I just had an idea, I could always change the way the system parses. If it encounters a type not previously defined, it could create the a new type with that information, and subclass from there. This, of course, would have to come after I enable subclassing/extending of the member map.
- I however, will not get to this for the first release. Down the road, having inline type definitions would be incredibly useful.
- Jifodus 15:37, 13 December 2007 (EST)
- I'm just commenting based on the presumed desired outcome. No need to get there in one large leap. Small steps are fine. Not implementing that is ok too. It's nice to give more convenience to the end users, but it is not necessary, esp at a cost too great to be economic for the implementors :) Btw, I hope that your parser does have convenient ways to ignore parts that it doesn't recognize or use? If we are working towards a common persistent format, chances are, there may be data that are only used by one party and not by the other. Ideal case is that the unused data gets ignored safely (or just generate warnings without killing the whole process). I'll be occupied this weekend, but if I have time next week, I'll see if I can come up with a xml alternative to what you have. (xml easier for me if I'm using existing library, like tiny. otherwise, I'll have to define the lua grammar manually for the parser, which could take quite a few rounds of grammar debugging. Once the formats stabilizes, the next step could be a converter tool to freely transform one file format to the other. :) ) Sphr 07:57, 15 December 2007 (EST)
- Heh, at the moment it ignores parts that it doesn't understand or not formatted correctly. In fact, when I was debugging StartProfile, nearly (if not) all the problems I had were caused by incorrectly formatted data. So I will come up with an additional tag:
- I'm just commenting based on the presumed desired outcome. No need to get there in one large leap. Small steps are fine. Not implementing that is ok too. It's nice to give more convenience to the end users, but it is not necessary, esp at a cost too great to be economic for the implementors :) Btw, I hope that your parser does have convenient ways to ignore parts that it doesn't recognize or use? If we are working towards a common persistent format, chances are, there may be data that are only used by one party and not by the other. Ideal case is that the unused data gets ignored safely (or just generate warnings without killing the whole process). I'll be occupied this weekend, but if I have time next week, I'll see if I can come up with a xml alternative to what you have. (xml easier for me if I'm using existing library, like tiny. otherwise, I'll have to define the lua grammar manually for the parser, which could take quite a few rounds of grammar debugging. Once the formats stabilizes, the next step could be a converter tool to freely transform one file format to the other. :) ) Sphr 07:57, 15 December 2007 (EST)
- Sphr 04:56, 13 December 2007 (EST)
some_member = { type_ptr = "creature" }
- Which is the same as doing:
some_member = { type = { type = "pointer, subtypes = { "creature" } } }
- Not properly formatting pointers alone caused the majority of my problems. As to writing a parser for the lua data files, why not use the Lua library itself? It's small and relatively easy to use (if you need an example, you can always take a look at my loading code.) Jifodus 00:55, 16 December 2007 (EST)