v50 Steam/Premium information for editors
  • v50 information can now be added to pages in the main namespace. v0.47 information can still be found in the DF2014 namespace. See here for more details on the new versioning policy.
  • Use this page to report any issues related to the migration.
This notice may be cached—the current version can be found here.

Editing Utility:Accent Removal

Jump to navigation Jump to search

Warning: You are not logged in.
Your IP address will be recorded in this page's edit history.


The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

Latest revision Your text
Line 3: Line 3:
 
Some tile sets use the accented characters for additional graphical symbols. This can make racial language text difficult to read. You can remove the accented characters and symbols from the data files. This works on existing worlds and saved games.
 
Some tile sets use the accented characters for additional graphical symbols. This can make racial language text difficult to read. You can remove the accented characters and symbols from the data files. This works on existing worlds and saved games.
  
Since the structure of language files might change, it is safest if you remove the problem characters from the files yourself. Here are four methods to do just that. The first (Jackard's) only works on Windows, but is probably the easiest for novice users. The second (frobnic8's) will work anywhere Python does (i.e. just about anywhere). The third is a quick fix that works in Linux and the fourth is a small Windows application.
+
Since the structure of language files might change, it is safest if you remove the problem characters from the files yourself. Here are two methods to do just that. The first (Jackard's) only works on Windows, but is probably the easiest for novice users. The second (frobnic8's) will work anywhere Python does (i.e. just about anywhere), but requires using the command line a little.
  
 
==[[User:Jackard|Jackard]]'s [http://www.inforapid.de/html/searchreplace.htm InfoRapid] Script==  
 
==[[User:Jackard|Jackard]]'s [http://www.inforapid.de/html/searchreplace.htm InfoRapid] Script==  
Line 123: Line 123:
  
 
<ol>
 
<ol>
<li>Ensure you have [http://www.python.org Python] installed.</li>
+
<li>Ensure you have [http://www.python.org Python] installed. (If you have Python 3.x installed, you will need to remove the unicode functions on line 100 and 104, and change the print statements to functions.)</li>
<li>Copy and paste code below into a file called <code>ascii_hammer.py</code> in the <code>raw/objects</code> sub-directory of your Dwarf Fortress directory. (The ASCII Hammer: Is that a name worthy of Dwarf Fortress, or what?)<p><pre>
+
<li>Copy and paste (this modified version of) "The Unicode Hammer" with the name <code>unicode_hammer.py</code> in the <code>raw/objects</code> sub-directory of your Dwarf Fortress directory. (The Unicode Hammer: Is that a name worthy of Dwarf Fortress, or what?)<p><pre>
 
#!/usr/bin/env python
 
#!/usr/bin/env python
"""Convert Dwarf Fortress Language files from extended ascii to
+
"""
unaccented ascii. Based on the unicode hammer from:
+
latin1_to_ascii -- The UNICODE Hammer -- AKA "The Stupid American"
http://code.activestate.com/recipes/251871-latin1-to-ascii-the-unicode-hammer/
+
 
 +
This takes a UNICODE string and replaces Latin-1 characters with
 +
something equivalent in 7-bit ASCII. This returns a plain ASCII string.
 +
This function makes a best effort to convert Latin-1 characters into
 +
ASCII equivalents. It does not just strip out the Latin1 characters.
 +
All characters in the standard 7-bit ASCII range are preserved.
 +
In the 8th bit range all the Latin-1 accented letters are converted to  
 +
unaccented equivalents. Most symbol characters are converted to
 +
something meaningful. Anything not converted is deleted.
 +
 
 +
Background:
 +
 
 +
One of my clients gets address data from Europe, but most of their systems
 +
cannot handle Latin-1 characters. With all due respect to the umlaut,
 +
scharfes s, cedilla, and all the other fine accented characters of Europe,
 +
all I needed to do was to prepare addresses for a shipping system.
 +
After getting headaches trying to deal with this problem using Python's
 +
built-in UNICODE support I gave up and decided to use some brute force.
 +
This function converts all accented letters to their unaccented equivalents.
 +
I realize this is dirty, but for my purposes the mail gets delivered.
  
by frobnic8
+
Noah Spurrier noah at noah.org
 +
License free and public domain
 
"""
 
"""
  
 +
"""This version has had its translation table abused to produce
 +
better results for the language files of the game Dwarf Fortress by
 +
frobnic8.
  
from glob import glob
+
Original here:
from shutil import move
+
http://code.activestate.com/recipes/251871-latin1-to-ascii-the-unicode-hammer/
 +
"""
  
def latin1_to_ascii(unicrap):
+
def latin1_to_ascii (unicrap):
 
     """This takes a UNICODE string and replaces Latin-1 characters with
 
     """This takes a UNICODE string and replaces Latin-1 characters with
 
     something equivalent in 7-bit ASCII. It returns a plain ASCII string.
 
     something equivalent in 7-bit ASCII. It returns a plain ASCII string.
Line 148: Line 172:
 
     """
 
     """
 
     xlate = {
 
     xlate = {
             0x80: 'C',
+
             0x80: 'E',   # Euro sign
             0x81: 'u',
+
             0x81: 'e',   # Blank
             0x82: 'e',
+
             0x82: 'i',   # Single low 9 quote
             0x83: 'a',
+
             0x83: 'f',   # Latin small letter f with hook
             0x84: 'a',
+
             0x84: 'ii',   # Doubel low 9 quote
             0x85: 'a',
+
             0x85: 'e',   # Horizontal elipsis
             0x86: 'a',
+
             0x86: 't',   # Dagger
             0x87: 'c',
+
             0x87: 'tt',   # Double dagger
             0x88: 'e',
+
             0x88: 'ea',   # Modified circumflex accent
             0x89: 'e',
+
             0x89: 'oloo', # Per mile sign
             0x8a: 'e',
+
             0x8a: 'S',   # Latin capital letter S with caron
             0x8b: 'i',
+
             0x8b: '<',   # Single left pointing angle quotation
             0x8c: 'i',
+
             0x8c: 'OE',   # Latin capital ligature OE
             0x8d: 'i',
+
             0x8d: '-',   # Blank
             0x8e: 'A',
+
             0x8e: 'Z',   # Latin capital letter Z with caron
             0x8f: 'A',
+
             0x8f: '-',   # Blank
             0x90: 'E',
+
             0x90: '-',   # Blank
             0x91: 'ae',
+
             0x91: 'ei',   # Left single quote
             0x92: "AE",
+
             0x92: "ie",   # Right single quote
             0x93: 'o',
+
             0x93: 'ii',   # Left double quote
             0x94: "o",
+
             0x94: "ee",   # Right double quote
             0x95: 'o',
+
             0x95: 'ao',   # Bullet
             0x96: 'u',
+
             0x96: '-',   # En dash
             0x97: 'u',
+
             0x97: '-',   # Em dash
             0x98: 'y',
+
             0x98: '-',   # Small tilde
             0x99: 'O',
+
             0x99: 'TM',   # Trademark sign
             0x9a: 'U',
+
             0x9a: 's',   # Latin small letter s with caron
             0x9b: 'c',
+
             0x9b: '>',   # Single right pointing andle quotation
             0x9c: 'E',
+
             0x9c: 'oe',   # Latin small ligature oe
             0x9d: 'Y',
+
             0x9d: '-',   # Blank
             0x9e: 'P',
+
             0x9e: 'z',   # Latin small letter z with caron
             0x9f: 'f',
+
             0x9f: 'Y',   # Latin capital letter Y with diaeresis
             0xa0: 'a',
+
             0xa0: '-',   # Non-breaking space
             0xa1: 'i',
+
             0xa1: 'i',   # Inverted exclamation mark
             0xa2: 'o',
+
             0xa2: 'c',   # Cent sign
             0xa3: 'u',
+
             0xa3: 'E',   # Pound sign
             0xa4: 'n',
+
             0xa4: 'o',   # Currency sign
             0xa5: 'N',
+
             0xa5: 'Y',   # Yen sign
             0xa6: 'a',
+
             0xa6: 'l',   # Pipe, broken vertical bar
             0xa7: 'o',
+
             0xa7: 'S',   # Section sign
             0xa8: 'b',
+
             0xa8: 'ii',   # Spacing diaeresis
             0xa9: 'r',
+
             0xa9: 'c',   # Copyright sign
             0xaa: 'n',
+
             0xaa: 'a',   # Feminine ordinal indicator
             0xab: '1/2',
+
             0xab: '<<',   # Left double angle quotes
             0xac: '1/4',
+
             0xac: 'r',   # Not sign
             0xad: 'i',
+
             0xad: '-',   # Soft hypehen
             0xae: '<<',
+
             0xae: 'R',   # Registered trade mark sign
             0xaf: '>>',
+
             0xaf: 'aa',  # Spacing macron
             0xe0: 'a',
+
            0xb0: 'o',    # Degree sign
             0xe1: 'B',
+
            0xb1: 't',    # Plus or minus sign
             0xe2: 't',
+
            0xb2: '2',    # Superscript 2
             0xe3: 'n',
+
            0xb3: '3',    # Superscript 3
             0xe4: 'E',
+
            0xb4: "'",    # Acute accent
             0xe5: 'o',
+
            0xb5: 'u',    # Micro sign
             0xe6: 'u',
+
            0xb6: 'P',    # Pilcrow sign
             0xe7: 't',
+
            0xb7: 'o',    # Middle dot
             0xe8: 'o',
+
            0xb8: 'e',    # Cedilla
             0xe9: 'o',
+
            0xb9: '1',    # Superscript 1
             0xea: 'o',
+
            0xba: 'o',    # Masculine ordinal indicator
             0xeb: 'o',
+
            0xbb: '>>',   # Right double angle quotes
             0xec: 'oo',
+
            0xbc: '1/4',  # Fraction one quarter
             0xed: 'o',
+
            0xbd: '1/2',  # Fraction one half
             0xee: 'e',
+
            0xbe: '3/4',  # Fraction three quarters
             0xef: 'N',
+
            0xbf: 'b',    # Inverted question mark
             0xf0: 'E',
+
            0xc0: 'A',    # Latin capital letter A with grave
             0xf1: 't',
+
            0xc1: 'A',    # Latin capital letter A with acute
             0xf2: 'D',
+
            0xc2: 'A',    # Latin capital letter A with circumflex
             0xf3: 'k',
+
            0xc3: 'A',    # Latin capital letter A with tilde
             0xf4: 'f',
+
            0xc4: 'A',    # Latin capital letter A with diaeresis
             0xf5: 'j',
+
            0xc5: 'A',    # Latin capital letter A with ring above
             0xf6: 'i',
+
            0xc6: 'Ae',  # Latin capital letter AE
             0xf7: 'e',
+
            0xc7: 'C',    # Latin capital letter C with cedilla
             0xf8: 'o',
+
            0xc8: 'E',    # Latin capital letter E with grave
             0xf9: 'o',
+
            0xc9: 'E',    # Latin capital letter E with acute
             0xfa: 'i',
+
            0xca: 'E',    # Latin capital letter E with circumflex
             0xfb: 'v',
+
            0xcb: 'E',    # Latin capital letter E with diaeresis
             0xfc: 'n',
+
            0xcc: 'I',    # Latin capital letter I with grave
             0xfd: 'z',
+
            0xcd: 'I',    # Latin capital letter I with acute
 +
            0xce: 'I',    # Latin capital letter I with circumflex
 +
            0xcf: 'I',    # Latin capital letter I with diaeresis
 +
            0xd0: 'D',    # Latin capital letter ETH
 +
            0xd1: 'N',    # Latin capital letter N with tilde
 +
            0xd2: 'O',    # Latin capital letter O with grave
 +
            0xd3: 'O',    # Latin capital letter O with acute
 +
            0xd4: 'O',    # Latin capital letter O with circumflex
 +
            0xd5: 'O',    # Latin capital letter O with tilde
 +
            0xd6: 'O',    # Latin capital letter O with diaeresis
 +
            0xd7: 'x',    # Multiplication sign
 +
            0xd8: 'O',    # Latin capital letter O with slash
 +
            0xd9: 'U',    # Latin capital letter U with grave
 +
            0xda: 'U',    # Latin capital letter U with acute
 +
            0xdb: 'U',    # Latin capital letter U with circumflex
 +
            0xdc: 'U',    # Latin capital letter U with diaeresis
 +
            0xdd: 'Y',    # Latin capital letter Y with acute
 +
            0xde: 'P',    # Latin capital letter THORN
 +
            0xdf: 'B',    # Latin small letter sharp s
 +
             0xe0: 'a',   # Latin small letter a with grave
 +
             0xe1: 'a',   # Latin small letter a with acute
 +
             0xe2: 'a',   # Latin small letter a with circumflex
 +
             0xe3: 'a',   # Latin small letter a with tilde
 +
             0xe4: 'a',   # Latin small letter a with diaeresis
 +
             0xe5: 'a',   # Latin small letter a with ring above
 +
             0xe6: 'ae',   # Latin small letter ae
 +
             0xe7: 'c',   # Latin small letter c with cedilla
 +
             0xe8: 'e',   # Latin small letter e with grave
 +
             0xe9: 'e',   # Latin small letter e with acute
 +
             0xea: 'e',   # Latin small letter e with circumflex
 +
             0xeb: 'e',   # Latin small letter e with diaeresis
 +
             0xec: 'i',   # Latin small letter i with grave
 +
             0xed: 'i',   # Latin small letter i with acute
 +
             0xee: 'i',   # Latin small letter i with circumflex
 +
             0xef: 'i',   # Latin small letter i with diaeresis
 +
             0xf0: 'oa',   # Latin small letter eth
 +
             0xf1: 'n',   # Latin small letter n with tilde
 +
             0xf2: 'o',   # Latin small letter o with grave
 +
             0xf3: 'o',   # Latin small letter o with acute
 +
             0xf4: 'o',   # Latin small letter o with circumflex
 +
             0xf5: 'o',   # Latin small letter o with diaeresis
 +
             0xf6: 'o',   # Latin small letter o with slash
 +
             0xf7: 'l',   # Division sign
 +
             0xf8: 'o',   # Latin small letter o with
 +
             0xf9: 'u',   # Latin small letter u with grave
 +
             0xfa: 'u',   # Latin small letter u with acute
 +
             0xfb: 'u',   # Latin small letter u with circumflex
 +
             0xfc: 'u',   # Latin small letter u with diaeresis
 +
             0xfd: 'y',    # Latin small letter y with acute
 +
            0xfe: 'p',    # Latin small letter thorn
 +
            0xff: 'y',   # Latin small letter y with diaeresis
 
             }
 
             }
  
Line 240: Line 314:
  
 
if __name__ == '__main__':
 
if __name__ == '__main__':
     for lang in glob('language_*'):
+
     import sys
         source = open(lang)
+
    input = sys.stdin
         dest = open('tmp_' + lang, 'w')
+
    output = sys.stdout
         for line in source:
+
    if len(sys.argv) == 1 or (len(sys.argv) == 2 and \
             dest.write(latin1_to_ascii(line))
+
      sys.argv[1] in ('-h', '-H', '-?', '--help', '/?', '/H', '/h')):
        source.close()
+
         print 'unicode_hammer.py [infile [outfile]]\n'
        dest.close()
+
        #for python 3.x, changes the following line to s = ''
move(lang, 'orig_' + lang)
+
         s = unicode('','latin-1')
        move('tmp_' + lang, lang)
+
         for c in range(32, 256):
 +
             if c != 0x7f:
 +
                #for python 3.x, change the following line to s += str(chr(c))
 +
                s += unicode(chr(c), 'latin-1')
 +
            plain_ascii = latin1_to_ascii(s)
  
 +
        #for python 3.x, change all of the following print statements to functions (wrap the entire statement in parenthesis)
 +
        print 'INPUT type:', type(s)
 +
        print 'INPUT:'
 +
        print s.encode('latin-1')
 +
        print
 +
        print 'OUTPUT type:', type(plain_ascii)
 +
        print 'OUTPUT:'
 +
        print plain_ascii
 +
        sys.exit()
 +
 +
    if len(sys.argv) > 1:
 +
        input = open(sys.argv[1])
 +
    if len(sys.argv) > 2:
 +
        output = open(sys.argv[2], 'w')
 +
    for line in input:
 +
        output.write(latin1_to_ascii(line))
 +
 +
</pre></p></li>
 +
<li>Open a command prompt and change directory to your <code>raw/objects</code> directory.</li>
 +
<li>Rename the four language files, adding '.orig' to the end of their names:<p><pre>
 +
mv language_DWARF.txt language_DWARF.txt.orig
 +
mv language_ELF.txt language_ELF.txt.orig
 +
mv language_GOBLIN.txt language_GOBLIN.txt.orig
 +
mv language_HUMAN.txt language_HUMAN.txt.orig
 +
</pre></p></li>
 +
<li>Apply the hammer to each of the four language files as follows:<p><pre>
 +
python unicode_hammer.py language_DWARF.txt.orig language_DWARF.txt
 +
python unicode_hammer.py language_ELF.txt.orig language_ELF.txt
 +
python unicode_hammer.py language_GOBLIN.txt.orig language_GOBLIN.txt
 +
python unicode_hammer.py language_HUMAN.txt.orig language_HUMAN.txt
 
</pre></p></li>
 
</pre></p></li>
<li>Double click on the <code>ascii_hammer.py</code> file in the folder.
 
 
<li>Enjoy!</li>
 
<li>Enjoy!</li>
 
</ol>
 
</ol>

Please note that all contributions to Dwarf Fortress Wiki are considered to be released under the GFDL & MIT (see Dwarf Fortress Wiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)