9

Python学习,VNR加载Jbeijing字典

 3 years ago
source link: https://xiaix.me/pythonxue-xi-vnrjia-zai-jbeijingzi-dian/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Python学习,VNR加载Jbeijing字典

发表于 2019-01-02   |   标签 Python   |  

翻译遇到麻烦的地方,就是人物名词、地名、拟声词还有专有名词,这些在中文里面有些是音译(比如地名、人名),如果直接用翻译软件往往会翻出一些奇怪的东西来。这时候就需要用到字典,其实就变成来说就是映射,将符合条件的原文字符串不翻译,直接映射出原文字符串。

打开Jbeijng-》加载字典-》编辑字典

然后找到对应的字典文件

接下来到vnr的代码里面看下是如何加载

  def openAllUserDic(self, paths):
    """Add one user dictionary.
    @param  paths  [unicode]  path to Jcuser.dic, but without ".dic" suffix.
                              At most 3 elements.
    @return  bool
    @raise  WindowsError, AttributeError

    Guessed from OllyDbg:
    //int __cdecl DJC_OpenAllUserDic_Unicode(LPWSTR, LPWSTR unknown, LPWSTR unknown, int unknown)
    int __cdecl DJC_OpenAllUserDic_Unicode(LPWSTR, int unknown)
    Return 1 or -255 if succeeded.

    According to how it is invoked in JCT.exe (push 0), unknown is always 0.
    This function will beep when failed.

    According to how DJC_OpenAllUserDic_Unicode invoke DJC_OpenAllUserDic,
    the first parameter type is supposed to be:
        wchar_t[0x408/sizeof(wchar_t)][3]

    10020150 >/$ 81EC 10060000  SUB ESP,0x610
    10020156  |. A1 50E70710    MOV EAX,DWORD PTR DS:[0x1007E750]
    1002015B  |. 33C4           XOR EAX,ESP
    1002015D  |. 898424 0C06000>MOV DWORD PTR SS:[ESP+0x60C],EAX
    10020164  |. 55             PUSH EBP
    10020165  |. 8B2D A4F20610  MOV EBP,DWORD PTR DS:[<&KERNEL32.WideCha>;  kernel32.WideCharToMultiByte
    1002016B  |. 56             PUSH ESI
    1002016C  |. 8BB424 1C06000>MOV ESI,DWORD PTR SS:[ESP+0x61C]
    10020173  |. 6A 00          PUSH 0x0                                 ; /pDefaultCharUsed = NULL
    10020175  |. 6A 00          PUSH 0x0                                 ; |pDefaultChar = NULL
    10020177  |. 68 04020000    PUSH 0x204                               ; |MultiByteCount = 204 (516.) ; jichi 12/31/2013: 0x204 comes from here
    1002017C  |. 8D4424 14      LEA EAX,DWORD PTR SS:[ESP+0x14]          ; |
    10020180  |. 50             PUSH EAX                                 ; |MultiByteStr
    10020181  |. 6A FF          PUSH -0x1                                ; |WideCharCount = FFFFFFFF (-1.)
    10020183  |. 56             PUSH ESI                                 ; |WideCharStr
    10020184  |. 6A 00          PUSH 0x0                                 ; |Options = 0
    10020186  |. 6A 00          PUSH 0x0                                 ; |CodePage = CP_ACP
    10020188  |. FFD5           CALL EBP                                 ; \WideCharToMultiByte
    1002018A  |. 8BC6           MOV EAX,ESI
    1002018C  |. 8D50 02        LEA EDX,DWORD PTR DS:[EAX+0x2]
    1002018F  |. 90             NOP

    """
    if not paths:
      return False
    if len(paths) > MAX_USERDIC_COUNT:
      print("too many user-defined dictionaries")

    MAX_PATH_LENGTH = USERDIC_PATH_SIZE

    #path = os.path.splitext(path)[0] # remove ".dic" suffix
    buf = self.userdicBuffer
    ctypes.memset(buf, 0, USERDIC_BUFFER_SIZE) # zero memory

    for i in xrange(min(len(paths), MAX_USERDIC_COUNT)):
      path = paths[i]
      if len(path) > MAX_PATH_LENGTH:
        print("path is too long: %s" % path)
        continue
      offset = i * USERDIC_PATH_SIZE
      for index, c in enumerate(path):
        buf[index + offset] = c

    ret = self.dll.DJC_OpenAllUserDic_Unicode(buf, 0)
    return ret in (1,-255)
  #未使用字典
  ret = Loader().translate(u"姫",simplified=True)
  print ret
  ret = Loader().translate(u"WTF",simplified=True)
  print ret

  #加载字典
  l = Loader()
  l.setUserDic((
    u"Y:\JDict\China",
  ))
  ret = Loader().translate(u"姫",simplified=True)
  print ret
  ret = Loader().translate(u"WTF",simplified=True)
  print ret

J北京对流行语,网络用语之类的翻译能力较差,这种时候就需要一个好的字典了,网上都有类似的字典,大部分都用json来存储,下阶段研究下如果通过python自动添加字典。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK