9
Python学习,VNR加载Jbeijing字典
source link: https://xiaix.me/pythonxue-xi-vnrjia-zai-jbeijingzi-dian/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Python学习,VNR加载Jbeijing字典
发表于
2019-01-02
|
标签
Python
|
翻译遇到麻烦的地方,就是人物名词、地名、拟声词还有专有名词,这些在中文里面有些是音译(比如地名、人名),如果直接用翻译软件往往会翻出一些奇怪的东西来。这时候就需要用到字典,其实就变成来说就是映射,将符合条件的原文字符串不翻译,直接映射出原文字符串。
打开Jbeijng-》加载字典-》编辑字典
然后找到对应的字典文件
接下来到vnr的代码里面看下是如何加载
def openAllUserDic(self, paths):
"""Add one user dictionary.
@param paths [unicode] path to Jcuser.dic, but without ".dic" suffix.
At most 3 elements.
@return bool
@raise WindowsError, AttributeError
Guessed from OllyDbg:
//int __cdecl DJC_OpenAllUserDic_Unicode(LPWSTR, LPWSTR unknown, LPWSTR unknown, int unknown)
int __cdecl DJC_OpenAllUserDic_Unicode(LPWSTR, int unknown)
Return 1 or -255 if succeeded.
According to how it is invoked in JCT.exe (push 0), unknown is always 0.
This function will beep when failed.
According to how DJC_OpenAllUserDic_Unicode invoke DJC_OpenAllUserDic,
the first parameter type is supposed to be:
wchar_t[0x408/sizeof(wchar_t)][3]
10020150 >/$ 81EC 10060000 SUB ESP,0x610
10020156 |. A1 50E70710 MOV EAX,DWORD PTR DS:[0x1007E750]
1002015B |. 33C4 XOR EAX,ESP
1002015D |. 898424 0C06000>MOV DWORD PTR SS:[ESP+0x60C],EAX
10020164 |. 55 PUSH EBP
10020165 |. 8B2D A4F20610 MOV EBP,DWORD PTR DS:[<&KERNEL32.WideCha>; kernel32.WideCharToMultiByte
1002016B |. 56 PUSH ESI
1002016C |. 8BB424 1C06000>MOV ESI,DWORD PTR SS:[ESP+0x61C]
10020173 |. 6A 00 PUSH 0x0 ; /pDefaultCharUsed = NULL
10020175 |. 6A 00 PUSH 0x0 ; |pDefaultChar = NULL
10020177 |. 68 04020000 PUSH 0x204 ; |MultiByteCount = 204 (516.) ; jichi 12/31/2013: 0x204 comes from here
1002017C |. 8D4424 14 LEA EAX,DWORD PTR SS:[ESP+0x14] ; |
10020180 |. 50 PUSH EAX ; |MultiByteStr
10020181 |. 6A FF PUSH -0x1 ; |WideCharCount = FFFFFFFF (-1.)
10020183 |. 56 PUSH ESI ; |WideCharStr
10020184 |. 6A 00 PUSH 0x0 ; |Options = 0
10020186 |. 6A 00 PUSH 0x0 ; |CodePage = CP_ACP
10020188 |. FFD5 CALL EBP ; \WideCharToMultiByte
1002018A |. 8BC6 MOV EAX,ESI
1002018C |. 8D50 02 LEA EDX,DWORD PTR DS:[EAX+0x2]
1002018F |. 90 NOP
"""
if not paths:
return False
if len(paths) > MAX_USERDIC_COUNT:
print("too many user-defined dictionaries")
MAX_PATH_LENGTH = USERDIC_PATH_SIZE
#path = os.path.splitext(path)[0] # remove ".dic" suffix
buf = self.userdicBuffer
ctypes.memset(buf, 0, USERDIC_BUFFER_SIZE) # zero memory
for i in xrange(min(len(paths), MAX_USERDIC_COUNT)):
path = paths[i]
if len(path) > MAX_PATH_LENGTH:
print("path is too long: %s" % path)
continue
offset = i * USERDIC_PATH_SIZE
for index, c in enumerate(path):
buf[index + offset] = c
ret = self.dll.DJC_OpenAllUserDic_Unicode(buf, 0)
return ret in (1,-255)
#未使用字典
ret = Loader().translate(u"姫",simplified=True)
print ret
ret = Loader().translate(u"WTF",simplified=True)
print ret
#加载字典
l = Loader()
l.setUserDic((
u"Y:\JDict\China",
))
ret = Loader().translate(u"姫",simplified=True)
print ret
ret = Loader().translate(u"WTF",simplified=True)
print ret
J北京对流行语,网络用语之类的翻译能力较差,这种时候就需要一个好的字典了,网上都有类似的字典,大部分都用json来存储,下阶段研究下如果通过python自动添加字典。
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK