Siripong's English Blog: Custom Dictionary

ในการสร้างพจนานุกรมส่วนตัว โดยที่เราเขียนแต่ในส่วนข้อมูล
แล้วใช้โปรแกรมที่มีแจกฟรี เช่น GoldenDict หรือ StarDict ในการแสดงผลนั้น
เรามีทางเลือกในการเขียนในส่วนข้อมูลอยู่หลายวิธี เช่น

ข้อดีข้อเสีย จุดเด่น-จุดด้อย

น่าจะสะดวกสุด ถ้าไม่ได้ต้องการการแสดงผลที่สวยงาม

เป็น text fie ธรรมดา ไม่ต้องแปลงเป็น binary file ก่อนเพื่อแสดงผลเหมือน gls

สามารถ embeded html เพื่อตกแต่งให้สวยงาม อ่านง่ายได้
รองรับ keyword มากกว่า 1 ตัว สำหรับคำอธิบายหนึ่งๆ

เช่น กำหนดให้ search ด้วย teach |taught แล้วได้ผลลัพทธ์เดียวกัน

ข้อด้อยที่เด่นชัด คือ ในส่วนของคำอธิบายต้องเขียนรวมอยู่ในบรรทัดเดียว

เพื่อให้สร้างไฟล์ gls ให้เราสามารถอ่านได้ง่ายผ่าน text editor ธรรมดา (เหมือน dsl)
และ ยังคงจุดเด่นต่างๆ ข้างต้น ของ gls ไว้
เลยต้องมีขั้นตอนพิเศษ จึงเป็นที่มาของบทความอันนี้

เริ่มต้นเราเขียนในรูปแบบนี้

#2011-04-13
สลายการชุมนุม
= disperse
- He'd crackdown on PAD demosstrators if they failed to disperse.
ร้านอาหาร
= diner: a small, usually cheap, restaurant
- This diner has reservations about its clients.
* diner != dinner
ฝ่าไฟแดง
= Don't run red light.
แต่งงาน|สมรส
- The Monash University report suggests that <u>wedlock</u> was incresingly becoming the province of the well-educated and wealthy.
- Before <u>settling into married life</u>, Holly worked at Disney World for six years.
frozen
( โฟร เส่น ไม่ใช่ ฟรอส เส้น
- frozen food
ผู้รักษาประตู
= Goalkeeper ย่อว่า GK ผู้รักษาประตูฟุตบอล
~ doorkeeper คนเฝ้าประตูโรงแรม

ซึ่งอ่านง่าย แก้ไขง่าย (ไม่ต้องรวมคำอธิบายเป็นบรรทัดเดียว ตามข้อกำหนดของ gls)
นำหน้าแต่ละบรรทัดด้วยสัญลักษณ์ ซึ่งเมื่อข้อมูลถูกแปลงไปไฟล์ output ก็จะถูกแทนที่ด้วยสีต่างๆ
(พิมพ์สัญลักษณ์ตัวเดียว สะดวกกว่า การกำหนดสีด้วยวิธี gls แบบเดิม)
สีที่จะถูกใช้สำหรับสัญลักษณ์ต่างๆ สามารถกำหนดเองได้ ในภายหลัง
แทรก html tag ได้ ( เช่น tag u ในการขีดเส้นใต้ คำว่า wedlock เป็นต้น)

(และถ้าไม่ทำในขั้นตอนถัดๆไป ก็สามารถบันทึกข้อมูลไฟล์เป็นนามสกุล dsl แล้วนำไปใช้งานได้เลย)

หลังจากที่เขียนไฟล์ฐานข้อมูลต้นฉบับเสร็จ
ก็เพียงใช้โปรแกรมแปลงข้อมูลในไฟล์ ให้อยู่ในรูปแบบเพื่อการแสดงผล ก็เสร็จแล้ว

ซึ่งรูปแบบข้างต้น น่าจะทำให้การจดบันทึกของเราสะดวกขึ้น

ตัวอย่างเมื่อนำไปใช้งาน

(พจนานุกรมไทย-อังกฤษ ที่ค่อนข้างดี ก็ไม่มีอยู่ในรูปที่เราสืบค้นได้ง่าย
ที่แจกๆกัน ก็ยังไม่ถึงจุดที่สมบูรณ์ เหมือนของต่างชาติ
ช่วยๆกันทำสะสมไว้เพื่อลูกเพื่อหลานกันนะครับ)

ขั้นตอน

(หลังจากลงโปรแกรม และกำหนด environment ตามรายการด้านล่างแล้ว)
แก้ไขไฟล์ต้นฉบับฐานข้อมูล (หัวข้อที่ 1 ในภาพ)
แปลงไฟล์ต้นฉบับฐานข้อมูล ให้อยู่ในรูปแบบมาตราฐานของ StarDict (หัวข้อที่ 2 ในภาพ)

c:\Python27\python.exe formatter.py < input.txt > output.txt

ทำการแปลงไฟล์ให้อยู่ในรูปแบบ binary ที่พร้อมให้ GoldenDict นำไปแสดงผล ด้วยโปรแกรม StarDict-Editor (หัวข้อที่ 3 ในภาพด้านบน)

หัวข้อที่ 4 ก็กำหนดให้ GoldenDict มาเปิดพจนานุกรมของเรา
หัวข้อที่ 5 ทำเพื่อตกแต่งการแสดงผล บนหน้าจอของ GoldenDict (ซึ่งจะทำก็ได้ หรือไม่ทำก็ได้)

สภาพแวดล้อมที่ต้องการในการสร้างพจนานุกรมส่วนตัว

MS Windows XP
Text Editor เช่น Notepad
StarDict-Editor

http://stardict.sourceforge.net/download.php

GoldenDict
Python

http://www.python.org/ftp/python/2.7.1/python-2.7.1.msi

article-style.css

เป็นไฟล์ที่ใช้กำหนดรูปแบบการแสดงผลของ GoldenDict (เช่น สีของตัวอักษร สีพื้นหลังตัวอักษร เป็นต้น)
ถ้านำฐานข้อมูลที่ได้ไปแสดงผลด้วยโปรแกรมอื่น เช่น QStarDict ก็ไม่ต้องทำในส่วนนี้
สร้างไฟล์นี้ ใน folder C:\Documents and Settings\{USERNAME}\Application Data\GoldenDict\ โดยที่ตัวอย่างของข้อมูลในไฟล์ อยู่ด้านล่างของบทความ
เพิ่มเติมใน [url: decorate-goldendict-by-css.html]

รูปแบบในไฟล์ต้นฉบับฐานข้อมูล

บรรทัดแรก เป็นบรรทัดว่าง
หรือ Comment ใดๆ เช่น วันที่แก้ไขล่าสุด
บันทึกไฟล์ในรูปแบบ UTF-8
keyword :

เขียนชิดขอบซ้ายสุด ถ้ามีหลายคำใช้ สัญลักษณ์ | คั่น
เช่น คำว่า "แต่งงาน" ในตัวอย่างข้างต้น
เป็น keyword ภาษาอะไรก็ได้ ไม่ว่าอังกฤษ ไทย ญี่ปุน
โดยที่เขียนรวมกันได้ เช่น sleep|นอน เป็นต้น

ส่วนคำแปลหรือคำอธิบาย :

space + symbol + space + text
symbol ที่ถูกกำหนดขึ้นได้แก่

( คำอ่าน เช่น บรรทัด "โฟรเส่น" ในตัวอย่างข้างต้น
= คำแปล
* หมายเหตุ เช่น อาจใช้เขียนว่าเราพบเจอศัพท์คำนี้ที่ไหน
~ คำใกล้เคียง เช่น บรรทัด "doorkeeper" ในตัวอย่างข้างต้น
! คำตรงกันข้าม
> เพิ่มเติม

มีมากกว่าหนึ่งบรรทัดได้

ส่วนตัวอย่างประโยค :

space + space + symbol + space + text
symbol ที่ถูกกำหนดขึ้นได้แก่

- ตัวอย่างประโยค
* หมายเหตุ

มีมากกว่าหนึ่งบรรทัดได้
เช่น คำว่า "แต่งงาน" ในตัวอย่างข้างต้น

formatter.py

import sys
import codecs
#sys.stdout = codecs.getwriter('utf8')(sys.stdout)

print codecs.BOM_UTF8
print u"#bookname=My English Notes"
print u"#description=Dictionary with incredible definitions."
print u"#author=Siripong"
print u"#stripmethod=stripnewline"
print u"#sametypesequence=h",

lines = sys.stdin.readlines()
#lines[0] = u'' #Remove a byte-order mark (BOM)

for line in lines[1:]:
    #print len(line.lstrip()),
    #if len(line.lstrip()) != 0:
    #line = line.decode('utf8')
        if not line.startswith(' '):
           # keyword
           print "\n\n" + line[:-1]
        else:
           if line.startswith(' '):
              if line[2] == '-':
                 print u'<div class="sample">' + line[4:-1] + u'</div>',
              elif line[2] == '*':
                 print u'<div class="snote">' + line[4:-1] + u'</div>',
              else:
                 print line[:-1],
           else: #line.startswith(' '):
              if line[1] == '(':
                 print u'<div class="pronounce">' + line[3:-1] + u'</div>'
              elif line[1] == '=':
                 print u'<div class="meaning">' + line[3:-1] + u'</div>'
              elif line[1] == '*':
                 print u'<div class="mnote">' + line[3:-1] + u'</div>'
              elif line[1] == '+':
                 print u'<div class="next-meaning">' + line[3:-1] + u'</div>'
              elif line[1] == '~':
                 print u'<div class="relate">' + line[3:-1] + u'</div>'
              elif line[1] == '!':
                 print u'<div class="opposite">' + line[3:-1] + u'</div>'
              elif line[1] == '>':
                 print u'<div class="moreinfo">' + line[3:-1] + u'</div>'
              else:
                 print line[:-1],
print "\n"

article-style.css

div.pronounce { text-indent: 9px; color: #093B8F; font-weight: bolder;}
div.meaning { text-indent: 9px; color: #B93B8F; font-weight: bolder;}
div.mnote { text-indent: 9px; color: #808080; }
div.next-meaning { text-indent: 9px; color: #B93B8F; }
div.relate { text-indent: 9px; color: #B9FB80; font-weight: bolder;}
div.opposite { text-indent: 9px; color: #8F8F8F; }
div.moreinfo { text-indent: 9px; color: #09F08F; font-weight: bolder;}
div.moreinfo:before { content: 'more information at '; }
div.sample { text-indent: 18px; color: #7E2217; font-weight: bolder;}
div.sample:before { content: '- '; }
div.snote { text-indent: 24px; color: #6D7B8D; }

formatter.bat

@ECHO OFF
cls
c:\Python27\python.exe formatter.py < Src.txt > MyEnglishNotes.txt
IF ERRORLEVEL == 1 GOTO WAIT
start notepad.exe MyEnglishNotes.txt
GOTO END
:WAIT
pause
:END

Siripong's English Blog

Saturday, April 9, 2011

Custom Dictionary

No comments:

Post a Comment