thctype.h

Langue: en

Autres versions - même langue

Version: 378018 (fedora - 01/12/10)

Section: 3 (Bibliothèques de fonctions)

NAME

thai/thctype.h -

Thai character classifications.

SYNOPSIS


Functions


int th_istis (thchar_t c)
Is the character a valid TIS-620 code?
int th_isthai (thchar_t c)
Is the character a Thai character?
int th_iseng (thchar_t c)
Is the character an English character?
int th_isthcons (thchar_t c)
Is the character a Thai consonant?
int th_isthvowel (thchar_t c)
Is the character a Thai vowel?
int th_isthtone (thchar_t c)
Is the character a Thai tone mark?
int th_isthdiac (thchar_t c)
Is the character a Thai diacritic?
int th_isthdigit (thchar_t c)
Is the character a Thai digit?
int th_isthpunct (thchar_t c)
Is the character a Thai punctuation?
int th_istaillesscons (thchar_t c)
Is the character a Thai consonant that fits the x-height?
int th_isovershootcons (thchar_t c)
Is the character a Thai consonant with stem above ascender?
int th_isundershootcons (thchar_t c)
Is the character a Thai consonant with stem below baseline?
int th_isundersplitcons (thchar_t c)
Is the character a Thai consonant with split part below baseline?
int th_isldvowel (thchar_t c)
Is the character a Thai leading vowel?
int th_isflvowel (thchar_t c)
Is the character a Thai following vowel?
int th_isupvowel (thchar_t c)
Is the character a Thai upper vowel?
int th_isblvowel (thchar_t c)
Is the character a Thai below vowel?
int th_chlevel (thchar_t c)
Position for rendering:.
int th_iscombchar (thchar_t c)
Is the character a combining character?

Detailed Description

Thai character classifications.

The Thai Standard Industrial Standards Institute (TIS) defined the Thai character set for using with computer named TIS-620. This character set is 8-bit encoded including both English and Thai characters. Aliases of TIS-620 are TIS620, TIS620-0, TIS620.2529-1, TIS620.2533-0 and ISO-IR-166.

The followings are the enconding values in hexadecimal, unicode values and their names.

 
  0x00   <U0000> NULL (NUL)
  0x01   <U0001> START OF HEADING (SOH)
  0x02   <U0002> START OF TEXT (STX)
  0x03   <U0003> END OF TEXT (ETX)
  0x04   <U0004> END OF TRANSMISSION (EOT)
  0x05   <U0005> ENQUIRY (ENQ)
  0x06   <U0006> ACKNOWLEDGE (ACK)
  0x07   <U0007> BELL (BEL)
  0x08   <U0008> BACKSPACE (BS)
  0x09   <U0009> CHARACTER TABULATION (HT)
  0x0A   <U000A> LINE FEED (LF)
  0x0B   <U000B> LINE TABULATION (VT)
  0x0C   <U000C> FORM FEED (FF)
  0x0D   <U000D> CARRIAGE RETURN (CR)
  0x0E   <U000E> SHIFT OUT (SO)
  0x0F   <U000F> SHIFT IN (SI)
  0x10   <U0010> DATALINK ESCAPE (DLE)
  0x11   <U0011> DEVICE CONTROL ONE (DC1)
  0x12   <U0012> DEVICE CONTROL TWO (DC2)
  0x13   <U0013> DEVICE CONTROL THREE (DC3)
  0x14   <U0014> DEVICE CONTROL FOUR (DC4)
  0x15   <U0015> NEGATIVE ACKNOWLEDGE (NAK)
  0x16   <U0016> SYNCHRONOUS IDLE (SYN)
  0x17   <U0017> END OF TRANSMISSION BLOCK (ETB)
  0x18   <U0018> CANCEL (CAN)
  0x19   <U0019> END OF MEDIUM (EM)
  0x1A   <U001A> SUBSTITUTE (SUB)
  0x1B   <U001B> ESCAPE (ESC)
  0x1C   <U001C> FILE SEPARATOR (IS4)
  0x1D   <U001D> GROUP SEPARATOR (IS3)
  0x1E   <U001E> RECORD SEPARATOR (IS2)
  0x1F   <U001F> UNIT SEPARATOR (IS1)
  0x20   <U0020> SPACE
  0x21   <U0021> EXCLAMATION MARK
  0x22   <U0022> QUOTATION MARK
  0x23   <U0023> NUMBER SIGN
  0x24   <U0024> DOLLAR SIGN
  0x25   <U0025> PERCENT SIGN
  0x26   <U0026> AMPERSAND
  0x27   <U0027> APOSTROPHE
  0x28   <U0028> LEFT PARENTHESIS
  0x29   <U0029> RIGHT PARENTHESIS
  0x2A   <U002A> ASTERISK
  0x2B   <U002B> PLUS SIGN
  0x2C   <U002C> COMMA
  0x2D   <U002D> HYPHEN-MINUS
  0x2E   <U002E> FULL STOP
  0x2F   <U002F> SOLIDUS
  0x30   <U0030> DIGIT ZERO
  0x31   <U0031> DIGIT ONE
  0x32   <U0032> DIGIT TWO
  0x33   <U0033> DIGIT THREE
  0x34   <U0034> DIGIT FOUR
  0x35   <U0035> DIGIT FIVE
  0x36   <U0036> DIGIT SIX
  0x37   <U0037> DIGIT SEVEN
  0x38   <U0038> DIGIT EIGHT
  0x39   <U0039> DIGIT NINE
  0x3A   <U003A> COLON
  0x3B   <U003B> SEMICOLON
  0x3C   <U003C> LESS-THAN SIGN
  0x3D   <U003D> EQUALS SIGN
  0x3E   <U003E> GREATER-THAN SIGN
  0x3F   <U003F> QUESTION MARK
  0x40   <U0040> COMMERCIAL AT
  0x41   <U0041> LATIN CAPITAL LETTER A
  0x42   <U0042> LATIN CAPITAL LETTER B
  0x43   <U0043> LATIN CAPITAL LETTER C
  0x44   <U0044> LATIN CAPITAL LETTER D
  0x45   <U0045> LATIN CAPITAL LETTER E
  0x46   <U0046> LATIN CAPITAL LETTER F
  0x47   <U0047> LATIN CAPITAL LETTER G
  0x48   <U0048> LATIN CAPITAL LETTER H
  0x49   <U0049> LATIN CAPITAL LETTER I
  0x4A   <U004A> LATIN CAPITAL LETTER J
  0x4B   <U004B> LATIN CAPITAL LETTER K
  0x4C   <U004C> LATIN CAPITAL LETTER L
  0x4D   <U004D> LATIN CAPITAL LETTER M
  0x4E   <U004E> LATIN CAPITAL LETTER N
  0x4F   <U004F> LATIN CAPITAL LETTER O
  0x50   <U0050> LATIN CAPITAL LETTER P
  0x51   <U0051> LATIN CAPITAL LETTER Q
  0x52   <U0052> LATIN CAPITAL LETTER R
  0x53   <U0053> LATIN CAPITAL LETTER S
  0x54   <U0054> LATIN CAPITAL LETTER T
  0x55   <U0055> LATIN CAPITAL LETTER U
  0x56   <U0056> LATIN CAPITAL LETTER V
  0x57   <U0057> LATIN CAPITAL LETTER W
  0x58   <U0058> LATIN CAPITAL LETTER X
  0x59   <U0059> LATIN CAPITAL LETTER Y
  0x5A   <U005A> LATIN CAPITAL LETTER Z
  0x5B   <U005B> LEFT SQUARE BRACKET
  0x5C   <U005C> REVERSE SOLIDUS
  0x5D   <U005D> RIGHT SQUARE BRACKET
  0x5E   <U005E> CIRCUMFLEX ACCENT
  0x5F   <U005F> LOW LINE
  0x60   <U0060> GRAVE ACCENT
  0x61   <U0061> LATIN SMALL LETTER A
  0x62   <U0062> LATIN SMALL LETTER B
  0x63   <U0063> LATIN SMALL LETTER C
  0x64   <U0064> LATIN SMALL LETTER D
  0x65   <U0065> LATIN SMALL LETTER E
  0x66   <U0066> LATIN SMALL LETTER F
  0x67   <U0067> LATIN SMALL LETTER G
  0x68   <U0068> LATIN SMALL LETTER H
  0x69   <U0069> LATIN SMALL LETTER I
  0x6A   <U006A> LATIN SMALL LETTER J
  0x6B   <U006B> LATIN SMALL LETTER K
  0x6C   <U006C> LATIN SMALL LETTER L
  0x6D   <U006D> LATIN SMALL LETTER M
  0x6E   <U006E> LATIN SMALL LETTER N
  0x6F   <U006F> LATIN SMALL LETTER O
  0x70   <U0070> LATIN SMALL LETTER P
  0x71   <U0071> LATIN SMALL LETTER Q
  0x72   <U0072> LATIN SMALL LETTER R
  0x73   <U0073> LATIN SMALL LETTER S
  0x74   <U0074> LATIN SMALL LETTER T
  0x75   <U0075> LATIN SMALL LETTER U
  0x76   <U0076> LATIN SMALL LETTER V
  0x77   <U0077> LATIN SMALL LETTER W
  0x78   <U0078> LATIN SMALL LETTER X
  0x79   <U0079> LATIN SMALL LETTER Y
  0x7A   <U007A> LATIN SMALL LETTER Z
  0x7B   <U007B> LEFT CURLY BRACKET
  0x7C   <U007C> VERTICAL LINE
  0x7D   <U007D> RIGHT CURLY BRACKET
  0x7E   <U007E> TILDE
  0x7F   <U007F> DELETE (DEL)
  0xA1   <U0E01> THAI CHARACTER KO KAI
  0xA2   <U0E02> THAI CHARACTER KHO KHAI
  0xA3   <U0E03> THAI CHARACTER KHO KHUAT
  0xA4   <U0E04> THAI CHARACTER KHO KHWAI
  0xA5   <U0E05> THAI CHARACTER KHO KHON
  0xA6   <U0E06> THAI CHARACTER KHO RAKHANG
  0xA7   <U0E07> THAI CHARACTER NGO NGU
  0xA8   <U0E08> THAI CHARACTER CHO CHAN
  0xA9   <U0E09> THAI CHARACTER CHO CHING
  0xAA   <U0E0A> THAI CHARACTER CHO CHANG
  0xAB   <U0E0B> THAI CHARACTER SO SO
  0xAC   <U0E0C> THAI CHARACTER CHO CHOE
  0xAD   <U0E0D> THAI CHARACTER YO YING
  0xAE   <U0E0E> THAI CHARACTER DO CHADA
  0xAF   <U0E0F> THAI CHARACTER TO PATAK
  0xB0   <U0E10> THAI CHARACTER THO THAN
  0xB1   <U0E11> THAI CHARACTER THO NANGMONTHO
  0xB2   <U0E12> THAI CHARACTER THO PHUTHAO
  0xB3   <U0E13> THAI CHARACTER NO NEN
  0xB4   <U0E14> THAI CHARACTER DO DEK
  0xB5   <U0E15> THAI CHARACTER TO TAO
  0xB6   <U0E16> THAI CHARACTER THO THUNG
  0xB7   <U0E17> THAI CHARACTER THO THAHAN
  0xB8   <U0E18> THAI CHARACTER THO THONG
  0xB9   <U0E19> THAI CHARACTER NO NU
  0xBA   <U0E1A> THAI CHARACTER BO BAIMAI
  0xBB   <U0E1B> THAI CHARACTER PO PLA
  0xBC   <U0E1C> THAI CHARACTER PHO PHUNG
  0xBD   <U0E1D> THAI CHARACTER FO FA
  0xBE   <U0E1E> THAI CHARACTER PHO PHAN
  0xBF   <U0E1F> THAI CHARACTER FO FAN
  0xC0   <U0E20> THAI CHARACTER PHO SAMPHAO
  0xC1   <U0E21> THAI CHARACTER MO MA
  0xC2   <U0E22> THAI CHARACTER YO YAK
  0xC3   <U0E23> THAI CHARACTER RO RUA
  0xC4   <U0E24> THAI CHARACTER RU
  0xC5   <U0E25> THAI CHARACTER LO LING
  0xC6   <U0E26> THAI CHARACTER LU
  0xC7   <U0E27> THAI CHARACTER WO WAEN
  0xC8   <U0E28> THAI CHARACTER SO SALA
  0xC9   <U0E29> THAI CHARACTER SO RUSI
  0xCA   <U0E2A> THAI CHARACTER SO SUA
  0xCB   <U0E2B> THAI CHARACTER HO HIP
  0xCC   <U0E2C> THAI CHARACTER LO CHULA
  0xCD   <U0E2D> THAI CHARACTER O ANG
  0xCE   <U0E2E> THAI CHARACTER HO NOKHUK
  0xCF   <U0E2F> THAI CHARACTER PAIYANNOI
  0xD0   <U0E30> THAI CHARACTER SARA A
  0xD1   <U0E31> THAI CHARACTER MAI HAN-AKAT
  0xD2   <U0E32> THAI CHARACTER SARA AA
  0xD3   <U0E33> THAI CHARACTER SARA AM
  0xD4   <U0E34> THAI CHARACTER SARA I
  0xD5   <U0E35> THAI CHARACTER SARA II
  0xD6   <U0E36> THAI CHARACTER SARA UE
  0xD7   <U0E37> THAI CHARACTER SARA UEE
  0xD8   <U0E38> THAI CHARACTER SARA U
  0xD9   <U0E39> THAI CHARACTER SARA UU
  0xDA   <U0E3A> THAI CHARACTER PHINTHU
  0xDF   <U0E3F> THAI CHARACTER SYMBOL BAHT
  0xE0   <U0E40> THAI CHARACTER SARA E
  0xE1   <U0E41> THAI CHARACTER SARA AE
  0xE2   <U0E42> THAI CHARACTER SARA O
  0xE3   <U0E43> THAI CHARACTER SARA AI MAIMUAN
  0xE4   <U0E44> THAI CHARACTER SARA AI MAIMALAI
  0xE5   <U0E45> THAI CHARACTER LAKKHANGYAO
  0xE6   <U0E46> THAI CHARACTER MAIYAMOK
  0xE7   <U0E47> THAI CHARACTER MAITAIKHU
  0xE8   <U0E48> THAI CHARACTER MAI EK
  0xE9   <U0E49> THAI CHARACTER MAI THO
  0xEA   <U0E4A> THAI CHARACTER MAI TRI
  0xEB   <U0E4B> THAI CHARACTER MAI CHATTAWA
  0xEC   <U0E4C> THAI CHARACTER THANTHAKHAT
  0xED   <U0E4D> THAI CHARACTER NIKHAHIT
  0xEE   <U0E4E> THAI CHARACTER YAMAKKAN
  0xEF   <U0E4F> THAI CHARACTER FONGMAN
  0xF0   <U0E50> THAI DIGIT ZERO
  0xF1   <U0E51> THAI DIGIT ONE
  0xF2   <U0E52> THAI DIGIT TWO
  0xF3   <U0E53> THAI DIGIT THREE
  0xF4   <U0E54> THAI DIGIT FOUR
  0xF5   <U0E55> THAI DIGIT FIVE
  0xF6   <U0E56> THAI DIGIT SIX
  0xF7   <U0E57> THAI DIGIT SEVEN
  0xF8   <U0E58> THAI DIGIT EIGHT
  0xF9   <U0E59> THAI DIGIT NINE
  0xFA   <U0E5A> THAI CHARACTER ANGKHANKHU
  0xFB   <U0E5B> THAI CHARACTER KHOMUT
  
 

Thai characters consist of 44 consonants, vowels, tonemarks, diacritics and Thai digits. Thai vowels are divided into 4 groups, Leading Vowels (LV), Following Vowels (FV), Below Vowels (BV) and Above Vowels (AV). There are 4 tonemarks whose position is above a consonant. Diacritics are divided into 2 groups, Above Diacritics (AD) and Below Diacritics (BD).

Libthai has defined 4 levels for the position of a character.

*
Below level: a character is placed below the consonant. th_chlevel() will return the value -1 for these characters.
*
Base level: this includes consonants, FV and LV. A character is placed on baseline. th_chlevel() will return the value 0 for these characters.
*
Above level: a character is placed just above the consonant. th_chlevel() will return the value 1 for these characters.
*
Top level: this includes tone marks and diacritics. For plain character cell rendering, it is safe to put these characters at top-most level. However, some rendering engines may lower them down on absence of character at Above level, for typographical quality. th_chlevel() will return the value 2 for these characters.

There is an extra level value 3 for certain characters which are usually classified as characters at Above level, but are also allowed to be placed at Top level for some rare cases. Two characters fall in this category, namely MAITAIKHU and NIKHAHIT.

MAITAIKHU can be placed at Top level when writing some minority languages such as Kuy, to shorten some syllables with compound vowels, such as Sara Ia and Sara Uea. NIKHAHIT can be placed at Top level in Pali/Sanskrit words, to represent -ng final sound above SARA I.

The following figure illustrates a Thai word and characters' level.

 
  --------------------------- Top(2) 
  ------*-------------------- Top(2) 
  ------*-------------------- Top(2) 
  ---------------------------
  --------------------------- Above(1)
  ------*---------------*---- Above(1)
  ---****---------------*---- Above(1)
  --------------------------- Above(1)
  ---------------------------
  --------------------------- Base(0) 
  --*---*----***-----*--*---- Base(0) 
  -*-*-*-*--*---*---*-*-*---- Base(0) 
  --**-*-*------*---**--*---- Base(0) 
  ---**--*---*--*---*---*---- Base(0) 
  ---**--*--*-*-*----*--*---- Base(0) 
  ---*---*--**--*---*---*---- Base(0) 
  ---*---*--*---*---*---*---- Base(0) 
  ---*---*--*****---*****---- Base(0) 
  --------------------------- Baseline
  --------------------------- Below(-1)
  -------------------**-*---- Below(-1)
  --------------------***---- Below(-1)
  --------------------------- Below(-1)
  
 

A character placed at below, above or top level is also called dead character. It is usually combined with a consonant, after a dead character is typed, the cursor will not be advanced to the next display cell. BV, BD, TONE, AD and AV are classified as dead character.

Function Documentation

int th_chlevel (thchar_t c)

Position for rendering:. .PD 0

*
3 = above/top
*
2 = top
*
1 = above
*
0 = base
*
-1 = below

int th_istis (thchar_t c)

Is the character a valid TIS-620 code? TIS-620 here means US-ASCII plus TIS-620 extension. Character codes in CR area (0x80-0x9f), non-breaking space (0xa0), code gap range (0xdb-0xde and 0xfc-0xff) are excluded.

Author

Generated automatically by Doxygen for libthai from the source code.