0

PHP7 IntlChar 处理unicode详解

neo created at6 years ago view count: 2573

IntlChar::charAge($codepoint )

用于查看指定的codepoint首次被分配给某个具体字符的unicode版本。

<?php
var_dump(IntlChar::charage("\u{2603}"));
/*
array(4) {
  [0]=>
  int(1)
  [1]=>
  int(1)
  [2]=>
  int(0)
  [3]=>
  int(0)
}
*/
说明2603这个codepoint在Unicode 1.1.0.0中被分配使用。

IntlChar::charDigitValue($codepoint)

获取十进制数字字符的十进制数字值

<?php
// 中文全角1
var_dump(IntlChar::charDigitValue("\u{FF11}"));
// 中文全角2
var_dump(IntlChar::charDigitValue("\u{FF12}"));
// int(1)
// int(2)

IntlChar::charDirection($codepoint )

获取书写方向

相关算法见:http://www.unicode.org/reports/tr9/

比如阿拉伯语言是从右到左的。 获取到书写方向就可以使用CSS中的unicode-bidi属性进行一些自定义设置。

<?php 
// 判断中文书写方向, "\u4E00"为中文字符一。
var_dump(IntlChar::charDirection("\u{4E00}") == IntlChar::CHAR_DIRECTION_LEFT_TO_RIGHT);
// bool(true)

IntlChar::charName($codepoint)

获取字符的名称

var_dump(IntlChar::charName("@"));
// string(13) "COMMERCIAL AT"
var_dump(IntlChar::charName("."));
// string(9) "FULL STOP"

IntlChar::getBidiPairedBracket($codepoint)

获取括号匹配的闭合括号

var_dump(IntlChar::getBidiPairedBracket('['));
var_dump(IntlChar::getBidiPairedBracket(')'));
// string(1) "]"
// string(1) "("

IntlChar::charMirror($codepoint)

获取字符的镜像字符

<?php
var_dump(IntlChar::charMirror("A"));
var_dump(IntlChar::charMirror("<"));
var_dump(IntlChar::charMirror("("));
// string(1) "A"
// string(1) ">"
// string(2) ")"

IntlChar::isMirrored($codepoint)

判断是否有镜像字符

IntlChar::charType($codepoint)

获取字符类型

<?php
var_dump(IntlChar::charType("A") === IntlChar::CHAR_CATEGORY_UPPERCASE_LETTER);
var_dump(IntlChar::charType(".") === IntlChar::CHAR_CATEGORY_OTHER_PUNCTUATION);
var_dump(IntlChar::charType("\t") === IntlChar::CHAR_CATEGORY_CONTROL_CHAR);
var_dump(IntlChar::charType("\u{2603}") === IntlChar::CHAR_CATEGORY_OTHER_SYMBOL);

// bool(true)
// bool(true)
// bool(true)
// bool(true)

支持以下字符类型判断

IntlChar::CHAR_CATEGORY_UNASSIGNED
IntlChar::CHAR_CATEGORY_GENERAL_OTHER_TYPES
IntlChar::CHAR_CATEGORY_UPPERCASE_LETTER
IntlChar::CHAR_CATEGORY_LOWERCASE_LETTER
IntlChar::CHAR_CATEGORY_TITLECASE_LETTER
IntlChar::CHAR_CATEGORY_MODIFIER_LETTER
IntlChar::CHAR_CATEGORY_OTHER_LETTER
IntlChar::CHAR_CATEGORY_NON_SPACING_MARK
IntlChar::CHAR_CATEGORY_ENCLOSING_MARK
IntlChar::CHAR_CATEGORY_COMBINING_SPACING_MARK
IntlChar::CHAR_CATEGORY_DECIMAL_DIGIT_NUMBER
IntlChar::CHAR_CATEGORY_LETTER_NUMBER
IntlChar::CHAR_CATEGORY_OTHER_NUMBER
IntlChar::CHAR_CATEGORY_SPACE_SEPARATOR
IntlChar::CHAR_CATEGORY_LINE_SEPARATOR
IntlChar::CHAR_CATEGORY_PARAGRAPH_SEPARATOR
IntlChar::CHAR_CATEGORY_CONTROL_CHAR
IntlChar::CHAR_CATEGORY_FORMAT_CHAR
IntlChar::CHAR_CATEGORY_PRIVATE_USE_CHAR
IntlChar::CHAR_CATEGORY_SURROGATE
IntlChar::CHAR_CATEGORY_DASH_PUNCTUATION
IntlChar::CHAR_CATEGORY_START_PUNCTUATION
IntlChar::CHAR_CATEGORY_END_PUNCTUATION
IntlChar::CHAR_CATEGORY_CONNECTOR_PUNCTUATION
IntlChar::CHAR_CATEGORY_OTHER_PUNCTUATION
IntlChar::CHAR_CATEGORY_MATH_SYMBOL
IntlChar::CHAR_CATEGORY_CURRENCY_SYMBOL
IntlChar::CHAR_CATEGORY_MODIFIER_SYMBOL
IntlChar::CHAR_CATEGORY_OTHER_SYMBOL
IntlChar::CHAR_CATEGORY_INITIAL_PUNCTUATION
IntlChar::CHAR_CATEGORY_FINAL_PUNCTUATION
IntlChar::CHAR_CATEGORY_CHAR_CATEGORY_COUNT

IntlChar::chr($codepoint)

返回codepoint表示的字符

<?php
$values = ["A", 63, 123, 9731];
foreach ($values as $value) {
    var_dump(IntlChar::chr($value));
}
// string(1) "A"
// string(1) "A"
// string(1) "{"
// string(3) "☃"

IntlChar::enumCharNames ( mixed $start , mixed $limit , callable $callback [, int $nameChoice = IntlChar::UNICODE_CHAR_NAME ] )

返回指定codepoint范围内的字符信息

<?php
IntlChar::enumCharNames(0x2600, 0x2610, function($codepoint, $nameChoice, $name) {
    printf("U+x %s\n", $codepoint, $name);
});

// U+2600 BLACK SUN WITH RAYS
// U+2601 CLOUD
// U+2602 UMBRELLA
// U+2603 SNOWMAN
// U+2604 COMET
// U+2605 BLACK STAR
// U+2606 WHITE STAR
// U+2607 LIGHTNING
// U+2608 THUNDERSTORM
// U+2609 SUN
// U+260a ASCENDING NODE
// U+260b DESCENDING NODE
// U+260c CONJUNCTION
// U+260d OPPOSITION
// U+260e BLACK TELEPHONE
// U+260f WHITE TELEPHONE
report
回复

Recent search keywords