37

[String] a new component for object-oriented strings management with an abstract...

 4 years ago
source link: https://github.com/symfony/symfony/pull/33553
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
QABranch?4.4Bug fix?noNew feature?yesDeprecations?noTickets-LicenseMITDoc PR-This is a reboot of #22184 (thanks @hhamon for working on it) and a generalization of my previous work on the topic (patchwork/utf8). Unlike existing libraries (including patchwork/utf8), this component provides a unified API for the 3 unit systems of strings: bytes, code points and grapheme clusters.The unified API is defined by the AbstractString class. It has 2 direct child classes: BinaryString and AbstractUnicodeString, itself extended by Utf8String and GraphemeString.All objects are immutable and provide clear edge-case semantics, using exceptions and/or (nullable) types!Three helper functions are provided to create such strings:new GraphemeString('foo') == g('foo');new Utf8String('foo') == u('foo');new BinaryString('foo') == b('foo');GraphemeString is the most linguistic-friendly variant of them, which means it's the one ppl should use most of the time when dealing with written text.Future ideas:improve testsadd more docblocks (only where they'd add value!)consider adding more methods in the string API (is*()?, *Encode()?, etc.)first class Emoji supportmerge the Inflector component into this oneuse width() to improve truncate() and wordwrap()move method slug() to a dedicated locale-aware service classpropose your ideas (send PRs after merge)Out of (current) scope:what intl provides (collations, transliterations, confusables, segmentation, etc)Here is the unified API I'm proposing in this PR, borrowed from looking at many existing libraries, but also Java, Python, JavaScript and Go.function __construct(string $string = '');static function unwrap(array $values): arraystatic function wrap(array $values): arrayfunction after($needle, bool $includeNeedle = false, int $offset = 0): self;function afterLast($needle, bool $includeNeedle = false, int $offset = 0): self;function append(string ...$suffix): self;function before($needle, bool $includeNeedle = false, int $offset = 0): self;function beforeLast($needle, bool $includeNeedle = false, int $offset = 0): self;function camel(): self;function contains($needle, int $offset = 0): bool;function chunk(int $length = 1): array;function collapseWhitespace(): selffunction endsWith($suffix): bool;function ensureEnd(string $suffix): self;function ensureStart(string $prefix): self;function equalsTo($string): bool;function folded(): self;function ignoreCase(): self;function indexOf($needle, int $offset = 0): ?int;function indexOfLast($needle, int $offset = 0): ?int;function isEmpty(): bool;function join(array $strings): self;function jsonSerialize(): string;function length(): int;function lower(): self;function match(string $pattern, int $flags = 0, int $offset = 0): array;function padBoth(int $length, string $padStr = ' '): self;function padEnd(int $length, string $padStr = ' '): self;function padStart(int $length, string $padStr = ' '): self;function prepend(string ...$prefix): self;function repeat(int $multiplier): self;function replace(string $from, string $to): self;function replaceMatches(string $fromPattern, $to): self;function slice(int $start = 0, int $length = null): self;function snake(): self;function splice(string $replacement, int $start = 0, int $length = null): self;function split(string $delimiter, int $limit = null, int $flags = null): array;function startsWith($prefix): bool;function title(bool $allWords = false): self;function toBinary(string $toEncoding = null): BinaryString;function toGrapheme(): GraphemeString;function toUtf8(): Utf8String;function trim(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;function trimEnd(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;function trimStart(string $chars = " \t\n\r\0\x0B\x0C\u{A0}\u{FEFF}"): self;function truncate(int $length, string $ellipsis = ''): self;function upper(): self;function width(bool $ignoreAnsiDecoration = true): int;function wordwrap(int $width = 75, string $break = "\n", bool $cut = false): self;function __clone();function __toString(): string;AbstractUnicodeString adds these:static function fromCodePoints(int ...$codes): self;function ascii(array $rules = []): self;function codePoint(int $index = 0): ?int;function folded(bool $compat = true): parent;function normalize(int $form = self::NFC): self;function slug(string $separator = '-'): self;and BinaryString:static function fromRandom(int $length = 16): self;function byteCode(int $index = 0): ?int;function isUtf8(): bool;function toUtf8(string $fromEncoding = null): Utf8String;function toGrapheme(string $fromEncoding = null): GraphemeString;Case insensitive operations are done with the ignoreCase() method.e.g. b('abc')->ignoreCase()->indexOf('B') will return 1.For reference, CLDR transliterations (used in the ascii() method) are defined here:https://github.com/unicode-org/cldr/tree/master/common/transforms

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK