docs/utils/en/strings.texy at master · nette/docs

637 lines (428 loc) · 20 KB
String Functions
****************
[api:Nette\Utils\Strings] is a static class containing useful functions for working with UTF-8 encoded strings.
Installation:
composer require nette/utils
All examples assume the following class alias is defined:
use Nette\Utils\Strings;
Letter Case
===========
These functions require the PHP extension `mbstring`.
lower(string $s): string .[method]
----------------------------------
Converts a UTF-8 string to lowercase.
Strings::lower('Hello World'); // 'hello world'
upper(string $s): string .[method]
----------------------------------
Converts a UTF-8 string to uppercase.
Strings::upper('Hello World'); // 'HELLO WORLD'
firstUpper(string $s): string .[method]
---------------------------------------
Converts the first character of a UTF-8 string to uppercase and leaves the other characters unchanged.
Strings::firstUpper('hello world'); // 'Hello world'
firstLower(string $s): string .[method]
---------------------------------------
Converts the first character of a UTF-8 string to lowercase and leaves the other characters unchanged.
Strings::firstLower('Hello World'); // 'hello world'
capitalize(string $s): string .[method]
---------------------------------------
Converts the first character of every word in a UTF-8 string to uppercase and the others to lowercase.
Strings::capitalize('hello world'); // 'Hello World'
Editing a String
================
normalize(string $s): string .[method]
--------------------------------------
Removes control characters, normalizes line endings to `\n`, trims leading and trailing blank lines, trims trailing spaces on lines, and normalizes UTF-8 to the NFC normal form.
unixNewLines(string $s): string .[method]
-----------------------------------------
Converts line endings to `\n` used on Unix systems. Line endings are: `\n`, `\r`, `\r\n`, U+2028 line separator, U+2029 paragraph separator.
$unixLikeLines = Strings::unixNewLines($string);
platformNewLines(string $s): string .[method]
---------------------------------------------
Converts line endings to characters specific to the current platform, i.e., `\r\n` on Windows and `\n` elsewhere. Line endings are: `\n`, `\r`, `\r\n`, U+2028 line separator, U+2029 paragraph separator.
$platformLines = Strings::platformNewLines($string);
webalize(string $s, ?string $charlist=null, bool $lower=true): string .[method]
-------------------------------------------------------------------------------
Modifies a UTF-8 string to the form used in URLs, i.e., removes diacritics and replaces all characters except letters of the English alphabet and numbers with hyphens.
Strings::webalize('žluťoučký kůň'); // 'zlutoucky-kun'
If other characters should be preserved, they can be specified in the second parameter.
Strings::webalize('10. image_id', '._'); // '10.-image_id'
The third parameter can suppress conversion to lowercase.
Strings::webalize('Dobrý den', null, false); // 'Dobry-den'
Requires the PHP extension `intl`.
trim(string $s, ?string $charlist=null): string .[method]
---------------------------------------------------------
Strips leading and trailing whitespace (or other characters specified by the second parameter) from a UTF-8 string.
Strings::trim('  Hello  '); // 'Hello'
truncate(string $s, int $maxLen, string $append=`'…'`): string .[method]
------------------------------------------------------------------------
Truncates a UTF-8 string to the specified maximum length, trying to preserve whole words. If the string is truncated, an ellipsis (changeable via the third parameter) is added to the end.
$text = 'Hello, how are you today?';
Strings::truncate($text, 5);       // 'Hell…'
Strings::truncate($text, 20);      // 'Hello, how are you…'
Strings::truncate($text, 30);      // 'Hello, how are you today?'
Strings::truncate($text, 20, '~'); // 'Hello, how are you~'
indent(string $s, int $level=1, string $indentationChar=`"\t"`): string .[method]
---------------------------------------------------------------------------------
Indents a multiline text from the left. The second parameter specifies the number of indentation characters, and the third parameter specifies the character(s) to use for indentation (default is tab).
Strings::indent('Nette');         // "\tNette"
Strings::indent('Nette', 2, '+'); // '++Nette'
padLeft(string $s, int $length, string $pad=`' '`): string .[method]
--------------------------------------------------------------------
Pads a UTF-8 string to a specified length by prepending the `$pad` string from the left.
Strings::padLeft('Nette', 6);        // ' Nette'
Strings::padLeft('Nette', 8, '+*');  // '+*+Nette'
padRight(string $s, int $length, string $pad=`' '`): string .[method]
---------------------------------------------------------------------
Pads a UTF-8 string to a specified length by appending the `$pad` string from the right.
Strings::padRight('Nette', 6);       // 'Nette '
Strings::padRight('Nette', 8, '+*'); // 'Nette+*+'
substring(string $s, int $start, ?int $length=null): string .[method]
---------------------------------------------------------------------
Returns a portion of a UTF-8 string `$s` specified by the starting position `$start` and length `$length`. If `$start` is negative, the returned string will start at the `$start`-th character from the end.
Strings::substring('Nette Framework', 0, 5); // 'Nette'
Strings::substring('Nette Framework', 6);    // 'Framework'
Strings::substring('Nette Framework', -4);   // 'work'
reverse(string $s): string .[method]
------------------------------------
Reverses a UTF-8 string.
Strings::reverse('Nette'); // 'etteN'
length(string $s): int .[method]
--------------------------------
Returns the number of characters (not bytes) in a UTF-8 string.
This is the number of Unicode code points, which may differ from the number of graphemes.
Strings::length('Nette');   // 5
Strings::length('červená'); // 7
startsWith(string $haystack, string $needle): bool .[method deprecated]
-----------------------------------------------------------------------
Checks if the string `$haystack` starts with the string `$needle`.
$haystack = 'Starts';
$needle = 'St';
Strings::startsWith($haystack, $needle); // true
Use the native `str_starts_with()`:https://www.php.net/manual/en/function.str-starts-with.php function.
endsWith(string $haystack, string $needle): bool .[method deprecated]
---------------------------------------------------------------------
Checks if the string `$haystack` ends with the string `$needle`.
$haystack = 'Ends';
$needle = 'ds';
Strings::endsWith($haystack, $needle); // true
Use the native `str_ends_with()`:https://www.php.net/manual/en/function.str-ends-with.php function.
contains(string $haystack, string $needle): bool .[method deprecated]
---------------------------------------------------------------------
Checks if the string `$haystack` contains the string `$needle`.
$haystack = 'Auditorium';
$needle = 'dit';
Strings::contains($haystack, $needle); // true
Use the native `str_contains()`:https://www.php.net/manual/en/function.str-contains.php function.
compare(string $left, string $right, ?int $length=null): bool .[method]
-----------------------------------------------------------------------
Compares two UTF-8 strings or their parts, case-insensitively. If `$length` is null, compares the entire strings. If negative, compares the corresponding number of characters from the end of the strings. Otherwise, compares the corresponding number of characters from the beginning.
Strings::compare('Nette', 'nette');     // true
Strings::compare('Nette', 'next', 2);   // true - first 2 characters match
Strings::compare('Nette', 'Latte', -2); // true - last 2 characters match
findPrefix(...$strings): string .[method]
-----------------------------------------
Finds the common prefix of the strings. Returns an empty string if no common prefix is found.
Strings::findPrefix('prefix-a', 'prefix-bb', 'prefix-c');   // 'prefix-'
Strings::findPrefix(['prefix-a', 'prefix-bb', 'prefix-c']); // 'prefix-'
Strings::findPrefix('Nette', 'is', 'great');                // ''
before(string $haystack, string $needle, int $nth=1): ?string .[method]
-----------------------------------------------------------------------
Returns the portion of string `$haystack` before the `$nth` occurrence of string `$needle`. Returns `null` if `$needle` is not found. If `$nth` is negative, it searches from the end of the string.
Strings::before('Nette_is_great', '_', 1);  // 'Nette'
Strings::before('Nette_is_great', '_', -2); // 'Nette'
Strings::before('Nette_is_great', ' ');     // null
Strings::before('Nette_is_great', '_', 3);  // null
after(string $haystack, string $needle, int $nth=1): ?string .[method]
----------------------------------------------------------------------
Returns the portion of string `$haystack` after the `$nth` occurrence of string `$needle`. Returns `null` if `$needle` is not found. If `$nth` is negative, it searches from the end of the string.
Strings::after('Nette_is_great', '_', 2);  // 'great'
Strings::after('Nette_is_great', '_', -1); // 'great'
Strings::after('Nette_is_great', ' ');     // null
Strings::after('Nette_is_great', '_', 3);  // null
indexOf(string $haystack, string $needle, int $nth=1): ?int .[method]
---------------------------------------------------------------------
Returns the character position of the `$nth` occurrence of string `$needle` in string `$haystack`. Returns `null` if `$needle` is not found. If `$nth` is negative, it searches from the end of the string.
Strings::indexOf('abc abc abc', 'abc', 2);  // 4
Strings::indexOf('abc abc abc', 'abc', -1); // 8
Strings::indexOf('abc abc abc', 'd');       // null
fixEncoding(string $s): string .[method]
----------------------------------------
Removes invalid UTF-8 characters from a string.
$correctString = Strings::fixEncoding($invalidString);
checkEncoding(string $s): bool .[method deprecated]
---------------------------------------------------
Checks if a string is a valid UTF-8 string.
$isUtf8 = Strings::checkEncoding($string);
Use [Nette\Utils\Validator::isUnicode() |validators#isUnicode].
toAscii(string $s): string .[method]
------------------------------------
Converts a UTF-8 string to ASCII, i.e., removes diacritics, etc.
Strings::toAscii('žluťoučký kůň'); // 'zlutoucky kun'
Requires the PHP extension `intl`.
chr(int $code): string .[method]
--------------------------------
Returns a specific character in UTF-8 from a code point (a number in the range 0x0000..D7FF or 0xE000..10FFFF).
Strings::chr(0xA9); // '©' in UTF-8 encoding
ord(string $char): int .[method]
--------------------------------
Returns the code point of a specific character in UTF-8 (a number in the range 0x0000..D7FF or 0xE000..10FFFF).
Strings::ord('©'); // 169 (0xA9)
Regular Expressions
===================
The `Strings` class offers functions for working with regular expressions. Unlike native PHP functions, they feature a more understandable API, better Unicode support, and, crucially, error detection. Any error during compilation or expression processing throws a `Nette\RegexpException`.
split(string $subject, string $pattern, bool $captureOffset=false, bool $skipEmpty=false, int $limit=-1, bool $utf8=false): array .[method]
-------------------------------------------------------------------------------------------------------------------------------------------
Splits a string into an array using a regular expression. Expressions in parentheses will also be captured and returned.
Strings::split('hello, world', '~,\s*~');
// ['hello', 'world']
Strings::split('hello, world', '~(,)\s*~');
// ['hello', ',', 'world']
If `$skipEmpty` is `true`, only non-empty items will be returned:
Strings::split('hello, world, ', '~,\s*~');
// ['hello', 'world', '']
Strings::split('hello, world, ', '~,\s*~', skipEmpty: true);
// ['hello', 'world']
If `$limit` is specified, only substrings up to the limit are returned, and the rest of the string is placed in the last element. A limit of -1 or 0 means no limit.
Strings::split('hello, world, third', '~,\s*~', limit: 2);
// ['hello', 'world, third']
If `$utf8` is `true`, evaluation switches to Unicode mode, similar to using the `u` modifier.
If `$captureOffset` is `true`, the position of each match in the string will also be returned (in bytes; in characters if `$utf8` is set). This changes the return value to an array where each element is a pair consisting of the matched string and its position.
Strings::split('žlutý, kůň', '~,\s*~', captureOffset: true);
// [['žlutý', 0], ['kůň', 9]]
Strings::split('žlutý, kůň', '~,\s*~', captureOffset: true, utf8: true);
// [['žlutý', 0], ['kůň', 7]] // positions are in characters
match(string $subject, string $pattern, bool $captureOffset=false, int $offset=0, bool $unmatchedAsNull=false, bool $utf8=false): ?array .[method]
--------------------------------------------------------------------------------------------------------------------------------------------------
Searches a string for a part matching a regular expression and returns an array containing the found expression and individual subexpressions, or `null` if no match is found.
Strings::match('hello!', '~\w+(!+)~');
// ['hello!', '!']
Strings::match('hello!', '~X~');
If `$unmatchedAsNull` is `true`, unmatched subpatterns are returned as `null`; otherwise, they are returned as an empty string or omitted entirely:
Strings::match('hello', '~\w+(!+)?~');
// ['hello'] (the optional group !+ did not match)
Strings::match('hello', '~\w+(!+)?~', unmatchedAsNull: true);
// ['hello', null]
If `$utf8` is `true`, evaluation switches to Unicode mode, similar to using the `u` modifier:
Strings::match('žlutý kůň', '~\w+~'); // Without UTF-8
// ['lut'] (matches only ASCII word characters)
Strings::match('žlutý kůň', '~\w+~', utf8: true); // With UTF-8
// ['žlutý'] (matches Unicode word characters)
The `$offset` parameter can specify the starting position for the search (in bytes; in characters if `$utf8` is set).
If `$captureOffset` is `true`, the position of each match in the string will also be returned (in bytes; in characters if `$utf8` is set). This changes the return value to an array where each element is a pair consisting of the matched string and its offset:
Strings::match('žlutý!', '~\w+(!+)?~', captureOffset: true); // Without UTF-8
// [['lut', 2]] (only ASCII match, offset in bytes)
Strings::match('žlutý!', '~\w+(!+)?~', captureOffset: true, utf8: true); // With UTF-8
// [['žlutý!', 0], ['!', 5]] (Unicode match, offsets in characters)
matchAll(string $subject, string $pattern, bool $captureOffset=false, int $offset=0, bool $unmatchedAsNull=false, bool $patternOrder=false, bool $utf8=false, bool $lazy=false): array|Generator .[method]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Searches a string for all occurrences matching a regular expression and returns an array of arrays containing the found expression and individual subexpressions.
Strings::matchAll('hello, world!!', '~\w+(!+)?~');
	0 => ['hello'],
	1 => ['world!!', '!!'],
If `$patternOrder` is `true`, the structure of the results changes: the first element is an array of full pattern matches, the second is an array of strings matching the first parenthesized subpattern, and so on:
Strings::matchAll('hello, world!!', '~\w+(!+)?~', patternOrder: true);
	0 => ['hello', 'world!!'],
	1 => ['', '!!'],
If `$unmatchedAsNull` is `true`, unmatched subpatterns are returned as `null`; otherwise, they are returned as an empty string or omitted:
Strings::matchAll('hello, world!!', '~\w+(!+)?~', unmatchedAsNull: true);
	0 => ['hello', null],
	1 => ['world!!', '!!'],
If `$utf8` is `true`, evaluation switches to Unicode mode, similar to using the `u` modifier:
Strings::matchAll('žlutý kůň', '~\w+~');
	0 => ['lut'],
	1 => ['k'],
Strings::matchAll('žlutý kůň', '~\w+~', utf8: true);
	0 => ['žlutý'],
	1 => ['kůň'],
The `$offset` parameter can specify the starting position for the search (in bytes; in characters if `$utf8` is set).
If `$captureOffset` is `true`, the position of each match in the string will also be returned (in bytes; in characters if `$utf8` is set). This changes the return value structure, where each match element is a pair `[matched_string, position]`:
Strings::matchAll('žlutý kůň', '~\w+~', captureOffset: true);
	0 => [['lut', 2]],
	1 => [['k', 8]],
Strings::matchAll('žlutý kůň', '~\w+~', captureOffset: true, utf8: true);
	0 => [['žlutý', 0]],
	1 => [['kůň', 6]],
If `$lazy` is `true`, the function returns a `Generator` instead of an array. This offers significant performance benefits when working with large strings, as matches are found incrementally rather than processing the entire string at once. This allows efficient handling of extremely large inputs. Additionally, you can interrupt processing at any time if you find the desired match, saving computation time.
$matches = Strings::matchAll($largeText, '~\w+~', lazy: true);
foreach ($matches as $match) {
    echo "Found: $match[0]\n";
    // Processing can be interrupted at any time, e.g., with break;
replace(string $subject, string|array $pattern, string|callable $replacement='', int $limit=-1, bool $captureOffset=false, bool $unmatchedAsNull=false, bool $utf8=false): string .[method]
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Replaces all occurrences matching a regular expression. `$replacement` is either a replacement string mask or a callback function.
Strings::replace('hello, world!', '~\w+~', '--');
// '--, --!'
Strings::replace('hello, world!', '~\w+~', fn($m) => strrev($m[0]));
// 'olleh, dlrow!'
The function also allows multiple replacements by passing an array in the format `pattern => replacement` as the second parameter:
Strings::replace('hello, world!', [
	'~\w+~' => '--',
	'~,\s+~' => ' ',
// '-- --!'
The `$limit` parameter restricts the number of replacements performed. A limit of -1 means no limit.
If `$utf8` is `true`, evaluation switches to Unicode mode, similar to using the `u` modifier.
Strings::replace('žlutý kůň', '~\w+~', '--');
// 'ž--ý --ůň'
Strings::replace('žlutý kůň', '~\w+~', '--', utf8: true);
If `$captureOffset` is `true`, the position of each match in the string (in bytes; in characters if `$utf8` is set) is also passed to the callback. This changes the structure of the passed array, where each element is a pair `[matched_string, position]`.
Strings::replace(
	'žlutý kůň',
	function (array $m) { dump($m); return ''; },
	captureOffset: true,
// dumps [['lut', 2]] and [['k', 8]]
Strings::replace(
	'žlutý kůň',
	function (array $m) { dump($m); return ''; },
	captureOffset: true,
	utf8: true,
// dumps [['žlutý', 0]] and [['kůň', 6]]
If `$unmatchedAsNull` is `true`, unmatched subpatterns are passed to the callback as `null`; otherwise, they are passed as an empty string or omitted:
Strings::replace(
	'~(a)(b)*(c)~',
	function (array $m) { dump($m); return ''; },
// dumps ['ac', 'a', '', 'c']
Strings::replace(
	'~(a)(b)*(c)~',
	function (array $m) { dump($m); return ''; },
	unmatchedAsNull: true,
// dumps ['ac', 'a', null, 'c']
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FilesExpand file tree

strings.texy

Latest commit

History

strings.texy

File metadata and controls