Strings — selkie.pyx.string
The code examples assume:
>>> import selkie.pyx.string as ss
ASCII representations
- as_ascii(s)
Returns a string containing only ASCII characters. Define an “objectionable” character to be a non-printing character other than space, a non-ASCII character (code point 128 or higher), DEL, and left and right brace. If the input string contains no objectionable characters, it is returned unchanged. Otherwise, all objectionable characters are eliminated. What they are replaced with depends on the value of the argument
use.If'names',Unicode character names are used. If'hex',hex codes are used. If'alts',alternative single characters are used, where available, and deletion otherwise. If None, objectionable characters are deleted. The default is'alts'.>>> ss.as_ascii('h\u00ff\n') 'h\n' >>> ss.as_ascii('h\u00ff\n', use='hex') 'h{ff}{nl}' >>> ss.as_ascii('h\u00ff', use='names') 'h{LATIN SMALL LETTER Y WITH DIAERESIS}'
If the input is not a string, verb|as_ascii()| calls
str()on it.>>> ss.as_ascii(10) '10'
- ascii_chars(s)
The function
ascii_chars()is more aggressive and less flexible. It does “smart quote” and “smart dash” substitutions, and it replaces tab with space, vertical tab with newline, and form feed with newline. All characters above code point U+00fe (~) are deleted, as are all characters with code point below U+0020 (space) except newline. Returns an iteration over characters.
- deaccent(s)
Converts a Unicode string to ASCII in a lossy way. It replaces characters in the Latin-1 range with corresponding ASCII characters, where natural correspondences exist. Characters without a natural ASCII counterpart are simply deleted. ASCII control characters other than space, tab, newline, and carriage return are deleted. The return value is an ASCII string.
- as_boolean(s)
Converts the strings
'True'and'False'to the corresponding boolean values. Given anything else, it signals an error.
Unicode
- unidescribe(s)
Takes a string and prints out the details of the Unicode characters it contains.
>>> ss.unidescribe('hi') 0 0x68 LATIN SMALL LETTER H 1 0x69 LATIN SMALL LETTER I
- utf8(s, fn)
Writes string s to the file named fn in UTF-8 format. It overwrites the file, if it already exists. If no filename is given, the bytes of the UTF-8 representation are printed out readably.
Miscellany
- quoted(s)
Takes a string and wraps double-quotes around it, escaping any internal double-quotes with backslashes. It also doubles any internal backslashes, and replaces newline with backslash-en.
>>> ss.quoted('L\u00ffc') '"Lÿc"'
The return value is a string suitable for printing, or suitable for use in JSON.
- trim(w, s)
It first calls
as_ascii()on the string, and then it truncates it at the field width.
- dtstr(t)
Takes a float representing seconds since the epoch, and returns a readable string representation.
>>> ss.dtstr(1000000000) '2001-09-08 21:46:40'
- elapsed_time_string(t0, t1)
T0 and t1 represent start time and end time in seconds.
>>> ss.elapsed_time_str(10, 135) '0:02:05.0000'
- sizestr(sz)
Takes an int representing a number of bytes and returns a string with three digits after the decimal, suffixed with B, KB, MB, GB, TB, or PB.
>>> ss.sizestr(123456789) '123.457 MB'
- expand_envvars(s)
Replace the pattern
${VAR}with the value of the environment variableVAR, wherever the pattern occurs in s.
Module Documentation
The selkie.string module contains general string-related functionality.
General functionality
- selkie.pyx.string.unidescribe(s)
Prints out a description of each (Unicode) character in a string.
- selkie.pyx.string.isword(s)
A word consists only of alphanumerics and underscore, and is not the empty string.
- selkie.pyx.string.lines(s)
Iterates over the lines in a string. The lines do not include carriage return and newline.
Conversion to ASCII
- selkie.pyx.string.as_ascii(s, use='alts')
Convert a string to ASCII. Characters that are not printable ASCII characters are treated as follows.
If use is None, they are deleted.
If use is ‘alts’ (the default), they are replaced with ASCII equivalents if possible, and otherwise deleted. The printable characters lie in the range from space (inclusive) to DEL (exclusive). Tab is replaced with a single space. Newline, vertical tab, and form feed are replaced with newline. “Smart quotes” (left and right, single and double) are replaced with the ASCII single or double quote. Em- and en-dash are replaced with hyphen.
If use is ‘hex’, non-printable characters are replaced with hex codes.
If use is ‘names’, they are replaced with their names, wrapped in braces.
- selkie.pyx.string.from_ascii(s)
This undoes the effects of
as_ascii().
- selkie.pyx.string.quoted(s)
Like
repr(), but always returns a double-quoted string. May be called as:quoted(as_ascii(s)).
- selkie.pyx.string.deaccent(s)
Maps accented characters to their unaccented form.
Formatting dates/times and sizes
- selkie.pyx.string.dtstr(timestamp)
Returns a string that renders a timestamp readably.
- selkie.pyx.string.sizestr(nbytes)
Returns a string representing a file size readably.
- selkie.pyx.string.elapsed_time_str(start, end)
Returns a readable string showing a time difference.
- selkie.pyx.string.timestr(nsec)
Returns a readable string showing an amount of time.
Expand environment variables
- selkie.pyx.string.expand_envvars(s)
Expands out environment variables. Environment variables begin with dollar sign and are optionally enclosed in braces.