4.2. Unicode Encoding Transformation

uint32_t stringprep_utf8_to_unichar (const char * p) p: a pointer to Unicode character encoded as UTF-8

Converts a sequence of bytes encoded as UTF-8 to a Unicode character. If p does not point to a valid UTF-8 encoded character, results are undefined.

Return value: the resulting character.

int stringprep_unichar_to_utf8 (uint32_t c, char * outbuf) c: a ISO10646 character code

outbuf: output buffer, must have at least 6 bytes of space. If NULL, the length will be computed and returned and nothing will be written to outbuf.

Converts a single character to UTF-8.

Return value: number of bytes written.

uint32_t stringprep_utf8_to_unichar (const char * p) p: a pointer to Unicode character encoded as UTF-8

Converts a sequence of bytes encoded as UTF-8 to a Unicode character. If p does not point to a valid UTF-8 encoded character, results are undefined.

Return value: the resulting character.

char * stringprep_ucs4_to_utf8 (const uint32_t * str, ssize_t len, size_t * items_read, size_t * items_written) str: a UCS-4 encoded string

len: the maximum length of str to use. If len < 0, then the string is terminated with a 0 character.

items_read: location to store number of characters read read, or NULL.

items_written: location to store number of bytes written or NULL. The value here stored does not include the trailing 0 byte.

Convert a string from a 32-bit fixed width representation as UCS-4. to UTF-8. The result will be terminated with a 0 byte.

Return value: a pointer to a newly allocated UTF-8 string. This value must be freed with free(). If an error occurs, NULL will be returned and error set.

uint32_t * stringprep_utf8_to_ucs4 (const char * str, ssize_t len, size_t * items_written) str: a UTF-8 encoded string

len: the maximum length of str to use. If len < 0, then the string is nul-terminated.

items_written: location to store the number of characters in the result, or NULL.

Convert a string from UTF-8 to a 32-bit fixed width representation as UCS-4, assuming valid UTF-8 input. This function does no error checking on the input.

Return value: a pointer to a newly allocated UCS-4 string. This value must be freed with free().