70

GitHub - lemire/fastvalidate-utf-8: header-only library to validate utf-8 string...

 5 years ago
source link: https://github.com/lemire/fastvalidate-utf-8
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

fastvalidate-utf-8

Build Status

Most strings online are in unicode using the UTF-8 encoding. Validating strings quickly before accepting them is important.

This is a header-only C library to validate UTF-8 strings at high speeds using SIMD instructions. Specifically, this expects an x64 processor (capable of SSE instruction). It will not work currently on ARM processors.

Quick usage:

make
./test
./benchmark

Code usage:

  #include "simdutf8check.h"

  char * mystring = ...
  bool is_it_valid = validate_utf8_fast(mystring, thestringlength);

It should be able to validate strings using close to 1 cycle per input byte.

If you expect your strings to be plain ASCII, you can spend less than 0.1 cycles per input byte to check whether that is the case using the validate_ascii_fast function found in the simdasciicheck.h header.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK