simdutf8/lib.rs
1#![warn(unused_extern_crates)]
2#![deny(
3 clippy::all,
4 clippy::unwrap_used,
5 clippy::unnecessary_unwrap,
6 clippy::pedantic,
7 clippy::nursery
8)]
9#![allow(clippy::redundant_pub_crate)] // check is broken
10#![allow(clippy::redundant_else)] // can make code more readable
11#![allow(clippy::explicit_iter_loop)] // can make code more readable
12#![allow(clippy::semicolon_if_nothing_returned)] // see https://github.com/rust-lang/rust-clippy/issues/7768
13#![allow(clippy::missing_const_for_fn)] // not necessary most of the times
14#![deny(missing_docs)]
15#![cfg_attr(not(feature = "std"), no_std)]
16#![cfg_attr(docsrs, feature(doc_cfg))]
17#![cfg_attr(
18 all(target_arch = "aarch64", feature = "aarch64_neon_prefetch"),
19 feature(stdarch_aarch64_prefetch)
20)]
21
22//! Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, based on the implementation from
23//! [simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs), but now heavily improved.
24//!
25//! ## Quick start
26//! Add the dependency to your Cargo.toml file:
27//! ```toml
28//! [dependencies]
29//! simdutf8 = "0.1.5"
30//! ```
31//!
32//! Use [`basic::from_utf8()`] as a drop-in replacement for `std::str::from_utf8()`.
33//!
34//! ```rust
35//! use simdutf8::basic::from_utf8;
36//!
37//! println!("{}", from_utf8(b"I \xE2\x9D\xA4\xEF\xB8\x8F UTF-8!").unwrap());
38//! ```
39//!
40//! If you need detailed information on validation failures, use [`compat::from_utf8()`]
41//! instead.
42//!
43//! ```rust
44//! use simdutf8::compat::from_utf8;
45//!
46//! let err = from_utf8(b"I \xE2\x9D\xA4\xEF\xB8 UTF-8!").unwrap_err();
47//! assert_eq!(err.valid_up_to(), 5);
48//! assert_eq!(err.error_len(), Some(2));
49//! ```
50//!
51//! ## APIs
52//!
53//! ### Basic flavor
54//! Use the `basic` API flavor for maximum speed. It is fastest on valid UTF-8, but only checks
55//! for errors after processing the whole byte sequence and does not provide detailed information if the data
56//! is not valid UTF-8. [`basic::Utf8Error`] is a zero-sized error struct.
57//!
58//! ### Compat flavor
59//! The `compat` flavor is fully API-compatible with `std::str::from_utf8()`. In particular, [`compat::from_utf8()`]
60//! returns a [`compat::Utf8Error`], which has [`valid_up_to()`](compat::Utf8Error#method.valid_up_to) and
61//! [`error_len()`](compat::Utf8Error#method.error_len) methods. The first is useful for verification of streamed data. The
62//! second is useful e.g. for replacing invalid byte sequences with a replacement character.
63//!
64//! It also fails early: errors are checked on the fly as the string is processed and once
65//! an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
66//! This comes at a slight performance penalty compared to the [`basic`] API even if the input is valid UTF-8.
67//!
68//! ## Implementation selection
69//!
70//! ### X86
71//! The fastest implementation is selected at runtime using the `std::is_x86_feature_detected!` macro, unless the CPU
72//! targeted by the compiler supports the fastest available implementation.
73//! So if you compile with `RUSTFLAGS="-C target-cpu=native"` on a recent x86-64 machine, the AVX 2 implementation is selected at
74//! compile-time and runtime selection is disabled.
75//!
76//! For no-std support (compiled with `--no-default-features`) the implementation is always selected at compile time based on
77//! the targeted CPU. Use `RUSTFLAGS="-C target-feature=+avx2"` for the AVX 2 implementation or `RUSTFLAGS="-C target-feature=+sse4.2"`
78//! for the SSE 4.2 implementation.
79//!
80//! ### ARM64
81//! The SIMD implementation is used automatically since Rust 1.61.
82//!
83//! ### WASM32
84//! For wasm32 support, the implementation is selected at compile time based on the presence of the `simd128` target feature.
85//! Use `RUSTFLAGS="-C target-feature=+simd128"` to enable the WASM SIMD implementation. WASM, at
86//! the time of this writing, doesn't have a way to detect SIMD through WASM itself. Although this capability
87//! is available in various WASM host environments (e.g., [wasm-feature-detect] in the web browser), there is no portable
88//! way from within the library to detect this.
89//!
90//! [wasm-feature-detect]: https://github.com/GoogleChromeLabs/wasm-feature-detect
91//!
92//! ### Access to low-level functionality
93//! If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation
94//! implementations are then accessible via [`basic::imp`] and [`compat::imp`]. Traits facilitating streaming validation are available
95//! there as well.
96//!
97//! ## Optimisation flags
98//! Do not use [`opt-level = "z"`](https://doc.rust-lang.org/cargo/reference/profiles.html), which prevents inlining and makes
99//! the code quite slow.
100//!
101//! ## Minimum Supported Rust Version (MSRV)
102//! This crate's minimum supported Rust version is 1.38.0.
103//!
104//! ## Algorithm
105//!
106//! See Validating UTF-8 In Less Than One Instruction Per Byte, Software: Practice and Experience 51 (5), 2021
107//! <https://arxiv.org/abs/2010.03090>
108
109pub mod basic;
110pub mod compat;
111mod implementation;