onig

Struct Regex

Source
pub struct Regex { /* private fields */ }
Expand description

This struct is a wrapper around an Oniguruma regular expression pointer. This represents a compiled regex which can be used in search and match operations.

Implementations§

Source§

impl Regex

Source

pub fn captures<'t>(&self, text: &'t str) -> Option<Captures<'t>>

Returns the capture groups corresponding to the leftmost-first match in text. Capture group 0 always corresponds to the entire match. If no match is found, then None is returned.

Source

pub fn find_iter<'r, 't>(&'r self, text: &'t str) -> FindMatches<'r, 't>

Returns an iterator for each successive non-overlapping match in text, returning the start and end byte indices with respect to text.

§Example

Find the start and end location of every word with exactly 13 characters:

let text = "Retroactively relinquishing remunerations is reprehensible.";
for pos in Regex::new(r"\b\w{13}\b").unwrap().find_iter(text) {
    println!("{:?}", pos);
}
// Output:
// (0, 13)
// (14, 27)
// (28, 41)
// (45, 58)
Source

pub fn captures_iter<'r, 't>(&'r self, text: &'t str) -> FindCaptures<'r, 't>

Returns an iterator over all the non-overlapping capture groups matched in text. This is operationally the same as find_iter (except it yields information about submatches).

§Example

We can use this to find all movie titles and their release years in some text, where the movie is formatted like “‘Title’ (xxxx)”:

let re = Regex::new(r"'([^']+)'\s+\((\d{4})\)")
               .unwrap();
let text = "'Citizen Kane' (1941), 'The Wizard of Oz' (1939), 'M' (1931).";
for caps in re.captures_iter(text) {
    println!("Movie: {:?}, Released: {:?}", caps.at(1), caps.at(2));
}
// Output:
// Movie: Citizen Kane, Released: 1941
// Movie: The Wizard of Oz, Released: 1939
// Movie: M, Released: 1931
Source

pub fn split<'r, 't>(&'r self, text: &'t str) -> RegexSplits<'r, 't>

Returns an iterator of substrings of text delimited by a match of the regular expression. Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression.

This method will not copy the text given.

§Example

To split a string delimited by arbitrary amounts of spaces or tabs:

let re = Regex::new(r"[ \t]+").unwrap();
let fields: Vec<&str> = re.split("a b \t  c\td    e").collect();
assert_eq!(fields, vec!("a", "b", "c", "d", "e"));
Source

pub fn splitn<'r, 't>( &'r self, text: &'t str, limit: usize, ) -> RegexSplitsN<'r, 't>

Returns an iterator of at most limit substrings of text delimited by a match of the regular expression. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression. The remainder of the string that is not split will be the last element in the iterator.

This method will not copy the text given.

§Example

Get the first two words in some text:

let re = Regex::new(r"\W+").unwrap();
let fields: Vec<&str> = re.splitn("Hey! How are you?", 3).collect();
assert_eq!(fields, vec!("Hey", "How", "are you?"));
Source

pub fn scan_with_region<F>( &self, to_search: &str, region: &mut Region, options: SearchOptions, callback: F, ) -> i32
where F: Fn(i32, i32, &Region) -> bool,

Scan the given slice, capturing into the given region and executing a callback for each match.

Source

pub fn scan<'t, CB>(&self, to_search: &'t str, callback: CB)
where CB: Fn(i32, Captures<'t>) -> bool,

Scan a Pattern and Observe Captures

The scan function takes a haystack to_search and invokes the given callback for each capture of this expression.

Source§

impl Regex

Source

pub fn capture_names_len(&self) -> usize

Returns the number of named groups into regex.

Source

pub fn foreach_name<F>(&self, callback: F) -> i32
where F: FnMut(&str, &[u32]) -> bool,

Calls callback for each named group in the regex. Each callback gets the group name and group indices.

Source§

impl Regex

Source

pub fn replace<R: Replacer>(&self, text: &str, rep: R) -> String

Replaces the leftmost-first match with the replacement provided. The replacement can be a regular string or a function that takes the matches Captures and returns the replaced string.

If no match is found, then a copy of the string is returned unchanged.

§Examples

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:

let re = Regex::new("[^01]+").unwrap();
assert_eq!(re.replace("1078910", ""), "1010");

But anything satisfying the Replacer trait will work. For example, a closure of type |&Captures| -> String provides direct access to the captures corresponding to a match. This allows one to access submatches easily:

let re = Regex::new(r"([^,\s]+),\s+(\S+)").unwrap();
let result = re.replace("Springsteen, Bruce", |caps: &Captures| {
    format!("{} {}", caps.at(2).unwrap_or(""), caps.at(1).unwrap_or(""))
});
assert_eq!(result, "Bruce Springsteen");
Source

pub fn replace_all<R: Replacer>(&self, text: &str, rep: R) -> String

Replaces all non-overlapping matches in text with the replacement provided. This is the same as calling replacen with limit set to 0.

See the documentation for replace for details on how to access submatches in the replacement string.

Source

pub fn replacen<R: Replacer>(&self, text: &str, limit: usize, rep: R) -> String

Replaces at most limit non-overlapping matches in text with the replacement provided. If limit is 0, then all non-overlapping matches are replaced.

See the documentation for replace for details on how to access submatches in the replacement string.

Source§

impl Regex

Source

pub fn new(pattern: &str) -> Result<Self, Error>

Create a Regex

Simple regular expression constructor. Compiles a new regular expression with the default options using the ruby syntax. Once compiled, it can be used repeatedly to search in a string. If an invalid expression is given, then an error is returned.

§Arguments
  • pattern - The regex pattern to compile
§Examples
use onig::Regex;
let r = Regex::new(r#"hello (\w+)"#);
assert!(r.is_ok());
Source

pub fn with_encoding<T>(pattern: T) -> Result<Regex, Error>
where T: EncodedChars,

Create a Regex, Specifying an Encoding

Attempts to compile pattern into a new Regex instance. Instead of assuming UTF-8 as the encoding scheme the encoding is inferred from the pattern buffer.

§Arguments
  • pattern - The regex pattern to compile
§Examples
use onig::{Regex, EncodedBytes};
let utf8 = Regex::with_encoding("hello");
assert!(utf8.is_ok());
let ascii = Regex::with_encoding(EncodedBytes::ascii(b"world"));
assert!(ascii.is_ok());
Source

pub fn with_options( pattern: &str, option: RegexOptions, syntax: &Syntax, ) -> Result<Regex, Error>

Create a new Regex

Attempts to compile a pattern into a new Regex instance. Once compiled, it can be used repeatedly to search in a string. If an invalid expression is given, then an error is returned. See onig_sys::onig_new for more information.

§Arguments
  • pattern - The regex pattern to compile.
  • options - The regex compilation options.
  • syntax - The syntax which the regex is written in.
§Examples
use onig::{Regex, Syntax, RegexOptions};
let r = Regex::with_options("hello.*world",
                            RegexOptions::REGEX_OPTION_NONE,
                            Syntax::default());
assert!(r.is_ok());
Source

pub fn with_options_and_encoding<T>( pattern: T, option: RegexOptions, syntax: &Syntax, ) -> Result<Self, Error>
where T: EncodedChars,

Create a new Regex, Specifying Options and Ecoding

Attempts to comile the given pattern into a new Regex instance. Instead of assuming UTF-8 as the encoding scheme the encoding is inferred from the pattern buffer. If the regex fails to compile the returned Error value from onig_new contains more information.

§Arguments
  • pattern - The regex pattern to compile.
  • options - The regex compilation options.
  • syntax - The syntax which the regex is written in.
§Examples
use onig::{Regex, Syntax, EncodedBytes, RegexOptions};
let pattern = EncodedBytes::ascii(b"hello");
let r = Regex::with_options_and_encoding(pattern,
                                         RegexOptions::REGEX_OPTION_SINGLELINE,
                                         Syntax::default());
assert!(r.is_ok());
Source

pub fn match_with_options( &self, str: &str, at: usize, options: SearchOptions, region: Option<&mut Region>, ) -> Option<usize>

Match String

Try to match the regex against the given string slice, starting at a given offset. This method works the same way as match_with_encoding, but the encoding is always utf-8.

For more information see Match vs Search

§Arguments
  • str - The string slice to match against.
  • at - The byte index in the passed slice to start matching
  • options - The regex match options.
  • region - The region for return group match range info
§Returns

Some(len) if the regex matched, with len being the number of bytes matched. None if the regex doesn’t match.

§Examples
use onig::{Regex, SearchOptions};

let r = Regex::new(".*").unwrap();
let res = r.match_with_options("hello", 0, SearchOptions::SEARCH_OPTION_NONE, None);
assert!(res.is_some()); // it matches
assert!(res.unwrap() == 5); // 5 characters matched
Source

pub fn match_with_encoding<T>( &self, chars: T, at: usize, options: SearchOptions, region: Option<&mut Region>, ) -> Option<usize>
where T: EncodedChars,

Match String with Encoding

Match the regex against a string. This method will start at the offset at into the string and try and match the regex. If the regex matches then the return value is the number of characters which matched. If the regex doesn’t match the return is None.

For more information see Match vs Search

The contents of chars must have the same encoding that was used to construct the regex.

§Arguments
  • chars - The buffer to match against.
  • at - The byte index in the passed buffer to start matching
  • options - The regex match options.
  • region - The region for return group match range info
§Returns

Some(len) if the regex matched, with len being the number of bytes matched. None if the regex doesn’t match.

§Examples
use onig::{Regex, EncodedBytes, SearchOptions};

let r = Regex::with_encoding(EncodedBytes::ascii(b".*")).unwrap();
let res = r.match_with_encoding(EncodedBytes::ascii(b"world"),
                                0, SearchOptions::SEARCH_OPTION_NONE, None);
assert!(res.is_some()); // it matches
assert!(res.unwrap() == 5); // 5 characters matched
Source

pub fn match_with_param<T>( &self, chars: T, at: usize, options: SearchOptions, region: Option<&mut Region>, match_param: MatchParam, ) -> Result<Option<usize>, Error>
where T: EncodedChars,

Match string with encoding and match param

Match the regex against a string. This method will start at the offset at into the string and try and match the regex. If the regex matches then the return value is the number of characters which matched. If the regex doesn’t match the return is None.

For more information see Match vs Search

The contents of chars must have the same encoding that was used to construct the regex.

§Arguments
  • chars - The buffer to match against.
  • at - The byte index in the passed buffer to start matching
  • options - The regex match options.
  • region - The region for return group match range info
  • match_param - The match parameters
§Returns

Ok(Some(len)) if the regex matched, with len being the number of bytes matched. Ok(None) if the regex doesn’t match. Err with an Error if an error occurred (e.g. retry-limit-in-match exceeded).

§Examples
use onig::{Regex, EncodedBytes, MatchParam, SearchOptions};

let r = Regex::with_encoding(EncodedBytes::ascii(b".*")).unwrap();
let res = r.match_with_param(EncodedBytes::ascii(b"world"),
                             0, SearchOptions::SEARCH_OPTION_NONE,
                             None, MatchParam::default());
assert!(res.is_ok()); // matching did not error
assert!(res.unwrap() == Some(5)); // 5 characters matched
Source

pub fn search_with_options( &self, str: &str, from: usize, to: usize, options: SearchOptions, region: Option<&mut Region>, ) -> Option<usize>

Search pattern in string

Search for matches the regex in a string. This method will return the index of the first match of the regex within the string, if there is one. If from is less than to, then search is performed in forward order, otherwise – in backward order.

For more information see Match vs Search

§Arguments
  • str - The string to search in.
  • from - The byte index in the passed slice to start search
  • to - The byte index in the passed slice to finish search
  • options - The options for the search.
  • region - The region for return group match range info
§Returns

Some(pos) if the regex matches, where pos is the byte-position of the start of the match. None if the regex doesn’t match anywhere in str.

§Examples
use onig::{Regex, SearchOptions};

let r = Regex::new("l{1,2}").unwrap();
let res = r.search_with_options("hello", 0, 5, SearchOptions::SEARCH_OPTION_NONE, None);
assert!(res.is_some()); // it matches
assert!(res.unwrap() == 2); // match starts at character 3
Source

pub fn search_with_encoding<T>( &self, chars: T, from: usize, to: usize, options: SearchOptions, region: Option<&mut Region>, ) -> Option<usize>
where T: EncodedChars,

Search for a Pattern in a String with an Encoding

Search for matches the regex in a string. This method will return the index of the first match of the regex within the string, if there is one. If from is less than to, then search is performed in forward order, otherwise – in backward order.

For more information see Match vs Search

The encoding of the buffer passed to search in must match the encoding of the regex.

§Arguments
  • chars - The character buffer to search in.
  • from - The byte index in the passed slice to start search
  • to - The byte index in the passed slice to finish search
  • options - The options for the search.
  • region - The region for return group match range info
§Returns

Some(pos) if the regex matches, where pos is the byte-position of the start of the match. None if the regex doesn’t match anywhere in chars.

§Examples
use onig::{Regex, EncodedBytes, SearchOptions};

let r = Regex::with_encoding(EncodedBytes::ascii(b"l{1,2}")).unwrap();
let res = r.search_with_encoding(EncodedBytes::ascii(b"hello"),
                                 0, 5, SearchOptions::SEARCH_OPTION_NONE, None);
assert!(res.is_some()); // it matches
assert!(res.unwrap() == 2); // match starts at character 3
Source

pub fn search_with_param<T>( &self, chars: T, from: usize, to: usize, options: SearchOptions, region: Option<&mut Region>, match_param: MatchParam, ) -> Result<Option<usize>, Error>
where T: EncodedChars,

Search pattern in string with encoding and match param

Search for matches the regex in a string. This method will return the index of the first match of the regex within the string, if there is one. If from is less than to, then search is performed in forward order, otherwise – in backward order.

For more information see Match vs Search

The encoding of the buffer passed to search in must match the encoding of the regex.

§Arguments
  • chars - The character buffer to search in.
  • from - The byte index in the passed slice to start search
  • to - The byte index in the passed slice to finish search
  • options - The options for the search.
  • region - The region for return group match range info
  • match_param - The match parameters
§Returns

Ok(Some(pos)) if the regex matches, where pos is the byte-position of the start of the match. Ok(None) if the regex doesn’t match anywhere in chars. Err with an Error if an error occurred (e.g. retry-limit-in-match exceeded).

§Examples
use onig::{Regex, EncodedBytes, MatchParam, SearchOptions};

let r = Regex::with_encoding(EncodedBytes::ascii(b"l{1,2}")).unwrap();
let res = r.search_with_param(EncodedBytes::ascii(b"hello"),
                              0, 5, SearchOptions::SEARCH_OPTION_NONE,
                              None, MatchParam::default());
assert!(res.is_ok()); // matching did not error
assert!(res.unwrap() == Some(2)); // match starts at character 3
Source

pub fn is_match(&self, text: &str) -> bool

Returns true if and only if the regex matches the string given.

For more information see Match vs Search

§Arguments
  • text - The string slice to test against the pattern.
§Returns

true if the pattern matches the whole of text, false otherwise.

Source

pub fn find(&self, text: &str) -> Option<(usize, usize)>

Find a Match in a Buffer, With Encoding

Finds the first match of the regular expression within the buffer.

Note that this should only be used if you want to discover the position of the match within a string. Testing if a pattern matches the whole string is faster if you use is_match. For more information see Match vs Search

§Arguments
  • text - The text to search in.
§Returns

The offset of the start and end of the first match. If no match exists None is returned.

Source

pub fn find_with_encoding<T>(&self, text: T) -> Option<(usize, usize)>
where T: EncodedChars,

Find a Match in a Buffer, With Encoding

Finds the first match of the regular expression within the buffer.

For more information see Match vs Search

§Arguments
  • text - The text to search in.
§Returns

The offset of the start and end of the first match. If no match exists None is returned.

Source

pub fn encoding(&self) -> OnigEncoding

Get the Encoding of the Regex

§Returns

Returns a reference to an oniguruma encoding which was used when this regex was created.

Source

pub fn captures_len(&self) -> usize

Get the Number of Capture Groups in this Pattern

Source

pub fn capture_histories_len(&self) -> usize

Get the Size of the Capture Histories for this Pattern

Trait Implementations§

Source§

impl Debug for Regex

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Drop for Regex

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

impl PartialEq for Regex

Source§

fn eq(&self, other: &Regex) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for Regex

Source§

impl Send for Regex

Source§

impl StructuralPartialEq for Regex

Source§

impl Sync for Regex

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.