Name
regcomp, regexec, regerror, regfree - POSIX regex functionsLibrary
Standard C library ( libc ", " -lc )Synopsis
#include <regex.h> int regcomp(regex_t *restrict " preg ", const char *restrict regex ,
int cflags );
int regexec(const regex_t *restrict preg \
", const char *restrict " string , size_t " nmatch , \
regmatch_t " pmatch "[_Nullable restrict ." nmatch ], int eflags );
size_t regerror(int " errcode ", const regex_t *_Nullable restrict preg ,
char " errbuf "[_Nullable restrict . errbuf_size ],
size_t errbuf_size );
void regfree(regex_t * preg );
typedef struct { " size_t re_nsub;" } regex_t; typedef struct { " regoff_t rm_so;" " regoff_t rm_eo;" } regmatch_t; typedef " /* ... */ " regoff_t;
Description
Compilation
regcomp() is used to compile a regular expression into a form that is suitable for subsequent regexec() searches.On success, the pattern buffer at *preg is initialized. regex is a null-terminated string. The locale must be the same when running regexec()
After regcomp() succeeds, preg->re_nsub holds the number of subexpressions in regex
. Thus, a value of preg->re_nsub + 1 passed as nmatch to regexec() is sufficient to capture all matches.
cflags is the bitwise OR of zero or more of the following:
REG_EXTENDED Use POSIX Extended Regular Expression syntax when interpreting regex
. If not set, POSIX Basic Regular Expression syntax is used.
REG_ICASE Do not differentiate case. Subsequent regexec() searches using this pattern buffer will be case insensitive.
REG_NOSUB Report only overall success. regexec() will use only pmatch for REG_STARTEND ,ignoring nmatch
.
REG_NEWLINE Match-any-character operators don't match a newline.
A nonmatching list ( [^...\&] ) not containing a newline does not match a newline.
Match-beginning-of-line operator ( ^ ) matches the empty string immediately after a newline, regardless of whether eflags
, the execution flags of regexec() contains REG_NOTBOL .
Match-end-of-line operator ( $ ) matches the empty string immediately before a newline, regardless of whether eflags contains REG_NOTEOL .
Matching
regexec() is used to match a null-terminated string against the compiled pattern buffer in*preg
, which must have been initialised with regexec() eflags is the bitwise OR of zero or more of the following flags: REG_NOTBOL The match-beginning-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above). This flag may be used when different portions of a string are passed to regexec() and the beginning of the string should not be interpreted as the beginning of the line.
REG_NOTEOL The match-end-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above).
REG_STARTEND Match [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) instead of [ string , " string + strlen(string)" ). This allows matching embedded NUL bytes and avoids a strlen(3) on known-length strings. If any matches are returned ( REG_NOSUB wasn't passed to regcomp() the match succeeded, and nmatch > 0), they overwrite pmatch as usual, and the match offsets remain relative to string (not "string + pmatch[0].rm_so" ).
This flag is a BSD extension, not present in POSIX.
Match Offsets
Unless REG_NOSUB was passed to regcomp() it is possible to obtain the locations of matches withinstring :
regexec() fills nmatch elements of pmatch with results: pmatch[0] corresponds to the entire match, pmatch[1] to the first subexpression, etc. If there were more matches than nmatch
, they are discarded; if fewer, unused elements of pmatch are filled with -1 s.Each returned valid (non- -1 ) match corresponds to the range [ "string + rm_so" , " string + rm_eo" ).
regoff_t is a signed integer type capable of storing the largest value that can be stored in either an ptrdiff_t type or a ssize_t type.
Error Reporting
regerror() is used to turn the error codes that can be returned by both regcomp() and regexec() into error message strings.If preg isn't a null pointer, errcode must be the latest error returned from an operation on preg
.
If errbuf_size isn't 0, up to errbuf_size bytes are copied to errbuf ;
the error string is always null-terminated, and truncated to fit.
Freeing
regfree() deinitializes the pattern buffer at*preg
, freeing any associated memory; *preg must have been initialized via regcomp() Return Value
regcomp() returns zero for a successful compilation or an error code for failure.regexec() returns zero for a successful match or REG_NOMATCH for failure.
regerror() returns the size of the buffer required to hold the string.
Errors
The following errors can be returned by regcomp()REG_BADBR Invalid use of back reference operator.
REG_BADPAT Invalid use of pattern operators such as group or list.
REG_BADRPT Invalid use of repetition operators such as using '*' as the first character.
REG_EBRACE Un-matched brace interval operators.
REG_EBRACK Un-matched bracket list operators.
REG_ECOLLATE Invalid collating element.
REG_ECTYPE Unknown character class name.
REG_EEND Nonspecific error. This is not defined by POSIX.
REG_EESCAPE Trailing backslash.
REG_EPAREN Un-matched parenthesis group operators.
REG_ERANGE Invalid use of the range operator; for example, the ending point of the range occurs prior to the starting point.
REG_ESIZE Compiled regular expression requires a pattern buffer larger than 64\ kB. This is not defined by POSIX.
REG_ESPACE The regex routines ran out of memory.
REG_ESUBREG Invalid back reference to a subexpression.
Attributes
For an explanation of the terms used in this section, see attributes(7).Interface | Attribute | Value |
T} | Thread safety | MT-Safe locale |
T} | Thread safety | MT-Safe env |
T} | Thread safety | MT-Safe |
Standards
POSIX.1-2008.History
POSIX.1-2001.Prior to POSIX.1-2008, regoff_t was required to be capable of storing the largest value that can be stored in either an off_t type or a ssize_t type.
Caveats
re_nsub is only required to be initialized if REG_NOSUB wasn't specified, but all known implementations initialize it regardless.Both regex_t and regmatch_t may (and do) have more members, in any order. Always reference them by name.
Examples
#include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <regex.h> #define ARRAY_SIZE(arr) (sizeof((arr)) / sizeof((arr)[0])) static const char *const str = "1) John Driverhacker;\en2) John Doe;\en3) John Foo;\en"; static const char *const re = "John.*o"; int main(void) { static const char *s = str; regex_t regex; regmatch_t pmatch[1]; regoff_t off, len; if (regcomp(®ex, re, REG_NEWLINE)) exit(EXIT_FAILURE); printf("String = \e"%s\e"\en", str); printf("Matches:\en"); for (unsigned int i = 0; ; i++) { if (regexec(®ex, s, ARRAY_SIZE(pmatch), pmatch, 0)) break; off = pmatch[0].rm_so + (s - str); len = pmatch[0].rm_eo - pmatch[0].rm_so; printf("#%zu:\en", i); printf("offset = %jd; length = %jd\en", (intmax_t) off, (intmax_t) len); printf("substring = \e"%.*s\e"\en", len, s + pmatch[0].rm_so); s += pmatch[0].rm_eo; } exit(EXIT_SUCCESS); }