Oracle REGEXP Functions
Explore advanced text matching with Oracle REGEXP functions.
The Oracle REGEXP_ functions are used to search and manipulate strings using simple string patterns. These functions can be used in SQL and PL/SQL statements. The expression is made up of special characters, which have their own meaning. This expression is then used in a regular expression function, and then the result is used in your query.
Why Use Regular Expressions?
There are several string operations in Oracle SQL that let you perform some comparisons. For example, you can use UPPER to find upper case values, and you can use LIKE along with the wildcard characters %, _ and * to find specific values. These functions are insufficient for more intricate inspections, though. Regular expressions can be used to:
Check phone number formats
Check email address formats
Check URLs match a specific format
Check any other type of string value to see if it matches the desired format
There are a few functions in Oracle SQL that can be used with regular expressions:
REGEXP_LIKE
REGEXP_INSTR
REGEXP_REPLACE
REGEXP_SUBSTR
REGEXP_COUNT (added in Oracle 11g)
Oracle REGEXP_LIKE Function
The REGEXP_LIKE function looks for a certain pattern in a column. It is utilised in a WHERE clause to determine if the row should be included in the result set. The syntax for the REGEXP_LIKE function is
REGEXP_LIKE (source_string, pattern [, match_parameter] )Where:
source_string (mandatory): The value that is searched in. It can be any data type of CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
pattern (mandatory): This is the regular expression that you provide. It can be up to 512 bytes.
match_parameter (optional): This allows you to change the default matching behaviour of the function, which can be one or more of below:
“i”: case-insensitive matching
“c”: case-sensitive matching
“n”: allows the “.” character to match the newline character instead of any character
“m”: treats the source_string value as multiple lines, where ^ is the start of a line and $ is the end of a line.
If you specify multiple match_parameter values that contradict each other (e.g. “ci” which matches to case-sensitive and case-insensitive), then Oracle uses the last value. In this example, it will use “i” instead of “c”.
If you don’t specify a match parameter, then:
The default case sensitivity is determined by the parameter NLS_SORT
The period character doesn’t match the newline character
The source string is treated as a single line and not multiple lines
REGEXP_LIKE Example 1
This example uses just a source and pattern value
select first_name from hr.employees where regexp_like(first_name,'W+');This will return values that start with W and have one or more characters following them

REGEXP_LIKE Example 2
This example looks for values where there are at least two consecutive e characters

REGEXP_LIKE Example 3
This example looks for values that have the letter C at any position in the string

REGEXP_LIKE Example 4
This example looks for values that have either a C or a c in it at any position where i denotes the search is case-insensitive

REGEXP_LIKE Example 5
This example shows results that have an uppercase V where c denotes the search is case-sensitive

REGEXP_LIKE Example 6
This example shows values that contains digits

REGEXP_LIKE Example 7
This example shows values that have alphabetical characters

Oracle REGEXP_INSTR Function
The Oracle REGEXP_INSTR function lets you search a string for a regular expression pattern, and returns a number that indicates where the pattern was found. The syntax for the REGEXP_INSTR function is
REGEXP_INSTR (
source_string, pattern [, position [, occurrence
[, return_option [, match_parameter [, sub_expression ] ] ] ] ])where:
source_string (mandatory): This is the character string that the expression is searched in. It can be any of CHAR, VARCHAR2, NCHAR, NVACHAR2, CLOB, or NCLOB.
pattern (mandatory): This is the regular expression that is used to search within the source_string. It can be any of CHAR, VARCHAR, NCHAR, or NVARCHAR2, and can be up to 512 bytes.
position (optional): This is the position in the source_string where the function should begin the search for the pattern. It must be a positive integer, and the default value is 1 (the search begins at the first character).
occurrence (optional): This is a positive integer that indicates which occurrence of the pattern within the source_string the function should search for. The default value is 1, which means the function finds the first occurrence. If the value is greater than 1, then the function looks for the second occurrence (or further occurrences) after the first occurrence is found.
return_option (optional): This lets you specify what happens when an occurrence is found. If you specify 0, which is the default, the function returns the position of the first character of the occurrence. If you specify 1, then the function returns the position of the character after the occurrence.
match_parameter (optional): This allows you to change the default matching behaviour of the function, which can be one or more of:
“i”: case-insensitive matching
“c”: case-sensitive matching
“n”: allows the “.” character to match the newline character instead of any character
“m”: treats the source_string value as multiple lines, where ^ is the start of a line and $ is the end of a line.
sub_expression (optional): If the pattern has subexpressions, this value indicates which subexpression is used in the function. The default value is 0.
REGEXP_INSTR Example 1
This example finds the position of the ee within a string

REGEXP_INSTR Example 2
This example finds the position of a string that starts with either A, B, or C, and then has 4 alphabetical characters following it

REGEXP_INSTR Example 3
This example finds the position of strings that have two vowels in a row. The REGEXP_INSTR value is different for each row depending on where the two vowels start

REGEXP_INSTR Example 4
This example shows the position of values where there are two vowels in a row, after position 4

REGEXP_INSTR Example 5
This example shows the position of the second occurrence in a string where there is a vowel after position 5

REGEXP_INSTR Example 6
This example shows the position of values that have an A, B, C, D, or E, followed by a vowel, using a case-insensitive search

Oracle REGEXP_REPLACE Function
The Oracle REGEXP_REPLACE function is used to search a string for a regular expression and replace it with other characters. It’s an extension of the standard Oracle REPLACE function, but REPLACE does not support regular expressions where REGEXP_REPLACE does. The syntax for this function is
REGEXP_REPLACE (source_string, pattern
[, replace_string [, position [, occurrence [, match_parameter ] ] ] ])Where:
Source_string (mandatory): This is the string to be searched in for this function. It is usually a character value and can be any of CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
pattern (mandatory): This is the regular expression and is used to search within the source_string. It can be any of CHAR, VARCHAR2, NCHAR, or NVARCHAR2, and can be up to 512 bytes.
replace_string (optional): This is a value that is used to replace the occurrences of the pattern within the source_string. It can be any of CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB. This replace_string can contain backreferences to subexpressions in the pattern by using backslashes (\), which I will show you in the examples below.
position (optional): This is the position in the source_string where the function should begin the search for the pattern. It must be a positive integer, and the default value is 1 (the search begins at the first character).
occurrence (optional): This is a positive integer that indicates which occurrence of the pattern within the source_string the function should search for. The default value is 1, which means the function finds the first occurrence. If the value is greater than 1, then the function looks for the second occurrence (or further occurrences) after the first occurrence is found.
match_parameter (optional): This allows you to change the default matching behaviour of the function, which can be one or more of:
“i”: case-insensitive matching
“c”: case-sensitive matching
“n”: allows the “.” character to match the newline character instead of any character
“m”: treats the source_string value as multiple lines, where ^ is the start of a line and $ is the end of a line.
The return value is a VARCHAR2 if the source_string is not a CLOB or NCLOB, and CLOB if it is.
REGEXP_REPLACE Example 1
This example removes all occurrences of two consecutive vowels

REGEXP_REPLACE Example 2
This example replaces two consecutive vowels with two dashes

REGEXP_REPLACE Example 3
This example replaces two consecutive vowels of the same vowel with two dashes

REGEXP_REPLACE Example 4
This example replaces any digits in the string with a + and then the digit

REGEXP_REPLACE Example 5
This example replaces any vowel character followed by any letter from a to m, starting from position 4, with two dashes

REGEXP_REPLACE Example 6
This example replaces the second occurrence of any vowel character followed by any letter from a to m, starting from position 1, with two dashes

REGEXP_REPLACE Example 7
This example replaces more than one consecutive capitalised letter with an underscore, starting from position 2

Oracle REGEXP_SUBSTR Function
The Oracle REGEXP_SUBSTR function allows you to search for a string inside another string, using regular expressions. It’s similar to the REGEXP_INSTR function, but instead of returning the position of the string, it returns the substring. One of the uses is to split a string into separate rows.
It extends the SUBSTR function but allows the user of regular expressions. The function returns a VARCHAR2 or CLOB data type, depending on what has been provided as an input. The syntax of the REGEXP_SUBSTR function is
REGEXP_SUBSTR (source_string, pattern [, position [, occurrence [, match_parameter ] ] ])Where:
source_string (mandatory): This is the string to be searched inside of. It is usually the larger of the two parameters, and is usually a character value and can be any of CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
pattern (mandatory): This is the regular expression and is used to search within the source_string. It can be any of CHAR, VARCHAR2, NCHAR, or NVARCHAR2, and can be up to 512 bytes.
position (optional): This is the position in the source_string where the function should begin the search for the pattern. It must be a positive integer, and the default value is 1 (the search begins at the first character).
occurrence (optional): This is a positive integer that indicates which occurrence of the pattern within the source_string the function should search for. The default value is 1, which means the function finds the first occurrence. If the value is greater than 1, then the function looks for the second occurrence (or further occurrences) after the first occurrence is found.
match_parameter (optional): This allows you to change the default matching behaviour of the function, which can be one or more of:
“i”: case-insensitive matching
“c”: case-sensitive matching
“n”: allows the “.” character to match the newline character instead of any character
“m”: treats the source_string value as multiple lines, where ^ is the start of a line and $ is the end of a line.
REGEXP_SUBSTR Example 1
This example finds a substring that matches two consecutive vowels

REGEXP_SUBSTR Example 2
This example finds all consecutive vowels in a string that are the same, and returns NULL for those that don’t have consecutive vowels that are the same

REGEXP_SUBSTR Example 3
This example finds substrings that contain one or more digits

REGEXP_SUBSTR Example 4
This example finds substrings that have a vowel followed by a letter from a to m, starting from position 4

REGEXP_SUBSTR Example 5
This example finds the second occurrence of substrings that have a vowel followed by a letter from a to m

