I’m pretty sure that everybody at least once in his professional coding life had the need to convert a string of character separated values into tokens or rows.
Let’s be honest, we all ended writing something like
1 2 3 4 5 6 7 8 9 10 |
for k in 1..length(mystring) loop if substr(mystring), k, 1) = ',' then ...bla... ...bla... else strBuffer := strBuffer || substr(mystring, k, 1); ...bla... ...bla... end if; end loop; |
Problem is that among all those “bla” it’s quite easy to forget something (is there always a separator at the very end of the string? Any space between values and separators? Did I clean the buffer?) and in such code bugs might find a comfortable home.
Why should we write such risky code to do something that can be achieved differently in a completely reliable, fast and elegant way? And all of this just using one powerful feature that Oracle kindly introduced with version 10g (drum roll): regular expressions!
Yes folks, with regular expressions it’s very easy to extract values from a string and without the need to be worried about all code traps mentioned before.
For our purpose we’ll use the function regexp_substr
1 |
regexp_substr(string, pattern, position, occurrence, match_options) |
You can find a complete description of this function in the official Oracle documentation so I will only explain how to get to our goal.
This function returns from string and starting from position the occurrence of substring that matches the pattern according to standard of regular expressions rules (you need at least a basic understanding of those standards to take maximum advantage of this feature). Last parameter match_options is not mandatory and not important for this demo; it’s used to alter the behavior by enabling, for instance, case-sensitive, multi-row searches and so on.
Let’s go back to our problem and write the query
1 |
select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, 1) from dual; |
Translated into “human language” it means “read the string ‘Val1,Val2,Val3, split it using ‘,’ as a separator, start from position 1 and return me the occurrence number 1 of the resulting set of substrings“.
In fact when executing it we get
1 |
Val1 |
as a result. The pattern ‘[^,]+’ specifies that we want the string to be split at every comma. But we don’t want only the first, we want all the tokens, right? To do that we can use a neat trick by introducing a “connect by” inside our query and replacing the occurrence parameter with the level pseudo-column
1 2 3 |
select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, level) from dual connect by level <= 3 |
Once executed this will return
1 2 3 4 5 |
REGEXP_SUBSTR('VAL1,VAL2,VAL3' -------------------------------------------------------- Val1 Val2 Val3 |
That’s exactly what we wanted! But wait, in our query we had to explicitly specify the number of values 3 writing that “connect by level <= 3” but normally we don’t know in advance how many values are in the string.
Don’t worry, here comes a nice solution for that as well. There is another regex function regexp_count that returns the number of values we are looking for.
Cool, so let’s rewrite our query
1 2 3 |
select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, level) from dual connect by level <= regexp_count('Val1,Val2,Val3', '[^,]+'); |
and if we run it we get all our values
1 2 3 4 5 |
REGEXP_SUBSTR('VAL1,VAL2,VAL3' -------------------------------------------------------- Val1 Val2 Val3 |
Obviously we can use this in Pl/Sql code, a generic example could be
1 2 3 4 5 6 7 8 9 10 11 12 13 |
declare pPar varchar2(100) := 'Val1,Val2,Val3'; -- Input string pNumPars number; pToken varchar2(30); begin pNumPars := regexp_count(pPar, '[^,]+'); dbms_output.put_line('Found '||pNumPars||' parameters:'); for i in 1..pNumPars loop pToken := regexp_substr(pPar, '[^,]+', 1, i); dbms_output.put_line(pToken); end loop; end; / |
that has the following output
1 2 3 4 |
Found 3 parameters: Val1 Val2 Val3 |
Last thing we can do in case there might be leading/trailing spaces between values is to add a simple TRIM function.
This way the code
1 2 3 4 5 6 7 8 9 10 11 12 13 |
declare pPar varchar2(100) := 'Val1, Val2 ,Val3 '; -- Input string with spaces pNumPars number; pToken varchar2(30); begin pNumPars := regexp_count(pPar, '[^,]+'); dbms_output.put_line('Found '||pNumPars||' parameters:'); for i in 1..pNumPars loop pToken := TRIM(regexp_substr(pPar, '[^,]+', 1, i)); -- TRIM is used dbms_output.put_line(pToken); end loop; end; / |
will extract all 3 values without any space. We can actually achieve the same by using only regular expressions simply adding a space to the pattern in the list of separators (no TRIM is needed in this case anymore)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
declare pPar varchar2(100) := 'Val1, Val2 ,Val3 '; -- Input string with spaces pNumPars number; pToken varchar2(30); begin pNumPars := regexp_count(pPar, '[^, ]+'); dbms_output.put_line('Found '||pNumPars||' parameters:'); for i in 1..pNumPars loop pToken := regexp_substr(pPar, '[^, ]+', 1, i); -- TRIM is not needed dbms_output.put_line(pToken); end loop; end; / |
Regexp functions are a really powerful tool and can also be used in check constraints to ensure that data adheres to specific formats. Imagine how easy it would be to validate an email column by using regular expressions, no need to write any custom (and error-prone) function!
But this is another story…
Hey Rob,
The content was very helpful though I cannot use them in my field, and the examples are really great with simple language. Must admit it was happy reading.
Thanks Rajnish, glad you enjoyed it!