Oracle regexp: a smart and reliable way to convert character separated values into rows

I’m pretty sure that everybody at least once in his professional coding life had the need to convert a string of character separated values into tokens or rows.
Let’s be honest, we all ended writing something like

for k in 1..length(mystring) loop
  if substr(mystring), k, 1) = ',' then
    ...bla...
    ...bla...
  else
    strBuffer := strBuffer || substr(mystring, k, 1);
    ...bla...
    ...bla...
  end if;
end loop;

for k in 1..length(mystring) loop

if substr(mystring), k, 1) = ',' then

...bla...

else

strBuffer := strBuffer || substr(mystring, k, 1);

...bla...

end if;

end loop;

Problem is that among all those “bla” it’s quite easy to forget something (is there always a separator at the very end of the string? Any space between values and separators? Did I clean the buffer?) and in such code bugs might find a comfortable home.
Why should we write such risky code to do something that can be achieved differently in a completely reliable, fast and elegant way? And all of this just using one powerful feature that Oracle kindly introduced with version 10g (drum roll): regular expressions!
Yes folks, with regular expressions it’s very easy to extract values from a string and without the need to be worried about all code traps mentioned before.
For our purpose we’ll use the function regexp_substr

regexp_substr(string, pattern, position, occurrence, match_options)

1	regexp_substr(string, pattern, position, occurrence, match_options)

You can find a complete description of this function in the official Oracle documentation so I will only explain how to get to our goal.
This function returns from string and starting from position the occurrence of substring that matches the pattern according to standard of regular expressions rules (you need at least a basic understanding of those standards to take maximum advantage of this feature). Last parameter match_options is not mandatory and not important for this demo; it’s used to alter the behavior by enabling, for instance, case-sensitive, multi-row searches and so on.
Let’s go back to our problem and write the query

select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, 1) from dual;

1	select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, 1) from dual;

Translated into “human language” it means “read the string ‘Val1,Val2,Val3, split it using ‘,’ as a separator, start from position 1 and return me the occurrence number 1 of the resulting set of substrings“.
In fact when executing it we get

Val1

Val1

as a result. The pattern ‘[^,]+’ specifies that we want the string to be split at every comma. But we don’t want only the first, we want all the tokens, right? To do that we can use a neat trick by introducing a “connect by” inside our query and replacing the occurrence parameter with the level pseudo-column

select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, level)
from dual
connect by level <= 3

select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, level)

from dual

connect by level <= 3

Once executed this will return

REGEXP_SUBSTR('VAL1,VAL2,VAL3'
--------------------------------------------------------
Val1
Val2
Val3

REGEXP_SUBSTR('VAL1,VAL2,VAL3'

--------------------------------------------------------

Val1

Val2

Val3

That’s exactly what we wanted! But wait, in our query we had to explicitly specify the number of values 3 writing that “connect by level <= 3” but normally we don’t know in advance how many values are in the string.
Don’t worry, here comes a nice solution for that as well. There is another regex function regexp_count that returns the number of values we are looking for.
Cool, so let’s rewrite our query

select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, level)
from dual
connect by level <= regexp_count('Val1,Val2,Val3', '[^,]+');

select regexp_substr('Val1,Val2,Val3', '[^,]+', 1, level)

from dual

connect by level <= regexp_count('Val1,Val2,Val3', '[^,]+');

and if we run it we get all our values

REGEXP_SUBSTR('VAL1,VAL2,VAL3'
--------------------------------------------------------
Val1
Val2
Val3

REGEXP_SUBSTR('VAL1,VAL2,VAL3'

--------------------------------------------------------

Val1

Val2

Val3

Obviously we can use this in Pl/Sql code, a generic example could be

declare
pPar     varchar2(100) := 'Val1,Val2,Val3'; -- Input string
pNumPars number;
pToken   varchar2(30);
begin
pNumPars := regexp_count(pPar, '[^,]+');
dbms_output.put_line('Found '||pNumPars||' parameters:');
for i in 1..pNumPars loop
  pToken := regexp_substr(pPar, '[^,]+', 1, i);
  dbms_output.put_line(pToken);
end loop;
end;
/

declare

pPar varchar2(100) := 'Val1,Val2,Val3'; -- Input string

pNumPars number;

pToken varchar2(30);

begin

pNumPars := regexp_count(pPar, '[^,]+');

dbms_output.put_line('Found '||pNumPars||' parameters:');

for i in 1..pNumPars loop

pToken := regexp_substr(pPar, '[^,]+', 1, i);

dbms_output.put_line(pToken);

end loop;

end;

that has the following output

Found 3 parameters:
Val1
Val2
Val3

Found 3 parameters:

Val1

Val2

Val3

Last thing we can do in case there might be leading/trailing spaces between values is to add a simple TRIM function.
This way the code

declare
pPar     varchar2(100) := 'Val1,  Val2 ,Val3  '; -- Input string with spaces
pNumPars number;
pToken   varchar2(30);
begin
pNumPars := regexp_count(pPar, '[^,]+');
dbms_output.put_line('Found '||pNumPars||' parameters:');
for i in 1..pNumPars loop
  pToken := TRIM(regexp_substr(pPar, '[^,]+', 1, i)); -- TRIM is used
  dbms_output.put_line(pToken);
end loop;
end;
/

declare

pPar varchar2(100) := 'Val1, Val2 ,Val3 '; -- Input string with spaces

pNumPars number;

pToken varchar2(30);

begin

pNumPars := regexp_count(pPar, '[^,]+');

dbms_output.put_line('Found '||pNumPars||' parameters:');

for i in 1..pNumPars loop

pToken := TRIM(regexp_substr(pPar, '[^,]+', 1, i)); -- TRIM is used

dbms_output.put_line(pToken);

end loop;

end;

will extract all 3 values without any space. We can actually achieve the same by using only regular expressions simply adding a space to the pattern in the list of separators (no TRIM is needed in this case anymore)

declare
pPar     varchar2(100) := 'Val1,  Val2 ,Val3  '; -- Input string with spaces
pNumPars number;
pToken   varchar2(30);
begin
pNumPars := regexp_count(pPar, '[^, ]+');
dbms_output.put_line('Found '||pNumPars||' parameters:');
for i in 1..pNumPars loop
  pToken := regexp_substr(pPar, '[^, ]+', 1, i); -- TRIM is not needed
  dbms_output.put_line(pToken);
end loop;
end;
/

declare

pPar varchar2(100) := 'Val1, Val2 ,Val3 '; -- Input string with spaces

pNumPars number;

pToken varchar2(30);

begin

pNumPars := regexp_count(pPar, '[^, ]+');

dbms_output.put_line('Found '||pNumPars||' parameters:');

for i in 1..pNumPars loop

pToken := regexp_substr(pPar, '[^, ]+', 1, i); -- TRIM is not needed

dbms_output.put_line(pToken);

end loop;

end;

Regexp functions are a really powerful tool and can also be used in check constraints to ensure that data adheres to specific formats. Imagine how easy it would be to validate an email column by using regular expressions, no need to write any custom (and error-prone) function!
But this is another story…

Facebook

Twitter

2 thoughts on “Oracle regexp: a smart and reliable way to convert character separated values into rows”

- August 3, 2016 at 20:04 Rajnish kumar

Hey Rob,
The content was very helpful though I cannot use them in my field, and the examples are really great with simple language. Must admit it was happy reading.

1. - August 3, 2016 at 20:20 Roberto Rigliaco
  
  Thanks Rajnish, glad you enjoyed it!

Oracle Findings

2 thoughts on “Oracle regexp: a smart and reliable way to convert character separated values into rows”

Leave a Reply Cancel reply