Splitting strings with MySQL

Suppose we’ve got a string that we’d like to parse it into a table and run DISTINCT over.

A613; A613; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A613;

MySQL would not automatically convert a string into a table and there’s no library function to do the job for us. But we can leverage a classic SUBSTRING_INDEX(SUBSTRING_INDEX(newString, ‘;’, N), ‘;’, -1) expression to extract Nth item from the string: 

NSUBSTRING_INDEX(SUBSTRING_INDEX(newString, ‘;’, N), ‘;’, -1)
1A613
2A613
3A095

Generating numeric sequence

For the SUBSTRING_INDEX trick to work we need to generate ourselves a long enough sequence of integers so we can cross join it onto our source string. Check out the folowing sequence generator:

select (th*1000+h*100+t*10+u+1) x from
(select 0 th union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) A,
(select 0 h union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) B,
(select 0 t union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) C,
(select 0 u union select 1 union select 2 union select 3 union select 4 union
select 5 union select 6 union select 7 union select 8 union select 9) D
order by x;

yes, this particular example is limited to 10k items but it can be expanded upon. It will come down to knowing your data and ensuring you’ve got enough of these to feed to the splitter. Or use alternatives.

Stopping enumeration

After we’ve got the sequence going we need to find a number of meaningful elements in the source string. We basically need to count the delimiters in our source string. Problem is, MySQL doesn’t have a built in function for that. Mathematics to the rescue though:

SELECT  (
            CHAR_LENGTH(val)
            - CHAR_LENGTH( REPLACE ( val, ";", "") ) 
        ) / CHAR_LENGTH(";")        
   AS count    
FROM "source string"

Putting it together

SELECT distinct SUBSTRING_INDEX(SUBSTRING_INDEX(val, '; ', x), '; ', -1) FROM
(
  select (t*10+u+1) x, val from
  (select 0 t union select 1 union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9) A,
  (select 0 u union select 1 union select 2 union select 3 union select 4 union
  select 5 union select 6 union select 7 union select 8 union select 9) B
	CROSS JOIN (SELECT 'A613; A613; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A095; A613' as val
               ) as seq
	ORDER BY x
) as raw_data
WHERE x <= (
            CHAR_LENGTH(val)
            - CHAR_LENGTH( REPLACE ( val, ";", "") ) 
        ) / CHAR_LENGTH(";")