Conditional Constructs in .NET Regular Expressions

I came across a particular problem that seem trivial at first, but ended up being trickier than I thought. What I wanted to do is split a string based on a delimiter value, but only when the delimiter value is not found inside a quoted string. In my particular case (simplified for discussion), I had two SQL queries concatenated by the ";" delimiter as follows:

SELECT "Jane;Sam;Julie;Homer's \"Salad\"" as f1;SELECT 'This is "a test".  \\Part2\\' as f2;

I wanted to split the strings into:

SELECT "Jane;Sam;Julie;Homer's \"Salad\"" as f1

and

SELECT 'This is "a test".  \\Part2\\' as f2

Note, that in each query, the quoted string is either escaped by single or double quotes, and that the string fields in the query themselves contain the delimiter within them. Using a simple String.Split() function will not yield the correct results. Writing a custom function to perform this split would probably work, but would probably be error prone, and ugly. So regular expressions seemed to be the answer. After spending some time doing some google searches, I couldn't find any example of a good way to perform my split using Regex, as much of the examples I found were about matching quoted strings (with escapes).

The trick to get this to work is to use a combination of Grouping Constructs, Back References, Zero Width Assertions, Conditionals, and Balanced Group Definitions in the regular expression pattern. Balanced Group definitions is a Microsoft innovation and is implemented in .NET Regex, so this is not a very common feature but is the key to get this to properly work.

Let's just layout the solution, then analyze it later. Here is the C# code that implements a SplitSql() function and splits the sample input above:

///////////////////////////////////////////////////////////////////////////////
//
//  Copyright 2008. Tom Kaminski. 
//  
// This program is free software; you can redistribute it and/or modify it under the terms 
// of the GNU General Public License as published by the Free Software Foundation; either 
// version 2 of the License, or (at your option) any later version.

// These programs are distributed in the hope that it will be useful, but WITHOUT ANY 
// WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
//  FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace SplitSql {
    class Program {
        public static string[] SplitSql(string multi_sql) {

            Regex re = new Regex(@"(?>(?(IQ)(?(ESC).(?<-ESC>)|
                   \\(?<ESC>))|(?!))|(?(IQ)\k<QUOTE>(?<-IQ>)
                   |(?<QUOTE>[""'])(?<IQ>))|(?(IQ).|[^;]))+",RegexOptions.IgnorePatternWhitespace);          

            List<string> sqls = new List<string>();

            Match m = re.Match(multi_sql);
            while (m.Success) {
                string result = m.Groups[0].ToString().Trim();
                if (result.Length > 0) sqls.Add(result);
                m = m.NextMatch();
            }
            return sqls.ToArray();
        }

        static void Main(string[] args) {
            string sql = @"SELECT ""Jane;Sam;Julie;Homer's \""Salad\""\"" as f1; 
                              SELECT 'This is ""a test"".  \\Part2\\' as f2;";
            string[] ss = SplitSql(sql);
        }
    }
}

Most of the program is trivial except for the regular expression pattern, which is expanded, commented here:

Regex re = new Regex(

@"  (?>                                 " +     // Specify non-backtracking subexpression  (only match sub-expression)

                                                // HANDLE ESCAPED CHARACTERS
@"      (?(IQ)                          " +     // if (IQ {In Quote}) then
@"         (?(ESC)                      " +     //      if(ESCAPE MODE) then
@"              .(?<-ESC>)              " +     //          Consume the next character(it is escaped!)
@"         |                            " +     //      else
@"              \\(?<ESC>))|(?!))       " +     //          Not in escape mode...but if \ is detected, enable ESCAPE MODE
                                                //          otherwise, zero width assert: (?!) never matches - do nothing

@"    |                                 " +     // OR

                                                // DETECT IF IN QUOTE, OR NOT
@"      (?(IQ)                          " +     // if (IQ {In Quote} ) then
@"          \k<QUOTE>(?<-IQ>)           " +     //      If captured quote character matched, then Pop the IQ named group. 
                                                //      No longer in quote.  We don't have to worry about preceeding \ as 
                                                //      escape characters are already handled in escape section!!
@"          |                           " +     // else
@"          (?<QUOTE>[""'])(?<IQ>))     " +     //      Try to match the begin quote character.  The begin quote character
                                                //      is assigned to the named group <QUOTE> for referencing later.  If
                                                //      matched, then indicate that we in a quote. Push IQ.

@"   |                                  " +     // OR 

                                                // CONSUME REGULAR CHARACTERS 
@"      (?(IQ)                          " +     // if (IQ {In Quote} ) then
@"          .                           " +     //      Any character is allowed when we are in the quote.
@"      |                               " +     // else
@"          [^;])                       " +     //      Any non-delimiting character is allowed. 


@"  )+",                                        // Allow the subexpression to be matched one more more times
                                                // Will terminate if end of string, or delimit character is found when 
                                                // not in a quote.
       
RegexOptions.IgnorePatternWhitespace);

The above regular expressions demonstrates how to change modes in the regular expression pattern and conditionally match based on what was detected before. This enables balanced groups (start tags/end tags etc), and escape handling for example. In theory, these concepts and techniques can be used to build a robust parser.

Leave me a comment if you find this code and explanation useful!!!

Comments

it’s good to see this

it’s good to see this information in your post, i was looking the same but there was not any proper resource..
custom writing | dissertation

re

We will say that you are good expert in your topic just about this good topic writing. Thus, some article submission can utilize your great project for various services.

respond this post

The perfect knowledge about this post could be a base for custom thesis or for some dissertation writing service and be used at dissertation work.

respond this post

Would you like to get resume company, that conform the range of study you desire?. You can count on our resume writers, as you rely on yourself. Thanks because that is the useful fact

reply

Professionalism can give high quality and famous custom essays writing companies can write the term paper of solid quality. This is a proper way to success.

reply this post

Lots of different students dream just about correctly written term papers, though they don’t really realize that the professional english writing service could do the best custom term papers writing of great quality!

Conditional Constructs in .NET Regular Expressions

Very useful explanation.....

free baby samples

Conditional Constructs in .NET Regular Expressions

You have put in lot of effort here,but have pulled a good one here...

school uniforms

Conditional Constructs in .NET Regular Expressions

You have pulled a good one here......

g@free baby samples

Conditional Constructs in .NET Regular Expressions

You have pulled a good one that is the code and its explanation....

f@ free baby coupons

Conditional Constructs in .NET Regular Expressions

I would be keen on more articles on this subjects in the near future...

e@baby coupons

Conditional Constructs in .NET Regular Expressions

you deserve a pat for this one looking for more....
d@free baby coupons

Conditional Constructs in .NET Regular Expressions

Very useful code and the detailed explaination was of immense use......

c@auto insurance quotes online

Conditional Constructs in .NET Regular Expressions

Thanks for an excellent explanation was quite useful....

ggg@ipad free

Conditional Constructs in .NET Regular Expressions

Very informative it can be useful to many......

hhh@weight watchers online

re

I am willing to state that a professional essays online service seems a good assistant on the road of biography term paper creating. Hence, all students should utilize it anytime they need buy essay online.

Conditional Constructs in .NET Regular Expressions

Very useful article,you deserve a pat for sure.....

hgm@free samples