Regular Expression to split on spaces unless in quotes


Question

I would like to use the .Net Regex.Split method to split this input string into an array. It must split on whitespace unless it is enclosed in a quote.

Input: Here is "my string"    it has "six  matches"

Expected output:

  1. Here
  2. is
  3. my string
  4. it
  5. has
  6. six  matches

What pattern do I need? Also do I need to specify any RegexOptions?

1
66
4/3/2009 9:49:03 PM

Accepted Answer

No options required

Regex:

\w+|"[\w\s]*"

C#:

Regex regex = new Regex(@"\w+|""[\w\s]*""");

Or if you need to exclude " characters:

    Regex
        .Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""")
        .Cast<Match>()
        .Select(m => m.Groups["match"].Value)
        .ToList()
        .ForEach(s => Console.WriteLine(s));
62
2/16/2009 7:27:04 PM

Lieven's solution gets most of the way there, and as he states in his comments it's just a matter of changing the ending to Bartek's solution. The end result is the following working regEx:

(?<=")\w[\w\s]*(?=")|\w+|"[\w\s]*"

Input: Here is "my string" it has "six matches"

Output:

  1. Here
  2. is
  3. "my string"
  4. it
  5. has
  6. "six matches"

Unfortunately it's including the quotes. If you instead use the following:

(("((?<token>.*?)(?<!\\)")|(?<token>[\w]+))(\s)*)

And explicitly capture the "token" matches as follows:

    RegexOptions options = RegexOptions.None;
    Regex regex = new Regex( @"((""((?<token>.*?)(?<!\\)"")|(?<token>[\w]+))(\s)*)", options );
    string input = @"   Here is ""my string"" it has   "" six  matches""   ";
    var result = (from Match m in regex.Matches( input ) 
                  where m.Groups[ "token" ].Success
                  select m.Groups[ "token" ].Value).ToList();

    for ( int i = 0; i < result.Count(); i++ )
    {
        Debug.WriteLine( string.Format( "Token[{0}]: '{1}'", i, result[ i ] ) );
    }

Debug output:

Token[0]: 'Here'
Token[1]: 'is'
Token[2]: 'my string'
Token[3]: 'it'
Token[4]: 'has'
Token[5]: ' six  matches'

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon