Add spaces before Capital Letters


Question

Given the string "ThisStringHasNoSpacesButItDoesHaveCapitals" what is the best way to add spaces before the capital letters. So the end string would be "This String Has No Spaces But It Does Have Capitals"

Here is my attempt with a RegEx

System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
1
178
11/7/2008 4:36:35 PM

Accepted Answer

The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)

This function

string AddSpacesToSentence(string text, bool preserveAcronyms)
{
        if (string.IsNullOrWhiteSpace(text))
           return string.Empty;
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]))
                if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
                    (preserveAcronyms && char.IsUpper(text[i - 1]) && 
                     i < text.Length - 1 && !char.IsUpper(text[i + 1])))
                    newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}

Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).

It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.

Hope this helps :)

Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).

On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.

Update 2 Will it take Acronyms into account? It will now! The logic of the if statment is fairly obscure, as you can see expanding it to this ...

if (char.IsUpper(text[i]))
    if (char.IsUpper(text[i - 1]))
        if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
            newText.Append(' ');
        else ;
    else if (text[i - 1] != ' ')
        newText.Append(' ');

... doesn't help at all!

Here's the original simple method that doesn't worry about Acronyms

string AddSpacesToSentence(string text)
{
        if (string.IsNullOrWhiteSpace(text))
           return "";
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]) && text[i - 1] != ' ')
                newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}
189
6/13/2014 1:21:19 PM

Your solution has an issue in that it puts a space before the first letter T so you get

" This String..." instead of "This String..."

To get around this look for the lower case letter preceding it as well and then insert the space in the middle:

newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");

Edit 1:

If you use @"(\p{Ll})(\p{Lu})" it will pick up accented characters as well.

Edit 2:

If your strings can contain acronyms you may want to use this:

newValue = Regex.Replace(value, @"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");

So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon