Tuesday 24 June 2014

Regular expressions in C#

Recently I got an excuse to learn and utilise regular expressions in C# and while I found that interesting at first from the sheer challenge of learning it, I am uncertain as to whether this should be abused within a development language.

To put it simply, it works and it is wonderful but code readability seems to suffer immensely. Lets look at an example.

We have a string variable called 's' that contains "The quick brown fox jumped over the lazy dog.". We want a word count of that.
If we think about it we immediately can see that each word is separated by spaces. So all we do is split the string up using the space as delimiter and count the array size right?

//s is the string.
var stringarray = s.Split(' ');
int count = stringarray.Count;


So in two lines of code we have our string array and the code is reasonably readable. Given the names used on the string and the methods, it is hard to imagine any programmer worth their salt having difficulty reading that code.

But what if we used regular expressions? Before I use regular expression in the next block of code, please be aware that there will be better expressions to do this in all likely hood and my implementation is more of a beginners understanding of C# regular expressions.


//s is the string.
Regex r = new Regex(@"\w+");
int count = r.Matches(s).Count;

This regular expression works too. The \w matches words equivalent to [A-Za-z0-9]. The + causes it to check for multiple occurrences.

Both of these methods work but which one is more readable that is the question. It is clear that the first version with the "Split" is more readable however regular expressions are more powerful. On MSDN and other sources you can find a list of regular expressions you can use to do many different operations including backtracking and it is quite useful in analysing text files. C# provides a multitude of ways to perform operations and its extensive .Net capability is amazing however I would recommend that rather than using it to impress, only use it when it is absolutely necessary.

No comments :

Post a Comment