Regular Expressions!

What the ^.{4}$.

Introduction

The following pages are intended to give you a solid foundation in how to write regular expressions (Also referred to as regex or RE's). A regular expression is a means for describing a particular pattern of characters of text.

That's kinda a bit abstract so let's try to put it into perspective. With regular expressions you can:

  • Search for particular items within a large body of text. eg. You may wish to identify all email addresses in some content using a text editor.
  • Replace particular items. eg. You may wish to clean up some poorly formatted HTML by replacing all uppercase tags with lowercase equivalents in a text editor.
  • Validate input. eg. You may want to check that a password meets certain criteria such as, a mix of uppercase and lowercase, digits and punctuation etc in a program you are writing.
  • Coordinate actions. eg. You may wish to process certain files in a directory, but only if they meet particular conditions, in work you are doing on the command line.
  • Reformat text. eg. You may export data from one program as a text file then modify its layout so you may import it into another program using a text editor.
  • and more...

with little effort.

Let's look at a very simple example. The following regular expression:

b[ia]

Will match every instance of the character b followed by either the character i or a So if we ran that regular expression over the following text it would match as follows:

b[ia]
The bat took a bite out of the big boring apple.

That's not very useful, or exciting, but as we delve further into regular expressions the examples will start to become more practical and powerful.

From here onwards I will illustrate regular expressions as above. The blue text is the regular expression and the text below it is what we are testing it on. Anything highlighted in blue is text which the regular expression has matched.

Outline

This Regular Expressions tutorial is divided into 3 sections. In general I recommend you work through them in order but if you've come here just to learn about a specific topic then who am I to slow you down, just head straight on over.

Keep reading below to get started with regular expressions or skip to one of the following sections.

  1. Regular expressions basics - the basic building blocks of regular expressions.
  2. Regular expressions intermediate - slightly harder regular expression features.
  3. Advanced regular expressions - features which are a little harder to get your head around.
  4. Examples - Some examples to give you a taste of what Regular Expressions can do.
  5. Cheat Sheet - Because remembering all those different meta characters can be difficult.

So Where and How do I use Regular Expressions?

Regular expressions are a feature of many pieces of software and nearly all programming languages. Probably you are using tools right now which can take advantage of regular expressions, and once you master them you'll be able to do even more with them.

Regular expressions aren't a specific feature, so you won't find an entry in a menu which says 'Regular Expression'. Instead they are typically used where you may provide input. A good example of this is the search function in text editors.

For the purposes of experimenting with regular expressions, the search function in text editors is a good way to practice. A text editor which is available for Windows, Linux and OSX is:

Komodo Edit

It's quite a nice editor which also does syntax highlighting and has a built in FTP client.

Note: by default it's search functionality is basic but you can enable regular expressions. To do this, go into:

Preferences/ Settings -> Find -> Incremental Search -> Uses

And change the value from Plain text to Regular Expressions

This is not the only editor so if you prefer another editor then you should see if it supports regular expressions, there is a good chance that it does.

Learning Regular Expressions

When you first start learning and playing about with regular expressions you will regularly create expressions which do not work properly. Either they will match many things they shouldn't, or they will match nothing. When this happens don't worry. It's part of learning and it's generally easy to get yourself out of trouble. I find the following approach to be effective for fixing oddly behaving regular expressions.

  • Break the regular expression down into it's individual components (So for instance, in the regular expression example above it would become b and [ia]).
  • Speak out aloud the steps of the expression. So for the above expression I might say: "First it matches a b, followed by either an i or an a" (This step may sound silly but trust me it works. You use different parts of your brain when you speak as opposed to think internally.)
  • Build the regular expression incrementally, testing as you go.

If you take this approach then you can easily narrow down exactly where the regular expression is going wrong. You may also find our Problem Solving Tutorial to be worth a quick read.

The right test data

Another avenue of attack is creating some test data, then tweaking it to better understand what your regular expression really is matching and not matching.

Creating good test data is really important in making sure your regular expressions behave the way they should (especially once you start diving into more complex expressions). Your regular expression may match some right matches but not all of them. Alternatively, it may match every correct bit of text but also some that it shouldn't. With practice you'll get better at creating test data to fully validate your expressions.

As an example. the following expression matches a b followed by any character. If my task was to match a b followed by either an i or a (as above), I might test it with the following data and come to the incorrect conclusion that it works.

b.
The bat took a bite out of the big apple.

The thing is, I gave it data which obviously should match, but I didn't give it data which is close but not quite. When you create your test data, make sure you also include those edge cases.

b.
The bat took a bite out of the big boring apple.

Some of this may not entirely make sense right now but it will as you start to dabble with regular expressions so keep it in mind as you progress through the next few pages.