Learning RegEx Basics in Ruby

Written by andresporras | Published 2020/04/22
Tech Story Tags: ruby | regex | beginners | online-courses | learning | tutorial | ruby-on-rails | coding

TLDR In Ruby, regular expressions are defined between two slashes. Ruby provides a shorthand syntax for some of the most common ranges we frequently need. You can use square brackets to check if a text contains a character from a particular group of characters. The most common range is [0-9], [a-z] or [A-Z] for tabs, white spaces, or newlines. You need to check for one or more characters then you need to use + to check the match with the last.via the TL;DR App

Find patterns in strings is a common problem for developers. One of the main scenarios where you could need this is on form validations, has the email the right structure? has the user first name any invalid character?. That's where regular expressions appear.

Let's Begin

In Ruby, regular expressions are defined between two slashes. Let me show you a simple example of regex in ruby:
def find_match(text)
  return true if text=~/hello/

  false
end

puts find_match("hello world") #true
puts find_match("nice to see you") #false
find_match
function will only check if the text contains hello

Match from at least one of the options

If you want to check if a text contains a character from a particular group of characters, you can use square brackets to accomplish this.
def match_range(text)
  return true if text=~/[0123456789]/

  false
end 

puts match_range("number 1") #true
puts match_range("number one") #false
While
/abc/
indicates "a and b and c",
/[abc]/
indicates "a or b or c". Instead of
/[0123456789]/
you can use
/[0-9]/
which generates the same result. Another common range is
/[a-z]/
to identify all the characters from the english alphabet.

Shorthand for ranges

Ruby provides a shorthand syntax for some of the ranges we frequently need. Let's see the most common:
  • \d for [0-9]
  • \s for tabs, white spaces, or newlines
  • \w for [0-9], [a-z] or [A-Z]
  • \D everything except [0-9]
  • \S everything except tabs, white spaces or newlines
  • \W for everything except characters from [0-9], [a-z] or [A-Z]
  • def word_match(text)
      return true if text=~/\w/
    
      false
    end 
    
    puts word_match("!&$%") #false
    puts word_match("a!#$%") #true

    Checking the first and last character of the string

    For now, we have examined for any match in the string, but if we want to check just the beginning, the end, or the whole string we need to add something else. Let's see an example:
    def begin_end_match(text)
      return true if text=~/^\w/ || text=~/\w$/
    
      false
    end 
    
    puts begin_end_match("acd#$%") #true
    puts begin_end_match("#$%acd") #true
    puts begin_end_match("#$%acd#$%") #false
    Use ^ at the beginning of the regex to indicate that you want to match at the first character of the string. Use $ at the final of the regex to check the match with the last character of the string.

    Using + for 1 or more matches

    What if we want to find a pattern with the whole text? isn't enough to use ^ and $ since it will only work if your text has just one character. If you want to check for one or more characters then you need to use +.
    def whole_match(text)
      return true if text=~/^\d+$/
    
      false
    end 
    
    puts whole_match("1") #true
    puts whole_match("123") #true
    puts whole_match("1s3") #false
    In the example we can see the matching works as long as all the characters are digits, it doesn't matter how many digits.

    Using ? for 0 or 1 match

    By using "?" we can specify that a pattern must appear at most once. Let's see an example that returns true if the given text has at most one digit at the begin and then it's must be followed by no digits (at least one).
    def zero_or_one_match(text)
      return true if text=~/^\d?\D+$/
    
      false
    end 
    
    puts zero_or_one_match("1qwe") #true
    puts zero_or_one_match("asd") #true
    puts zero_or_one_match("12swdwe") #false
    You can notice that in order to use different patterns you only need to concatenate them in the same expression. So first we have "^" to define a pattern at the start of the string, then "\d?" to ask for one digit or no digit, this is followed by "\D+" which look for at least one no digit character. Finally, the pattern finishes with "$" which indicates that this string must be finished with the previous pattern described.

    Using {}

    A frequent situation when you need regex is to validate a phone number and you want to check if phone number length is between 7 and 10 characters. To solve this scenario you an use the next match.
    def phone_match(phone)
      return true if phone=~/^\d{7,10}$/
    
      false
    end 
    
    puts phone_match("1234567") #true
    puts phone_match("123456") #false
    puts phone_match("3117654321") #true
    You can also use the curly brackets to ask for a specific length. In the previous example, if you use {7} instead of {7,10} then only the first test will return true.

    Using () to capture groups

    Now, let's think in a compound pattern that uses different rules, and you need to accept that pattern many times. An example of this can be found in the pattern of an IP address.
    def check_if_ip(ip_number)
      return true if ip_number=~/^(\d{1,3}[.]){3}\d{1,3}$/
    
      false
    end 
    
    puts check_if_ip("255.255.255.0") #true
    puts check_if_ip("128.11.1.0") #true
    puts check_if_ip("191.0.0") #false
    puts check_if_ip("191.0.0.0.") #false
    Here the pattern consists in the concatenation of one to three digits followed by a dot. This pattern is repeated three times, and then another pattern of one to three digits with no dot.

    Using * for zero or more

    Let's suppose you want to allow a specific pattern in some part of the string, then you could use "*" to define that the pattern is allowed, so it can appear once or many times or no appear at all. For example, for an integer number when you want to use a dot as delimiter each three decimal units, you want to allow a specific pattern of a dot and three digits to be allowed at the end of the string.
    def check_if_integer(phone)
      return true if phone=~/^\d{1,3}(.\d{3})*$/
    
      false
    end 
    
    puts check_if_integer("23") #true
    puts check_if_integer("1.234") #true
    puts check_if_integer("5.734.123") #true
    puts check_if_integer("24.18") #false

    Accepting any character

    Now let's suppose you want to accept any character in some part of your string. For example, you want to make a very simple validation of an URL, then you validate that it always begins with "www." and finishes with ".com", you can do it this way:
    def url_regex(phone)
      return true if phone=~/^[www.][\d\D]+[.com]$/
    
      false
    end 
    
    puts url_regex("google.com") #false
    puts url_regex("www.google") #false
    puts url_regex("www.google.com") #true
    With "[\d\D]" (which means either digit or not digit ) you can accept any character, you can also get the same result with "[\w\W]".

    Escaping special characters in Ruby Regex

    What if I want to use a special character in my regex?. For example, I want to validate that the string begins with a slash and also contains the character "[" in some part of the text.
    def espacing_regex(phone)
      return true if phone =~ %r{^/[\w\W]*\[[\w\W]*$}
    
      false
    end 
    
    puts espacing_regex('/[') #true
    puts espacing_regex("qwer") #false
    puts espacing_regex('/exit[') #true
    puts espacing_regex('exit/[') #false
    To escape a special character in a regex you could use backslash "\". So for instance, you can use "\[" to verify if the "[" character is in your string. Besides, In the example, I replace the initial and final slashes with %r{}. Both options "\\" and "%r{}" are equivalent but the second one allows me to use slashes in the regex with no need for escaping.

    In conclusion

    Regex can be confusing at first. It is not exactly one of the first 2 or 3 topics you learn in the "introduction to programming course" besides there is a lot of concepts to learn. But with the basic notions explained here, you can make basics validations (and even some complex). As a recommendation to practice more regex try to create a chatbot, you will need to use a lot of all these concepts and many others.

Published by HackerNoon on 2020/04/22