Company Email Address

Data Type Requirements

  1. Search case-insensitively for company email addresses with the format <mailbox>@example.com.
  2. The length of a mailbox name is between 2 to 64 ASCII alphabets and/or numbers.
  3. Valid email addresses can only start with ASCII alphabets but may contain a combination of alphabets and numbers in the mailbox name.
  4. Email addresses should be bounded only by non-alphanumeric characters.

Company email address example requirements

Go through the detailed steps to build out the custom GLASS data type (recommended), or jump straight to the recommended GLASS solution.

Part 1 - Email Domain

In Part 1, we use the WORD operator to build the expression that will match the domain (@example.com) in a company email address. The NOCASE option is defined as Requirement #1 indicates that the matches are expected to be case-insensitive.

WORD NOCASE '@example.com'

See WORD for more information.

Part 2 - Mailbox Name

In Part 2, we use the RANGE operator to define the mailbox name which is between 2 to 64 (TIMES 2-64) characters long. As the mailbox name can only contain ASCII alphabets and/or numbers, the preset keyword ALNUM is used.

RANGE ALNUM TIMES 2-64

However seeing as Requirement #3 states that valid email addresses must start with an ASCII alphabet, the GLASS expression is revised to first look for a single ASCII alphabet, followed by 1 to 63 ASCII alphabets and/or numbers.

(RANGE LETTER) THEN (RANGE ALNUM TIMES 1-63)

See RANGE, Preset Keywords, THEN and OR and Grouping Operator for more information.

Part 3 - Joining the Mailbox Name and Email Domain

In Part 3, we join the expressions in Part 2 and Part 1 to define a pattern that searches for the mailbox name, followed by the domain name in a company email address.

(RANGE LETTER) THEN (RANGE ALNUM TIMES 1-63) THEN \
WORD NOCASE '@example.com'

Part 4 - Boundary Rules

In Part 4, we address Requirement #4 by adding boundary rules for the whole expression. Using the BOUND operator, we can reduce the number of potential false positive matches by only reporting email address matches if they are surrounded by non-alphanumeric (NONALNUM) characters.

1
2
3
4
( \
  (RANGE LETTER) THEN (RANGE ALNUM TIMES 1-63) THEN \
  WORD NOCASE '@example.com' \
) BOUND NONALNUM

The outermost parentheses () on Line 1 and Line 4 are added so that the BOUND rule is applied to all expressions within the parentheses (Line 2 and Line 3).

Without the use of the outermost parentheses, the BOUND operator would only apply to the WORD expression that immediately preceded it.

See WORD, RANGE, Preset Keywords, THEN and OR, BOUND, and Grouping Operator for more information.

Match Samples

1
2
3
Employee1,employee1@example.com,Marketing
Customer Support: support@example.com
123@employee.com

Based on the GLASS expression in Part 4 - Boundary Rules, the email addresses underlined in line 1 and line 2 will be returned as match locations by the GLASS pattern matching engine.

The email address in line 3 will not be marked as a match as the mailbox name for valid email addresses cannot start with an ASCII number (1).