By default, matches are not limited by traditional word boundaries (e.g. whitespaces, new lines) and the GLASS engine matches data patterns anywhere in a string of data.
For example, if the search pattern is 123, all the following lines will be returned as match locations:
1 | 1123581321 |
2 | Phone No.: +65 6123 4567 |
3 | Amount Paid: $ 3,123 |
To reduce false positives and tighten the pattern matching rules, you can use the BOUND operator to define pattern boundaries for an expression (or group of expressions).
Pattern boundaries let you define the content that the must be found before (BOUND LEFT), after (BOUND RIGHT), or surrounding (BOUND) a search pattern (WORD, RANGE, or GROUP) for it to be a match.
The generic GLASS syntax for BOUND is:
<search pattern> BOUND [LEFT|RIGHT] <boundary pattern>
You can define preset or custom character ranges as boundary rules for a base pattern.
The GLASS reference language supports a list of predefined keywords that represent commonly used literal ranges. These preset keywords may be used in place of the literal ranges that they represent when defining a BOUND rule.
For example, the GLASS preset keyword NONDIGIT can be used to define a range of non-ASCII numerals for the BOUND rule.
<search pattern> BOUND [LEFT|RIGHT]
ACME Corporation wants to remove all data for a specific customer
from its storage systems.
To do so, ACME Corporation defines a basic data type to search
for the customer's unique regional ID number, SG0000137492
.
The BOUND pattern rule is specified so that
matches are only reported if non-alphanumeric characters (e.g. colon
:
, comma ,
,
whitespace
,
etc…) are detected on either side of the customer ID.
WORD 'SG0000137492' BOUND NONALNUM
Based on the WORD expression above, the GLASS pattern matching engine will return Line 1 and Line 2 as match locations:
1 | Customer ID: SG0000137492 |
2 | John Doe|SG0000137492|+65 9876 5432|john.doe@example.com |
3 | AB1234SG0000137492DE5678 |
However, Line 3 will not be returned as a match location. The data pattern
that resembles a customer ID number failed the
BOUND pattern rule as it is surrounded by
alphanumeric characters (4
and D
) on both sides.
The equivalent configuration in GLASS Studio Visual Builder mode is:
See WORD for more information.
You can specify a custom set of characters when defining a BOUND rule by clicking on Customized in the BOUND section of the base pattern form.
For example, define:
0
or 1
) that must be found to the left of
a search pattern.
<search pattern> BOUND LEFT '01'
:
, whitespace
, comma ,
)
that must be found on either side of a search pattern.
<search pattern> BOUND ' ,:'
,
) when specifying the
list of characters to match in a BOUND
rule.ACME Corporation wants to remove all data for a specific customer
from its storage systems.
To do so, ACME Corporation defines a basic data type to search
for the customer's unique regional ID number, UK0027738122
.
As ACME Corporation only wants to search for this customer ID
number in CSV files, the BOUND pattern rule is
specified so that matches are only reported if common delimiters for fields in
CSV files (e.g. comma ,
, semicolon ;
, tab \t
, whitespace
, pipe
|
) or newline characters (\n
) are detected on either side of the customer
ID.
WORD 'UK0027738122' BOUND ' ,;|\t\n'
Based on the WORD expression above, the GLASS pattern matching engine will return Line 1 and Line 2 as match locations:
1 | Customer ID: UK0027738122 |
2 | John Doe|UK0027738122|+65 9876 5432|john.doe@example.com |
3 | -UK0027738122- |
However, Line 3 will not be returned as a match location. The data pattern
that resembles a customer ID number is surrounded by the hyphen character (-
),
which fails the BOUND rule as this is not an
expected delimiter.
The equivalent configuration in GLASS Studio Visual Builder mode is:
See WORD for more information.
To add a BOUND rule for a base pattern in GLASS Studio Visual Builder Mode:
WORD <data pattern> BOUND LEFT <Preset or custom range>
RANGE <range of characters> BOUND LEFT <Preset or custom range>
GROUP <namespace> BOUND LEFT <Preset or custom range>
WORD <data pattern> BOUND RIGHT <Preset or custom range>
RANGE <range of characters> BOUND RIGHT <Preset or custom range>
GROUP <namespace> BOUND RIGHT <Preset or custom range>
If the same boundary rule is defined for both sides of the search pattern, GLASS Studio automatically combines this into a single rule.
WORD <data pattern> BOUND <Preset or custom range>
RANGE <range of characters> BOUND <Preset or custom range>
GROUP <namespace> BOUND <Preset or custom range>
For example, the WORD component configured in Line 1 will be rendered into a single BOUND rule as shown in Line 2.
1 | WORD 'SG0000137492' BOUND LEFT NONALNUM BOUND RIGHT NONALNUM |
2 | WORD 'SG0000137492' BOUND NONALNUM |
To edit a BOUND rule:
To remove a BOUND rule: