A GLASS data type generally consists of several building blocks. For example, the ACME Corporation Customer ID data type is constructed from:
To build a data type, you start by defining the format of the data pattern you want to search for.
Take for example the regional and worldwide customer ID for ACME Corporation.
A valid worldwide customer ID number starts with
a constant WW
prefix, followed by
a 6-digit short year (<YYMMDD>
),
a 5-digit serial number (<5-digit serial number>
), and
a single check digit (<check digit>
).
A valid regional customer ID starts with
a supported 2-character ccTLD (<ccTLD>
),
followed by a 9-digit serial number (<9-digit serial number>
), and
a single check digit (<check digit>
).
To identify both types of customer ID numbers, ACME Corporation uses a combination of the following base pattern components (or keywords) to define the custom GLASS data type.
Use the WORD operator to search for a specific data pattern. A location will be returned as a match if the data pattern is found in the location.
For example, search for the constant prefix WW
in a worldwide customer
ID.
WORD 'WW'
1 | Client data: WW301231018313 |
See WORD for more information.
Use the GROUP (LIST) operator to search for any element from a set of words (or data patterns) that are defined in a MAP namespace. A location will be marked as a match if any of the data patterns defined in the namespace is found in the location.
For example,
ACME Corporation defines a
MAP namespace for the accepted ccTLD values
(AU
, IE
, KR
, SG
, UK
, US
) in a regional
customer ID and searches for these values by referencing the namespace in the
GROUP operator.
MAP 'ACME_CUST_ID_CCTLD' 'AU', 'IE', 'KR', 'SG', 'UK', 'US'
GROUP 'ACME_CUST_ID_CCTLD'
1 | Customer ID: US1003992835 |
2 | Client data: AU2648761235 |
3 | John Doe|SG0000137492|+65 9876 5432|john.doe@example.com |
See LIST and MAP for more information.
Use the RANGE operator to search for N number (TIMES) of characters from a specific set of characters. A location will be marked as a match if N characters from the defined range are found in the location.
For example, search for a range of possible 9-digit numbers (000000001
to 999999999
) that represent the serial number in a regional customer
ID.
RANGE DIGIT TIMES 9
1 | Customer ID: IE4871209841 |
2 | Client data: UK0027738122 |
See RANGE and Preset Range Keywords for more information.
Base pattern components (or GLASS expressions) can be joined using THEN and OR connectors. THEN and OR connectors tell the GLASS engine if the base patterns should be detected sequentially, or if either base pattern should be detected for a location to be reported as a match.
You can define various pattern rules to tighten the criteria for a data pattern to be a match.
The following pattern rules were used to define the relaxed version of ACME Corporation's customer ID data type.
Pattern boundaries let you define the content that the must be found before (BOUND LEFT), after (BOUND RIGHT), or surrounding (BOUND) a search pattern (WORD, RANGE, or GROUP) for it to be a match.
For example, the GLASS engine detects a
12-character string that matches the format of ACME Corporation's regional
customer ID number.
By specifying the BOUND pattern rule, the
GLASS engine only reports the 12-character string as a
regional customer ID match if it is bounded by non-alphanumeric (e.g.
colon :
, whitespace
, comma ,
)
characters on each side.
(<Customer ID GLASS pattern>) BOUND NONALNUM
1 | Customer ID: AU2648761235 |
2 | Client data (US1003992835) |
3 | John Doe|SG0000137492|+65 9876 5432|john.doe@example.com |
4 | AB1234SG0000137492DE5678 |
Line 4 will not be reported as a match as the string that appears to be a
customer ID number SG0000137492
is bounded by alphanumeric characters on both
sides.
See BOUND and Preset Range Keywords for more information.
Applying a REQUIRE rule to a base pattern instructs the GLASS engine to report a match only if the base pattern is explicitly represented in the selected MAP namespace(s).
For example,
ACME Corporation defines a general base pattern to
search for 2-digit strings that represent the month <MM>
component in the
worldwide customer ID.
RANGE DIGIT TIMES 2
Since DIGIT represents all integers
between 0
to 9
, the GLASS pattern above will
match any 2-digit string from 00
to 99
.
As there are only 12 valid months in a year, the
REQUIRE pattern rule is applied so that only
a specific range of 2-digit numbers (e.g. 01
to 12
) that are defined
in a given MAP namespace
(MONTH_OF_YEAR_LOOKUP) are returned as a
match.
MAP 'MONTH_OF_YEAR_LOOKUP' 1-12
RANGE DIGIT TIMES 2 REQUIRE 'MONTH_OF_YEAR_LOOKUP'
See REQUIRE for more information.
Applying an EXCLUDE rule to a base pattern instructs the GLASS engine to exclude a pattern from being reported as a match if it is represented in the selected MAP namespace(s).
For example, ACME Corporation defines a general base pattern to search for 9-digit strings that represent the serial number in the regional customer ID.
RANGE DIGIT TIMES 9
Since DIGIT represents all integers
between 0
to 9
, the GLASS pattern above will
match any 9-digit string from 000000000
to 999999999
.
As 000000000
is not a valid serial number, the
EXCLUDE pattern rule is applied so that
000000000
is excluded as a match.
MAP 'INVALID_SERIAL_NUM' 0
RANGE DIGIT TIMES 9 EXCLUDE 'INVALID_SERIAL_NUM'
See EXCLUDE for more information.
Use the CHECK rule to instruct the GLASS engine to run each potential match through a specific algorithm as a form of validation to reduce or eliminate false positive matches. A location is only returned as a match if it passes the checksum module that is applied to the GLASS expression(s).
For example, the PASSPORTMOD10 algorithm should arrive at the same value as the check digit in ACME Corporation's customer ID to pass the validity test.
(<Customer ID GLASS pattern>) CHECK 'PASSPORTMOD10'
See CHECK for more information on supported CHECK modules.
Contextual matching (CONTEXT and APPLY) is an efficient way to specify a set of contextual keywords which when present, determines whether the GLASS engine reports or ignores a potential match.
For example, the GLASS engine detects a 12-character
string (US1003992835
) that matches the format of
ACME Corporation's regional customer ID number and passes the
Check Algorithm validity test.
By specifying the CONTEXT rule, US1003992835
is only considered as a match if the GLASS engine
detects keywords such as customer or
client within the proximity of the detected
regional customer ID number.
MAP 'ACME_CUST_ID_CONTEXTS' 'cust id', 'custid', 'customer', 'client', 'cliente', 'kunde', '고객'
((<Customer ID GLASS pattern>) CHECK 'PASSPORTMOD10') APPLY 'ACME_CUST_ID_CONTEXTS'
1 | Customer ID: US1003992835 |
2 | Client data (US1003992835) |
3 | There needs to be at least one contextual keyword within the proximity of the string US1003992835 for it to be reported as a match. |
See CONTEXT and APPLY for more information.