A 13-digit ID number that is constructed as follows:
00
if month of birth is unknown.00
if day of birth is unknown.We start by defining the 4-digit year, 2-digit month, and 2-digit day of the month in the XYZ national ID.
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
Defining an ALIAS does not do anything useful, so we use the REFER keyword to reference them.
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY'
See ALIAS and REFER for more information.
Let's take a look at how we have named the ALIASes.
ALIAS 'XYZ_ID_YEAR'
ALIAS 'XYZ_ID_MONTH'
ALIAS 'XYZ_ID_DAY'
XYZ is a top level domain (TLD) for the country in question. And ID is the generic type of data that we are looking here.
Most relevant expressions for this data type will start with XYZ_ID. This makes it easier to group all relevant expressions together and minimize / avoid similar identifiers / labels with existing data types or expressions.
Next, we add on the expression to search for the 3-digit unique sequential number and the check digit.
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY' THEN \
# 3-digit unique sequential number + the check digit.
RANGE DIGIT TIMES 4
In Part 3, we add on the boundary rules to the left of the string, and verify the validity of the current 12-digit string using the Luhn checksum algorithm.
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
( \
REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY' THEN \
# 3-digit unique sequential number + the check digit.
RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM'
In Part 4, we search for the last digit (position 13) of the XYZ national ID and add on the boundary rules to the right of the string 13-digit string.
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
( \
( \
REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY' THEN \
# 3-digit unique sequential number + the check digit.
RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
) THEN RANGE DIGIT BOUND RIGHT NONALNUM
In Part 5, we execute the Luhn checksum algorithm to verify the validity of the 13-digit string.
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
( \
( \
( \
REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY' THEN \
# 3-digit unique sequential number + the check digit.
RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
) CHECK 'LUHNCHKSUM'
Based on the GLASS expression in Part 5 - Adding the Final Checksum Algorithm, the 13-digit strings in line 1 and line 2 will be returned as XYZ national ID matches by the GLASS pattern matching engine.
1 2 |
2001031441237 1986002312394 |
After a bit more research, you learn that XYZ national ID numbers are expected to be in various formats:
, 2 digits, a whitespace
, 2
digits, a whitespace
, and finally 5 digits (e.g.
"YYYY MM DD SSSCC").-
, 2 digits, a hyphen -
, 2 digits, a hyphen -
, and
finally 5 digits (e.g. "YYYY-MM-DD-SSSCC").To search for all three XYZ national ID number formats, we declare three separate ALIASes and refer to them as part of the main GLASS expression.
SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 BEFORE WORD '-'
SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 AFTER WORD '-'
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 BEFORE WORD ' ' THEN RANGE DIGIT
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 AFTER WORD ' ' THEN RANGE DIGIT
# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
(WORD '19' THEN RANGE DIGIT TIMES 2) OR \ # Year 1900-1999.
(WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.
# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
(WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
(WORD '1' THEN RANGE '0-2') # Month 10, 11 or 12.
# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
(RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
(WORD '3' THEN RANGE '01') # Day of month 30-31.
# Search for 13-digit straight numbers.
ALIAS 'XYZ_ID_STRAIGHT' \
( \
( \
( \
REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY' THEN \
RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
) CHECK 'LUHNCHKSUM'
# Search for 13-digit numbers in groups of 4 2 2 5 digits separated by
# whitespace characters.
ALIAS 'XYZ_ID_SPACE' \
( \
( \
( \
REFER 'XYZ_ID_YEAR' THEN RANGE ' \t' TIMES 1-2 THEN \
REFER 'XYZ_ID_MONTH' THEN RANGE ' \t' TIMES 1-2 THEN \
REFER 'XYZ_ID_DAY' THEN RANGE ' \t' TIMES 1-2 THEN \
RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
) CHECK 'LUHNCHKSUM' RANK 'EXCLUDE_ADJACENT_SPACE_DIGIT'
# Search for 13-digit numbers in groups of 4 2 2 5 digits separated by a
# hyphen character.
ALIAS 'XYZ_ID_HYPHEN' \
( \
( \
( \
REFER 'XYZ_ID_YEAR' THEN WORD '-' THEN \
REFER 'XYZ_ID_MONTH' THEN WORD '-' THEN \
REFER 'XYZ_ID_DAY' THEN WORD '-' THEN \
RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
) CHECK 'LUHNCHKSUM' RANK 'EXCLUDE_ADJACENT_DASH_ANY'
# Create an ALIAS that includes all three XYZ national ID number formats.
ALIAS 'NID_XYZ_ID' \
REFER 'XYZ_ID_STRAIGHT' OR REFER 'XYZ_ID_SPACE' OR REFER 'XYZ_ID_HYPHEN'
# Define a unique identifier for the custom data type.
LABEL 'NID_XYZ_ID'
# Reference all three XYZ national ID number formats.
REFER 'NID_XYZ_ID'
The example GLASS code is built on top of the recommended GLASS solution in Part 5.
For readability, an ALIAS is created for each 13-digit XYZ national ID number format. These ALIASes are then joined together with the OR connector so that the GLASS pattern searches for all three formats.
The final GLASS expression is assigned a unique identifier using the LABEL operator (LABEL 'NID_XYZ_ID'). Any matches in Enterprise Recon for the XYZ national ID data type will be reported under this label.
The following rules are added for the XYZ national ID number formats with whitespace and hyphen separators.
SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 BEFORE WORD '-'
SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 AFTER WORD '-'
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 BEFORE WORD ' ' THEN RANGE DIGIT
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 AFTER WORD ' ' THEN RANGE DIGIT
The RANK and SCORE rules are added to strengthen the main GLASS rules by excluding matches that have either:
or tab \t
character, followed by a digit, or-
characterimmediately before or after a potential 13-digit XYZ national ID number. For example:
1 2 |
0 2001 03 14 41237 0 0-2001-03-14-41237-0 |
1 2 3 |
2001031441237 2001 03 14 41237 2001-03-14-41237 |