XYZ National ID

Data Type Requirements

A 13-digit ID number that is constructed as follows:

  • The year of birth is expressed by four digits.
    The year of birth is limited to the range 1900-2029 to reduce false positives.
  • The month of birth expressed in two digits.
    This value is set to 00 if month of birth is unknown.
  • The day of birth expressed by two digits.
    This value is set to 00 if day of birth is unknown.
  • A unique sequential range expressed by three digits.
  • A control number (digit #12) calculated according to the Luhn checksum algorithm.
  • A control number (digit #13) calculated according to the Luhn checksum algorithm.
  • ID number matches can only be bounded by non-alphanumeric characters.

XYZ National ID example requirements

Go through the detailed steps to build out the custom GLASS data type (recommended), or jump straight to the recommended GLASS solution.

Part 1 - The Year, Month, and Day in the XYZ National ID

We start by defining the 4-digit year, 2-digit month, and 2-digit day of the month in the XYZ national ID.

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

Defining an ALIAS does not do anything useful, so we use the REFER keyword to reference them.

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY'

See ALIAS and REFER for more information.

Data Type Naming Convention

Let's take a look at how we have named the ALIASes.

ALIAS 'XYZ_ID_YEAR'
ALIAS 'XYZ_ID_MONTH'
ALIAS 'XYZ_ID_DAY'

XYZ is a top level domain (TLD) for the country in question. And ID is the generic type of data that we are looking here.

Most relevant expressions for this data type will start with XYZ_ID. This makes it easier to group all relevant expressions together and minimize / avoid similar identifiers / labels with existing data types or expressions.

Part 2 - Adding the Unique Serial Number and Check Digit

Next, we add on the expression to search for the 3-digit unique sequential number and the check digit.

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

REFER 'XYZ_ID_YEAR' THEN \
REFER 'XYZ_ID_MONTH' THEN \
REFER 'XYZ_ID_DAY' THEN \
# 3-digit unique sequential number + the check digit.
RANGE DIGIT TIMES 4

Part 3 - Adding the Checksum Algorithm and Left Boundary Rules

In Part 3, we add on the boundary rules to the left of the string, and verify the validity of the current 12-digit string using the Luhn checksum algorithm.

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

( \
  REFER 'XYZ_ID_YEAR' THEN \
  REFER 'XYZ_ID_MONTH' THEN \
  REFER 'XYZ_ID_DAY' THEN \
  # 3-digit unique sequential number + the check digit.
  RANGE DIGIT TIMES 4 \
) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM'

Part 4 - Adding the Checksum Algorithm and Right Boundary Rules

In Part 4, we search for the last digit (position 13) of the XYZ national ID and add on the boundary rules to the right of the string 13-digit string.

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

( \
  ( \
    REFER 'XYZ_ID_YEAR' THEN \
    REFER 'XYZ_ID_MONTH' THEN \
    REFER 'XYZ_ID_DAY' THEN \
    # 3-digit unique sequential number + the check digit.
    RANGE DIGIT TIMES 4 \
  ) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
) THEN RANGE DIGIT BOUND RIGHT NONALNUM

Part 5 - Adding the Final Checksum Algorithm

In Part 5, we execute the Luhn checksum algorithm to verify the validity of the 13-digit string.

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

( \
  ( \
    ( \
      REFER 'XYZ_ID_YEAR' THEN \
      REFER 'XYZ_ID_MONTH' THEN \
      REFER 'XYZ_ID_DAY' THEN \
      # 3-digit unique sequential number + the check digit.
      RANGE DIGIT TIMES 4 \
    ) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
  ) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
) CHECK 'LUHNCHKSUM'

Match Samples

Based on the GLASS expression in Part 5 - Adding the Final Checksum Algorithm, the 13-digit strings in line 1 and line 2 will be returned as XYZ national ID matches by the GLASS pattern matching engine.

1
2
2001031441237
1986002312394

Challenge 1 - XYZ National ID with Separators

After a bit more research, you learn that XYZ national ID numbers are expected to be in various formats:

  1. 13 consecutive numbers.
  2. 4 digits, a whitespace  , 2 digits, a whitespace  , 2 digits, a whitespace  , and finally 5 digits (e.g. "YYYY MM DD SSSCC").
  3. 4 digits, a hyphen -, 2 digits, a hyphen -, 2 digits, a hyphen -, and finally 5 digits (e.g. "YYYY-MM-DD-SSSCC").

Example GLASS Code

To search for all three XYZ national ID number formats, we declare three separate ALIASes and refer to them as part of the main GLASS expression.

SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 BEFORE WORD '-'
SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 AFTER WORD '-'
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 BEFORE WORD ' ' THEN RANGE DIGIT
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 AFTER WORD ' ' THEN RANGE DIGIT

# 4-digit year from 1900-2029.
ALIAS 'XYZ_ID_YEAR' \
  (WORD '19' THEN RANGE DIGIT TIMES 2) OR \     # Year 1900-1999.
  (WORD '20' THEN RANGE '0-2' THEN RANGE DIGIT) # Year 2000-2029.

# 2-digit month of year, including 00.
ALIAS 'XYZ_ID_MONTH' \
  (WORD '0' THEN RANGE DIGIT) OR \ # Month 00-09.
  (WORD '1' THEN RANGE '0-2')      # Month 10, 11 or 12.

# 2-digit day of month, including 00.
ALIAS 'XYZ_ID_DAY' \
  (RANGE '0-2' THEN RANGE DIGIT) OR \ # Day of month 00-29.
  (WORD '3' THEN RANGE '01')          # Day of month 30-31.

# Search for 13-digit straight numbers.
ALIAS 'XYZ_ID_STRAIGHT' \
  ( \
    ( \
      ( \
        REFER 'XYZ_ID_YEAR' THEN \
        REFER 'XYZ_ID_MONTH' THEN \
        REFER 'XYZ_ID_DAY' THEN \
        RANGE DIGIT TIMES 4 \
      ) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
    ) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
  ) CHECK 'LUHNCHKSUM'

# Search for 13-digit numbers in groups of 4 2 2 5 digits separated by
# whitespace characters.
ALIAS 'XYZ_ID_SPACE' \
  ( \
    ( \
      ( \
      	REFER 'XYZ_ID_YEAR' THEN RANGE ' \t' TIMES 1-2 THEN \
      	REFER 'XYZ_ID_MONTH' THEN RANGE ' \t' TIMES 1-2 THEN \
      	REFER 'XYZ_ID_DAY' THEN RANGE ' \t' TIMES 1-2 THEN \
      	RANGE DIGIT TIMES 4 \
      ) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
    ) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
  ) CHECK 'LUHNCHKSUM' RANK 'EXCLUDE_ADJACENT_SPACE_DIGIT'

# Search for 13-digit numbers in groups of 4 2 2 5 digits separated by a
# hyphen character.
ALIAS 'XYZ_ID_HYPHEN' \
  ( \
    ( \
      ( \
      	REFER 'XYZ_ID_YEAR' THEN WORD '-' THEN \
      	REFER 'XYZ_ID_MONTH' THEN WORD '-' THEN \
      	REFER 'XYZ_ID_DAY' THEN WORD '-' THEN \
      	RANGE DIGIT TIMES 4 \
      ) BOUND LEFT NONALNUM CHECK 'LUHNCHKSUM' \
    ) THEN RANGE DIGIT BOUND RIGHT NONALNUM \
  ) CHECK 'LUHNCHKSUM' RANK 'EXCLUDE_ADJACENT_DASH_ANY'

# Create an ALIAS that includes all three XYZ national ID number formats.
ALIAS 'NID_XYZ_ID' \
  REFER 'XYZ_ID_STRAIGHT' OR REFER 'XYZ_ID_SPACE' OR REFER 'XYZ_ID_HYPHEN'

# Define a unique identifier for the custom data type.
LABEL 'NID_XYZ_ID'
# Reference all three XYZ national ID number formats.
REFER 'NID_XYZ_ID'

Breaking Down the GLASS Code

The example GLASS code is built on top of the recommended GLASS solution in Part 5.

ALIAS for Different XYZ National ID Number Formats

For readability, an ALIAS is created for each 13-digit XYZ national ID number format. These ALIASes are then joined together with the OR connector so that the GLASS pattern searches for all three formats.

Assigning a Unique Identifier for the Data Type

The final GLASS expression is assigned a unique identifier using the LABEL operator (LABEL 'NID_XYZ_ID'). Any matches in Enterprise Recon for the XYZ national ID data type will be reported under this label.

Scoring Rules for National ID Numbers with Separators

The following rules are added for the XYZ national ID number formats with whitespace and hyphen separators.

SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 BEFORE WORD '-'
SCORE 'EXCLUDE_ADJACENT_DASH_ANY' -10 AFTER WORD '-'
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 BEFORE WORD ' ' THEN RANGE DIGIT
SCORE 'EXCLUDE_ADJACENT_SPACE_DIGIT' -10 AFTER WORD ' ' THEN RANGE DIGIT

The RANK and SCORE rules are added to strengthen the main GLASS rules by excluding matches that have either:

  • a whitespace   or tab \t character, followed by a digit, or
  • a hyphen - character

immediately before or after a potential 13-digit XYZ national ID number. For example:

1
2
0 2001 03 14 41237 0
0-2001-03-14-41237-0

Match Samples

1
2
3
2001031441237
2001 03 14 41237
2001-03-14-41237