Overview

Terminology

The following terms are used through out this document:

  • GLASS scanning engine and engine can be used interchangeably.
  • GLASS expressions, custom expressions, and expressions can all be used interchangeably.
  • Pattern is a valid combination of expressions.
  • Byte and octet have the same meaning and represent 8 bits.
What is an "octet"?

An octet is a digital unit that represents a sequence of 8 bits. For example, the octet value 00110000 represents the ASCII digit 0.

Syntax Notation Conventions

The following conventions are used when describing the GLASS syntax:

  • Angle brackets (< >) indicate a placeholder for a required parameter. This value must be provided when defining a GLASS expression.
  • Square brackets ([ ]) indicate an optional parameter. If omitted, the GLASS expression uses the default behaviour for the operator.

GLASS Language

GLASS is the language used internally by the GLASS engine to accurately and efficiently detect sensitive data.

By default, the GLASS language is Unicode agnostic and the engine operates at the octet level unless specified otherwise. This means any expression that maps to a UTF-8 encoded stream will match the required octet / byte sequences if they are present in the input stream (e.g. input data you are trying to scan).

This property of the GLASS language means that the input GLASS expression can be specified in the native language of the custom data pattern that the author of the expression is trying to match.

Example

The word world in Chinese is 世界. The GLASS expression to search for the phrase Hello, world! in Chinese can be written as a:

  • UFT-8 encoded expression, or
    WORD 'Hello 世界!'
    
  • UTF-8 encoded octet sequence (code units).
    WORD 'Hello \xE4\xB8\x96\xE7\x95\x8C!'
    

Both the expressions above will match the octet sequence (shown as a hex dump) below:

00000000: 48 65 6C 6C 6F 20 E4 B8  96 E7 95 8C 21 0A        Hello ......!.

GLASS Grammar and Syntax

GLASS expressions are written using a combination of operators and values, to search for specific sequences of data.

  • Operators are keywords that instruct the scanning engine to perform a given function. For example, WORD, TIMES, EXCLUDE.
  • Values are string literals that describe what data to look for. For example, a-z, John Doe.

GLASS Expressions

All GLASS expressions must follow these basic rules.

  1. An expression is a combination of operators and values. Each expression must start on a new line.
    • For readability, a single expression can be split across multiple lines by ending a line with a backslash \ character.
    • If the backslash \ character is the last character on a line, the GLASS compiler treats the following line as part of the expression on the previous line.

    The example below forms a single expression:

    # MAP namespaces can be written across multiple lines for readability.
    MAP NOCASE 'ACME_CUST_ID_CONTEXTS' \
      'cust id', 'custid', 'customer', 'client', 'cliente', 'kunde', '고객'
    
    # Long GLASS expressions can also be split into multi-line expressions.
    GROUP 'ACME_CUST_ID_CCTLD' THEN \
    (RANGE DIGIT TIMES 9 EXCLUDE 'INVALID_SERIAL_NUM') THEN \
    RANGE DIGIT
    
  2. Operators and values are separated by one or more blank spaces.

  3. Comments can be added to GLASS patterns to explain the implementation and to improve the readability of the code.
    • Start a comment with the hash # character. Any character(s) after the # sign until the end of the line will be ignored by the GLASS compiler.
      # This is a comment.
      WORD 'ID' # All text after the hash symbol will be ignored by the compiler.
      
  4. Blank lines are ignored by the compiler.

Values

Values are string literals or integers that, when used with the appropriate operators, function as search terms. Preset Keywords may be used in place of the literal ranges that they represent anywhere in a GLASS pattern or expression.

Values that are enclosed in single ('') or double quotes ("") are processed as string literals.

# The RANGE search term ('ABC') is enclosed in single quotes.
# The value (1-3) passed to the TIMES operator does not need to be enclosed in
# single / double quotes.
RANGE 'ABC' TIMES 1-3

# In the namespace NS0 below, the first key will be processed as the integer
# value 1, while the second key (enclosed in single quotes) will be processed
# as the literal string 00_01.
MAP 'NS0' 00_01, '00_01'

# Preset keywords can be used in place of the equivalent literal ranges.
RANGE ALNUM
RANGE '0-9a-zA-Z'

Integers

Integers in the GLASS grammar are ASCII digits in the inclusive range of 0-9.

Optionally, you can separate the digits using the underscore (_) character after the first digit for readability.

For example, the integers from Line 1 to Line 6 are equivalent. All lines are processed by the GLASS parser as 12345.

1 12345
2 1_2_3_4_5
3 12_3_45
4 12345_
5 1_2_3_4_5_

Leading Zeros

When integers with leading zeros are processed by the GLASS parser, the resulting value is equivalent to the numeric value of the integer.

For example, the integers from Line 1 to Line 3 are equivalent. All lines are processed by the GLASS parser as 1.

1 1
2 01
3 00_01

Signed Integers

Certain operators have support for both positive and negative integers. Negative integers can be defined by prepending the minus sign (-) or ASCII character 0x2D. By default, integers are positive unless the minus sign is defined.