The following terms are used through out this document:
An octet is a digital unit that represents a sequence of 8 bits. For
example, the octet value
00110000
represents
the ASCII digit
0
.
The following conventions are used when describing the GLASS syntax:
< >
) indicate a placeholder for a required parameter. This
value must be provided when defining a GLASS expression.[ ]
) indicate an optional parameter. If omitted, the
GLASS expression uses the default behaviour for the
operator.GLASS is the language used internally by the GLASS engine to accurately and efficiently detect sensitive data.
By default, the GLASS language is Unicode agnostic and the engine operates at the octet level unless specified otherwise. This means any expression that maps to a UTF-8 encoded stream will match the required octet / byte sequences if they are present in the input stream (e.g. input data you are trying to scan).
This property of the GLASS language means that the input GLASS expression can be specified in the native language of the custom data pattern that the author of the expression is trying to match.
The word world
in Chinese is 世界
. The GLASS
expression to search for the phrase Hello, world!
in Chinese can be written
as a:
WORD 'Hello 世界!'
WORD 'Hello \xE4\xB8\x96\xE7\x95\x8C!'
Both the expressions above will match the octet sequence (shown as a hex dump) below:
00000000: 48 65 6C 6C 6F 20 E4 B8 96 E7 95 8C 21 0A Hello ......!.
GLASS expressions are written using a combination of operators and values, to search for specific sequences of data.
All GLASS expressions must follow these basic rules.
\
character.\
character is the last character on a line, the
GLASS compiler treats the following line as part of
the expression on the previous line.The example below forms a single expression:
# MAP namespaces can be written across multiple lines for readability.
MAP NOCASE 'ACME_CUST_ID_CONTEXTS' \
'cust id', 'custid', 'customer', 'client', 'cliente', 'kunde', '고객'
# Long GLASS expressions can also be split into multi-line expressions.
GROUP 'ACME_CUST_ID_CCTLD' THEN \
(RANGE DIGIT TIMES 9 EXCLUDE 'INVALID_SERIAL_NUM') THEN \
RANGE DIGIT
Operators and values are separated by one or more blank spaces.
#
character. Any character(s) after the
#
sign until the end of the line will be ignored by the
GLASS compiler.
# This is a comment.
WORD 'ID' # All text after the hash symbol will be ignored by the compiler.
Values are string literals or integers that, when used with the appropriate operators, function as search terms. Preset Keywords may be used in place of the literal ranges that they represent anywhere in a GLASS pattern or expression.
Values that are enclosed in single (''
) or double quotes (""
) are
processed as string literals.
# The RANGE search term ('ABC') is enclosed in single quotes.
# The value (1-3) passed to the TIMES operator does not need to be enclosed in
# single / double quotes.
RANGE 'ABC' TIMES 1-3
# In the namespace NS0 below, the first key will be processed as the integer
# value 1, while the second key (enclosed in single quotes) will be processed
# as the literal string 00_01.
MAP 'NS0' 00_01, '00_01'
# Preset keywords can be used in place of the equivalent literal ranges.
RANGE ALNUM
RANGE '0-9a-zA-Z'
Integers in the GLASS grammar are ASCII digits in the
inclusive range of 0-9
.
Optionally, you can separate the digits using the underscore (_
) character
after the first digit for readability.
For example, the integers from Line 1 to Line 6 are equivalent. All lines are
processed by the GLASS parser as 12345
.
1 | 12345 |
2 | 1_2_3_4_5 |
3 | 12_3_45 |
4 | 12345_ |
5 | 1_2_3_4_5_ |
When integers with leading zeros are processed by the GLASS parser, the resulting value is equivalent to the numeric value of the integer.
For example, the integers from Line 1 to Line 3 are equivalent. All lines are
processed by the GLASS parser as 1
.
1 | 1 |
2 | 01 |
3 | 00_01 |
Certain operators have support for both positive and negative integers.
Negative integers can be defined by prepending the minus sign (-
) or ASCII
character 0x2D. By default, integers are positive unless the minus sign is
defined.