RANK and SCORE

Overview of the RANK and SCORE Operators

Every GLASS rule / expression has an implicit score of zero associated with it.

When an input stream is matched against a rule, the GLASS engine will inspect the rule's score before deciding whether to generate a matching report or not.

For the match report to be generated, a rule's score has to be greater than or equal to zero (≥0). A negative score (<0) will stop the GLASS engine from generating a matching report for that rule.

This scoring system assists in eliminating false and otherwise undesirable matches from being reported.

The GLASS language has two operators that allow the expression writer to influence the score for the rule: RANK and SCORE. Both RANK and SCORE operators work with the concept of namespaces. These namespaces are distinct from the mapping operator namespaces (MAP) and are used to define a name for a set of scores.

The generic GLASS syntax for RANK and SCORE is:

SCORE '<namespace>' <integer> <BEFORE|AFTER|INSIDE> <expression or pattern>

<expression or pattern> RANK <score modifier> ['<namespace>']

SCORE Operator

SCORE '<namespace>' <integer> <BEFORE|AFTER|INSIDE> <expression or pattern>
...

The SCORE operator either (i) creates a new scoring namespace, or (ii) appends to an existing scoring namespace if it had been previously defined.

Once a scoring namespace is defined, a rule can be associated with that score using the RANK operator. A rule can be associated with multiple scoring namespaces at the same time. Each namespace associated with the rule either increments (+) or decrements (i) the score associated with the rule.

If the <expression or pattern> is successfully matched, then the <integer> will be added to the rule's score. The <integer> can be a negative value in order to reduce the rule's score.

Understanding the SCORE Operator

There are 3 variations of the SCORE operator:

GLASS SCORE Rule Before After Within

  1. BEFORE
    The <expression or pattern> must match immediately preceding the section of the input stream that has matched a rule, unless WITHIN operator is specified.

  2. AFTER
    The <expression or pattern> must match immediately after the section of the input stream that has matched a rule, unless WITHIN operator is specified.

  3. INSIDE
    The <expression or pattern> will be attempted to be matched within the section of the input stream that has matched a rule. In order for a successful match to be generated, the <expression or pattern> must match the input stream in its entirety. If the <expression or pattern>needs to match in the middle of the pattern being matched against, it needs to be padded appropriately using either RANGE with TIMES operator, or the WITHIN operator.

RANK and SCORE Example 1

SCORE 'SN0' +1 BEFORE \
  WORD 'Foo' THEN WORD 'Bar' THEN WORD 'Baz'

WORD 'Hello, World!' RANK -1 'SN0'

Based on the above GLASS expression, only line 1 below will be returned by the engine as a match, with a final score of 0.

1
2
BazBarFooHello, World!
FooBarBazHello, World!

Understanding RANK and SCORE Example 1

In Example 1 above, the basic expression / rule (WORD 'Hello, World!') starts with a score of 0. The score is adjusted to -1 (score = 0 - 1 = -1) with the RANK operator, and associated with the scoring namespace SN0 (...RANK -1 'SN0').

The GLASS rule is then evaluated against the scoring namespace SN0, which uses the BEFORE variation of the SCORE rule to increase the score by 1 if the terms Foo, Bar, and Baz are found to the left of the main rule (score = 0 - 1 + 1 = 0), in reverse (BazBarFoo<main expression / rule>).

RANK and SCORE Example 2

Since the BEFORE variant of the SCORE operator matches in reverse, BOUND rules also have to be applied in reverse.

SCORE 'SN1' +1 BEFORE \
  WORD 'foo' BOUND RIGHT '<'
SCORE 'SN1' +1 AFTER \
  WORD 'baz' BOUND RIGHT '>'

WORD 'bar' RANK -2 'SN1'

Based on the above GLASS expression, only line 1 below will be returned by the engine as a match, with a final score of 0 (score = -2 + 1 + 1 = 0).

1
2
3
<foobarbaz>
-foobarbaz-
<bazbarfoo>

RANK Operator

...
<expression or pattern> RANK <score modifier> '<namespace>'
<expression or pattern> RANK '<namespace>'

The RANK operator associates one or more scoring namespaces with a rule, and optionally sets the score modifiers for the final score for a rule.

If the <score modifier> is specified, then that score is added to the final score for the rule after all the matching SCORE expressions for a scoring namespace have been evaluated.

If a rule is associated with more than one scoring namespace, then the final score for the rule is the sum of all the RANK score modifiers.

A RANK specified without a <score modifier> is equivalent to a RANK with the <score modifier> of zero.

RANK and SCORE Example 3

<scoring rules>

WORD 'Marvel Avengers' RANK +5 'SN_Hawkeye' RANK -20 'SN_Superman' RANK 10 'SN_IronMan'

Without defining any scoring rules, the initial score for Example 3 is:

0 + 5 - 20 + 10 = -5

RANK and SCORE Example 4

SCORE 'SN2' +10 INSIDE WORD 'bar' WITHIN 20 BYTES                                                             
(WORD 'foo' THEN WORD 'baz' WITHIN 20 BYTES) RANK -10 'SN2'

Based on the above GLASS expression, only line 1 and line 3 below will be returned by the engine as a match, with a final score of 0.

Line 5 is not reported as the word bar cannot be found inside the main GLASS rule, and subsequently fails the scoring rule in SN2.

In line 7, the word baz cannot be found within 20 bytes of the word foo. This fails the main GLASS rule, and the scoring rule will not be evaluated.

1
2
3
4
5
6
7
foobarbaz
...Filler string of more than 20 bytes...
foo  bar  baz
...Filler string of more than 20 bytes...
foobaz
...Filler string of more than 20 bytes...
foo         bar         baz

Remapping Functions for SCORE and RANK

A namespace is specified as a literal that matches the exact namespace created with the SCORE operator, as well as up to 15 remapping functions.

SCORE '<namespace>' <integer> <BEFORE|AFTER|INSIDE> <expression or pattern>

<expression or pattern> RANK <score modifier> '<namespace>(%<remap from integer>:<remap to integer>=<remap value integer>){1-15}'

A remapping function specifies an inclusive integer range that maps a range of scores to a fixed value. The remapping functions are applied after all of the SCORE expressions for a given namespace have been evaluated. The remapping function ranges can overlap, and the last matching range takes precedence.

The remapping functions for a scoring namespace will still execute even if no SCORE expression has matched. This allows for a possibility to adjust the zero score on a per-scoring-namespace basis.

RANK and SCORE Example 5

# Assume that the namespace 'SN0' has been previously defined.
WORD 'foo' RANK 'SN0%-100:-10=10%10:100=-10'

The RANK operator specifies that for a SCORE associated with the scoring namespace SN0,

  • Any final score that is in the inclusive range [-100:-10] will be converted to 10, and
  • Any score in the inclusive range [10:100] will be converted to -10.

RANK and SCORE Example 6

SCORE 'SN0' +10 BEFORE WORD 'bar' WITHIN 20 BYTES
SCORE 'SN1' +10 INSIDE WORD 'bar' WITHIN 20 BYTES
SCORE 'SN1' +10 BEFORE WORD 'xyz' WITHIN 20 BYTES
SCORE 'SN2' +10 AFTER  WORD 'bar' WITHIN 20 BYTES

ALIAS 'FooBaz' (WORD 'foo' THEN WORD 'baz' WITHIN 20 BYTES) RANK -10
REFER 'FooBaz' RANK 'SN0' RANK 'SN1%10:100=5' RANK 'SN2'

Based on the above GLASS expression, only line 5 and line 7 below will be returned as a match with a score of 0 and 10 respectively.

1
2
3
4
5
6
7
foo012345678901234567890baz
...Filler string of more than 20 bytes...
foobarbaz
...Filler string of more than 20 bytes...
foo<baz>bar
...Filler string of more than 20 bytes...
foo<baz>bar--bar
Input String Match Notes Final Score
foo012345678901234567890baz - The input string failed the main GLASS expression (WORD 'foo' THEN WORD 'baz' WITHIN 20 BYTES) as the word baz is not within 20 bytes of the word foo. As such, the GLASS pattern will not be evaluated further. -
foobarbaz - The input string starts with a score of -10 and is first evaluated against the scoring namespace SN0. The input string fails to match the scoring rules in SN0, and the score remains at -10. SN1 is evaluated next, and as the input string matches SN1 (SCORE 'SN1' +10 INSIDE WORD 'bar' WITHIN 20 BYTES), the score for SN1 starts at +10 but is remapped to +5. At this point, the overall score is -10 + 0 + 5 = -5. The overall score is unchanged after evaluating SN2 as the word bar is not found after the foobaz match. -5
foo<baz>bar foo<baz The input string starts with a score of -10 and is first evaluated against the scoring namespace SN0, followed by SN1. The input string fails to match the scoring rules in SN0 or SN1, and the score remains unchanged at -10. SN2 is evaluated last, and as the input string matches SN2 (SCORE 'SN2' +10 AFTER WORD 'bar' WITHIN 20 BYTES), the overall score is -10 + 0 + 10 = 0. 0
foo<baz>bar--bar foo<baz The input string starts with a score of -10 and is first evaluated against the scoring namespace SN0, followed by SN1. The input string fails to match the scoring rules in SN0 or SN1, and the score remains unchanged at -10. SN2 is evaluated last, and as the input string matches SN2 (SCORE 'SN2' +10 AFTER WORD 'bar' WITHIN 20 BYTES), the overall score is -10 + 0 + 10 = 0. However since the word bar occurs twice within a 20-byte range, the final score is -10 + 0 + 10 + 10 = +10. +10

RANK and SCORE Example 7

MAP 'INVALID_ID' 0
MAP NOCASE 'PRIMARY_KEYWORDS' 'id', 'membership'
MAP NOCASE 'SECONDARY_KEYWORDS' 'account', 'number', 'customer'

SCORE 'PRIMARY_CONTEXTS' +10 BEFORE \
  GROUP 'PRIMARY_KEYWORDS' WITHIN 20 BYTES
SCORE 'SECONDARY_CONTEXTS' +5 BEFORE \
  GROUP 'SECONDARY_KEYWORDS' WITHIN 20 BYTES
SCORE 'SECONDARY_CONTEXTS' +5 AFTER \
  GROUP 'SECONDARY_KEYWORDS' WITHIN 20 BYTES  

RANGE DIGIT TIMES 7 BOUND NONALNUM EXCLUDE 'INVALID_ID' \
RANK -10 RANK 'PRIMARY_CONTEXTS' RANK 'SECONDARY_CONTEXTS'

Based on the above GLASS expression, only line 1 and line 3 below will be returned as a match, both with a final score of 0.

Line 5 will not be returned as a match as the scoring rules require at least one primary keyword to be found before the 7-digit number, or at least two secondary keywords to be found within 20 bytes before or after the 7-digit number. The final score for line 5 is -5.

1
2
3
4
5
Requested for customer account 1122334.
...Filler string of more than 20 bytes...
Membership No.: 3344556.
...Filler string of more than 20 bytes...
Customer #5566778 filed a request to...