ThreatQ Pynoceros documentation

Configuration Driven Feeds

The Pynoceros codebase is responsible for handling feed execution and parsing. Configuration Driven Feeds, (CDFs), allow a user to build powerful and robust definitions of how to ingest Threat Intelligence from a Feed provider. This section provides a general overview of writing CDF definitions and documents the options available when writing CDF definitions.

Feed Runs

This section provides an overview of different Feed Run types that are available to users running Configuration Driven Feeds.

All Feed Runs have an associated time range to query. The start and end datetime values for the Feed Run’s query range are known as the since and until dates within CDFs. These values are available for use within a definition via Run Meta.

Scheduled Runs

Scheduled Runs are a Feed’s primary means of ingesting data and are scheduled and executed via the dynamo process.

The query time range for Scheduled Runs is based on a Feed’s last_import_at and configured period (aka frequency), or schedule. The schedule is a limited CRON string that can be used for advanced scheduling. The first time a Feed runs, its time range is set as:

  • since - The current time minus the Feed’s period or schedule interval.

  • until - The current time

When a Scheduled Run completes successfully, the Feed’s last_import_at is updated to the completed Run’s until datetime. The next Scheduled Run is then scheduled for the previous run’s until time plus the Feed’s period, or schedule interval. From then on, Scheduled Run query ranges are set as:

  • since - The Feed’s last_import_at

  • until - The Feed’s last_import_at plus the Feed’s period, or schedule interval.

If a Scheduled Run encounters an error or fails for some reason, the Feed’s last_import_at is not updated. If a Feed’s last_import_at falls behind due to large data throughput, Feed Run errors, the Feed being disabled, or any other reason, the Feed will perform consecutive Scheduled Runs until it catches back up to the current time.

For any given Feed, only one Scheduled Run may be in progress at a time.

Note

Starting in ThreatQ Version 5.3.0, first run behavior for CRON-scheduled feeds is determined by the schedule entirely. If a schedule is configured as 30 15 * * *, the feed will not run until the 30th minute of the 15th hour, System Time (UTC) regardless of last_import_at.

Note

The database will store BOTH a CRON-Schedule AND the Frequency, but the CRON-Schedule takes precidence, and if it is not None for a given feed, will ignore the frequency.

Manual Runs

Manual Runs are on-demand Feed Runs triggered via the ThreatQ UI or CLI. Manual Runs require a since time and accept an optional until time. Since Manual Runs require at least a since time, only Feeds that leverage a since or until value within their definition’s source support Manual Runs.

For any given Feed, only one Manual Run may be in progress at a time.

One can trigger a Manual Run for an installed, enabled, and configured Feed by using the following artisan command via the CLI:

sudo php /var/www/api/artisan threatq:feeds-manual --since <DateTime> --until <DateTime> <Feed Name>

# The format for <DateTime> is "YYYY-MM-DD HH:mm:ss" (e.g. "2020-02-19 23:24:00")

Note

For any given Feed, both a Scheduled Run and a Manual Run can be in progress at the same time.

Feed Definitions

A CDF definition file is written in the YAML data serialization language. For more YAML information and usage examples, see Basic YAML Usage.

CDF Feed Definitions also make extensive use of the Jinja2 templating language. See Jinja2 Templating in CDF for more information.

For more information on how to write a feed definition and the purpose of each of its parts, see Writing a Feed Definition.

Basic YAML Usage

Overview

A CDF definition file is written in the YAML data serialization language. YAML is a superset of JSON, supporting all of the possible data forms allowed within standard JSON data and more. This page lays out common YAML constructs one will encounter while writing a Feed Definition.

Note

YAML comments are denoted by a #. Within this section, commented lines in YAML examples provide more contextual information.

Basic YAML Structure

YAML utilizes indentation to specify scoping of object data. The default indentation size for YAML is two spaces.

Primitive Data Types (Scalars)

YAML supports all the standard primitive data types one would expect from a data markup language, namely:

  • String

  • Integer

  • Floating Point

  • Boolean

  • Null

The following illustrates the use of various primitive data types within YAML:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
Hello There    # String
'Hello There'  # String
"Hello There"  # String
42             # Int
20.21          # Floating Point
True           # Boolean
False          # Boolean
null           # Null

|
  This is a
  multiline string

# Result: "This is a\nmultiline string\n"

|-
  This is a
  multiline string with
  the trailing newline removed

# Result: "This is a\nmultiline string with\nthe trailing newline removed"

Warning

In the context of Pynoceros CDF, data does not require explicit quotations to be read as a String. Quotations are required in a few “Gotcha” cases, however:

  • String starting with certain reserved symbols, like % and !

  • An empty string ('')

  • Regex strings

Basic Data Structures (Collections)

The following data structures can be used anywhere within a YAML file.

List/Array

Lists (or arrays) are denoted in YAML via a - prior to a data value. When supplying elements to a list via the - syntax, element declarations should be indented once (two spaces only) beneath the list object declaration. Like JSON, YAML lists can contain a mix of multiple data types. The following illustrates the use of a YAML list called example_list that has two elements:

1
2
3
example_list:
  - Hello There
  - 42
Dictionary/Mapping

Dictionary mapping of key/value pairs. A Key is expected to be a String (Integers are implicitly cast to a String), while a Value can be any supported YAML value: a scalar (primitive data value) or collection (list or dictionary). The following illustrates a complex YAML dictionary called example_dict that uses a variety of YAML types:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
example_dict:
  id: 99
  may_be_provided: null
  description: This is an example!
  values:
    - 1
    - 2
    - 3
    - color: Blue
    - nested_dict:
        is_nested: True

Warning

Sometimes YAML needs a little help determining what data structure you are intending to create, particularly when intermixing dictionaries and lists. Note that the nested-dict mapping under values``has its ``is-nested attribute indented twice (four spaces instead of the usual two).

The previous example is directly equivalent to the following JSON:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
    "example_dict": {
        "values": [
            1,
            2,
            3,
            {
                "color": "Blue"
            },
            {
                "nested_dict": {
                    "is_nested": true
                }
            }
        ],
        "id": 99,
        "may_be_provided": null,
        "description": "This is an example!"
    }
}
Custom YAML Declarations

YAML allows one to declare arbitrary data structures for use anywhere within the YAML file by simple reference. When referenced, the reference is replaced with the content of the custom declaration. The following illustrates creating custom declarations and referencing said declarations within the YAML file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
_anchors:
  - &custom1  # Here, &custom1 denotes the name of this custom declaration
      test_value_A: 1
      test_value_B: 2
  - &custom2  # Here, &custom2 denotes the name of this custom declaration
      test_value_C: 3

example:
  is_true: True
  <<: [*custom1, *custom2]  # Referencing custom declarations
  is_false: False

The previous example written without the custom declarations would look like:

1
2
3
4
5
6
example:
  is_true: True
  test_value_A: 1
  test_value_B: 2
  test_value_C: 3
  is_false: False

The final resulting YAML from either example would be equivalent. Utilizing custom YAML declarations allows a Feed Definition writer to reduce repeated logic and create more modular definitions.

Note

Custom YAML declarations cannot be used to extend a list.

Jinja2 Templating in CDF

Overview

Jinja2 is a powerful templating language written for Python. CDF Definitions provide an interface for manipulating and accessing data via Jinja2 Expressions and Templates.

Leveraging Jinja2 in Feed Definitions

By default, CDF Definitions read YAML values as primitive data types. To indicate that a YAML value should be interpreted as a Jinja2 Expression or Template, one must insert a YAML tag before the value.

  • !expr - Read value as Jinja2 Expression

  • !tmpl - Read value as Jinja2 Template

The following illustrates these tags in the context of a CDF Filter Chain:

1
2
3
4
5
6
filters:
  - get: resources
  - iterate
  - new:
      id: !expr 'None' if not value.id else value.id
      display: !tmpl 'Value of A is {{value.A}} and Value of B is {{value.B}}'
Jinja2 Expressions

Jinja2 Expressions allow one to specify Python-like expressions that act upon data passed into them via keyword arguments. Jinja2 Expressions can be used to implement conditional transformation logic, string concatenation, and numerous other functionalities. The following illustrates some common expression constructs as if they were snippets of a CDF Filter Chain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# =========================
#   String Concatenation
# =========================

- new: Test!
- new: !expr '"This is a " ~ value'

# Result: "This is a Test!"

# Jinja2 Templates should be preferred over Jinja2 Expressions for simple string concatenation.
# For example, the previous snippet could be rewritten as:

- new: Test!
- new: !tmpl 'This is a {{value}}'

# However, in use cases in which string concatenation is part of a more complex expression,
#  Jinja2 Expressions must be used. For example:

- new:
    some_mapping:
      key_abc: 2
      key_def: 42
      key_ghi: 9
    lookup_key: def
- if:
    condition: !expr '"key_" ~ value.lookup_key in value.some_mapping'
    filters:
      - set:
          lookup_value: !expr 'value.some_mapping["key_" ~ value.lookup_key]'
- set-default:
    lookup_value: !expr None
- get: lookup_value

# Result: 42

# =========================
#     Conditional Logic
# =========================

- new:
    num: 28
- new: !expr '"Even" if value.num % 2 == 0 else "Odd"'

# Result: "Even"

- new:
    num: 27
- new: !expr '"Even" if value.num % 2 == 0 else "Odd"'

# Result: "Odd"

# =========================================
#  Python-like Operations & List Creation
# =========================================

- new: !expr '["# This is a comment", 1, 2, 3]'
- iterate
- str
- drop: !expr value.startswith('#')
- new: !expr value|int ** value|int  # Uses the Jinja2 `int` filter

# Yields: 1, 4, 27

# Jinja2 Expressions are not required for creating lists. The above, however,
#  looks better than the following:

- new:
    -
      - '# This is a comment'
      - 1
      - 2
      - 3

# A list of lists has to be used here since the `new` filter unpacks an iterable into
#  its args. This is not necessary when using a Jinja2 Expression because the Expression
#  itself is a single argument. Regardless, the above can be somewhat beautified without
#  using a Jinja2 Expression by writing out the list using JSON syntax, which is supported
#  given that YAML is a superset of JSON.

- new:
    - ['# This is a comment', 1, 2, 3]

# This looks just like the line in the example that uses the Jinja2 Expression, except without
#  the !expr tag and the list isn't wrapped in quotes. For most cases, this is fine, unless
#  there is a list item that is an expression. Then a Jinja2 Expression is required:

- new: !expr '["# This is a comment" ~ "!", 1 ** 1, 2 ** 2, 3 ** 3]'

# Result: ["# This is a comment!", 1, 4, 27]

Note

String concatenation within Jinja2 Expressions utilizes the ~ character rather than the classic + symbol.

Note

Jinja2 Expressions in the context of CDF Definitions do not need to be wrapped entirely in quotes - only a specific string chunk used within an Expression needs to be quoted.

Warning

Jinja2 Expressions do not support list comprehensions or looping.

For more information on Jinja2 Expressions including tests and filters built into Jinja2, see Jinja2’s API Documentation.

Jinja2 Templates

Jinja2 Templates allow one to easily create formatted strings more concisely than with Jinja2 Expressions. The following illustrates how to utilize Jinja2 Templates to create formatted strings as if they were snippets of a CDF Filter Chain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
- new:
    A: 1
    B: 2
- new: !tmpl 'A = {{value.A}} and B = {{value.B}}'

# Result: "A = 1 and B = 2"

- new:
    A: True
    B: False
- new: !tmpl 'A = {{value.A}} and B = {{value.B}}'

# Result: "A = True and B = False"

- new:
    A: !expr '"This is a " ~ "Test!"'
    B: !expr 2 ** 2
- set:
    C: !expr value.A.startswith('This')
- new: !tmpl 'A = {{value.A}} and B = {{value.B}} and C = {{value.C}}'

# Result: "A = This is a Test! and B = 4 and C = True"


# Templates can include inline expressions:

- new:
    num: 27
- new: !tmpl '{{value.num}} is {{"Even" if value.num % 2 == 0 else "Odd"}}.'

# Result: "27 is Odd."


# Templates can be multiline and use Jinja2 Control Structures and Whitespace Control:

- new:
    vulnerable_browsers:
      - Firefox
      - Brave
      - Netscape
- set:
    description: !tmpl |
      {% if value.vulnerable_browsers -%}
      <h2>Vulnerable Browsers</h2><ul>
      {%- for browser in value.vulnerable_browsers -%}
      <li>{{browser}}</li>
      {%- endfor %}</ul>
      {%- endif %}
      {%- if value.vulnerable_vendors %}
      <h2>This Won't Render</h2>
      {% endif %}

# Result:
# {
#     "description": "<h2>Vulnerable Browsers</h2><ul><li>Firefox</li><li>Brave</li><li>Netscape</li></ul>",
#     "vulnerable_browsers": [
#         "Firefox",
#         "Brave",
#         "Netscape"
#     ]
# }

For more information on Jinja2 Templates including tests and filters built into Jinja2, see Jinja2’s API Documentation.

Pynoceros Jinja2 Extension

The Pynoceros SDK used by Dynamo’s CDF runtime contains a Jinja2 Extension which exposes the following Jinja2 Filters:

  • to_timestamp - New in version 4.34.0. This filter passes its input to the arrow module’s get(), returning an Arrow object. The following illustrates some example use cases for this Jinja2 Filter used within Jinja2 Expressions:

    # Subtract 5 minutes (300 seconds) from when the feed run started and display the resultant
    #  timestamp in the format `YYYY-MM-DDTHH:MM:SSZ`:
    
    - new: !expr (((run_meta.since|to_timestamp).format('X')|int - 300)|to_timestamp).strftime('%FT%H:%M:%SZ')
    
    # Example Result: "2020-11-15T19:36:52Z"
    
    
    # Convert a timestamp from one format to another:
    
    - new:
        published: '05/20/2020 05:20:20 AM'
    - set:
        published: !expr value.published | to_timestamp('M/D/YYYY h:mm:ss A')
    - filter-mapping:
        published:
          timestamp  # There is no way to provide an input timestamp format, so to_timestamp is used for that purpose
    
    # Result: {"published": "2020-05-20 05:20:20-00:00"}
    

    Note

    Pynoceros is currently dependent on arrow 0.8.0. As a result, the Arrow object returned from this filter will not have newer convenient methods like shift(). Therefore, doing relative datetime manipulations is a bit more tedious today, as can be seen in one of the above examples.

    Warning

    Arrow objects are not JSON serializable. If one is using tq-feed run to test a Feed Definition’s Filter Chain and has the following at the end of the Filter Chain:

    - new:
        published: '05/20/2020 05:20:20 AM'
    - set:
        published: !expr value.published | to_timestamp('M/D/YYYY h:mm:ss A')
    

    One will see the error: TypeError: <Arrow [2020-05-20T05:20:20+00:00]> is not JSON serializable. To avoid this, one needs to convert the Arrow object into a JSON primitive, similar to what is done in the above use case examples by using the filter-mapping and timestamp filters.

Jinja2 Caveats
  • Jinja2’s dot syntax can only be used to access valid Python identifiers: all characters alphanumeric or underscores, not starting with a number. Keys that do not conform to standard Python indentifiers, such as meta-category, can instead be referenced with the direct dictionary access syntax. For instance, the expression !expr value.meta-category would be read by Jinja2 as value.meta - category, rather than selecting on the key meta-category. The correct expression to reference this value is !expr value["meta-category"].

    Warning

    Some keys (such as ‘items’) are reserved in CDFs and cannot be used with dot notation. It’s recommended to not use these keys at all, but if you must, accessing it as !expr value["items"] will work.

Writing a Feed Definition

Overview

The basic setup of a CDF Definition is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
version: '1.1.1'
required_threatq_version: '>=4.34.0'

template_values:
  ...

feeds:
  Feed Name ABC:
    category: Commercial
    default_period: 86400
    display_name: ABC
    namespace: threatq.feeds.FeedNameABC
    description: "This is Feed ABC"
    default_indicator_status: Active
    default_signature_status: Active
    timestamp_format: '%s'
    user_fields:
      ...
    source:
      ...
    filters:
      ...
    report:
      ...

  Feed Name XYZ:
    category: Commercial
    default_period: 3600
    namespace: threatq.feeds.FeedNameXYZ
    default_indicator_status: Review
    timestamp_format: '%Y-%m-%d %H:%M:%S'
    user_fields:
      ...
    source:
      ...
    filters:
      ...
    report:
      ...

This section covers all keys in the root scope of a Feed Definition and most keys in a given feed dictionary mapping under feeds. A feed’s Source (source), Filters (filters), and Reporting (report) keys are much more involved and are covered in their own sections.

Feed Definition Version

The version of a Feed Definition is provided via the version key in the root scope of a Feed Definition. All primary feeds in the Feed Definition are versioned based on the version value.

It is highly advised to follow Semantic Versioning.

Warning

Explicitly wrap the version value in quotes so that the YAML parser does not attempt to interpret it as a floating point value.

Feed Definition Requirements

Based on changes in the Pynoceros SDK, Dynamo, the ThreatQ API, etc., a Feed Definition may be constrained to specific ThreatQ versions (the aforementioned ThreatQ components are typically all versioned to match the ThreatQ version). Attempting to install a Feed Definition that uses a feature that is not known to any of the aforementioned components may result in failing to install the feed or failing to successfully run the feed.

The required_threatq_version key in the root scope of a Feed Definition provides a means of guarding against installing the Feed Definition on a ThreatQ box that is incapable of understanding or running it. The required_threatq_version value is a version specifier modeled after PEP 440.

Note

A version specifier contains constraints. A constraint is in the format {{operator}}{{version}}, such as >=4.34.0. There can be whitespace between the operator and the version in the constraint. Any leading or trailing whitespace in an individual constraint is trimmed.

The currently supported operators are:

  • ==

  • !=

  • <

  • <=

  • >=

  • >

A version specifier can have multiple constraints separated by commas that are implicitly joined by ANDs, meaning that the ThreatQ version must satisfy all the constraints. For example, if a Feed Definition can be supported by at least ThreatQ version 4.34.0 but cannot be supported by ThreatQ version 4.44.0 for some reason, one can provide the required_threatq_version value: >=4.34.0, !=4.44.0.

Most Feed Definitions are only constrained by a minimum ThreatQ version; therefore, most version specifiers are in the format: >=x.y.z.

The required_threatq_version is only checked by the ThreatQ API at Feed Definition installation time. It is not checked by Dynamo before loading or running a Feed Definition.

Template Values

A Feed Definition writer often wants to reference some hard-coded static data in several locations within the Feed Definition or lookup/resolve values in a global reference dictionary mapping. The optional template_values dictionary mapping, located at the root of a Feed Definition, provides an easy mechanism for declaring this information. The following demonstrates an example template_values declaration:

1
2
3
4
5
template_values:
  common_string: This is used often
  lookup_table:
    keyA: A value to insert
    keyB: Another value

Once declared, these template_values can be accessed anywhere in the Feed Definition via Jinja2 Expressions or Templates. For instance, a Feed Definition writer can reference the above template_values like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
filters:
  - new:
      common: !tmpl '{{common_string}} to dance!'
      needle: keyA
  - set:
      found: !expr lookup_table[value.needle]

  # Result:
  # {
  #   "common": "This is used often to dance!",
  #   "found": "A value to insert",
  #   "needle": "keyA"
  # }

Warning

The following key names should not be used as Template Values as they are reserved and are overwritten during parsing:

Note

Jinja2 Expressions or Templates cannot be used within template_values.

Category

When looking at the “My Integrations” page on the ThreatQ UI, one will see that Feeds have categories that can be filtered on. The two default categories are:

  • OSINT (Open-source intelligence)

  • Commercial

The category key is used to set the category for a feed:

1
2
3
4
5
6
7
feeds:
  Example OSINT Feed:
    category: OSINT
    ...
  Example Commercial Feed:
    category: Commercial
    ...

Note

category values are case-sensitive. Feed installation may fail if capitalization is incorrect.

Note

If no category is explicitly set, the feed will default to using the Labs category.

Note

While not explicitly enforced, the STIX/TAXII category should be reserved for TAXII feeds that are dynamically created via the ThreatQ UI (“My Integrations” > “Add New Integrations” > “Add New TAXII Feed”). If one is creating a Feed Definition that utilizes the TAXII Source and/or the parse-stix filter, one should stick to using either the OSINT or Commercial categories.

Default Period

When specified, the Default Period value (in seconds) is honored as the Feed’s run period on install. Currently, the ThreatQ UI supports a few frequency options: 3600 (1 Hour), 21600 (6 Hours), 86400 (1 Day), 172800 (2 Days), 1209600 (14 days), and 2592000 (30 Days).

The default_period key is used to set the default period for a feed:

1
2
3
feeds:
  Example Feed:
    default_period: 3600

Note

The default value for default_period is 86400, or a daily period.

Note

The default_period is only honored on install. Upgrades will not reset an installed Feed’s period back to the default_period.

Note

While the ThreatQ UI only supports daily and hourly period choices, the feed itself can specify any integer value for default_period and have it honored.

New in version 5.3.0.

Default Schedule

When specified, the Default Schedule value (in modified CRON) is honored as the Feed’s run schedule on install. Currently, the ThreatQ UI supports Daily and Weekly configurations.

The default_schedule key is used to set the default schedule for a feed:

1
2
3
feeds:
  Example Feed:
    default_schedule: "30 13 * * *"

Note

The default value for default_schedule is None.

Note

The default_schedule is only honored on install. Upgrades will not reset an installed Feed’s schedule back to the default_schedule.

Note

While the ThreatQ UI only supports daily and weekly schedule choices, the Yaml itself can specify any valid CRON-Like schedule and have it honored, which is validated on upload.

Note

We do not support ? or W type syntax. We do support H syntax. For more detailed information about what CRON we support, please reference the helpcenter Docs. Alternatively, you may use our command line utility, tq-feed analyze for validation.

New in version 4.42.0.

Display Name

A feed’s display_name is the identifier string shown on the presentation layer. A display_name can be any arbitrary string up to 255 characters. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
feeds:
  PhishLabs:
    feed_type: primary
    namespace: threatq.connector.commercial.phishlabs.lab
    description: "Retrieves data at PhishLabs"
    ...
  Phish Labs Intelligence Echo:
    feed_type: primary
    display_name: PhishLabs
    namespace: threatq.connector.commercial.phishlabs.echo
    description: "Retrieves Echo data at PhishLabs"
    ...
  Phish Labs Intelligence Golf:
    feed_type: primary
    display_name: PhishLabs
    namespace: threatq.connector.commercial.phishlabs.golf
    description: "Retrieves Golf data at PhishLabs"
    ...

Note

The default value for display_name is the feed’s name when the display_name key is not stated.

Note

One can use display_name to update a single integration using multiple sources of data.

New in version 5.8.0.

Namespace

A feed’s Namespace is a unique identifier, often used in place of the feed’s human-readable name for identification purposes. A Namespace can be any arbitrary identifier, but they often follow the convention threatq.connector.[category].[vendor].[identifier]. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
feeds:
  Phishlabs:
    category: Commercial
    namespace: threatq.connector.commercial.phishlabs.PhishLabs
    ...
  www.dan.me.uk Toe Node List:
    category: OSINT
    namespace: threatq.connector.osint.www_dan_me_ul_torlist.WwwDanMeUkTorList
    ...
  Bambenek Consulting - C2 IP:
    category: OSINT
    namespace: threatq.connector.osint.bambenek.C2IP
    ...

Note

If no namespace is provided, the feed’s namespace will default to threatq.connector.[feed_name], where feed_name is the unmodified feed display name. Note that this may not follow convention, since feed names may contain spaces (whereas namespaces typically do not).

Description

A feed’s Description explains what a feed does. A Description can be any arbitrary string up to 255 characters. For example:

1
2
3
4
5
6
feeds:
  Phishlabs:
    category: Commercial
    namespace: threatq.connector.commercial.phishlabs.PhishLabs
    description: "Retrieves data at PhishLabs and does stuff."
    ...

Note

If no description is provided, the feed’s description will default to a blank string.

Ingest Rules

Ingest Rules is a mapping that contains data ingestion rules for the feed. At the time of writing, the only supported ingest rule is for Attributes.

On feed upload, these configurations are used for setting an ingestion rule in the platform. If an attribute key is not specified in the rules, it will be assumed to be a multi-value attribute.

If an attribute key is specified within the config, its default rule insert will be overwritten for the attribute for that source, if provided. If a source is not provided on the ingest_rules and if the feed type is not action, the source will default to the feed’s name.

This means that if an attribute is attempted to be ingested more than once, but an entry exists in the ingest_rules, and rule: insert the new attribute value will overwrite the existing value in ThreatQ instead of adding a new attribute key/value pair. If an attribute is not specified in the ingest_rules, a new key/value pair will be ingested alongside the existing ones.

The name and rule keys are required. The sources key is also required when the feed_type is action. If they are not specified, the feed will fail to upload. rule: update enables the updating of an attribute and rule: insert restores normal multi-value behavior.

When we are trying to install the action, if there is no sources key it gives the DefinitionError. If the action is already installed, when we are trying to install the action again without the sources key it gives the DefinitionError.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
feeds:
  Phishlabs:
    category: Commercial
    namespace: threatq.connector.commercial.phishlabs.PhishLabs
    description: "Retrieves data at PhishLabs and does stuff."
    ingest_rules:
      attributes:
        - name: No Source Specified Example  # Required
          rule: update  # Required
          sources: Single Source  # Required when feed_type is action
        - name: Single Source Specified Example
          rule: update  # Enables single_value
          sources: Single Source  # Required when feed_type is action
        - name: Multiple Sources Specified Example
          sources:
            - Single Source
            - Another Source
          rule: insert # Normal behavior
    ...
Default Statuses

The default_indicator_status key is used to set the default status for Indicator Threat Objects ingested by a feed. Indicator statuses can be created by administrators via the ThreatQ UI under “System Configurations”. In addition to any user-created statuses, the following values can be used:

Active

Poses a threat and is being exported to detection tools.

Expired

No longer poses a serious threat.

Indirect

Associated to an active indicator or event (i.e. pDNS).

Review

Requires further analysis.

Whitelisted

Poses no risk and should never be deployed.

The default_signature_status key is used to set the default status for Signature Threat Objects ingested by a feed. The following values can be used:

Active

Currently poses a threat.

Expired

No longer poses a serious threat.

Inactive

Does not currently pose a threat, but could return to Active status.

Non-malicious

Signature does not describe a threat and therefore poses no risk.

Review

Requires further analysis.

Whitelisted

Poses no risk.

The default_indicator_status or default_signature_status keys are not required by feeds that do not ingest Indicator or Signature Threat Objects, respectively.

Timestamp Formatting

Feed providers typically require that datetime parameters be formatted in a certain consistent way when passed in a source request. In order to do this, one can provide a format string via the timestamp_format field. Additionally, a timezone can be provided via the timezone field, if required by the format. An example of how these fields are set in a feed can be found in the code-block below:

1
2
3
4
feeds:
  Example Feed:
    timestamp_format: '%FT%H:%M:%S.%l%Z'
    timezone: 'US/Eastern'

Setting a timestamp_format will automatically format the since, until, and started_at run_meta parameters only within the feed’s source. If accessed via Jinja2 Expressions or Templates outside of a feed’s source, the since, until, and started_at run_meta parameters are in the format YYYY-MM-DD HH:mm:ssZZ where the default timezone is UTC (these are also the defaults in source if timestamp_format or timezone are not explicitly provided).

A supplemental feed or action may define its own timestamp_format and timezone. This will format run_meta source values for only that supplemental or action feed. This is new as of version 5.20.0. Previously, only a primary feed could define these values and any supplemental feeds would inherit from it. This inheritance behavior still exists. Actions would inherit from the primary as well, but there was no way to configure these values in the primary before this change.

Note

For actions before version 5.20.0, users can modify run_meta values and pass them to actions via run-params. For example, passing a run parameter like this: epoch: !expr (run_meta.started_at|to_timestamp).format('X') | int, and referencing it in the invoked feed source as run_params.epoch is equivalent to using timestamp_format: '%s' and referencing run_meta.started_at in source in version 5.20.0 and later.

The timestamp_format values are formatted using Python’s strftime Format Codes, which themselves are largely derived from the format codes required by the C89 language standard.

However, there is a special format string '%s' which formats the aforementioned parameters as integers instead of strings. This integer represents the number of seconds that have elapsed since the Unix epoch.

The timezone value supports a variety of timezones and aliases. For example, America/New_York, US/Eastern, EST/EDT (depending on time of year) are all equivalent. Reference the list of tz database time zones for available options.

Warning

timestamp_format is not to be confused with the Timestamp Filter.

Feed Run Metadata

The following contextual information is provided to the Feed Definition and available for access via Jinja2 Expressions or Templates. All contextual feed run data is provided via the run_meta object. For example, to access a feed run’s started at time, one would write: !expr run_meta.started_at.

  • started_at - A string or integer representing the time that the current Feed Run actually started. This value is formatted per the feed’s timestamp_format only if accessed in the feed’s source.

  • since - A string or integer representing the start of the date range that is queried from the data source. This value is formatted per the feed’s timestamp_format only if accessed in the feed’s source. Using this in a primary feed’s source enables support for manual runs. Does not need to be used along with until.

  • until - A string or integer representing the end of the date range that is queried from the data source. This value is formatted per the feed’s timestamp_format only if accessed in the feed’s source. Must be used along with since.

  • uuid - Each feed run gets a unique UUID to help distinguish it in the ThreatQ Appliance. This value is provided purely for informational purposes and is not generally used in a Feed Definition.

  • trigger_type - A string that can either be manual if the feed run was started by a user or scheduled if the feed run was started by the system during normal periodic feed processing. This value can be used to make adaptive queries based on how the run was triggered.

User Fields

CDFs have the capability to define user_fields. These fields are presented to users in the ThreatQ UI. This allows feed designers to inject configuration options, credentials, or any other information that is needed for a feed to operate. To learn more about the configuration options available when declaring user fields, see the User Fields and Parameters page.

Feed Run Response

A CDF response is provided via the response object. For example, to access a CDF’s HTTP response header, one would write !expr response.headers. At this time, only HTTP headers are supported in the response object.

  • headers - A dictionary representing the response’s HTTP headers. The values in this dictionary are dynamic, because different feeds can provide different header values. When accessing specific values in the dictionary, the mapping must exist or else an error will occur.

Feed Type

There are four different types of feeds that can be defined:

  • Primary (default)

  • Supplemental

  • Fulfillment

  • Action

A feed’s type is specified by the optional feed_type key. By default, the value of feed_type is None, which implies that the feed is a primary feed.

Primary Feeds
1
2
3
4
5
6
7
Formatted Feed Name:
  source:
    ...
  filters:
    ...
  report:
    ...

If a feed’s type is not explicitly specified, then the feed is a primary feed. Primary feeds are listed in the ThreatQ UI’s “My Integrations” page and are scheduled by the feed run scheduler when enabled. If the primary feed uses at least run_meta.since (see Feed Run Metadata) in its source section, then manual runs are supported. Users can kick off manual runs via the ThreatQ UI or CLI when the feed is enabled.

Note

The option to run a manual run may not be visible in the ThreatQ UI until a scheduled run is visible in the feed’s Activity Log.

At least one primary feed must be declared within a Feed Definition. There may be any number of primary feeds within a single Feed Definition.

A “connector” source is created for a primary feed when it is enabled for the first time. The source’s name matches the name of the primary feed. All objects ingested by the primary feed belong to this source.

Note

There is currently no way to overwrite or change the source of objects ingested by a feed.

Primary feeds are the only feed type that utilizes the report section.

Note

Advanced use case: One can specify the primary feed name as _default in the Feed Definition. If a feed’s database record in the connectors table references a Feed Definition that makes use of _default, and this Feed Definition does not contain a primary feed whose name matches the database record’s name, then the _default feed is aliased under the database record’s feed name.

This is how STIX/TAXII feeds dynamically created via the ThreatQ UI work; they all reference the same Feed Definition which has a _default primary feed.

Supplemental Feeds
1
2
3
4
5
6
Formatted Feed Name:
  feed_type: supplemental
  source:
    ...
  filters:
    ...

If a feed’s type is supplemental, then the feed is not listed in the ThreatQ UI and cannot be externally triggered. Supplemental feeds can be invoked by any feed type (including other supplemental feeds) in the same Feed Definition. The purpose of supplemental feeds is to make additional API requests to the feed provider or, really, any external or internal resource. All values yielded from a supplemental feed’s Filter Chain are collected into a single list.

Note

While they cannot be avoided for use cases in which a single feed needs to make multiple requests, one should be cautious when using supplemental feeds. Since all values yielded from a supplemental feed’s Filter Chain are collected into a single list, there are two performance hits to take into account:

  • Memory consumption

  • Blocking the task of the calling feed that invoked the supplemental feed

In other words, a supplemental feed does not function like an asynchronous generator that has values yielded from its Filter Chain one-at-a-time. All the values need to be collected into memory, and subsequent tasks, which may belong to a stage or filter, will not execute until the supplemental feed is finished. However, other tasks running in parallel to the blocked task will not be affected (think of a supplemental feed being invoked for a single value yielded from the Iterate Filter; other yielded items may be at different stages of the Filter Chain or Feed Run Pipeline).

A general rule-of-thumb to keep in mind is to do the minimal amount of work necessary in a supplemental feed in order to get back to processing in the primary feed. For example, if a supplemental feed returns a JSON string containing a list of dictionary mappings, there is absolutely no reason to do further processing of the contents of this JSON string within the supplemental feed’s Filter Chain. Have the supplemental feed return either the JSON string or the parsed JSON, and do further transformations within the primary feed’s Filter Chain.

A common anti-pattern is to use the Iterate Filter at the end of a supplemental feed’s Filter Chain. This is simply wasting time and resources since each yielded item is collected back into a list anyway!

The primary way to invoke a supplemental feed is via the Set Filter in a feed’s Filter Chain. Supplemental feeds are also used for HTTP Token-based Authentication.

Regardless of invocation method, a dictionary mapping can be provided via the run-params argument. This mapping can then be accessed within the supplemental feed’s definition by accessing run_params via Jinja2 Expressions or Templates. To see an example use case for run-params, click here.

Supplemental feeds inherit the template context of the calling feed, which means it has access to the following via Jinja2 Expressions or Templates:

As a result, any information needed from any of the above dictionary mappings does not need to be passed via run-params since these dictionary mappings can be accessed directly.

A supplemental feed also inherits the timestamp_format value of the primary feed that invoked it (see Timestamp Formatting).

Fulfillment Feeds
1
2
3
4
5
6
Formatted Feed Name:
  feed_type: fulfillment
  source:
    ...
  filters:
    ...

If a feed’s type is fulfillment, then the feed is not listed in the UI and cannot be directly triggered. A fulfillment feed run automatically starts at the end of a scheduled or manual primary feed run for any primary feed(s) that exist(s) in the same Feed Definition.

The purpose of a fulfillment feed is to fulfill placeholder files ingested from attachment-sets in primary feed(s) located in the same Feed Definition. A fulfillment feed is provided a list of placeholder filenames for files matching the same source as the primary feed triggering the fulfillment feed run. This list of placeholder filenames is available via run_params.placeholders in the fulfillment feed.

Note

When a primary feed ingests an attachment object without a content value, the attachment object is ingested as a placeholder file. Placeholder files have their name modified prior to ingestion. A placeholder file’s name, as stored in the ThreatQ Platform, is in the format:

pending-{{original_name}}.txt

However, the name available in the fulfillment feed’s run_params.placeholders list has the added “pending-” and “.txt” removed.

So, for example, if a primary feed ingests a placeholder file whose name is my-malware-sample.tar.gz, the stored placeholder file’s name is pending-my-malware-sample.tar.gz.txt. However, the name provided by run_params.placeholders would be my-malware-sample.tar.gz.

Note

For clarity, run_params.placeholders is a list of strings representing each placeholder file’s name. It is not a list of file (attachment) objects. Other attributes of the placeholder file, such as its title, type, or mime_type, are not exposed to the fulfillment feed.

A fulfillment feed run will be provided all placeholder files matching the source of the primary feed regardless of which primary feed run ingested the placeholder file. Therefore, if a provider does not yet have file content for a placeholder file during one fulfillment feed run, it will be tried again in subsequent fulfillment feed runs.

Fulfillment feeds cannot be invoked by other feed types, but fulfillment feeds can invoke supplemental feeds. Fulfillment feeds typically depend on supplemental feeds for downloading a given file’s content. A fulfillment feed’s Filter Chain must yield a dictionary mapping containing attributes of a fulfilled file:

1
2
3
4
5
6
- new:
    name: str
    title: str
    type:
      name: str  # ThreatQ file type (dynamically created if it does not exist)
    content: !expr value.content  # downloaded file content (acceptable value types: aiohttp.StreamReader, str, bytearray, bytes)

Since the placeholder file object is not passed into the fulfillment feed for it to be mutated and then pushed back into ThreatQ, the dictionary mapping yielded from the fulfillment feed’s Filter Chain needs to be associated back to a placeholder file object. This association is done by attempting to match the yielded dictionary mapping’s name or title against the regex ^{{name}}\b, where {{name}} is the original placeholder file’s name.

The yielded dictionary mapping can also contain attributes or tags to add to the fulfilled file object:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
- new:
    # . . .
    # (Required file fields from above snippet)
    # . . .
    tags:
      - name: !expr value.tags  # where `tags` is a list of strings
    attributes:  # or `!expr value.attributes`, where `attributes` is a list of mappings formatted like the following
      - name: Attribute Name 1
        value: !expr value.attribute1
        published_at: !expr value.created_date  # Optional
      - name: Attribute Name 2
        value: !expr value.attribute2
        published_at: !expr value.created_date  # Optional

A Feed Definition can contain at most one fulfillment feed. Fulfilled files maintain the same source as the primary feed that ingested its placeholder.

Fulfillment feeds (and attachments in general) can be difficult to use and work with. It is preferable to link back to the provider an an Attribute Value if the provider hosts the file, whether it be an analysis report, malware sample, sandbox execution result, etc. Creating a Report Threat Object is an advisable alternative to creating an Attachment Threat Object if the file is a report or results hosted by the provider.

Action Definitions
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Formatted Action Name:
  feed_type: action
  invoking_filter:
    ...
  user_fields:
    ...
  source:
    ...
  filters:
    ...
  report:
    ...

If a feed’s type is action, then the feed is not installed as an Integration in the ThreatQ platorm, but instead an “Action” on the Orchestrator page of ThreatQ. An action is not executable by itself, but instead is executed via the ThreatQ Orchestrator.

tq-feed run will execute an action definition and pass it run_params as defined in a run-params-file. The Threat Objects are passed literally, and it is likely that an author would want to follow run_params matching data that would be returned from the ThreatQ API from a threat-collection source.

An author has control over the Writing an invoking_filter, which is the filter that is executed when the action is invoked from the ThreatQ Orchestrator. The invoking_filter is honored literally. Alternatively, an author can skip this and create a config_options dictionary mapping with a supported_objects key and a list of Threat Object types as its value. The config_options route will only create an invoking_filter for Indicator Threat Objects. If additional Threat Object types are desired, the invoking_filter should be written by the Author.

Note

The invoking_filter is not executed when the action is executed via tq-feed run.

Note

An author should take care when writing an invoking_filter as to not disturb the integrity of the data in the primary filter chain of the CDW it is finally inserted into when constructed by the ThreatQ Orchestrator.

An author should assume on writing of an action definition that the Threat Objects passed to it contain all of the data returned for the object from the threat-collection source. For example, as of writing, tq-feed workflow is configured to request only the fields type, source, and value. This became configurable in ThreatQ Version 5.12.1 with the extension of config_options. See Action Configuration Options for more information.

The threat-collection feed source for the primary which an action is finally assembled into is currently configured as such:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
threat-collection:
  collection_hash: <assigned_hash>
  object_collections:
    - indicators
  object_fields_mapping:
    indicators:
      - value
      - type
      - sources
  object_sort_mapping:
    indicators:
      - -updated_at
  chunk_size: 100
  objects_per_run: 123
  since: !expr run_meta.since
  until: !expr run_meta.until

The object_collections, object_fields_mapping, object_contexts_mapping, and object_sort_mapping, are all configurable via specifying config_options in the Action definition.

In ThreatQ Versions 5.12.0 and below, the report of an action mapped its resulting data to a nested dictionary matching the pattern: {<action_name>}_action. This was a slight departure from normal report behavior, and was changed in ThreatQ Version 5.12.1 to match the expected behavior of regular feeds.

For a modern example, see Action Definition.

Action Configuration Options

Beginning in ThreatQ Version 5.12.1, an author may specify a config_options dictionary mapping in an action definition as a means of configuring the Primary Feeds threat-collection source mappings. This modification incurs a consequence that all actions within the final assembled workflow will be executed with the sum of threat-collection context derived from the config_options mappings of all actions in the workflow.

The supported_objects mapping is required if there is no invoking_filter. For Indicators only, the invoking_filter will be created by the ThreatQ Orchestrator if there is no invoking_filter specified. This has limited use and will likely be deprecated in a future release. It is recommended that an author specify an invoking_filter.

An author may declare both an invoking_filter and supported_objects. In this case, the invoking_filter is honored, but the supported objects field is rendered in the UI.

In order for an action to render the supported objects field in the UI, the author must include supported_objects.

The config_options may look like the following:

1
2
3
4
5
6
7
8
feeds:
  ActionFeed:
    feed_type: action
    config_options:
      supported_objects:
        - indicators:
            - FQDN
            - IP Address

The above example will create an invoking_filter which would be inserted into the final workflow as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
- invoke-connector:
    condition: !expr value.0.threatq_object_type in ["indicator"]
    connector:
      iterate: True
      name: TestNoInvokingFilter
      return: value
      run-params: !expr value
      to-stage: publish
    filters:
      - each:
          - drop: !expr value.type not in ['IP Address', 'FQDN']

See Invoke Connector Filter for more specific implementation details.

Starting in ThreatQ Version 5.12.1, the config_options may also be used to configure the threat-collection source mappings of the primary feed. The config_options may look like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
feeds:
  ActionFeed:
    feed_type: action
    config_options:
      supported_objects:
        - adversaries
      object_fields_mapping:
        adversaries:
          - name
          - sources
      object_contexts_mapping:
        adversaries:
          - attributes
      object_sort_mapping:
        adversaries:
          - -updated_at

The above example will create a threat-collection source mapping which would be inserted into the final workflow as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
threat-collection:
  collection_hash: <assigned_hash>
  object_collections:
    - indicators
    - adversaries
  object_fields_mapping:
    indicators:
      - value
      - type
      - sources
    adversaries:
      - name
      - sources
  object_sort_mapping:
    indicators:
      - -updated_at
    adversaries:
      - -updated_at
  chunk_size: 100
  yield_chunk: True
  objects_per_run: 123
  since: !expr run_meta.since
  until: !expr run_meta.until

Note that the new fields are in addition to the existing fields, and not a replacement. This means that the object_collections, object_fields_mapping, object_contexts_mapping, and object_sort_mapping will be merged with the existing fields. By default, Indicators are always included in the source.

Known Limitations

  • Since our executing environments are different in development versus in production, an author must be careful to ensure that the action definition is written in a production-friendly manner. Specifically, the keys contained in: user_fields, template_values, and the name of the Action itself must be unique across all actions in ThreatQ.

  • A Definition Yaml may not contain both Primary and Action definitions.

Note

This feature is in active development, and is subject to change. We intend to resolve many of the above limitations in the near future.

Source Definition

In a Feed Definition, the source is what specifies how to fetch data from a provider. It may, for instance, contain the URL for an API along with authentication information for an endpoint.

See the CDF Sources page for a detailed list of all source types available for use in Feed Definitions complete with comprehensive examples.

Filter Chain

The Filter Chain defines how to manipulate feed data ahead of ingestion into ThreatQ. It follows the operations in the Source Definition, and precedes the Reporting stage.

See the CDF Filters page for a detailed list of all filters available for use in Feed Definitions complete with comprehensive examples.

Reporting

Reporting allows a feed definition writer to define how threat objects are created and related from the data output by the Filter Chain. For more information, see the CDF Reporting page.

Feed CLI Interface

In order to assist with development and installation of definitions, a command-line interface has been built. See the tq-feed page for more information.

CDF Sources

Sources allow a feed definition writer to specify how and where to download feed data from a provider. This section provides in-depth explanations of each source type available for use within a CDF Feed’s Source Definition.

File Source

Overview

The File Source is used to read a specified file from the filesystem. This source is often used for development and debugging, as a CDF writer can avoid making wasteful requests for each feed run. However, some feeds may require parsing files to execute.

If a CDF writer has a source which has unpredicatable intel, a static file can be modified to easily test against edge cases.

Usage
1
2
3
4
source:
  file:
    mode: rb #OPTIONAL - Read in Binary mode
    file: /path/to/source_file.json

Note

When declaring a file path, the path should be absolute.

Examples

Suppose a CDF writer has a shared folder integrations with a json file representing data they want to test against. They can use the static file like so:

1
2
source:
  file: /var/www/integrations/file.json

Suppose a CDF writer has a large file that requires being parsed to execute the Feed, a CDF writer may want to use the file chunking feature, added in 5.4.0, to load the file more efficiently by preserving memory space. They can do so like this, with yield_chunk set to True (default False), chunk_size to be the approximate size of each chunk of data being yielded to the API (default 1000), and record_responses if the file chunks should be recorded (default False).

1
2
3
4
5
6
source:
  file:
    file: /var/www/integrations/file.json
    chunk_size: 9999
    record_responses: False
    yield_chunk: True

There can be some cases where we need not provide a file (simply nothing). For this, we can also provide None or "None" in the value of file key. This simply returns and can be used for saving the memory and also improving the performance of the Feed run.

1
2
source:
    file: None

OR

1
2
source:
    file: "None"

HTTP Source

Overview

The HTTP Source is used to query/poll threat intelligence data from a HTTP provider. This source type can be used with RESTful APIs or plain-text file reading. Basic usage of the HTTP Source is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
source:
  http:
    base_url: https://www.example.com/
    url: feed
    method: GET
    params:
      exampleA: 22
      exampleB:
        - 'A'
        - 'B'
        - 'C'
    data:
      exampleA: 42
      exampleList:
        - 1
        - 2
        - 3
    headers:
      Accepts: application/json
    request_content_type: application/json
    response_content_type: text/plain
    auth:
      ...
    pagination:
      ...
    compress: None
    chunked: 13107200
    expect100: False
    host_ca_certificate: |-
      -----BEGIN CERTIFICATE-----
      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
      CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
      -----END CERTIFICATE-----
    verify_host_ssl: True
    disable_proxies: False
    total_timeout: 120
    status_code_handlers:
      201: ignore
      404:
        fail: "we got a 404 here"
      418: pass
      403: pass_save
      429:
        attempts: 3 # Try a total of 3 times
        delay: 5  # Wait 5 seconds after this status code is received

The following fields are available for use within a HTTP Source, most closely mapping to their respective parameters on aiohttp.ClientSession.request():

  • base_url - Optional, String. When supplied, the url field is appended to base_url in order to calculate the final target URL to poll for feed data.

  • url - Required, String. The target URL to poll for feed data.

  • method - Optional, String. Defaults to GET. The HTTP method for this request. Accepts any HTTP method verb, notably GET, POST, PUT, and DELETE.

  • params - Optional, Mapping. Query string parameters to be sent along with the request. Parameters are expected as a simple mapping. Given a list value, the list is expanded such that there is a query string key/value pair for each item in the value list. The example above, for instance, senda a request to the final URL: https://www.example.com/feed?exampleA=22&exampleB=A&exampleB=B&exampleB=C

  • data - Optional, Mapping. Data to be sent along via the request body. JSON body data is automatically encoded for the request by supplying application/json to the request_content_type field.

  • headers - Optional, Mapping. Simple mapping of header key/value pairs to be sent along with the request.

  • request_content_type - Optional, String. Explicitly sets the Content-Type header of the HTTP request. Passing application/json as a value to this field automatically formats and encodes any data passed to the data field as JSON.

  • response_content_type - Optional, String. Specifies the content type that Dynamo should read responses from this source as. Usually, providers specify the content type of a response appropriately via the Content-Type response Header. Sometimes, however, providers do not specify the correct Content-Type with the response, (ie. specifying Content-Type as text/html instead of text/plain). In these cases, a response_content_type designation is required in order to help Dynamo correctly deserialize data. Currently, HTTP Source supports the following content types:

    • text/plain

    • application/json

    If none of the available content types can decode the response data or if Dynamo cannot determine the appropriate way to deserialize the response data, the response data is treated as binary file content and passed to the Filter Chain as a aiohttp.StreamReader. TODO: See Attachment ingestion.

  • auth - Optional, Mapping. Specifies the authentication object to be used when requesting data from the provider. See HTTP Source Authentication for more details and usage.

  • pagination - Optional, Mapping. Specifies how to paginate large volumes of data returned by a provider. See HTTP Pagination for more details and usage.

  • compress - Optional, Bool. Defaults to None. Specifies whether the request should be compressed. Passing True to this field results in aiohttp compressing the request data.

  • chunked - Optional, Int. Defaults to None. Specifies that the request should be sent as a chunked request, using the supplied Int value as the chunk size.

  • expect100 - Optional, Bool. Defaults to False. Specifies whether the request should expect a HTTP 100 continue response from the provider.

  • host_ca_certificate - Optional, String. Defaults to None. Specifies a base64 PEM encoded CA Certificate Bundle to verify the provider’s SSL certificate against. If not provided, the operating system’s default CA Certificate Bundle is used. Applicable only to https URLs.

  • verify_host_ssl - Optional, Bool. Defaults to True. Specifies whether the provider’s certificate and hostname is verified for each request. Applicable only to https URLs.

  • disable_proxies - Optional, Bool. Defaults to False. Specifies whether configured proxies should be ignored when making HTTP requests. New in version 4.31.0.

  • total_timeout - Optional, Int. Defaults to 119 seconds. Specifies the total timeout of the HTTP request, including connection establishment, request sending, and response reading. New in version 4.36.0.

  • status_code_handlers - Optional, Mapping. Specifies a mapping of HTTP status code to a handler action. By default, status_code_handlers is configured with 204: ignore. To override this behavior, one can specify 204: null Currently, the following handler actions are available:

    • ignore - Denotes that a response of the specified status code should be ignored. Whatever value returned will not be forwarded along to the filter chain and the response will not be logged in a response log file. If specified, pagination will still trigger after a response is ignored. New in version 4.34.0.

    • fail - Denotes that a response of the specified status code should throw an user defined error. New in version 5.0.1.

    • pass - Denotes that a response of the specified status code should be passed along to the filter chain, without saving the response to disk in the feeds activity directory. New in version 5.3.0.

    • pass_save - Denotes that a response of the specified status code should be passed along to the filter chain and responses will be saved to disk in the feeds activity directory. New in version 5.3.0.

    • delay - Denotes that a response of the specified status code should trigger a delay of any further handling of the request for a specified number of seconds. Useful in pagination and in combination with attempts or ignore. New in version 5.19.0.

    • attempts - Denotes that a response of the specified status code should attempt the request up to the speficied number of times. Minimally, any request will make 1 attempt by default. New in version 5.19.0.

    Handler assignments which take numbers as values are intended to be postive, and attempts must be an integer. If invalid values are supplied, an error will be logged, and an attempt will be made to continue through the Feed Run with corrected values.

Examples
Simple HTTP GET

Many data sources exist as just an openly available plain-text file on a web server. Because this is so common, a shortcut to simplify these source definitions is available:

1
2
source:
  http: http://www.example.com/download_file.txt

This definition is the equivalent of:

1
2
3
source:
  http:
    url: http://www.example.com/download_file.txt
HTTP Dynamic URL

Sometimes, a data source may have a different URL for different types of endpoints, so having the url value as an expression may be helpful:

1
2
3
4
source:
  http:
    base_url: http://www.example.com/
    url: !expr 'historic.php' if run_meta.trigger_type == 'manual' else 'current.php'

This references the run_meta object to determine how this Feed Run was triggered. The final URL is http://www.example.com/current.php if the current Feed Run is a normal, scheduled Hourly/Daily run. If the Feed Run was triggered manually by a user, the URL is http://www.example.com/historic.php.

HTTP Pagination

Many APIs have so much information in a query result that it is unwieldy to return it all at once. In order to make this data more digestible, providers employ some form of pagination. HTTP Pagination within CDF has been designed as a largely configurable object nested within the http definition. The following shows a typical HTTP Pagination setup wherein the provider expects the offset and limit query string parameters to paginate the response data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
source:
  http:
    url: http://www.example.com/download_file.php
    pagination:
      template_values:
        offset: 0
        limit: 1000
      condition: !expr prev_request_params.limit == (prev_response_data | length)
      params:
        offset: !expr "(prev_request_params.offset or 0) + 1"
        limit: !expr limit

The following fields are available when writing a HTTP Pagination definition:

  • template_values - Optional, Mapping. Allows for declaration of constant default values for use specifically within the pagination definition. For more information on these see the Template Values section.

  • condition - Required, Expression. The condition that is checked after each request to see if the HTTP source should continue requesting paginated data. As long as condition resolves to True, paginated data continues to be requested. In the example above, condition evaluates to True as long as the length of data from the previous response is equal to the limit value.

    Note

    Since the condition check is run after a request is sent and a response is received, the first request in a pagination supported HTTP source is always sent.

    Note

    When developing a Feed Definition, one can easily disable a HTTP Pagination by setting condition to !expr False.

  • url - Optional, Expression. Allows pagination to dynamically change the URL. When supplied, any base_url supplied to the HTTP Source definition containing the pagination in question is automatically included within HTTP Pagination and prepended to url in the same manner as HTTP Source.

  • params - Optional, Mapping. Query string parameters which pagination dynamically changes before each request by calculating the Jinja2 expressions supplied. The resulting dictionary of query string parameters calculated by pagination are used to update the query string request parameters declared within the HTTP Source definition. Thus, any query string parameters that are set within both the HTTP Source definition and the HTTP Pagination definition will have the values set by the HTTP Pagination definition.

  • headers - Optional, Mapping. Supports the same functionality as the params field, but for request headers.

  • data - Optional, Mapping. Supports the same functionality as the params field, but for request body data.

The following values are available via Jinja2 Expressions within a HTTP Pagination. These values are updated following each request sent/response received by the HTTP Source. Using these values, a Feed Definition writer can transform the request fields necessary for pagination:

  • prev_request_params - Mapping of the query string parameters that were used in the previous request. Note that the provided example looks at the limit value in the condition field.

  • prev_request_headers - Mapping of the headers that were used in the previous request.

  • prev_request_data - Mapping of the body data that was sent in the previous request.

  • prev_response_headers - Mapping of the headers that were sent by the provider with the previous response.

  • prev_response_data - This is the raw data response (pre-filter chain) that came back from the data source. Note that in the provided example this is used in the condition parameter in conjunction with Jinja2’s length filter.

  • page_count - Int count of the number of pages that have been requested by the HTTP Pagination in question. page_count starts at 1 and is incremented by 1 each time a paginated request is made.

HTTP Source Authentication

Overview

Some data sources also require authentication. Due to security, it is not recommended to just plainly include an authentication header alongside the source definition. An auth section has been supplied with the HTTP source configuration in order to encapsulate authentication logic and details.

Most authentication methods supply fields to specify required query string parameters or headers. Ultimately, the system combines all of the headers or parameters supplied prior to sending them to the data provider. Any information declared both inside and outside of the auth section is overwritten by the values found in the auth section.

Simple Authentication

A simple auth declaration simply allows a Feed Definition writer to specify tokens required by a provider for authentication via either headers or query string parameters. The following demonstrates the usage of a simple auth within an HTTP Source definition.

1
2
3
4
5
6
7
8
9
source:
  http:
    url: http://www.example.com/download_file.txt
    auth:
      simple:
        headers:
          X-Auth-Token: !expr "8675309EABCDEF"
        params:
          token: !expr "8675309EABCDEF"
Basic Authentication

A basic auth declaration uses the HTTP BASIC authentication standard to provide a username and password to the server. The following demonstrates the usage of basic auth within a HTTP Source definition.

1
2
3
4
5
6
7
8
source:
  http:
    url: http://www.example.com/download_file.txt
    auth:
      basic:
        username: my_username
        password: my_password
        encoding: utf8 # This is optional and defaults to 'latin1'
Client SSL Authentication

The Client SSL Auth declaration allows one to authenticate with a feed provider using client certificate authentication.

The following fields are available for use within a Client SSL Auth declaration:

  • client_certificate - Required, String. The client certificate in base64 PEM encoding.

  • client_private_key - Optional, String. The private key in base64 PEM encoding for the associated client certificate, if applicable. If the private key is stored in the client certificate string, this field does not need to be used.

  • client_private_key_passphrase - Optional, String. Password for decrypting the client private key.

  • headers - Optional, Mapping. Headers that may also be needed for authentication.

  • params - Optional, Mapping. Query string parameters that may also be needed for authentication.

The following demonstrates the usage of a Client SSL Auth within an HTTP Source definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
source:
  http:
    url: https://www.example.com/download_file.txt
    auth:
      client_ssl:
        client_certificate: |-
          -----BEGIN CERTIFICATE-----
          AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
          BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
          CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
          -----END CERTIFICATE-----
        client_private_key: |-
          -----BEGIN RSA PRIVATE KEY-----
          DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
          EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
          FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
          -----END RSA PRIVATE KEY-----
        client_private_key_passphrase: super secret
        headers:
          X-Auth-Token: 8675309EABCDEF
        params:
          token: 8675309EABCDEF
Multiple Authentication

The Multiple Authentication declaration allows one to aggregate multiple authentication methods in order to authenticate with a feed provider.

The following demonstrates the usage of a Multiple Authentication within an HTTP Source definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
source:
  http:
    url: https://www.example.com/download_file.txt
    auth:
      multi:
        - basic:
            username: my_username
            password: my_password
        - client_ssl:
            client_certificate: |-
              -----BEGIN CERTIFICATE-----
              AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
              BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
              CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
              -----END CERTIFICATE-----
            client_private_key: |-
              -----BEGIN RSA PRIVATE KEY-----
              DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
              EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
              FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
              -----END RSA PRIVATE KEY-----
Token-based Authentication

Token-based Authentication allows a CDF writer to authenticate with services that use standards such as OAuth2 or JWT, in cases where user interaction is not required. For OAuth2, for instance, this means that only the Client Credential Grant and Password Grant flows are supported. The authentication tokens are cached in memory to avoid unnecessary repetitive reauthentication between feed runs and across feeds. Note that a cached token may be accessed by any Feed that references it with a matching token_identifier_set, as long as the token is unexpired - so it is a best practice for a token_identifier_set to uniquely and securely identify a token.

Within the context of a CDF, a Supplemental Feed is used to define how to fetch and parse a response from an authentication endpoint to obtain a token and, optionally, an expiration (which defaults to one hour). The expiration is used to prevent making requests with an invalid token to lighten load for both the client and server alike - the cache automatically reauthenticates with the provider when an expired token is requested by a feed run. After the supplemental feed run is complete, the parent feed may use the token to authenticate its own request (for example, in a header, as shown in the example below) - the token will be available in the Jinja2 context under the token variable, and may be referenced with a Jinja2 Expression or Template within headers or params. An example of this behavior is shown below in the !tmpl under the headers key.

New in version 4.41.0.

The following example demonstrates the usage of a Token-based Authentication within an HTTP Source definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
feeds:
  Intelligence Provider - Bad Guys:
    user_fields:
      - username
      - name: password
        mask: True
    source:
      http:
        url: https://api.intel-data-provider.com/intel/bad-guys
        auth:
          token:
            headers:
              Authorization: !tmpl 'Bearer: {{token}}'
            reauthorize_error_codes:
              - 401
              - 403
              - 500
            token_feed:
              name: Intelligence Provider - Authentication
            token_identifier_set:
              - !expr user_fields.username
              - !expr user_fields.password
              - 'any arbitrary string'
    ...

  Intelligence Provider - Authentication:
    feed_type: supplemental
    source:
      http:
        url: https://api.intel-data-provider.com/auth/token
        params:
          grant_type: client_credentials
        auth:
          basic:
            username: !expr user_fields.username
            password: !expr user_fields.password
    filters:
      - parse-json
      - get: token

The following fields are used under the token key, and are required unless otherwise noted:

  • token_feed - Mapping that denotes the following fields:

    • name - Name of the Supplemental Feed to use to retrieve a token.

    • run-params - Optional, defaults to None. Mapping of run-params to be sent to the token Supplemental Feed. Note that the user field values do not have to be passed as run-params.

  • reauthorize_error_codes - Optional, defaults to [401]. List of HTTP response status codes that will trigger reauthentication with the provider.

  • token_identifier_set - Required. Order-sensitive list of string values used to uniquely identify the fetched authentication token. Typically, these should be dynamically determined based on user fields to avoid collisions.

  • headers - Optional, defaults to None. Set of headers to apply to the main provider request (i.e., the request made by the feed sporting the TokenAuth). Writers will use this to specify how to send along the token they are granted for their request(s).

  • params - Optional, defaults to None. Set of query string params to apply to the main provider request, (i.e., the request made by the feed sporting the TokenAuth). Writers will use this to specify how to send along the token they are granted for their request(s).

The supplemental feed is expected to return either a single str for the access token value, or a mapping containing the following fields:

  • token - Required, a str containing the access token returned by the provider.

  • expires_in - Optional, an int equivalent to the number of seconds for which the token is valid. If this parameter is not provided, the expiration defaults to an hour (3600 seconds).

TAXII Source

Overview

The TAXII Source is used to poll TAXII server for STIX threat intelligence data. Basic usage of the TAXII Source is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
source:
  taxii:
    version: '1.1'
    discovery_url: http://hailataxii.com/taxii-discovery-service
    collection_name: guest.Abuse_ch
    auth:
      taxii1:
        client_certificate: |-
          -----BEGIN CERTIFICATE-----
          AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
          BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
          CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
          -----END CERTIFICATE-----
        client_private_key: |-
          -----BEGIN RSA PRIVATE KEY-----
          DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
          EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
          FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
          -----END RSA PRIVATE KEY-----
        username: user1234
        password: P@ssw0rd
    verify_ssl: True
    host_ca_certificate: |-
      -----BEGIN CERTIFICATE-----
      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
      CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
      -----END CERTIFICATE-----
      -----BEGIN CERTIFICATE-----
      DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
      EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
      -----END CERTIFICATE-----
    disable_proxies: False
    headers:
      header_a: Something
      header_b: Something Else
    poll_url: http://hailataxii.com/abuse-poll
    since: !expr run_meta.since
    until: !expr run_meta.until

The following fields are available for use within a TAXII Source:

  • version - Optional, String. Defaults to 1.1. Specifies the version of the TAXII server being polled. Accepted values include 1.0, 1.1, and 2.0.

  • discovery_url - String. Specifies the URL of the TAXII Server’s discovery service.

  • collection_name - String. Specifies the name of the collection to poll for data.

  • auth - Optional, Mapping. Specifies an authentication object to use when making TAXII requests. Version 1.0 or 1.1 TAXII Servers require a taxii1 authentication object, while version 2.0 TAXII Servers require a basic authentication object. If version is specified as 2.0 and a taxii1 authentication object is supplied, the TAXII Source attempts to automatically generate a basic authentication object from the taxii1 authentication object. While only the username and password fields are available for basic authentication objects, the following fields are available for taxii1 authentication objects:

    • client_certificate - Optional, String. Can be used to specify a Certificate to be passed to the TAXII client. This field is currently only honored for TAXII Server versions 1.0 and 1.1.

    • client_private_key - Optional, String. Can be used to specify a RSA Private Key to be passed to the TAXII client. This field is currently only honored for TAXII Server versions 1.0 and 1.1.

    • username - Optional, String. Can be used to specify a basic authentication username to be passed to the TAXII client.

    • password - Optional, String. Can be used to specify a basic authentication password to be passed to the TAXII client.

  • verify_ssl - Optional, Bool. Defaults to True. Can be used to specify whether the TAXII client should verify the provider’s SSL certificate when making requests. If True, the TAXII client attempts to verify the provider’s certificate against public CAs. However, if host_ca_certificate is provided, this field is assumed as True and the provider’s certificate is instead verified against the certificate passed to host_ca_certificate.

  • host_ca_certificate - Optional, String. Can be used to specify a CA Certificate Bundle to verify the provider’s SSL certificate against. Overrides the verify_ssl field if specified.

  • disable_proxies - Optional, Bool. Specifies whether to disable proxies when performing TAXII requests. Defaults to False.

  • headers - Optional, Dictionary. Specifies additional headers that should be applied to requests sent via the TAXII client.

  • poll_url - Optional, String. Specifies the URL to poll for data. If not specified, the TAXII client attempts to discover the appropriate poll URL via the TAXII Server’s Discovery Service.

  • since - Optional, String. Specifies the start date of the poll period.

  • until - Optional, String. Specifies the end date of the poll period. This field is currently only honored for TAXII Server versions 1.0 and 1.1.

Since many of these fields do not apply to a version 2.0 TAXII Server, a succinct version 2.0 TAXII Server source definition would look like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
source:
  taxii:
    discovery_url: https://cti-taxii.mitre.org/taxii/
    collection_name: Enterprise ATT&CK
    auth:
      basic:
        username: user1234
        password: P@ssw0rd
    verify_ssl: True
    disable_proxies: False
    headers:
      header_a: Something
      header_b: Something Else
    since: !expr run_meta.since
    version: '2.0'

Warning

The version field is expected to be a string. To ensure that YAML does not parse the version as a float, make sure to wrap the version in ''’s.

Many of these fields have a corresponding user_field value that can be passed from the front end STIX/TAXII feed settings:

  • discovery_url - !expr user_fields.discovery_url

  • collection_name - !expr user_fields.collection_name

  • client_certificate - !expr user_fields.certificate

  • client_private_key - !expr user_fields.private_key

  • username - !expr user_fields.username

  • password - !expr user_fields.password

Threat Collection Source

Overview

New in version 4.50.0.

The Threat Collection Source is used to query threat objects from the system’s ThreatQ Threat Library. The following code snippets illustrate several use cases of the Threat Collection Source.

Resolve the provided collection hash for a saved Threat Library Data Collection into an API query and execute the query against the indicators, events, and courses of action object collections:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
source:
  threat-collection:
    collection_hash: c7ca1e1ccc799d9e435d22f64022e594
    chunk_size: 500  # Optional, defaults to 1000
    objects_per_run: 10000  # Optional, default is set by engineering in /configuration
    yield_chunk: True  # Optional, defaults to False
    object_collections:  # Optional, defaults to None (all)
      - indicators
      - events
      - course_of_action
    object_fields_mapping:  # Optional, defaults to None (most fields for any absent object collection keys)
      indicators:
        - description
        - status
        - value
      events:
        - title
        - happened_at
        - type
    object_contexts_mapping:  # Optional, defaults to None (most contexts(*) for any absent object collection keys)
      course_of_action:
        - sources
        - adversaries
        - attributes
    object_sort_mapping:  # Optional, defaults to +id in the ThreatQ API for any absent object collection keys
      indicators:
        - -updated_at
      events:
        - +type
        - +happened_at
    since: !expr run_meta.since  # Optional, defaults to None
    until: !expr run_meta.until  # Optional, defaults to None

# (*) Defining explicit fields to return for an object collection affects the default contexts that are returned.

Execute the provided API query against only the indicators object collection to retrieve all FQDN indicators:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
source:
  threat-collection:
    object_collections:
      - indicators
    api_query:  # Can also be provided as a string
      criteria: {}
      filters:
        +and:
          - +or:
              - type_name: FQDN

Return all threat objects in the ThreatQ instance’s Threat Library (use with caution):

1
2
source:
  threat-collection

Warning

The collection_hash and api_query arguments are mutually exclusive. If both are provided, the source fails to initialize and raises a ValueError.

The following fields are available for use within a Threat Collection Source:

  • collection_hash - Optional, String. A saved ThreatQ Threat Library search’s Data Collection hash. Collection hashes can be obtained from the ThreatQ API endpoint GET /search/query. Similarly, one can easily obtain a collection hash after executing the saved Data Collection in the ThreatQ Threat Library UI and copying the value after the URL’s hash fragment (#) (for example, in the URL https://my.threatq.com/threat-library#c7ca1e1ccc799d9e435d22f64022e594, the collection hash is c7ca1e1ccc799d9e435d22f64022e594). A collection hash is resolved to an API query, which is then provided as the request payload for the set of ThreatQ API endpoints POST /{object_collection}/query. If collection_hash is provided, api_query cannot also be provided; these arguments are mutually exclusive. Only a single collection hash can be provided. If neither collection_hash nor api_query is provided, all objects in the system’s ThreatQ Threat Library are returned (use with caution).

  • api_query - Optional, String or Mapping. A serialized (str) or deserialized (dict) API query. A provided API query does not need to have a matching Data Collection (saved search) entry in the system’s ThreatQ instance. The API query is provided as the request payload for the set of ThreatQ API endpoints POST /{object_collection}/query. If api_query is provided, collection_hash cannot also be provided; these arguments are mutually exclusive. Only a single API query can be provided. If neither collection_hash nor api_query is provided, all objects in the system’s ThreatQ Threat Library are returned (use with caution).

  • chunk_size - Optional, Int. Defaults to 1000. The number of threat objects to return in each paginated response from the set of ThreatQ API endpoints POST /{object_collection}/query.

  • objects_per_run - Optional, Int. Default defined in GET /configuration. The maximum number of threat objects to retrieve from the Threat Library in a given feed run. If a dynamo.cdw.run_limit is not set, the default will fall back to 1 Million. New in version 5.0.1

  • yield_chunk - Optional, Bool. Defaults to False. Specifies whether the source should yield individual threat objects from each paginated chunk (default behavior) or whether the source should yield the entire list of threat objects from each paginated chunk (if provided as True). The latter may be useful in the future if there is a downstream filter that operates on bulk threat objects. Otherwise, yield_chunk being False saves one from needing to use the Iterate Filter at the start of the Filter Chain.

  • object_collections - Optional, List[String]. Defaults to None. A list of object collection names. This list narrows down which object collections the provided (api_query) or resolved (collection_hash) API query is executed against. For example, if one wishes to return all MD5 indicators, then one should provide a list with a single “indicators” element in it since it would not make sense to query against non-indicator object collections. Defaults to querying against all object collection endpoints. Valid object collection names can be obtained from the “collection” field for objects returned from the ThreatQ API endpoint GET /objects. Some examples of valid object collection names: indicators, adversaries, events, attachments, signatures, malware, course_of_action, tool, etc.

  • object_fields_mapping - Optional, Mapping[String, List[String]]. Defaults to None. A mapping of object collection names to a list of field names to include within threat objects returned from executing the query. By default, most fields are returned. In this sense, “most” is defined as the fields that are needed by the Threat Library UI. Due to the volume of data that could be returned, it is recommended to provide only the fields that are needed for each object collection that is being queried against. Fields are typically scalar properties of a threat object. For example: value, type, happened_at, published_at, description, etc. Since fields may only be relevant depending on the given object collection, the list of fields must be keyed by an object collection name. Providing a list of fields to include does prevent all the object contexts from being included by default, so to be sure to also configure object_contexts_mapping as necessary. By default, fields id and threatq_object_type (added by the source) are always returned.

  • object_contexts_mapping - Optional, Mapping[String, List[String]]. Defaults to None. A mapping of object collection names to a list of context names to include within threat objects returned from executing the query. By default, most contexts are returned. Due to the volume of data that could be returned, it is recommended to provide only the contexts that are needed for each object collection that is being queried against. Contexts are typically complex objects containing their own scalar fields. For example: sources, adversaries, attributes, etc. Note: adversaries are the only object relation context supported by the ThreatQ API today. Some complex objects, such as type, may be provided as a field in object_fields_mapping, in which case it will be returned in its simplest representation, such as the string name for type instead of a mapping that contains name, class, and id keys. Since contexts may only be relevant depending on the given object collection, the list of contexts must be keyed by an object collection name. If a list of field names is given for an object collection in object_fields_mapping, then explicit context names will need to be provided for that object collection in object_contexts_mapping in order for any contexts to be returned.

  • object_sort_mapping - Optional, Mapping[String, List[String]]. Defaults to None. A mapping of object collection names to a list of field names prefixed with + (ascending order) or - (descending order). The default sort order is by ascending IDs (+id). If multiple sort fields are provided, the sorting is applied for the first sort field, and if the sorting results in a conflict (for example, the objects have the same type name), the second sort field is applied, and so on. Since fields may only be relevant depending on the given object collection, the list of sort fields must be keyed by an object collection name.

  • since - Optional, String. Defaults to None. If provided, a threat object’s last touched datetime must be greater than or equal to the provided value.

  • until - Optional, String. Defaults to None. If provided, a threat object’s last touched datetime must be less than or equal to the provided value.

Note

Reporting on an object that was fetched via the Threat Collection Source will cause the object’s last touched timestamp to be updated based on the ThreatQ API’s current implementation. The following feed definition snippet is a minimal configuration that can cause this behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
source:
  threat-collection:
    collection_hash: c7ca1e1ccc799d9e435d22f64022e594
    object_collections:
      - indicators
    since: !expr run_meta.since
    until: !expr run_meta.until
filters:
  - new:
      type: !expr value.type.name
      value: !expr value.value
report:
  indicator-sets:
    default:
      items: !expr data

This will cause an infinite loop of fetching and reporting on the same data every feed run. As a result, until a proper caching mechanism is implemented in the future, it is advised to use this source only when pushing threat data to another service/tool or for fetching new threat data in a supplemental feed that is not reported as being related to the original fetched object.

Proposed Sources

The following sources have been proposed for CDF support and will be considered for future releases:

  • IMAP

    • Exchange

  • SSH

  • FTP

  • SMB

  • SOAP

CDF Filters

Filters enable a feed definition writer to manipulate data ingested from a provider within the source section of a definition file. This section provides in-depth explanations of each filter available for use within a CDF Feed Definition’s Filter Chain.

ACE Filter

Overview

The Ace parser filters, also known as ACE (Automated Contextualization Engine), automatically extracts Threat Intelligence from unstructured data, using NLP and regular expressions.

There are parser filters are as follows: (formly known as / deprecated)

No arguments:

  • ace-adversaries (parse-adversaries)

  • ace-attack-patterns (parse-attack-patterns)

  • ace-malware (parse-malware)

  • ace-tags (parse-tags)

Additional arguments:

  • ace-attributes (parse-attributes)

  • ace-indicators (parse-indicators)

Added in version 5.19.0

Usage
1
2
filters:
  - ace-malware

The first four parser filters mentioned do not take any arguments. When parsing attributes or indicators, they use an additional argument respectively: attribute_names and types.

1
2
3
filters:
  - ace-attributes:
      attribute_names: !expr user_fields.parsed_attribute_names
1
2
3
filters:
  - ace-indicators:
      types: !expr user_fields.parsed_ioc_types

The following ThreatQ indicator types are accepted:

  • ‘md5’

  • ‘sha1’

  • ‘sha256’

  • ‘sha384’

  • ‘sha512’

  • ‘cidr’

  • ‘url’

  • ‘domain’

  • ‘email’

  • ‘ip’

  • ‘cve’

  • ‘filename’

  • ‘filepath’

Incoming Value

A string value, which may be some long unstructured text or HTML.

Transform Result

A list of results

Examples

Below is an example of turning arbitrary data into a string and parsing out any relevant results

Filter Chain

1
2
filters:
  - ace-malware

Input

1
2
3
"<id='canadian-home-goods-retailer-acknowledges-darkside-ransomware-incident'>"
"Canadian Home-Goods Retailer Acknowledges Darkside Ransomware Incident"
"</h3> Recent CrowdStrike Intelligence Reporting on Darkside Ransomware"

Output

1
2
3
4
5
[
    {
        "value": "Darkside"
    }
]

API Filter

Overview

The Api Filter is used to make calls to the TQ API from within the Filter Chain. Not unlike the HTTP Source, the API Filter allows one to specify the target URL, HTTP method, request headers, query string parameters, and body data.

Usage
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
filters:
  - api:
      url: !tmpl 'attachments/{value.id}/attributes'
      method: GET                                    # Optional, defaults to 'GET'
      params:                                        # Optional, defaults to None
        param_a: !expr value.value_a
        param_b: !expr value.value_b
      headers:                                       # Optional, defaults to None
        header_a: !expr value.value_a
        header_b: !expr value.value_b
      data:                                          # Optional, defaults to None
        body_data: !expr value.data
      response_content_type: 'json'
      filters:
        - get: data

By default response_content_type is ‘json’, because in the most cases, the API will respond with a JSON-encoded string, though there are exceptions, e.g. csv or pdf. Regardless of the actual data response type, response_content_type will be honored and will be parsed into the appropriate native Python object for the filter chain.

Supported values:

  • ‘json’ will be parsed as a python JSON object.

  • ‘text’ will be parsed as a python utf-8 encoded string.

  • ‘bytes’ will be parsed as a python byte string.

Any other value will be parsed as a python byte string as a failsafe.

Incoming Value

This can be any value: this value is only used in order to fulfill any Jinja2 expressions or templates that were possibly used within this filter’s arguments.

Transform Result

API response data that has been formatted as per the filter chain supplied via the filters argument.

Examples

There are a few caveats that one should be aware of when utilizing the API Filter.

Since the API Filter returns formatted data from the TQ API that does not including the incoming value, this means the value traversing the Filter Chain is entirely replaced with response data following the API Filter. In order to avoid overwriting the value in flight, one should leverage a syntax similar to the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
filters:
  - parse-json
  - get: data
  - set:
      holder_var: !expr value.target_data_for_api_request
  - filter-mapping:
      holder_var:
        api:
            url: attachments
            method: GET
            params:
              name: !expr value  # Here, value is whatever was set into ``holder_var``
            filters:
              - get: data
              ...  # Further transformations on the API response data

Using this kind of setup, (setting a holder variable and then run it through the API Filter via a Filter Mapping), you can maintain the original value coming through the filter chain while isolating the API response within value.holder_var.

Further, when leveraging the API Filter, one should be wary of API load and consider when exactly in the filter chain the API request should be made. For example:

1
2
3
4
5
6
7
8
9
filters:
  - parse-json
  - get: data
  - iterate
  - api:
      url: attachments
      method: GET
      filters:
        ...

The example above results in a call being made to the TQ API for each object being iterated due to the fact that the API Filter is called after an Iterate Filter filter. This can balloon quickly if a large enough data set is being iterated over and can effectively DDoS the TQ API. When possible, one should map multiple lookup values into a single key that can then be used to query the TQ API only once for the requisite data.

Below is an example of how to specify a response content type when downloading data from the API utilizing the response_content_type parameter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filters:
  - parse-json
  - get: data
  - iterate
  - api:
      url: attachments/{{ value.id }}/download
      method: GET
      response_content_type: bytes
      filters:
        ...

Chain Filter

Overview

The Chain Filter is used to combine numerous filters into one filter that process each given filter in turn. The Chain Filter is useful in cases where other filters require exactly one sub-filter but multiple filters are desired (e.g., the Filter Mapping/ Each filter construct).

Usage
1
2
3
4
5
6
filters:
  - chain:  # Chain accepts a list of sub-filters
    - get: !expr value.created
    - timestamp
    - new:
        published_at: !expr value
Incoming Value

This can be any value. This value is passed into the sub-filter chain and processed as per normal Filter Chain conventions.

Transform Result

The value having been transformed by each sub-filter specified under the chain Filter.

Examples

The most common use case for the Chain Filter occurs within the Filter Mapping/Each filter construct. Since the Each Filter only accepts exactly one sub-filter, one needs to wrap multiple sub-filters within a Chain Filter in order to perform more complex formatting within an Each Filter. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filters:
  - filter-mapping:
      created_at: timestamp
      attribution:
        each:      # The Each Filter accepts exactly one sub-filter, so we use a chain
          - chain:
            - drop: !expr not value.name
            - new:
                name: Some Attribute
                value: !expr value.name

Note

As illustrated in the above example, the Chain Filter must be specified as a list element of the Each Filter as the Chain Filter requires arguments (the list of sub-filters to apply). If the Chain Filter were not provided as a list element, an “unexpected keyword argument” error is raised.

Common Pitfalls

If a single sub-filter is desired for the Filter Mapping/Each filter construct, it is ill-advised to use the Chain Filter.

The following illustrates a filter chain that makes use of two unnecessary Chain Filters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
filters:
  - new:
      urls:
        - http://example.com
        - ftp://example.org
  - filter-mapping:
      urls:
        chain:
          - each:
            - chain:
              - drop: !expr not value.startswith('http')

The above should be rewritten as:

1
2
3
4
5
6
7
8
9
filters:
  - new:
      urls:
        - http://example.com
        - ftp://example.org
  - filter-mapping:
      urls:
        each:
          - drop: !expr not value.startswith('http')

Debugging Filters

Log Filter
Overview

The Log Filter is a debugging filter that logs out the current value being processed in the filter chain. This filter can be leveraged during CDF development in order to introspect exactly what a given piece of data looks like at any point as it traverses the Filter Chain.

Usage
1
2
3
4
5
6
7
8
filters:
  - log:
      condition: !expr not value   # Optional, defaults to None
      level: CRITICAL              # Optional, defaults to DEBUG
      include:                     # Optional, defaults to value
        - value
        - parent_values
        - condition
  • If a condition is supplied, the current value is only logged out if the condition resolves to True.

  • The log level can be configured with any of the following values:

    • CRITICAL

    • WARNING

    • INFO

    • DEBUG

    • An integer between 1 and 9. In this case, more verbose logging than DEBUG is used; with level 1 being the most verbose.

Incoming Value

Any value to be logged.

Transform Result

The incoming value unchanged.

Examples

The Log Filter can be inserted at any point in the Filter Chain in order to log out what the current value is, allowing a CDF writer to debug various filter errors by introspecting what the value being filtered looks like before and after a given filter. For example:

1
2
3
4
5
6
7
filters:
  - log:
      level: CRITICAL
  - set:
      some_key: !expr value.some_value
  - log:
      level: INFO

With a setup like this, a CDF writer could see the current value in the Filter Chain logged at CRITICAL before the Set Filter runs and again at INFO after the Set Filter has completed.

Delay Filter
Overview

The Delay Filter allows a CDF writer to introduce an artificial delay into the Filter Chain. The Delay Filter can be useful for:

  • testing Dynamo’s stability during a possibly long-running Feed operation

  • attempting to deter a provider’s rate-limiting by artificially slowing down processing in the Filter Chain and thus causing back pressure in the Feed Run Pipeline (best that can be done until first-class rate-limiting support is implemented)

  • waiting some amount of time before making a subsequent request using Supplemental Feeds to allow the provider more time to publish content

Usage
1
2
filters:
  - delay: 2

The delay argument denotes the time to sleep in seconds. If not supplied, the default sleep time is 1 second.

Incoming Value

Any value.

Transform Result

The incoming value unchanged.

Decode Binary Filter

Overview

The DecodeBinary Filter decodes string values into bytes using one of the available decoders:

  • base16

  • base32

  • base64

  • base85/ascii85

Usage
1
2
3
filters:
  - decode-binary:
      encoding: base64

Or:

1
2
filters:
  - decode-binary: base64

The encoding argument represents the decoder used for decoding the string value.

Incoming Value

Encoded string value.

Transform Result

Decoded bytes value.

Examples
  • base16: '666F6F626172' -> b'foobar'

  • base32: 'MZXW6YTBOI======' -> b'foobar'

  • base64: 'Zm9vYmFy' -> b'foobar'

  • base85/ascii85: 'W^Zp|VR8' -> b'foobar'

Examples

Input File Contents

1
666F6F626172

Filter Chain:

1
2
3
filters:
  - decode-binary: base16
  - str #needed in this example, but removes the ``b`` as a result

Output:

1
2
3
[
    "foobar"
]

Dedupe Filter

Overview

The Deduplicate (Dedupe) Filter allows a CDF writer to deduplicate a given data list. Python’s built-in set functionality is used to deduplicate the incoming list before the value is returned back to the Filter Chain once again as a list.

Usage
1
2
filters:
  - dedupe
Incoming Value

A list of values.

Warning

As per standard Python conventions, dictionaries and lists of dictionaries cannot be ordered or contained within a set. Accordingly, a dictionary or list of dictionaries raises a TypeError when passed to the Dedupe Filter.

Transform Result

The list of values deduplicated.

Examples

Suppose a CDF writer is attempting to deduplicate a list of IOC’s:

Input File Contents

1
2
3
4
5
6
7
8
2.2.2.2
2.2.2.2
duckduckgo.com
duckduckgo.com
google.net
google.net
facebook.pro
facebook.com

Filter Chain

1
2
3
filters:
  - split-lines
  - dedupe

Output

1
2
3
4
5
6
7
8
9
[
  [
      "duckduckgo.com",
      "google.net",
      "2.2.2.2",
      "facebook.pro",
      "facebook.com"
  ]
]

Suppose a CDF writer is attempting to use a Supplemental Feed to poll a provider for enrichment data. To get this data, a list of IOC names is required to be sent to the provider as body data. The Dedupe Filter here can be used to ensure that a list of only unique names is sent to the provider:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filters:
  # Assume value.names is a list of IoC strings needed for the enrichment request
  - filter-mapping:
      names: dedupe
  - set:
      enrichment_results:
        feed:
          name: Enrichment Supplemental Feed
          run-params:
            target_names: !expr value.names

Here, the list of IoC string names held by value.name will be unique when it is sent as a run-param to the Enrichment Supplemental Feed.

Download Filter

Overview

The Download Filter is used to write the incoming data into an AsyncTemporaryFile, providing this AsyncTemporaryFile to the next filter in the chain. This filter should be used to avoid loading a large data set (oftentimes returned by a provider) into memory.

New in version 4.37.0.

Usage
1
2
3
filters:
  - download:
      named: True # optional, defaults to False

By setting the named argument to True the temporary file is guaranteed to have a visible name in the file system. Prefer using unnamed temporary files, they are more secure. Named temporary files should only be used if the subsequent filter needs a named temporary file

Incoming Value

A StreamReader, bytes or str.

Transform Result

An AsyncTemporaryFile handler.

Examples

The following filter chain could be used in order to store a large JSON response to a temporary file and then iterate each item in the JSON array.

1
2
3
4
filters:
  - download
  - iterate-json-file
  ...

Drop Filter

Overview

The Drop Filter is used to remove unnecessary, problematic, or otherwise unneeded data from Filter Chain processing.

Usage

Typical usage involves specifying the drop filter with a single condition argument, e.g.:

1
2
filters:
  - drop: !expr not value

If a condition is supplied and resolves as “truthy,” the value in question is dropped from the Filter Chain, effectively ending processing of the current iteration. The Drop Filter also provides keyword arguments similar to the Log Filter:

1
2
3
4
5
6
7
8
filters:
  - drop:
      condition: !expr not value  # Optional, defaults to None
      level: DEBUG                # Optional, defaults to 5
      include:                    # Optional, defaults to just ``value``
        - value
        - parent_values
        - condition
Incoming Value

This can be any value.

Transform Result

If the Drop Filter’s condition is met, the value is dropped from Filter Chain processing. This is achieved by simply not returning the value out of the Drop Filter’s transform method.

Examples

The most common use case for the Drop Filter is dropping out formatting or commented lines from some CSV-encoded feed. Assuming that the provider returns some CSV data like so:

Input File Contents

1
2
3
4
5
############  Some Header Text  ############
############created_at, ip, hash############

2019-01-01,10.0.0.1,a3ah84lfj03kj8vn
2019-02-01,172.0.0.1,3n7fj3jlakn3fj8d

A CDF writer could leverage the Drop Filter in order to remove the commented and blank lines from Filter Chain processing, eg:

Filter Chain

1
2
3
4
filters:
  - split-lines
  - iterate
  - drop: !expr not value or value.startswith('#')

Output

1
2
3
4
[
  "2019-01-01,10.0.0.1,a3ah84lfj03kj8vn",
  "2019-02-01,172.0.0.1,3n7fj3jlakn3fj8d"
]

Each Filter

Overview

The Each filter allows a CDF writer to apply filters to each item within a list value.

Note

For applying filters to each value on an incoming dictionary value, see Each Value Filter

Usage
1
2
filters:
  - each: <SubFilter>

The Each Filter is configured with exactly one sub-filter that is applied to each item in the list value. When leveraging a sub-filter that requires its own field arguments, one must specify the sub-filter as a list element of the Each Filter:

1
2
3
4
5
filters:
  - each:
      - replace:
          old: target
          new: replacement

While the Each Filter takes only a single sub-filter, the Chain Filter can be leveraged to apply multiple sub-filters to each value:

1
2
3
4
5
6
7
filters:
  - each:
    - chain:
      - drop: !expr value.startswith('#')
      - new:
          type: IP Address
          value: !expr value

Note

As illustrated in the above example, the Chain Filter must be specified as a list element of the Each Filter as the Chain Filter requires arguments (the list of sub-filters to apply). If the Chain Filter were not provided as a list element, an “unexpected keyword argument” error is raised.

Incoming Value

A list value.

Transform Result

The list value with each item in it transformed as per the configured sub-filter. If any transformations were to result in a None value, that None is dropped from the list value.

Examples

The Each Filter is commonly used within a Filter Mapping Filter to format a key/value pair in an incoming value dictionary whose value is a list. For example:

Filter Chain

1
2
3
4
5
6
7
8
filters:
  - new:
      aliases:
        - apt 28
        - large alligator
  - filter-mapping:
      aliases:
        each: title

Output

1
2
3
4
5
6
7
8
[
    {
        "aliases": [
            "Apt 28",
            "Large Alligator"
        ]
    }
]

While one could, in theory, format the initial list of data entering the filter chain from the source all within a single Each Filter, this is generally not advisable. One should leverage an Iterate Filter instead so that only a single line of data need be reported at a time - increasing feed efficiency.

Warning

As the Each Filter applies a simple, single value for loop to the incoming value, it can be applied to a dictionary incoming value; however, only the dictionary’s keys are transformed and the values are lost. This is generally ill-advised.

Each Value Filter

Overview

The EachValue Filter functions in the same manner as the Each Filter, but applies its specified sub-filter to each value on an incoming dictionary value.

Usage
1
2
filters:
  - each-value: <SubFilter>

The Each-Value filter leverages the same setup syntax as the Each Filter. Please see the Usage there for leveraging filters requiring configuration or running multiple filters at once.

Incoming Value

A dictionary value.

Transform Result

The same dictionary value with the specified sub-filter applied to each value in the dictionary. The dictionary’s existing keys remain unchanged and are linked to the same, albeit, transformed value.

Examples

The Each-Value filter is useful in cases where a provider serves a dictionary structure whose keys cannot be readily known, yet have similar values. For example:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
filters:
  - new:
      Event_A:
        seen_at: 2019-02-16 10:00:00
        alert_level: critical
        tags: ['DDoS', 'Attack']
      Event_B:
        seen_at: 2019-02-28 10:00:00
        alert_level: info
        tags: ['Spam']
  - each-value:
    - filter-mapping:
        seen_at: timestamp
        alert_level: title
        tags:
          each: upper

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
    "Event_A": {
        "alert_level": "Critical",
        "seen_at": "2019-02-16 10:00:00-00:00",
        "tags": [
            "DDOS",
            "ATTACK"
        ]
    },
    "Event_B": {
        "alert_level": "Info",
        "seen_at": "2019-02-20 10:00:00-00:00",
        "tags": [
            "SPAM"
        ]
    }
}

Enumerate Filter

Overview

The Enumerate Filter allows a CDF writer to yield each member of a mapping or list as individual [key, value] pairs. For lists, the key is the position of the value within the list, starting at zero. Order is preserved for lists, but this is not true for mappings, which are unordered by nature.

Note

In addition to enumerate, this filter is available under the aliases items and mapping-pairs in an effort to facilitate recognizability by different audiences.

Note

The enumerate filter cannot be called from within a Filter-Mapping Filter. Refer to List-Items Filter for modifying dictionary mapping values from a Filter-Mapping Filter.

Usage
1
2
filters:
  - enumerate
Incoming Value

A mapping or list value.

Transform Result

[key, value] pairs, each pair yielded separately, one for each member of the input.

Examples

Suppose a feed provider’s output data is a mapping where the value data structures are all of the same form, but the key for each denotes a category (that might not even be known ahead of time). The CDF writer would like to simplify the remainder of the filter chain by orienting it around the structure of those values.

Input File Contents

1
2
3
4
{
    "category_one": {"domain": "example_one.com"},
    "category_two": {"domain": "example_two.com"}
}

Filter Chain

1
2
filters:
    - enumerate  # or "items" for familiarity to Python programmers

Output

The results yielded (separately) from this filter chain:

1
2
["category_one", {"domain": "example_one.com"}]
["category_two", {"domain": "example_two.com"}]

Thus, consistent data structures are being yielded for each member (with the category still referencable if needed). If the input is a list instead of a mapping, the results are similar:

Input File Contents

1
2
3
4
[
    {"domain": "example_one.com"},
    {"domain": "example_two.com"}
]

Output

The results yielded (separately) from this filter chain:

1
2
[0, {"domain": "example_one.com"}]
[1, {"domain": "example_two.com"}]

Fail Filter

Overview

The Fail Filter allows a CDF writer to throw an user defined error with their own message.

Usage
1
2
filters:
  - fail: "this is a user error"
Incoming Value

A mapping or list value.

Transform Result

Based on the [key, value] pair, the value is the message of the error.

Examples

The Fail Filter is useful in situations where conditional, complex transform logic is required on some value and the user wishes to exit if the logic is true. Using the If Filter, a CDF writer can purposefully target and exit, for instance:

1
2
3
4
5
filters:
  - if:
      condition: !expr value.name == "John"
      filters:
        - fail: "I have encountered John. Exit now."

Filter Mapping Filter

Overview

The FilterMapping Filter allows a CDF writer to easily format a dictionary mapping of values by accepting a configuration mapping the target data field on the incoming value to some sub-filter chain. The Filter Mapping Filter is invaluable to a CDF writer as it allows for complex transforms on data embedded within some dictionary structure.

Usage
1
2
3
4
5
6
7
filters:
  - filter-mapping:
      value_a: title
      value_b: timestamp
      value_c:
        each:
          - upper

The Filter Mapping Filter allows one to apply specified sub-filters to targeted key fields on some incoming dictionary value.

While the Filter Mapping Filter allows for only one sub-filter per configured field, said sub-filter may be a Chain Filter, allowing a CDF writer to apply any number of sub-filters to a target field.

Incoming Value

Any dictionary mapping value.

Transform Result

The dictionary mapping value transforms as per the Filter Mapping configuration. Each field on the incoming value is transformed by the matching sub-filter chain configured in the Filter Mapping Filter.

Examples

The Filter Mapping Filter is often used in conjunction with the Each Filter in order to format each object in some iterable field value, ie:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
filters:
  - new:
      created: 2018-01-01T12:00:00
      name: fiesty_binturong
      motivations:
        - infiltration
        - ransom
        - command and control
  - filter-mapping:
      created: timestamp
      name:
        chain:
          - replace:
              old: '_'
              new: ' '
          - title
      motivations:
        each:
          - chain:
            - title
            - new:
                name: Motivation
                value: !expr value

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
    "created": "2018-01-01 12:00:00-00:00",
    "motivations": [
        {
            "name": "Motivation",
            "value": "Infiltration"
        },
        {
            "name": "Motivation",
            "value": "Ransom"
        },
        {
            "name": "Motivation",
            "value": "Command And Control"
        }
    ],
    "name": "Fiesty Binturong"
}

As one may have noticed from the previous example, a CDF writer must be aware of changing the Filter Chain’s context when transforming inside of a Filter Mapping Filter. Looking at the sub-filter transforms for motivations, when the Each Filter is stepped into, the context for Jinja2 expressions changes from the entire dictionary mapping value to each individual value in the motivations key. Thus, when the New Filter executes at the end of motivations’ sub-filter chain, the Jinja2 expression to pull out the current item being considered is simply !expr value, not !expr value.some_key.

Filter-Mapping Field Filter Must Yield One Value

When writing a Filter-Mapping, one can run into the following error:

Error applying filter FilterMapping(...) to value {...}: TypeError('Field filter apply() must yield exactly 1 value ancestry',)

This occurs when a Filter-Mapping Filter’s filter chain for a given key results in anything other than 1 item. The Filter-Mapping Filter is designed to update in place the value of some target key / value pair. As such, the Filter-Mapping Filter expects exactly one value result - be that a simple scalar, a single list, or a single dictionary.

This error is generally raised when one uses an Iterate Filter in a Filter-Mapping like so:

1
2
3
4
5
6
7
8
9
filters:
  - new:
      source: "# Comment\nSome Value\n# Comment"
  - filter-mapping:
      source:
        chain:
          - split-lines
          - iterate
          - drop: !expr "value.startswith('#')"

This Filter-Mapping can hit the Filter Must Yield One Value error in two ways:

  • If all values are dropped from source after the Split-Lines Filter, or

  • If source results in more than 1 value

Instead of using the Iterate Filter, one should use the Each Filter like so:

1
2
3
4
5
6
7
8
9
filters:
  - new:
      source: "# Comment\nSome Value\n# Comment"
  - filter-mapping:
      source:
        chain:
          - split-lines
          - each:
              - drop: !expr "value.startswith('#')"

The Each Filter allows in line looping over the source list so that items can be conditionally dropped. Even if all items were dropped for the source list, since the Each Filter loops in place the Filter-Mapping Filter will still receive a single result for source - an empty list.

Flatten Filter

Overview

The Flatten Filter allows a CDF writer to flatten a list of nested lists or iterables into a single, flat list.

Usage
1
2
3
filters:
  - flatten:
      depth: 1  # Optional, defaults to infinite

The depth argument limits how deep the flattening should go. For instance, a depth of 1 specified when flattening the list [1, 2, [3], [4, [5]]] results in [1, 2, 3, 4, [5]]. The depth argument defaults to infinite, meaning that the input list is completely flattened. The depth argument can also be specified with the shorthand argument syntax:

1
2
filters:
  - flatten: 1
Incoming Value

A list value.

Transform Result

The same list flattened to the specified depth.

Examples

Suppose a writer may encounter a situation wherein it is easier to combine multiple lists of similar, yet distinct data into a single list to apply filters on rather than repeating the same logic numerous times. In cases like this, when lists may contain sub-lists of any length (including 0), the Flatten Filter becomes essential. For example, the writer wants to combine all of these attribute dictionaries together into a single list, and then drop any that do not have a value:

Input File Contents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
    "attributes": [
        [
            {"Item1":"Items Thing"},
            {"Item2":"Items Thing"}
        ],
        [
            {"Item3":"Items Thing"},
            {"Item4":"Items Thing"}
        ]
    ]
}

Filter Chain

1
2
3
4
filters:
  - parse-json
  - get: attributes
  - flatten

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[
    {
        "Item1": "Items Thing"
    },
    {
        "Item2": "Items Thing"
    },
    {
        "Item3": "Items Thing"
    },
    {
        "Item4": "Items Thing"
    }
]

Get Filter

Overview

The Get Filter allows CDF writers to get a specified element, attribute, or value out of a list, object, or dictionary value, respectively.

Usage

The Get Filter can be used with dictionary or object incoming values to get the value associated with the specified member argument:

1
2
filters:
  - get: key_or_attribute_name

The Get Filter can also be used to get particular elements out of a list, in which case one would specify an integer index instead of a string key:

1
2
filters:
  - get: 0

Further, one can specify a default value that would be returned if the target key or list index was not found:

1
2
3
4
filters:
  - get:
      member: key_or_attribute_name
      default: !expr '[]'

Warning

If a default is not supplied and the member target is not found, an error corresponding to the incoming value’s data type is raised: IndexError, KeyError, or AttributeError.

Warning

One cannot use a Jinja2 Expression or Template as a value for the Get Filter’s member parameter. One should supply a simple string or integer corresponding to the target key, attribute, or list index instead. One can supply a Jinja2 Expression to the default parameter, however.

Incoming Value

Any list, object, or dictionary value.

Transform Result

The value contained in the target member.

Examples

When used with a dictionary:

Input File Contents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    "adversary": [
            {"key1":"value1"},
            {"key2":"value2"}
    ],
    "something else":[
            {"key3":"value3"},
            {"key4":"value4"}
    ]
}

Filter Chain

1
2
3
filters:
    - parse-json
    - get: adversary

Output

1
2
3
4
5
6
7
8
[
    {
        "key1": "value1"
    },
    {
        "key2": "value2"
    }
]

When used with a list:

Input File Contents

1
2
3
4
5
[
    "Item 1",
    "Item 2",
    "Item 3"
]

Filter Chain

1
2
3
filters:
    - parse-json
    - get: 1 #Index starts at Zero

Output

1
2
3
[
    "Item 2"
]

Suppose that some JSON data being filtered includes an attribution dictionary on each object that contains a number of key/value pairs, but the CDF writer is only interested in the value contained under an adversary key. To explicitly select only the adversary key as the value for attribution, the CDF writer can leverage a Filter-Mapping Filter along with a get filter like so:

1
2
3
4
5
6
filters:
  - parse-json
  - iterate
  - filter-mapping:
      attribution:
        - get: adversary

Gunzip Filter

Overview

The Gunzip filter allows a CDF writer to decompress a gzip stream (extension .gz; not to be confused with .zip) into a bytearray.

Usage
1
2
3
filters:
  - gunzip:
      chunk_size: 65536  # optional, defaults to 1000000

The filter may take an optional chunk_size argument for the number of bytes read and decompressed at one time.

Note

The chunk_size parameter is not necessary, and in most cases, the default value is appropriate.

Incoming Value

A gzip compressed stream, either as a StreamReader or bytes.

Transform Result

The uncompressed data as a bytearray.

Examples

Input File Contents

A base64-encoded gzipped XML file:

Filter Chain

1
2
3
4
5
6
filters:
  - decode-binary: base64
  - gunzip
  - str # cast uncompressed the bytearray to a string
  - parse-xml
  ...

Output

A decoded and uncompressed bytearray which is cast to a string before being parsed as XML.

If Filter

Overview

The If Filter allows a CDF writer to conditionally apply a specified sub-filter chain if a specified condition is met.

Usage
1
2
3
4
5
6
filters:
  - if:
      condition: !expr value.type == Actor
      filters:
        - drop: !expr value.name != Hypno Toad
        ...

For the sub-filter chain specified by filters to be applied to the incoming value, the specified condition must evaluate to true. Conditions are usually specified via some Jinja2 Expression, though standard Python truthiness checks are applied to any value passed to the condition field.

Incoming Value

This can be any value.

Transform Result

The incoming value having been transformed by the sub-filters specified via filters if condition evaluates to True.

Examples

The If Filter is useful in situations where conditional, complex transform logic is required on some value. Using the If Filter, a CDF writer can purposefully target and transform deliberate subsets of data, for instance:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
filters:
  - parse-json
  - get: Data
  - iterate
  - if:
      condition: !expr value.type == 'Actor'
      filters:
        - filter-mapping:
            created: timestamp
            name: title
            motivations:
              each:
                - new:
                    name: Motivation
                    value: !expr value

Here, assuming the incoming JSON data has some list of dictionaries under a key called Data, the specified Filter Mapping transforms are only applied to objects whose type is Actor.

While there is not currently an else equivalent for the If Filter, one can achieve the same effect by grouping multiple If Filters together. Extending on the prior example:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
filters:
  - parse-json
  - get: Data
  - iterate
  - if:
      condition: !expr value.type == 'Actor'
      filters:
        - filter-mapping:
            created: timestamp
            name: title
            motivations:
              each:
                - new:
                    name: Motivation
                    value: !expr value
  - if:
      condition: !expr value.type == 'Event'
      filters:
        - filter-mapping:
            occurred: timestamp
            created: timestamp
            targets:
              each:
                - new:
                    name: Target
                    value: !expr value

Another example:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
filters:
  - new:
      contents: [1, 2, 3, 4, 5, 6]
  - get: contents
  - iterate
  - new:
      num: !expr value
      text: !tmpl 'This is Mambo Number {{value}}'
  - if:
      condition: !expr value.num != 5
      filters:
        - filter-mapping:
            text:
              chain:
                - new:
                    text: 'This is not Mambo Number 5!'
                - get: text

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[
    {
        "num": 1,
        "text": "This is not Mambo Number 5!"
    },
    {
        "num": 2,
        "text": "This is not Mambo Number 5!"
    },
    {
        "num": 3,
        "text": "This is not Mambo Number 5!"
    },
    {
        "num": 4,
        "text": "This is not Mambo Number 5!"
    },
    {
        "num": 5,
        "text": "This is Mambo Number 5"
    },
    {
        "num": 6,
        "text": "This is not Mambo Number 5!"
    }
]

Invoke Connector Filter

Overview

The InvokeConnector filter allows a feed writer to call a supplemental feed or action from a parent feed or workflow. This functionality is similar to that provided by the Set filter, but with more control over how data is passed, executed, and returned.

Usage

To use the Invoke Connector Filter a feed writer defines an optional condition, pre-filters, and the connector.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
invoke-connector:
  condition: True
  filters:
    - iterate
  connector:
    name: FeedName
    iterate: False
    to-stage: report
    run-params: !expr value
    return: value
  • condition - Optional, defaults to True. Determines whether the connector is invoked.

  • filters - The filters transform the incoming value before passing that value to the defined connector. Data modified by these filters exists only within the scope of the invoking connector and not outside it. The resulting data depends on the return value for the connector.

  • name - The name of the feed to be invoked.

  • iterate - Optional, defaults to False. This configuration determines how the data will be iterated over and sent to the connector feed. Valid options for iterate are: True, False, as_completed, and seq.

    • True - Data will be iterated over and sent in batches, defined by the iterate chunk size, to the connector feed. These batches will be initiated at the same time and will resolve as they complete.

    • False - Data will be submitted without iteration to the connector feed. The entire iterable will be sent to the connector feed.

    • as_completed - Alias for True. This value operates the same as setting iterate to True.

    • seq - Data will be iterated over as with iterate=True, but each batch will be submitted to the connector feed in sequential order and resolved in sequential order.

  • to-stage - The stage (source, filters, report, publish) to run the connector feed to before stopping. Supplemental feeds can only run to source and filters while actions can run to all stages.

  • run-params - Optional, defaults to None. Dictionary mapping of any run-params that need to be passed into the connector feed being called.

  • return - This configuration determines what should be returned after execution of the connector feed. Acceptable values are: result, value, parent_values, filters_result, None.

Incoming Value

A dictionary, list, or object value.

Transform Result

The incoming dictionary or a supported return value defined by the return configuration.

Example

In the following example a primary feed named ExampleWorkflow uses invoke-connector to call the ExampleAction feed. For this example, the chunk size of the incoming data from the threat collection source is 100.

When this feed runs:

  1. Condition is True so the connector will evaluate

  2. Filters will run, in this case chunking the incoming data of 100 into two chunks of 50 (chunk_size: 50)

  3. ExampleAction connector will make two POST requests of 50 items

  4. Data after stage filters (to-stage: filters) will be collected and returned (return: result)

  5. Data will be available to any filters after invoke-connector for further processing and/or reporting

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
feeds:
  ExampleWorkflow:
    feed_type: primary
    category: workflow
    # ...
    filters:
      - invoke-connector:
          condition: True
          filters:
            - iterate:
                chunk_size: 50
          connector:
            name: ExampleAction
            iterate: True
            to-stage: filters
            run-params:
              value: !expr value
            return: result
    # ...

ExampleAction:
  feed_type: action
  namespace: threatq.action.example
  source:
    http:
      url: http://example.com/submit
      method: POST
      data:
        value: !expr run_params.value
      request_content_type: application/json
      headers:
        Accept: application/json
        Content-Type: application/json
  filters:
    - parse-json
  report:
    indicator-sets:
      default:
        items:
          - type: String
            value: !expr data

IP Filter

Overview

The IP Filter allows CDF writers to parse an IP address, whether it be a string or an integer. The parsed results will include the original input, the parsed IP, and whether the IP is private.

Usage
1
2
3
filters:
  - new: 16909060
  - ip
Incoming Value

An IP value. Accepted forms include strings and integers.

Transform Result

This filter returns a dictionary containing the following keys: original_ip, parsed_ip, and is_private

Examples

The most common use case for the IP Filter involves taking an integer representation of an IP Address, and ingesting it into ThreatQ, as long as it is not a private IP:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
filters:
  - json
  - iterate
  - filter-mapping:
      address: ip
report:
  indicator-sets:
    default:
      items:
        - condition: !expr not data.address.is_private
          value: !expr data.address.parsed_ip
          type: IP Address

Note

After using the filter, you cannot pass the exact result to the reporter. You must access and use the “parsed_ip” key.

The most simple implementation of the IP Filter is as follows:

Filter Chain

1
2
3
filters:
  - new: 16909060
  - ip

Output

1
2
3
4
5
6
7
[
  {
      "is_private": false,
      "original_ip": 16909060,
      "parsed_ip": "1.2.3.4"
  }
]

A CDF Writer can also leverage this filter for IPv6 addresses as seen here:

Filter Chain

1
2
3
filters:
  - new: 42540766411282592856903984951653826561
  - ip

Output

1
2
3
4
5
6
7
[
  {
      "is_private": true,
      "original_ip": 42540766411282592856903984951653826561,
      "parsed_ip": "2001:db8::1"
  }
]

Iterate Filter

Overview

The Iterate Filter allows a CDF writer to yield items down the Filter Chain from an incoming list value one at a time or in batches/chunks. The Iterate Filter is key to efficiently looping over data values from a provider.

Note

Though the Each filter also has looping behavior, it works very differently and should not be confused with Iterate.

Usage
1
2
filters:
  - iterate

Iterate’s behavior of yielding each item of a list one at a time or in chunks has some important implications:

  • It shifts the context of the Filter Chain: no longer is the CDF writer transforming a large blob of data, but rather individual objects or chunks within said data. The remainder of the Filter Chain runs for each individual item in turn.

  • Further, the original context of the list data as a whole is lost. Situations where a list needs to be kept together in order to derive relationships between objects within are not a good application of the Iterate filter.

By shifting the context of the Filter Chain from a list object to individual items in the list, the Iterate Filter helps a CDF writer focus in the reporting of their data to a singular grouping of related objects and attribution data. One will find the Iterate Filter utilized in most JSON-like data feeds like so:

1
2
3
4
5
filters:
  - parse-json
  - get: data # Get Filter is not needed if the outermost data structure of the parsed JSON is a list
  - iterate
  # Further filter processing for each individual item in data

If the focus is on processing in chunks rather than individual items, one can use the chunk_size option, assuming the incoming value is not an async-generator object. If the last chunk does not match chunk_size, the chunk will still yield. By default, chunk_size is set to 0, i.e. it will process individual items in the incoming list and not in chunks.

1
2
3
4
5
6
filters:
  - parse-json
  - get: data # Get Filter is not needed if the outermost data structure of the parsed JSON is a list
  - iterate
      chunk_size: 10
  # Further filter processing for each batch/chunk where a single chunk is 10 items as a list value

Note

To clarify, when using chunk_size, data is yielded as a list value, not as the original individual object. For example, if chunk_size is set to 1, then it processes each individual object as if it is a single value inside a list. If chunk_size is set to 3, then it will yield 3 values inside a list for each chunk.

Incoming Value

Any list or iterable value.

Transform Result

Each item within the incoming list value one at a time, starting from the beginning of the list.

Note

While other Filters may yield lists of data, this is still a single value which happens to be a list. See Examples below for more information.

Examples

Supposing that a feed provider returns a simple, flat-file list of IP data, the Iterate Filter can be leveraged along with the Report Section to transform all data appropriately for ingestion:

Input File Contents

1
2
3
172.217.2.110
172.217.2.111
172.217.2.112

Filter Chain

1
2
3
4
5
6
7
8
9
filters:
  - split-lines
  - iterate
report:
  indicator-sets:
    default:
      items:
        - value: !expr data
          type: IP Address

Output --to-stage=filters

1
2
3
4
5
[
    "172.217.2.110",
    "172.217.2.111",
    "172.217.2.112"
]

Output --to-stage=report

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
  {
      "indicators": [
          {
              "description": null,
              "status_id": 1,
              "type": {
                  "name": "IP Address"
              },
              "value": "172.217.2.110"
          }
      ]
  },
  {
      "indicators": [
          {
              "description": null,
              "status_id": 1,
              "type": {
                  "name": "IP Address"
              },
              "value": "172.217.2.111"
          }
      ]
  },
  {
      "indicators": [
          {
              "description": null,
              "status_id": 1,
              "type": {
                  "name": "IP Address"
              },
              "value": "172.217.2.112"
          }
      ]
  }
]

While one could leverage the Each Filter and other Filters to transform all data in a provider response into object dictionaries that are then passed to the reporter, it is not recommended over the Iterate Filter. The Iterate Filter is implemented as a Python generator, therefore seeing some increased performance and memory usage.

Iterate JSON File Filter

Overview

The IterateJSONFile Filter allows a CDF writer to yield items down the Filter Chain from an incoming AsyncTemporaryFile containing an encoded JSON array data one at a time.

New in version 4.37.0.

Usage
1
2
3
filters:
  - iterate-json-file:
      entries: 10  # Optional, sets the max number of possible items yielded from the JSON array

By setting the entries argument, the CDF writer can configure the maximum number of yielded items from the JSON array.

Incoming Value

An AsyncTemporaryFile containing an encoded JSON array.

Transform Result

Each deserialized JSON item within the incoming AsyncTemporaryFile one at a time.

Examples

The following filter chain could be used in order to iterate each item in a JSON array.

Input File Contents

1
2
3
4
5
6
7
[
    {"IP Address":"172.217.2.110"},
    {"IP Address":"172.217.2.111"},
    {"IP Address":"172.217.2.112"},
    {"IP Address":"172.217.2.113"},
    {"IP Address":"172.217.2.114"}
]

Filter Chain

1
2
3
4
filters:
  - download
  - iterate-json-file:
        entries: 2 #Optional - returns first x number of entries

Output

1
2
3
4
5
6
7
8
[
    {
        "IP Address": "172.217.2.110"
    },
    {
        "IP Address": "172.217.2.111"
    }
]

Iterate Text File Filter

Overview

The IterateTextFile Filter allows a CDF writer to yield items down the Filter Chain from an incoming AsyncTemporaryFile containing a line, one at a time.

Usage
1
2
3
filters:
  - iterate-text-file:
      lines: 10  # Optional, sets the max number of possible lines yielded from the input file

By setting the lines argument, the CDF writer can configure the maximum number of yielded lines from the file, by setting the lines argument to 0 the filter will yield all the lines from the file.

Incoming Value

An AsyncTemporaryFile containing a text file.

Transform Result

Each line within the incoming AsyncTemporaryFile one at a time.

Examples

The following filter chain could be used in order to iterate each line from a text file.

Input File Contents

1
2
3
line1
line2
line3

Filter Chain

1
2
3
4
filters:
  - download
  - iterate-text-file:
        lines: 2 #Optional - returns first x number of lines

Output

1
2
3
4
[
  line1,
  line2,
]

List Items Filter

Overview

The ListItems Filter allows a CDF writer to transform an incoming dictionary mapping value into a list of the dictionary’s key-value pairs.

Note

While the Enumerate Filter returns a generator that yields a tuple representing each key-value pair in the dictionary mapping and is preferred for when the context of the current value in the filter chain needs to be modified, the list-items filter should be used when a CDF writer wants to modify a field within the dictionary mapping value while preserving the remaining fields in the mapping.

New in version 4.40.1.

Usage
1
2
filters:
  - list-items
Incoming Value

Any dictionary mapping value.

Transform Result

A list of the dictionary’s key-value pairs.

Examples

Suppose that some incoming data being filtered includes a meta dictionary on each object and the CDF writer is interested in iterating over a list of the dictionary’s key-value pairs. To create a list of the key-value pairs from the meta dictionary, the CDF writer can leverage a Filter-Mapping Filter along with a list-items filter like so:

Incoming Dictionary Mapping Value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
    "meta": {
        "attribution-confidence": "50",
        "country": "CN",
        "refs": [
        "https://paper.seebug.org/papers/APT/APT_CyberCriminal_Campagin/2011/the_nitro_attacks.pdf",
        "https://unit42.paloaltonetworks.com/new-indicators-compromise-apt-group-nitro-uncovered/",
        "https://blog.trendmicro.com/trendlabs-security-intelligence/the-significance-of-the-nitro-attacks/"
        ],
        "synonyms": [
        "Covert Grove"
        ]
    },
    "uuid": "0b06fb39-ed3d-4868-ac42-12fff6df2c80",
    "value": "Nitro",
    "related": [
        {
            "dest-uuid": "6a2e693f-24e5-451a-9f88-b36a108e5662",
            "tags": [
            "estimative-language:likelihood-probability=\"likely\""
            ],
            "type": "similar"
        }
    ]
}

Filter Chain:

1
2
3
4
filters:
  - filter-mapping:
      meta:
        list-items

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
    "meta": [
        [
            "attribution-confidence",
            "50"
        ],
        [
            "synonyms",
            [
                "Covert Grove"
            ]
        ],
        [
            "refs",
            [
                "https://paper.seebug.org/papers/APT/APT_CyberCriminal_Campagin/2011/the_nitro_attacks.pdf",
                "https://unit42.paloaltonetworks.com/new-indicators-compromise-apt-group-nitro-uncovered/",
                "https://blog.trendmicro.com/trendlabs-security-intelligence/the-significance-of-the-nitro-attacks/"
            ]
        ],
        [
            "country",
            "CN"
        ]
    ],
    "uuid": "0b06fb39-ed3d-4868-ac42-12fff6df2c80",
    "value": "Nitro",
    "related": [
        {
            "dest-uuid": "6a2e693f-24e5-451a-9f88-b36a108e5662",
            "tags": [
            "estimative-language:likelihood-probability=\"likely\""
            ],
            "type": "similar"
        }
    ]
}

List Keys Filter

Overview

The ListKeys Filter allows a CDF writer to transform an incoming dictionary mapping value into a list of the dictionary’s keys.

Note

Calling the list-keys filter is equivalent to calling the List Filter on a dictionary mapping value.

New in version 4.40.1.

Usage
1
2
filters:
  - list-keys
Incoming Value

Any dictionary mapping value.

Transform Result

A list of the dictionary’s keys.

Examples

Suppose that some incoming data being filtered includes a meta dictionary on each object and the CDF writer is interested in iterating over a list of the dictionary’s keys. To create a list of the keys from the meta dictionary, the CDF writer can leverage a Filter-Mapping Filter along with a list-keys filter like so:

Incoming Dictionary Mapping Value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
    "meta": {
        "attribution-confidence": "50",
        "country": "CN",
        "refs": [
        "https://www.proofpoint.com/us/exploring-bergard-old-malware-new-tricks",
        "http://researchcenter.paloaltonetworks.com/2016/01/new-attacks-linked-to-c0d0s0-group/",
        "https://www.nytimes.com/2016/06/12/technology/the-chinese-hackers-in-the-back-office.html",
        "https://www.ncsc.gov.uk/content/files/protected_files/article_files/Joint%20report%20on%20publicly%20available%20hacking%20tools%20%28NCSC%29.pdf"
        ],
        "synonyms": [
        "C0d0so",
        "APT19",
        "APT 19",
        "Sunshop Group"
        ]
    },
    "uuid": "0b06fb39-ed3d-4868-ac42-12fff6df2c80",
    "value": "Nitro",
    "related": [
        {
            "dest-uuid": "6a2e693f-24e5-451a-9f88-b36a108e5662",
            "tags": [
            "estimative-language:likelihood-probability=\"likely\""
            ],
            "type": "similar"
        }
    ]
}

Filter Chain:

1
2
3
4
filters:
  - filter-mapping:
      meta:
        list-keys

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "meta": [
        "synonyms",
        "refs",
        "attribution-confidence",
        "country"
    ],
    "uuid": "0b06fb39-ed3d-4868-ac42-12fff6df2c80",
    "value": "Nitro",
    "related": [
        {
            "dest-uuid": "6a2e693f-24e5-451a-9f88-b36a108e5662",
            "tags": [
            "estimative-language:likelihood-probability=\"likely\""
            ],
            "type": "similar"
        }
    ]
}

List Values Filter

Overview

The ListItems Filter allows a CDF writer to transform an incoming dictionary mapping value into a list of the dictionary’s values.

New in version 4.40.1.

Usage
1
2
filters:
  - list-values
Incoming Value

Any dictionary mapping value.

Transform Result

A list of the dictionary’s values.

Examples

Suppose that some incoming data being filtered includes a meta dictionary on each object and the CDF writer is interested in iterating over a list of the dictionary’s values. To create a list of the values from the meta dictionary, the CDF writer can leverage a Filter-Mapping Filter along with a list-values filter like so:

Incoming Dictionary Mapping Value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
    "meta": {
        "attribution-confidence": "50",
        "cfr-suspected-state-sponsor": "China",
        "cfr-suspected-victims": [
        "U.S. satellite and aerospace sector"
        ],
        "cfr-target-category": [
        "Private sector",
        "Government"
        ],
        "cfr-type-of-incident": "Espionage",
        "country": "CN",
        "refs": [
        "http://cdn0.vox-cdn.com/assets/4589853/crowdstrike-intelligence-report-putter-panda.original.pdf",
        "https://www.cfr.org/interactive/cyber-operations/putter-panda",
        "https://attack.mitre.org/groups/G0024/"
        ],
        "synonyms": [
        "PLA Unit 61486",
        "APT 2",
        "APT2",
        "Group 36",
        "APT-2",
        "MSUpdater",
        "4HCrew",
        "SULPHUR",
        "SearchFire",
        "TG-6952"
        ]
    },
    "uuid": "0b06fb39-ed3d-4868-ac42-12fff6df2c80",
    "value": "Nitro",
    "related": [
        {
            "dest-uuid": "6a2e693f-24e5-451a-9f88-b36a108e5662",
            "tags": [
            "estimative-language:likelihood-probability=\"likely\""
            ],
            "type": "similar"
        }
    ]
}

Filter Chain:

1
2
3
4
filters:
  - filter-mapping:
      meta:
        list-values

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
    "meta": [
        [
            "PLA Unit 61486",
            "APT 2",
            "APT2",
            "Group 36",
            "APT-2",
            "MSUpdater",
            "4HCrew",
            "SULPHUR",
            "SearchFire",
            "TG-6952"
        ],
        [
            "U.S. satellite and aerospace sector"
        ],
        "China",
        [
            "http://cdn0.vox-cdn.com/assets/4589853/crowdstrike-intelligence-report-putter-panda.original.pdf",
            "https://www.cfr.org/interactive/cyber-operations/putter-panda",
            "https://attack.mitre.org/groups/G0024/"
        ],
        [
            "Private sector",
            "Government"
        ],
        "CN",
        "50",
        "Espionage"
    ],
    "uuid": "0b06fb39-ed3d-4868-ac42-12fff6df2c80",
    "value": "Nitro",
    "related": [
        {
            "dest-uuid": "6a2e693f-24e5-451a-9f88-b36a108e5662",
            "tags": [
            "estimative-language:likelihood-probability=\"likely\""
            ],
            "type": "similar"
        }
    ]
}

Map Items Filter

Overview

The MapItems Filter enables a CDF writer to transform an incoming iterable value into a mapping, in which a field name is mapped to a member of the incoming iterable value, in order. The list of field names is either explicitly passed as an argument or implicitly determined by the first incoming iterable value passed to it (e.g. a parsed CSV header).

Note

Naming each element of a numerically-indexed iterable, such as a list, improves the readability and maintainability of the CDF by avoiding the magic numbers software anti-pattern. Instead of needing to remember what data[3] is referring to in the report section of the CDF, one can use the Map Items Filter to assign that index a meaningful name, such as data.malware_family.

Usage
1
2
filters:
  - map-items: [field1, field2, field3]

The Map Items Filter’s argument is optional. If not provided, the first incoming iterable value is used as the list of field names to be applied to subsequent incoming iterable values.

Incoming Value

Any list or iterable value.

Transform Result

The dictionary mapping value in which each element of the incoming iterable value is keyed by each element of the field names list.

1
2
3
4
5
{
    'field1': 'element1',
    'field2': 'element2',
    'field3': 'element3',
}
Examples
Implicit Field Names

Input File Contents:

1
2
3
4
5
# abuse.ch Feodo Tracker Botnet C2 IP Blocklist (CSV)
Firstseen,DstIP,DstPort,LastOnline,Malware
2019-10-09 11:06:45,216.98.148.181,8080,2019-10-09,Heodo
2019-10-08 03:48:24,5.185.67.137,449,2019-10-09,TrickBot
2019-09-11 20:20:37,5.67.96.120,8080,,Heodo

Filter Chain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
- split-lines
- iterate
- drop: !expr not value or value.startswith("#")
- parse-csv
- map-items
- filter-mapping:
    Firstseen:
      if:
        condition: !expr value
        filters:
          - timestamp
    LastOnline:
      if:
        condition: !expr value
        filters:
          - timestamp

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
    {
        "DstIP": "216.98.148.181",
        "DstPort": "8080",
        "Firstseen": "2019-10-09 11:06:45-00:00",
        "LastOnline": "2019-10-09 00:00:00-00:00",
        "Malware": "Heodo"
    },
    {
        "DstIP": "5.185.67.137",
        "DstPort": "449",
        "Firstseen": "2019-10-08 03:48:24-00:00",
        "LastOnline": "2019-10-09 00:00:00-00:00",
        "Malware": "TrickBot"
    },
    {
        "DstIP": "5.67.96.120",
        "DstPort": "8080",
        "Firstseen": "2019-09-11 20:20:37-00:00",
        "LastOnline": "",
        "Malware": "Heodo"
    }
]

Comments:

The input file is a slightly edited and truncated version of what the abuse.ch Feodo Tracker Botnet C2 IP Blocklist feed provides. In the feed’s actual file, the CSV header is commented out. For this example, the CSV header is uncommented in order to prevent being dropped by the Drop Filter, illustrating how the Map Items Filter uses the first incoming iterable value to set the filter’s field names for subsequent transformations.

This example also illustrates a general best practice for CDF writing: forcing a normalized timestamp format for any timestamp fields using the Timestamp Filter. Since timestamp fields for this feed are not guaranteed to have provided values, the timestamp filter is conditionally applied.

Using the Field Names Argument

Input File Contents:

1
2
3
4
5
# abuse.ch Feodo Tracker Botnet C2 IP Blocklist (CSV)
# Firstseen,DstIP,DstPort,LastOnline,Malware
2019-10-09 11:06:45,216.98.148.181,8080,2019-10-09,Heodo
2019-10-08 03:48:24,5.185.67.137,449,2019-10-09,TrickBot
2019-09-11 20:20:37,5.67.96.120,8080,,Heodo

Filter Chain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
- split-lines
- iterate
- drop: !expr not value or value.startswith("#")
- parse-csv
- map-items: [first_seen, dst_ip, dst_port, last_online, malware]
- filter-mapping:
    first_seen:
      if:
        condition: !expr value
        filters:
          - timestamp
    last_online:
      if:
        condition: !expr value
        filters:
          - timestamp

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
    {
        "dst_ip": "216.98.148.181",
        "dst_port": "8080",
        "first_seen": "2019-10-09 11:06:45-00:00",
        "last_online": "2019-10-09 00:00:00-00:00",
        "malware": "Heodo"
    },
    {
        "dst_ip": "5.185.67.137",
        "dst_port": "449",
        "first_seen": "2019-10-08 03:48:24-00:00",
        "last_online": "2019-10-09 00:00:00-00:00",
        "malware": "TrickBot"
    },
    {
        "dst_ip": "5.67.96.120",
        "dst_port": "8080",
        "first_seen": "2019-09-11 20:20:37-00:00",
        "last_online": "",
        "malware": "Heodo"
    }
]

Comments:

Compared to the input file in the “Implicit Field Names” example, the input file in this example is more accurate to what the abuse.ch Feodo Tracker Botnet C2 IP Blocklist feed provides; the CSV header line is commented out. Furthermore, our controversial yet brave CDF writer really hates title cased key names, so it was decided that the field names would be explicitly passed to the Map Items Filter with snake cased names instead of spending time on a workaround for the CSV header being commented out.

New Filter

Overview

The New Filter allows a CDF writer to transform an incoming value into a new, different value.

Usage

The new filter can be used to transform an incoming value into any object constructable via YAML and Jinja2. Commonly, one transforms one dictionary mapping to another, possibly changing key names or selecting different data:

1
2
3
4
filters:
  - new:
      key_a: some string
      key_b: !expr value.b

One can also use the New Filter to create a new list value by supplying a Jinja2 expression as a positional argument:

1
2
filters:
  - new: !expr '[1, 2, 3, 4]'

Warning

Supplying a YAML style list value to the New Filter causes a TypeError to be raised as the New Filter is only configured to accept one positional argument.

Incoming Value

This can be any value.

Transform Result

A new data value created based on the New Filter’s configuration.

Examples

The New Filter could be useful for a CDF writer in any number of situations. Its ability to replace an incoming value with a brand new, specified value is exceedingly useful for:

  • Truncating large, unruly objects into smaller, more concise mappings.

  • Formatting data values into mappings representing IoC Objects or Object Attributes.

In fact, the New Filter is consistently used throughout the CDF Filter documentation in order to create an example incoming data value to manipulate.

Note

As the New Filter is used to populate the example Filter Chain with data, most examples in the CDF Filter documentation can be easily run via the TQ Filter command.

Supposing that a CDF writer wants to format some list of data values into Motivation Attribute dictionaries, the New Filter could be leveraged like so:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
filters:
  - new: #Create example incoming data
      name: Motivator Set
      Motivation:
        - infiltration
        - ransom
        - command and control
        - shrek
  - filter-mapping:
      Motivation:
        each:
          - new:
              name: Motive
              value: !expr value

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
    {
        "Motivation": [
            {
                "name": "Motive",
                "value": "infiltration"
            },
            {
                "name": "Motive",
                "value": "ransom"
            },
            {
                "name": "Motive",
                "value": "command and control"
            },
            {
                "name": "Motive",
                "value": "shrek"
            }
        ],
        "name": "Motivator Set"
    }
]

Parse CSV Filter

Overview

The ParseCSV Filter allows a CDF writer to split an incoming CSV-formatted string value into a list of substrings. Unlike the Split Filter and its derivatives, the Parse CSV Filter does not blindly split on provided separator characters. Instead, it leverages the full power of csv.reader(). A CDF writer can specify or implicitly make use of default values for formatting parameters that are grouped together into dialects defined by the csv Python stdlib module’s default registered instances of Dialect. The default dialect used throughout the csv module and used by this filter is excel, which is ideal for the majority of common use cases.

Usage
1
2
filters:
  - parse-csv

The above implicitly uses the excel Dialect and is functionally the same as:

1
2
filters:
  - parse-csv: excel

or

1
2
3
filters:
  - parse-csv:
      dialect: excel

The Dialects and Formatting Parameters section of the csv module documentation shows attributes of Dialect that can be passed to the Parse CSV Filter.

Example:

1
2
3
4
filters:
  - parse-csv:
      delimiter: '|'
      quotechar: "'"
Incoming Value

A CSV-formatted string value.

Transform Result

A list of substrings.

Examples
Parsing Comma-Delimited Strings

Input File Contents:

1
2
3
4
5
# abuse.ch Feodo Tracker Botnet C2 IP Blocklist (CSV)
# Firstseen,DstIP,DstPort,LastOnline,Malware
2019-10-09 11:06:45,216.98.148.181,8080,2019-10-09,Heodo
2019-10-08 03:48:24,5.185.67.137,449,2019-10-09,TrickBot
2019-09-11 20:20:37,5.67.96.120,8080,,Heodo

Filter Chain:

1
2
3
4
- split-lines
- iterate
- drop: !expr not value or value.startswith("#")
- parse-csv

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
    [
        "2019-10-09 11:06:45",
        "216.98.148.181",
        "8080",
        "2019-10-09",
        "Heodo"
    ],
    [
        "2019-10-08 03:48:24",
        "5.185.67.137",
        "449",
        "2019-10-09",
        "TrickBot"
    ],
    [
        "2019-09-11 20:20:37",
        "5.67.96.120",
        "8080",
        "",
        "Heodo"
    ]
]

Comments:

The input file is a truncated version of what the abuse.ch Feodo Tracker Botnet C2 IP Blocklist feed provides.

Commonly, the Map Items Filter is applied to the output of the Parse CSV Filter so that the rest of the filter chain and the report stage can refer to the elements of the created substring lists by field names instead of numerical indices. Please see the Map Items Filter documentation for more details.

Parsing Tab-Delimited Strings

Input File Contents:

1
2
3
4
5
# abuse.ch Feodo Tracker Botnet C2 IP Blocklist (CSV)
# Firstseen DstIP   DstPorts        LastOnline      Malware
2019-10-09 11:06:45 216.98.148.181  8080    2019-10-09      Heodo
2019-10-08 03:48:24 5.185.67.137    449     2019-10-09      TrickBot
2019-09-11 20:20:37 5.67.96.120     8080            Heodo

Filter Chain:

1
2
3
4
- split-lines
- iterate
- drop: !expr not value or value.startswith("#")
- parse-csv: excel-tab

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[
    [
        "2019-10-09 11:06:45",
        "216.98.148.181",
        "8080",
        "2019-10-09",
        "Heodo"
    ],
    [
        "2019-10-08 03:48:24",
        "5.185.67.137",
        "449",
        "2019-10-09",
        "TrickBot"
    ],
    [
        "2019-09-11 20:20:37",
        "5.67.96.120",
        "8080",
        "",
        "Heodo"
    ]
]

Comments:

The input file is a truncated and edited version of what the abuse.ch Feodo Tracker Botnet C2 IP Blocklist feed provides. Instead of being comma-delimited, the lines are tab-delimited. One could have overwritten the formatting defaults of the excel Dialect to achieve parsing tab-delimited strings, but one of the default registered instances of the Dialect class in the csv Python stdlib module is excel_tab (note that it is registered with the dialect name excel-tab).

Parsing Pipe-Delimited Strings

Input File Contents:

1
2
3
4
5
# abuse.ch Feodo Tracker Botnet C2 IP Blocklist (CSV)
# Firstseen|DstIP|DstPorts|LastOnline|Malware
2019-10-09 11:06:45|216.98.148.181|8080|2019-10-09|Heodo
2019-10-08 03:48:24|5.185.67.137|449|2019-10-09|TrickBot
2019-09-11 20:20:37|5.67.96.120|'8080|80|443'||Heodo

Filter Chain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
- split-lines
- iterate
- drop: !expr not value or value.startswith("#")
- parse-csv:
    delimiter: '|'
    quotechar: "'"
- map-items: [first_seen, dst_ip, dst_port, last_online, malware]
- filter-mapping:
    dst_port:
      split:
        sep: '|'

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[
    {
        "dst_ip": "216.98.148.181",
        "dst_port": [
            "8080"
        ],
        "first_seen": "2019-10-09 11:06:45",
        "last_online": "2019-10-09",
        "malware": "Heodo"
    },
    {
        "dst_ip": "5.185.67.137",
        "dst_port": [
            "449"
        ],
        "first_seen": "2019-10-08 03:48:24",
        "last_online": "2019-10-09",
        "malware": "TrickBot"
    },
    {
        "dst_ip": "5.67.96.120",
        "dst_port": [
            "8080",
            "80",
            "443"
        ],
        "first_seen": "2019-09-11 20:20:37",
        "last_online": "",
        "malware": "Heodo"
    }
]

Comments:

The input file is a truncated and edited version of what the abuse.ch Feodo Tracker Botnet C2 IP Blocklist feed provides.

There are several differences with this example’s version of the feed:

  • Instead of being comma-delimited, the lines are pipe-delimited.

  • The destination ports column can have multiple pipe-delimited destination ports. If a column has multiple destination ports, the column is quoted with single quotes.

In this example, the delimiter and quotechar formatting parameters have values explicitly provided to reflect the formatting of the feed data, overwriting the values defined by the excel Dialect.

Parse JSON Filter

Overview

The ParseJSON Filter allows a CDF writer to parse an incoming JSON-encoded string value into native Python objects.

Usage
1
2
filters:
  - parse-json
Incoming Value

A JSON-encoded string.

Transform Result

Native Python objects parsed from the incoming JSON-encoded string value.

Examples

Input File Contents:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[
    {
        "name": "Fancy Panda",
        "is_active": true,
        "cash_amount_stolen": 22001.50,
        "currency": "USD",
        "total_campaigns": 27,
        "aliases": [
            "Arcane Warrior King",
            "APT-2001"
        ],
        "attribute_mapping": {
            "MITRE Technique": "TA0004",
            "Nessus Plugin ID": "70485"
        }
    },
    {
        "name": "Swollen Hog",
        "is_active": false,
        "cash_amount_stolen": 0.00,
        "currency": "N/A",
        "total_campaigns": 3,
        "aliases": [],
        "attribute_mapping": null
    }
]

Filter Chain:

1
- parse-json

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[
    {
        'is_active': True,
        'attribute_mapping': {
            'MITRE Technique': 'TA0004',
            'Nessus Plugin ID': '70485'
        },
        'total_campaigns': 27,
        'cash_amount_stolen': 22001.5,
        'currency': 'USD',
        'name': 'Fancy Panda',
        'aliases': [
            'Arcane Warrior King',
            'APT-2001'
        ]
    },
    {
        'is_active': False,
        'attribute_mapping': None,
        'total_campaigns': 3,
        'cash_amount_stolen': 0,
        'currency': 'N/A',
        'name': 'Swollen Hog',
        'aliases': []
    }
]

Comments:

Unlike the pretty-printed JSON output shown in most filter documentation, this example’s output illustrates how the Parse JSON filter deserializes the incoming value into native Python objects.

Note

The shown output is a pretty-printed version of the current value displayed via the Log Filter.

Parse JSON Sequence Filter

Overview

ToDo

Usage

ToDo

Incoming Value

ToDo

Transform Result

ToDo

Examples

ToDo

Parse MISP Filter

Overview

The ParseMISP Filter is used to parse MISP JSON data. The filter leverages the misp module to parse an incoming data value that is formatted as MISP JSON data.

Usage
1
2
filters:
  - parse-misp
Incoming Value

A dictionary, list, or string.

Enumerations of possible incoming values:
  • A serialized (string) or deserialized (dictionary) response from the MISP Events JSON API endpoint: {"response": [{"Event": {"id": 1, ...}}, {"Event": {"id": 2, ...}}, ...]}

  • A single dictionary or list of dictionaries, in which each dictionary contains an “Event” key whose value is a MISP Event dictionary: {"Event": {"id": 1, ...}} or [{"Event": {"id": 1, ...}}, {"Event": {"id": 2, ...}}, ...]

  • A single MISP Event dictionary or a list of MISP Event dictionaries: {"id": 1, ...} or [{"id": 1, ...}, {"id": 2, ...}, ...]

Transform Result

Dictionary containing two keys: indicators and events. The value of each key is a list of dictionaries that represent threat object data. The result of this filter can be passed directly to the Reporter.

Examples

Input File Contents (some values were truncated for documentation purposes):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
{
    "response": [
        {
            "Event": {
                "id": "845",
                "orgc_id": "1",
                "org_id": "1",
                "date": "2021-02-11",
                "threat_level_id": "2",
                "info": "misp_parser_test_event",
                "published": false,
                "uuid": "6024e6c5-2d54-432e-9276-1604ac107221",
                "attribute_count": "17",
                "analysis": "0",
                "timestamp": "1614870003",
                "Attribute": [
                    {
                        "id": "1273664",
                        "type": "link",
                        "category": "Internal reference",
                        "to_ids": false,
                        "uuid": "60266d36-b7e4-4583-a467-22c3ac107221",
                        "event_id": "845",
                        "distribution": "5",
                        "timestamp": "1613131062",
                        "comment": "link comment",
                        "sharing_group_id": "0",
                        "deleted": false,
                        "disable_correlation": false,
                        "object_id": "0",
                        "object_relation": null,
                        "value": "http:\/\/www.testlink.com"
                    },
                    {
                        "id": "1273670",
                        "type": "yara",
                        "category": "Payload installation",
                        "to_ids": false,
                        "uuid": "603cddde-6930-4136-ad3a-1035ac107221",
                        "event_id": "845",
                        "distribution": "5",
                        "timestamp": "1614606558",
                        "comment": "yara .response[].Event.Attribute[]",
                        "sharing_group_id": "0",
                        "deleted": false,
                        "disable_correlation": false,
                        "object_id": "0",
                        "object_relation": null,
                        "value": "rule notPresent\r\n{\r\n  strings:\r\n    $a = \"KnowBe4\" nocase\r\n\r\n  ..."
                    }
                ],
                "Galaxy": [
                    {
                        "id": "6",
                        "uuid": "03e3853a-1708-11e8-95c1-67cf3f801a18",
                        "name": "Mobile Attack - Malware",
                        "type": "mitre-mobile-attack-malware",
                        "description": "Name of ATT&CK software",
                        "version": "4",
                        "icon": "optin-monster",
                        "namespace": "mitre-attack",
                        "GalaxyCluster": [
                            {
                                "id": "10632",
                                "collection_uuid": "04a165aa-1708-11e8-b2da-c7d7625f4a4f",
                                "type": "mitre-mobile-attack-malware",
                                "value": "AndroRAT - MOB-S0008",
                                "tag_name": "misp-galaxy:mitre-mobile-attack-malware=\"AndroRAT - MOB-S0008\"",
                                "description": "AndroRAT \"allows a third party to control the device and collect ...",
                                "galaxy_id": "6",
                                "source": "https:\/\/github.com\/mitre\/cti",
                                "authors": [
                                    "MITRE"
                                ],
                                "version": "6",
                                "uuid": "a3dad2be-ce62-4440-953b-00fbce7aba93",
                                "tag_id": "10",
                                "meta": {
                                    "external_id": [
                                        "MOB-S0008"
                                    ],
                                    "refs": [
                                        "https:\/\/attack.mitre.org\/mobile\/index.php\/Software\/MOB-S0008",
                                        "https:\/\/blog.lookout.com\/blog\/2016\/05\/25\/spoofed-apps\/"
                                    ],
                                    "synonyms": [
                                        "AndroRAT"
                                    ]
                                }
                            }
                        ]
                    }
                ],
                "Object": [
                    {
                        "id": "22",
                        "name": "suricata",
                        "meta-category": "network",
                        "description": "An object describing one or more Suricata rule(s) ...",
                        "template_uuid": "3c177337-fb80-405a-a6c1-1b2ddea8684a",
                        "template_version": "2",
                        "event_id": "845",
                        "uuid": "603cf586-edbc-4443-a93a-0cdfac107221",
                        "timestamp": "1614607750",
                        "distribution": "5",
                        "sharing_group_id": "0",
                        "comment": "",
                        "deleted": false,
                        "ObjectReference": [],
                        "Attribute": [
                            {
                                "id": "1273672",
                                "type": "comment",
                                "category": "Other",
                                "to_ids": false,
                                "uuid": "603cf586-ee54-4c8f-9696-0cdfac107221",
                                "event_id": "845",
                                "distribution": "5",
                                "timestamp": "1614607750",
                                "comment": "",
                                "sharing_group_id": "0",
                                "deleted": false,
                                "disable_correlation": false,
                                "object_id": "22",
                                "object_relation": "comment",
                                "value": ".response[].Event.Object[].Attribute",
                                "Galaxy": [],
                                "ShadowAttribute": []
                            }
                        ]
                    },
                    {
                        "id": "23",
                        "name": "url",
                        "meta-category": "network",
                        "description": "url object describes an url along with ...",
                        "template_uuid": "60efb77b-40b5-4c46-871b-ed1ed999fce5",
                        "template_version": "7",
                        "event_id": "845",
                        "uuid": "6040f5f2-928c-4c56-bd4c-3f86ac107221",
                        "timestamp": "1614870002",
                        "distribution": "5",
                        "sharing_group_id": "0",
                        "comment": "",
                        "deleted": false,
                        "ObjectReference": [],
                        "Attribute": [
                            {
                                "id": "1278505",
                                "type": "url",
                                "category": "Network activity",
                                "to_ids": true,
                                "uuid": "6040f5f2-44e8-4a75-8e74-3f86ac107221",
                                "event_id": "845",
                                "distribution": "5",
                                "timestamp": "1614870002",
                                "comment": "",
                                "sharing_group_id": "0",
                                "deleted": false,
                                "disable_correlation": false,
                                "object_id": "23",
                                "object_relation": "url",
                                "value": "www.objectattribute.com",
                                "Galaxy": [],
                                "ShadowAttribute": []
                            },
                            {
                                "id": "1278506",
                                "type": "text",
                                "category": "Other",
                                "to_ids": false,
                                "uuid": "6040f5f3-6adc-4a73-9a0e-3f86ac107221",
                                "event_id": "845",
                                "distribution": "5",
                                "timestamp": "1614870003",
                                "comment": "",
                                "sharing_group_id": "0",
                                "deleted": false,
                                "disable_correlation": true,
                                "object_id": "23",
                                "object_relation": "scheme",
                                "value": "http",
                                "Galaxy": [],
                                "ShadowAttribute": []
                            },
                            {
                                "id": "1278507",
                                "type": "domain",
                                "category": "Network activity",
                                "to_ids": true,
                                "uuid": "6040f5f3-4414-47f9-b28d-3f86ac107221",
                                "event_id": "845",
                                "distribution": "5",
                                "timestamp": "1614870003",
                                "comment": "",
                                "sharing_group_id": "0",
                                "deleted": false,
                                "disable_correlation": false,
                                "object_id": "23",
                                "object_relation": "domain",
                                "value": "objectattribute.com",
                                "Galaxy": [],
                                "ShadowAttribute": []
                            }
                        ]
                    }
                ],
                "Tag": [
                    {
                        "id": "11",
                        "name": "tlp:red",
                        "colour": "#CC0033",
                        "exportable": true,
                        "user_id": "0",
                        "hide_tag": false,
                        "numerical_value": null
                    },
                    {
                        "id": "17",
                        "name": "malware_classification:malware-category=\"Worm\"",
                        "colour": "#244100",
                        "exportable": true,
                        "user_id": "0",
                        "hide_tag": false,
                        "numerical_value": null
                    }
                ]
            }
        }
    ]
}

Filter Chain

1
2
3
4
5
6
7
8
9
 filters:
   - parse-misp
 report:
   event-sets:
     events:
         items: !expr data.events
   indicator-sets:
     indicators:
       items: !expr data.indicators

Output

The above filter chain outputs the following dictionary (some values were truncated for documentation purposes):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
 [
     {
         "events": [
             {
                 "attributes": [
                     {
                         "name": "External MISP",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "10.13.0.135/events/view/845"
                     },
                     {
                         "name": "Tag",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "tlp:red"
                     },
                     {
                         "name": "UUID",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "6024e6c5-2d54-432e-9276-1604ac107221"
                     },
                     {
                         "name": "ID",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "845"
                     },
                     {
                         "name": "Category",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "http://www.testlink.com"
                     },
                     {
                         "name": "Analysis",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "Initial"
                     },
                     {
                         "name": "MISP Threat Level",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "Medium"
                     },
                     {
                         "name": "Tag",
                         "published_at": "2023-02-23 14:27:36-00:00",
                         "value": "malware_classification:malware-category=\"Worm\""
                     }
                 ],
                 "description": "misp_parser_test_event",
                 "happened_at": "2021-02-11 00:00:00-00:00",
                 "indicators": [
                     {
                         "type": {
                             "name": "URL"
                         },
                         "value": "www.objectattribute.com"
                     },
                     {
                         "type": {
                             "name": "FQDN"
                         },
                         "value": "objectattribute.com"
                     }
                 ],
                 "malware": [
                     {
                         "value": "AndroRAT"
                     }
                 ],
                 "published_at": "2023-02-23 14:27:36-00:00",
                 "tags": [
                     {
                         "name": "Worm"
                     }
                 ],
                 "title": "misp_parser_test_event",
                 "tlp": {
                     "name": "RED"
                 },
                 "type": {
                     "name": "MISP"
                 }
             }
         ],
         "indicators": [
             {
                 "attributes": [
                     {
                         "name": "Category",
                         "published_at": "2021-03-04 15:00:02-00:00",
                         "value": "Network activity"
                     },
                     {
                         "name": "Tag",
                         "value": "tlp:red"
                     },
                     {
                         "name": "Sharing Group",
                         "published_at": "2021-03-04 15:00:02-00:00",
                         "value": "0"
                     },
                     {
                         "name": "Comment",
                         "value": ""
                     },
                     {
                         "name": "To IDS",
                         "published_at": "2021-03-04 15:00:02-00:00",
                         "value": "True"
                     },
                     {
                         "name": "Distribution",
                         "published_at": "2021-03-04 15:00:02-00:00",
                         "value": "Inherit event"
                     },
                     {
                         "name": "Tag",
                         "value": "malware_classification:malware-category=\"Worm\""
                     }
                 ],
                 "description": null,
                 "events": [
                     {
                         "happened_at": "2021-02-11 00:00:00-00:00",
                         "title": "misp_parser_test_event",
                         "type": {
                             "name": "MISP"
                         }
                     }
                 ],
                 "indicators": [
                     {
                         "type": {
                             "name": "FQDN"
                         },
                         "value": "objectattribute.com"
                     }
                 ],
                 "published_at": "2021-03-04 15:00:02-00:00",
                 "status_id": 1,
                 "tlp": {
                     "name": "RED"
                 },
                 "type": {
                     "name": "URL"
                 },
                 "value": "www.objectattribute.com"
             },
             {
                 "attributes": [
                     {
                         "name": "Category",
                         "published_at": "2021-03-04 15:00:03-00:00",
                         "value": "Network activity"
                     },
                     {
                         "name": "Tag",
                         "value": "tlp:red"
                     },
                     {
                         "name": "Sharing Group",
                         "published_at": "2021-03-04 15:00:03-00:00",
                         "value": "0"
                     },
                     {
                         "name": "Comment",
                         "value": ""
                     },
                     {
                         "name": "To IDS",
                         "published_at": "2021-03-04 15:00:03-00:00",
                         "value": "True"
                     },
                     {
                         "name": "Distribution",
                         "published_at": "2021-03-04 15:00:03-00:00",
                         "value": "Inherit event"
                     },
                     {
                         "name": "Tag",
                         "value": "malware_classification:malware-category=\"Worm\""
                     }
                 ],
                 "description": null,
                 "events": [
                     {
                         "happened_at": "2021-02-11 00:00:00-00:00",
                         "title": "misp_parser_test_event",
                         "type": {
                             "name": "MISP"
                         }
                     }
                 ],
                 "indicators": [
                     {
                         "type": {
                             "name": "URL"
                         },
                         "value": "www.objectattribute.com"
                     }
                 ],
                 "published_at": "2021-03-04 15:00:03-00:00",
                 "status_id": 1,
                 "tlp": {
                     "name": "RED"
                 },
                 "type": {
                     "name": "FQDN"
                 },
                 "value": "objectattribute.com"
             }
         ],
         "malware": [
             {
                 "events": [
                     {
                         "happened_at": "2021-02-11 00:00:00-00:00",
                         "title": "misp_parser_test_event",
                         "type": {
                             "name": "MISP"
                         }
                     }
                 ],
                 "value": "AndroRAT"
             }
         ]
     }
 ]

Parse OLE2 Email Filter

Overview

The ParseOLE2Email filter is used to parse an OLE2 file. This filter parses the OLE2 file’s property streams and attachments.

New in version 4.38.0.

Usage
1
2
filters:
  - parse-ole2-email
Incoming Value

An OLE2 file.

Transform Result

The stream and attachment data formatted as a dictionary. The ParseOLE2Email filter parses the following OLE2 property streams:

  • PidTagSubject

  • PidTagBody

  • PidTagTransportMessageHeaders

  • PidTagMessageClass

  • PidTagSmtpAddress

  • PidTagAttachLongFilename

  • PidTagAttachMimeTag

  • PidTagAttachDataBinary

Examples

Input File Contents The following snippet parses an OLE2 file with an attachment:

Filter Chain

1
2
3
filters:
  - parse-ole2-email
...

Output

The output of this filter chain is the following dictionary (note that some string values are truncated for documentation purposes):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
    "attachments": [
        {
            "type": "image/jpeg",
            "data": "/9j/4AAQSkZJRgABAQEASABIAAD/4QAWRXhpZgAAT...",
            "name": "test.jpg"
        }
    ],
    "email": {
        "date": "Mon, 24 Sep 2007 15:28:03 +0200",
        "from_address": "Matijs van Zuijlen <Matijs.van.Zuijlen@xs4all.nl>",
        "helo": "from apricot.matijs.net (mvz.xs4all.nl [80.126.4.68])\t",
        "message_id": "<20070924132803.GB10141@matijs.net>",
        "raw_body": "test\r\n",
        "raw_header": "from XXXXXXXXX.XXX.XX.NL ([111.111.111.111]) by XXXXXXXXX.XXX.XX.NL with Microsoft...",
        "reply_to": "",
        "sender": "Matijs.van.Zuijlen@xs4all.nl",
        "subject": "test",
        "to": [
            "matijs@xxxxxx.nl"
        ],
        "x_mailer": "",
        "x_originating_ip": "80.126.4.68"
    },
    "received_date": "2007-09-24 09:25:20"
}

Parse Snort/Suricata Filter

Overview

The ParseSnort Filter allows a CDF writer to parse an incoming value that is a string of Snort/Suricata signatures. The filter leverages the idstools module to parse an incoming string containing one or more Snort/Suricata signatures as a list of dictionaries.

Usage
1
2
filters:
  - parse-snort
Incoming Value

Snort/Suricata signatures as a string.

Transform Result

Data formatted as a list of dictionaries.

Examples

Input File Contents:

1
2
3
4
5
6
7
alert tcp $HOME_NET any -> any 3306 (msg: "mysql general_log write file"; flow: established;  content:"|03|";
depth: 5; content:"|67 65 6e 65 72 61 6c 5f 6c 6f 67 5f 66 69 6c 65|"; distance:0; classtype:trojan-activity;
sid: 3013005; rev: 1; metadata:created_at 2018_11_20,by al0ne;)

alert udp $HOME_NET any -> $EXTERNAL_NET 53 (msg:"Suspicious dns request"; flow:established,to_server;
content:"|01 00|"; depth:4; pcre:"/\x00\x10\x00\x01|\x00\x0f\x00\x01|\x00\x05\x00\x01/"; dsize:>200;
classtype:trojan-activity; sid:3011001; rev:1; metadata:created_at 2018_11_09,by al0ne;)

Filter Chain

1
2
filters:
  - parse-snort

Output

The output of this filter chain is the following list of dictionaries:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
[
    {
        "action": "alert",
        "classtype": "trojan-activity",
        "content": "\"|67 65 6e 65 72 61 6c 5f 6c 6f 67 5f 66 69 6c 65|\"",
        "depth": "5",
        "direction": "->",
        "distance": "0",
        "enabled": True,
        "flow": "established",
        "flowbits": [],
        "gid": 1,
        "group": None,
        "header": "alert tcp $HOME_NET any -> any 3306",
        "metadata": [
            "created_at 2018_11_20",
            "by al0ne"
        ],
        "msg": "mysql general_log write file",
        "options": [
            {
                "name": "msg",
                "value": "\"mysql general_log write file\""
            },
            {
                "name": "flow",
                "value": "established"
            },
            {
                "name": "content",
                "value": "\"|03|\""
            },
            {
                "name": "depth",
                "value": "5"
            },
            {
                "name": "content",
                "value": "\"|67 65 6e 65 72 61 6c 5f 6c 6f 67 5f 66 69 6c 65|\""
            },
            {
                "name": "distance",
                "value": "0"
            },
            {
                "name": "classtype",
                "value": "trojan-activity"
            },
            {
                "name": "sid",
                "value": "3013005"
            },
            {
                "name": "rev",
                "value": "1"
            },
            {
                "name": "metadata",
                "value": "created_at 2018_11_20,by al0ne"
            }
        ],
        "priority": 0,
        "raw": "alert tcp $HOME_NET any -> any 3306 (msg: \"mysql general_log write file\"; flow: established;
                content:\"|03|\"; depth: 5; content:\"|67 65 6e 65 72 61 6c 5f 6c 6f 67 5f 66 69 6c 65|\";
                distance:0; classtype:trojan-activity; sid: 3013005; rev: 1; metadata:created_at 2018_11_20,
                by al0ne;)",
        "references": [],
        "rev": 1,
        "sid": 3013005
    },
    {
        "action": "alert",
        "classtype": "trojan-activity",
        "content": "\"|01 00|\"",
        "depth": "4",
        "direction": "->",
        "dsize": ">200",
        "enabled": True,
        "flow": "established,to_server",
        "flowbits": [],
        "gid": 1,
        "group": None,
        "header": "alert udp $HOME_NET any -> $EXTERNAL_NET 53",
        "metadata": [
            "created_at 2018_11_09",
            "by al0ne"
        ],
        "msg": "Suspicious dns request",
        "options": [
            {
                "name": "msg",
                "value": "\"Suspicious dns request\""
            },
            {
                "name": "flow",
                "value": "established,to_server"
            },
            {
                "name": "content",
                "value": "\"|01 00|\""
            },
            {
                "name": "depth",
                "value": "4"
            },
            {
                "name": "pcre",
                "value": "\"/\\x00\\x10\\x00\\x01|\\x00\\x0f\\x00\\x01|\\x00\\x05\\x00\\x01/\""
            },
            {
                "name": "dsize",
                "value": ">200"
            },
            {
                "name": "classtype",
                "value": "trojan-activity"
            },
            {
                "name": "sid",
                "value": "3011001"
            },
            {
                "name": "rev",
                "value": "1"
            },
            {
                "name": "metadata",
                "value": "created_at 2018_11_09,by al0ne"
            }
        ],
        "pcre": "\"/\\x00\\x10\\x00\\x01|\\x00\\x0f\\x00\\x01|\\x00\\x05\\x00\\x01/\"",
        "priority": 0,
        "raw": "alert udp $HOME_NET any -> $EXTERNAL_NET 53 (msg:\"Suspicious dns request\";
                flow:established,to_server; content:\"|01 00|\"; depth:4; pcre:\"/\\x00\\x10\\x00\\x01
                |\\x00\\x0f\\x00\\x01|\\x00\\x05\\x00\\x01/\"; dsize:>200; classtype:trojan-activity; sid:3011001;
                rev:1; metadata:created_at 2018_11_09,by al0ne;)",
        "references": [],
        "rev": 1,
        "sid": 3011001
    }
]

Note

A CDF writer can construct a signature from this data via:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
filters:
 - parse-snort
 - iterate
report:
  signature-sets:
    snort_signatures:
      items:
        type: snort
        name: !expr data.sid | string
        value: !expr data.raw
      attribute-sets:
        - snort_signatures_attributes
  attribute-sets:
    snort_signatures_attributes:
      items:
        - name: Signature Action
          value: !expr data.action

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[
    {
        "attributes": [
            {
                "name": "Signature Action",
                "value": "alert"
            }
        ],
        "description": "",
        "name": "3013005",
        "status_id": 1,
        "type": {
            "name": "snort"
        },
        "value": "alert tcp $HOME_NET any -> any 3306 (msg: \"mysql general_log write file\"; flow:...",
    },
    {
        "attributes": [
            {
                "name": "Signature Action",
                "value": "alert"
            }
        ],
        "description": "",
        "name": "3011001",
        "status_id": 1,
        "type": {
            "name": "snort"
        },
        "value": "alert udp $HOME_NET any -> $EXTERNAL_NET 53 (msg:\"Suspicious dns request\"; flow...",
    }
]

Parse STIX Filter

Overview

The ParseSTIX Filter is used to parse STIX data. The filter leverages the stix module to parse an incoming data value that is formatted as STIX data.

Usage
1
2
3
filters:
  - parse-stix:
      no_tq_prep: True/False # Optional, defaults to False

Note

The Parse STIX Filter does not require the CDF writer to provide which version of STIX should be parsed. The version is inferred from the incoming value’s type and content.

Incoming Value

A dictionary, string, or an iterable of strings of STIX formatted data.

Note

Currently, only STIX versions 1.1.1, 1.2, and 2.0 are supported. See threatq.core.lib.stix.

Transform Result

ThreatObject data formatted as a dictionary if no_tq_prep is False, which can be directly passed to the reporter. If no_tq_prep is True, the result is a simple dictionary that is not preformatted to adhere to ThreatQ’s ThreatObjects, which requires additional processing by the CDF writer before it can be passed to the reporter.

Examples

For most all cases, one simply has to call the parse-stix filter with no arguments.

Filter Chain

1
2
filters:
  - parse-stix

This parses the current value in the filter chain as STIX data, returning the result as the value for the next filter in the filter chain. One can always modify values on the resulting dictionary value before passing it to the reporter. When reporting the results of the parse-stix filter, one has to simply specify object-set objects by referencing keys on the result corresponding to the api type of the model being reported. Eg:

Filter Chain

1
2
3
4
5
6
7
8
9
filters:
  - parse-stix
report:
  indicator-sets:
    default:
      items: !expr data.indicators
  adversary-sets:
    default:
      items: !expr data.adversaries

In some cases, it may be desirable to get the parsed STIX data as just a simple dictionary that is not preformatted to adhere to ThreatQ’s ThreatObjects. To achieve this, simply pass the no_tq_prep flag to the parse-stix filter like so:

Filter Chain

1
2
3
filters:
  - parse-stix:
      no_tq_prep: True

Warning

Further processing is needed in this case to make the parsed STIX data reportable.

Parse XML Filter

Overview

The ParseXML Filter is used to parse XML data. The filter leverages the xmltodict module to parse an incoming XML file as a dictionary (see examples below).

Usage
1
2
filters:
  - parse-xml
Incoming Value

XML formatted data.

Transform Result

Data formatted as a dictionary.

Examples

Input File Contents:

1
2
3
4
5
6
7
8
9
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>

Filter Chain

1
2
filters:
  - parse-xml

Output

The output of this filter chain is the following dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
  'root': {
    'e': [
      None,
      'text',
      {
        '@name': 'value'
      },
      {
        '@name': 'value',
        '#text': 'text'
      },
      {
        'b': 'text',
        'a': 'text'
      },
      {
         'a': [
           'text',
           'text'
         ]
      },
      {
        '#text': 'text',
        'a': 'text'
      }
    ]
  }
}

Parse YARA Filter

Overview

The ParseYARA Filter allows a CDF writer to parse an incoming value that is a string of YARA signatures. The filter leverages the Plyara module to parse an incoming string containing one or more YARA signatures as a list of dictionaries.

Usage
1
2
filters:
  - parse-yara
Incoming Value

YARA signatures as a string.

Transform Result

Data formatted as a list of dictionaries.

Examples

Input File Contents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
rule DummyRule1 : Tag1 Tag2
 {
    meta:
        Author = "Robert"
        MD5_1 = "11111111111111111111111111111111"
        MD5_2 = "22222222222222222222222222222222"
        Description = "blabla bla foo"

    strings:
        $a = "dummy1" nocase fullword
        $b = "dummy2" nocase fullword
        $c = "dummy3" nocase fullword

    condition:
        2 of them
 }

rule DummyRule2 : Tag1 Tag3
 {
    meta:
        Author = "Asztalos"

    strings:
        $d = "dummy4"

    condition:
        all of them
 }

Filter Chain

1
2
filters:
  - parse-yara

Output

The output of this filter chain is the following list of dictionaries (note that some string values were truncated for documentation purposes):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
[
    {
        "condition_terms": [
            "2",
            "of",
            "them"
        ],
        "metadata": [
            {
                "Author": "Robert"
            },
            {
                "MD5_1": "11111111111111111111111111111111"
            },
            {
                "MD5_2": "22222222222222222222222222222222"
            },
            {
                "Description": "blabla bla foo"
            }
        ],
        "raw_condition": "condition:\n        2 of them\n ",
        "raw_meta": "meta:\n        Author = \"Robert\"\n        MD5_1 = \"11111111111111111111111111111111\"\n...",
        "raw_signature": "rule DummyRule1 : Tag1 Tag2 {\n\n\tmeta:\n\t\tAuthor = \"Robert\"\n\t\tMD5_1 = ...",
        "raw_strings": "strings:\n        $a = \"dummy1\" nocase fullword\n..."
        "rule_name": "DummyRule1",
        "start_line": 1,
        "stop_line": 16,
        "strings": [
            {
                "modifiers": [
                    "nocase",
                    "fullword"
                ],
                "name": "$a",
                "type": "text",
                "value": "dummy1"
            },
            {
                "modifiers": [
                    "nocase",
                    "fullword"
                ],
                "name": "$b",
                "type": "text",
                "value": "dummy2"
            },
            {
                "modifiers": [
                    "nocase",
                    "fullword"
                ],
                "name": "$c",
                "type": "text",
                "value": "dummy3"
            }
        ],
        "tags": [
            "Tag1",
            "Tag2"
        ]
    },
    {
        "condition_terms": [
            "all",
            "of",
            "them"
        ],
        "metadata": [
            {
                "Author": "Asztalos"
            }
        ],
        "raw_condition": "condition:\n        all of them\n ",
        "raw_meta": "meta:\n        Author = \"Asztalos\"\n\n    ",
        "raw_signature": "rule DummyRule2 : Tag1 Tag3 {\n\n\tmeta:\n\t\tAuthor = \"Asztalos\"\n\n\tstrings:...",
        "raw_strings": "strings:\n        $d = \"dummy4\"\n\n    ",
        "rule_name": "DummyRule2",
        "start_line": 18,
        "stop_line": 28,
        "strings": [
            {
                "name": "$d",
                "type": "text",
                "value": "dummy4"
            }
        ],
        "tags": [
            "Tag1",
            "Tag3"
        ]
    }
]

A CDF writer can construct a signature from this data via:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
filters:
 - parse-yara
 - iterate
report:
  signature-sets:
    yara_signatures:
      items:
        type: yara
        name: !expr data.rule_name
        value: !expr data.raw_signature
      attribute-sets:
        - yara_signatures_attributes
  attribute-sets:
        yara_signatures_attributes:
          items:
            - name: Tags
              value: !expr data.tags

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[
    {
        "attributes": [
            {
                "name": "Tags",
                "value": "Tag2"
            },
            {
                "name": "Tags",
                "value": "Tag1"
            }
        ],
        "description": "",
        "name": "DummyRule1",
        "status_id": 1,
        "type": {
            "name": "yara"
        },
        "value": "rule DummyRule1 : Tag1 Tag2 {\n\n\tmeta:\n\t\tAuthor = \"Robert\"\n\t\tMD5_1 = ...",
    },
    {
        "attributes": [
            {
                "name": "Tags",
                "value": "Tag1"
            },
            {
                "name": "Tags",
                "value": "Tag3"
            }
        ],
        "description": "",
        "name": "DummyRule2",
        "status_id": 1,
        "type": {
            "name": "yara"
        },
        "value": "rule DummyRule2 : Tag1 Tag3 {\n\n\tmeta:\n\t\tAuthor = \"Asztalos\"\n\n\tstrings...",
    }
]

Regex Filters

Regex Find All Filter
Overview

This filter returns a list containing all of the substrings within the value that match the regular expression. In implementation, this filter is similar to the Python re.findall()

Note

It is a common pattern to use regex filters with a Filter-Mapping Filter filter mapping. Since the filter will return None on the condition that no matches were found, it is advised to use the Each filter construct to handle the None result.

Usage
1
2
filters:
  - regex-findall: regular expression
Incoming Value

Any string value.

Transform Result

Returns all substrings that match the regular expression

Examples

Suppose a CDF writer has a list of IP Addresses mixed with domain names. In this example, the CDF writer only wants the domain extensions:

Input File Contents

1
2
3
4
5
6
7
8
192.168.1.1
facebook.pro
1.1.1.1
0.2.4.5
duckduckgo.com
172.217.2.110
52.149.246.39
google.net

Filter Chain

1
2
3
4
filters:
  - split-lines
  - iterate
  - regex-findall: \.[a-zA-Z]+ #Do not wrap in double quotes

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
[
    [],
    [
        ".pro"
    ],
    [],
    [],
    [
        ".com"
    ],
    [],
    [],
    [
        ".net"
    ]
]
#The CDF writer would then likely use a ``drop`` filter to clear out the empty results.
Regex Match Filter
Overview

If the value matches the regular expression as the beginning of the string, this filter returns the match object representing the match (otherwise None) In implementation, this filter is similar to the native Python re.match()

Usage
1
2
filters:
  - regex-match: regular expression
Incoming Value

Any string value.

Transform Result

Returns the match object containing the match.

Examples

Suppose a CDF writer has a list of IP Addresses mixed with domain names and wants to extract IP Addresses.

Input File Contents

1
2
3
4
5
6
7
8
192.168.1.1
facebook.pro
1.1.1.1
0.2.4.5
duckduckgo.com
172.217.2.110
52.149.246.39
google.net

Filter Chain

1
2
3
4
5
filters:
  - split-lines
  - iterate
  - regex-match: ^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
  - new: !expr value.group(0) #Since the filter returns a ``Match Object``, we group the span

Output

1
2
3
4
5
6
7
[
    "192.168.1.1",
    "1.1.1.1",
    "0.2.4.5",
    "172.217.2.110",
    "52.149.246.39"
]
Regex Replace Filter
Overview

This filter replaces all substrings within the value that match the regular expression with a specified replacement string and returns the result. In implementation, this filter is similar to the native Python re.sub()

Usage
1
2
filters:
  - regex-replace: ['regex', 'new content'] #Single quotes only
Incoming Value

Any string value.

Transform Result

Returns the string value with matching regex replaced by specified content.

Examples

Input File Content

1
2
3
4
5
6
7
8
192.168.1.1
facebook.pro
1.1.1.1
0.2.4.5
duckduckgo.com
172.217.2.110
52.149.246.39
google.net

Filter Chain

1
2
3
4
filters:
  - split-lines
  - iterate
  - regex-replace: ['\.[a-zA-Z]+', '.Replaced']

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[
    "192.168.1.1",
    "facebook.Replaced",
    "1.1.1.1",
    "0.2.4.5",
    "duckduckgo.Replaced",
    "172.217.2.110",
    "52.149.246.39",
    "google.Replaced"
]
Regex Search Filter
Overview

If the value matches the regular expression (anywhere in the string), this filter returns the match object representing the first match. Otherwise, None.

Usage
1
2
filters:
    - regex-search: regular expression
Incoming Value

Any string value.

Transform Result

Returns the match object containing the match.

Examples

Input File Contents

1
2
3
4
5
6
7
8
192.168.1.1
facebook.pro
1.1.1.1
0.2.4.5
duckduckgo.com
172.217.2.110
52.149.246.39
google.net

Filter Chain

1
2
3
4
5
filters:
  - split-lines
  - iterate
  - regex-search: \.[a-zA-Z]+
  - new: !expr value.group(0) #Since the filter returns a ``Match Object``, we group the span

Output

1
2
3
4
5
[
    ".pro",
    ".com",
    ".net"
]
Common Pitfalls

A CDF writer may be accustomed to using single and/or double quotes when specifying filter defintions. Most of the time, it does not functionally matter whether a ' is used versus ". It is important to note that double quoted scalar’s can be escaped, but single quotes can not. A CDF writer should be careful of this specifically when defining regex filters.

- regex-search: '\.[a-zA-Z]+' OK, quotes not required
- regex-search: "\.[a-zA-Z]+" Results in Scanner Error

Single quotes are required for the regex-replace filter. Double quotes or the omission of quotes may result in an error.

- regex-replace: [\.[a-zA-Z]+, .Replaced] Omission of quotes results in a ParserError
- regex-replace: ["\.[a-zA-Z]+", ".Replaced"] Results in ScannerError
- regex-replace: ['\.[a-zA-Z]+', '.Replaced'] OK. Single quotes required

Run Variable Filters

Get Run Variable Filter
Overview

ToDo

Usage

ToDo

Incoming Value

ToDo

Transform Result

ToDo

Examples

ToDo

Set Run Variable Filter
Overview

ToDo

Usage

ToDo

Incoming Value

ToDo

Transform Result

ToDo

Examples

ToDo

Set Filter

Overview

The Set filter allows a CDF writer to set values onto specified keys on an incoming dictionary or object value. It is also the entry point for calling Supplemental Feeds.

Usage - Standard Dictionary/Object Transformation
1
2
3
4
filters:
  - set:
      key_a: Some String
      key_b: !expr value.some_key

The Set Filter is configured with any number of key-value pairs. The Set Filter transforms the incoming dictionary or object value by setting each specified key-value pair into the incoming value. In basic Python terms, a loop not unlike the following is performed:

1
2
for key, value in set_filter_args.items():
    incoming_value[key] = value

As may be evident by the example code above, the Set Filter resets any existing keys’ value if specified. If a key is specified that does not already exist in the incoming dictionary or object value, that key is created on the incoming value with its specified value.

In addition, the declared order of keys are not taken into account. Each key is evaluated independently of other keys on the same level. In the example below, key_xyz will resolve to “charlie”, not “lima”, because it is not being evaluated/set one by one.

1
2
3
4
5
6
7
filters:
  - new:
      key_abc: "charlie"

  - set:
      key_abc: "lima"
      key_xyz: !expr value.key_abc

Warning

One should not set a complex object like feed: {key_a: value_a, ...} using the Set Filter. The feed key mapping to some sub-dictionary implies a Supplemental Feed call. See the Usage - Supplemental Feed Calls section below.

Usage - Supplemental Feed Calls

The Set Filter is also the entry point to calling Supplemental Feeds from within a Primary Feed definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
filters:
  - set:
      supp_feed_results:
        feed:
          name: Supplemental Feed Name
          run-once: False                  # Optional, defaults to False
          default: !expr '[]'              # Optional, defaults to None
          run-params:                      # Optional, defaults to None
            since: !expr run_meta.since
            until: !expr run_meta.until
            ids: !expr value.ids_to_lookup

To call a Supplemental Feed, one specifies the key into which the Supplemental Feed’s results should be set and passes to that a mapping keyed to the key feed containing the following information:

  • name - The name of the Supplemental Feed one wishes to call. The named Supplemental Feed must exist within the feed definition, otherwise a SupplementalFeedError is raised.

  • run-once - Optional, defaults to False. If True, this Supplemental Feed call only ever triggers once per Feed Run for the Feed in which it is called. This means that, even if the Supplemental Feed call is made after an Iterate Filter, the Supplemental Feed only ever runs once, no matter how many data values are iterated over.

  • default - Optional, defaults to None. Only applies if the run-once field is configured as True. If the Supplemental Feed was configured to run-once, then the specified default value is returned each time the Set Filter calling said Supplemental Feed is called after the first run.

  • run-params - Optional, defaults to None. Dictionary mapping of any run-params that need to be passed in to the Supplemental Feed being called.

For more information on Supplemental Feeds in general, see Supplemental Feeds.

Incoming Value

Any dictionary or object value.

Transform Result

The incoming dictionary or object with values set for each key as supplied per the Each Filter’s configuration.

Examples - Standard Usage

Often, it is useful to use the Set Filter to set a formatted value that is deeply embedded within the incoming data value:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
filters:
  - new:
      actor_group:
          assc_countries:
            - China
            - Russia
            - USA
      name: Swoll Marmot
  - set:
      america: !expr value.actor_group.assc_countries[-1]
  - unset-key: actor_group

Output

1
2
3
4
{
  "america": "USA",
  "name": "Swoll Marmot"
}

As one may have deduced, most simple Filter Mapping/Get Filter combinations can be equated in a Set Filter transform.

Indeed, one can also equate most Filter Mapping/Text Filter combinations as well by simply resetting a value transformed via a Jinja2 expression into it’s same, original key:

Filter Chain

1
2
3
4
5
6
7
8
9
filters:
  - new:
      key_a: iS tHiS tHe ReAl LiFe
      key_b: oR is tHIS FANTASY
      key_c: Caught iNnA Landslide
  - set:
      key_a: !expr value.key_a.upper()
      key_b: !expr value.key_b.split()[-1]
      key_c: !expr value.key_c.lower()

Output

1
2
3
4
5
{
    "key_a": "IS THIS THE REAL LIFE",
    "key_b": "FANTASY",
    "key_c": "caught inna landslide"
}

Warning

While one can reset keys on the current value with set, some consideration is required. Since order is not assured on Python dictionaries, one cannot be sure which transformation is applied first when the Set Filter runs. As such, one should not change a key on the current value and then reference it again within the same Set Filter. For example, this Filter Chain:

1
2
3
4
5
6
filters:
  - new:
      name: Vast Iguana
  - set:
      name: !expr value.name.upper()
      lizard: !expr value.name.split()[-1]

Can result in two distinctly different values:

1
2
3
4
5
6
7
8
9
{
    "lizard": "IGUANA",
    "name": "VAST IGUANA"
}
########### OR ###########
{
    "lizard": "Iguana",
    "name": "VAST IGUANA"
}

One can also use the Set Filter to set in complex objects into an incoming value:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
filters:
  - new:
      name: Elusive Axolotl
  - set:
      dict_a:
          dict_b:
              key_a: value_a
              key_b: value_b
          some_list:
            - 1
            - 2
            - 3
      dict_b:
          key_c: value_c
          key_d: value_d

The resulting value would look like:

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
    "dict_a": {
        "dict_b": {
            "key_a": "value_a",
            "key_b": "value_b"
        },
        "some_list": [
            1,
            2,
            3
        ]
    },
    "dict_b": {
        "key_c": "value_c",
        "key_d": "value_d"
    },
    "name": "Elusive Axolotl"
}
Examples - Supplemental Feed Usage

Supplemental Feeds offer a CDF writer the ability to make subsequent data requests from within a Primary Feed’s Filter Chain. The entry point for calling Supplemental Feeds within the Filter Chain is the Set Filter, as one needs to “set” the Supplemental Feed’s resulting data into the current value in the calling Primary Feed’s Filter Chain. General syntax usually involves sending some value from the Primary Feed into the Supplemental Feed as a run_param and utilizing said value to make some kind of enrichment request. For example:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
feeds:
  Primary Feed:
    # ...
    filters:
      - parse-json
      - get: data
      - iterate
      - set:
          enrichment_data:
            feed:
              name: Supplemental Feed
              run-params:
                adversary_id: !expr value.id_
    # ...

  Supplemental Feed:
    feed_type: supplemental
    source:
      http:
        url: www.some-fake-provider.zyx/adversaries/enrichment
        params:
          id_: !expr run_params.adversary_id
    # ...

One should keep in mind the context of the Filter Chain when making Supplemental Feed requests. For instance, in the example above, since the Supplemental Feed call is made after using the Iterate Filter on the data list, a supplemental enrichment API call is made for each item in the data list.

If a provider supports some sort of bulk API lookup functionality, this can be leveraged to reduce the number of total calls made to the provider. One would just need to format a list/mapping of values to send as a run_param to the bulk enrichment Supplemental Feed first. For example:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
feeds:
  Primary Feed:
    # ...
    filters:
      - parse-json
      # Copy the list of objects
      - set:
          lookup_keys: !expr value.data
      - filter-mapping:
          lookup_keys:
            each:
              get: id_
      - set:
          enrichment_data:
            feed:
              name: Bulk Enrichment Supplemental Feed
              run-params:
                lookup_keys: !expr value.lookup_keys
      # ...

  Bulk Enrichment Supplemental Feed:
    feed_type: supplemental
    source:
      http:
        url: www.some-fake-provider.zyx/adversaries/enrichment
        data:
          ids: !expr run_params.lookup_keys

Note

Further filter processing leveraging parent_values to associate objects in data with enrichment results in enrichment_data would be necessary here.

Set Default Filter

Overview

The SetDefault filter is used to set default values for keys or attributes depending on if the current value in the filter chain is a dictionary mapping or an object instance, respectively. If the key or attribute already exists in the value dictionary mapping or object instance, its value is left untouched; otherwise, the specified default value is set. This filter helps in cases where a feed provider does not always return the same set of data and allows the CDF writer to normalize the data set that is used in subsequent filters.

Note

Before the introduction of the Set Default Filter, it was necessary to use a Jinja2 Expression as a value in a key-value pair passed to the Set Filter.

Usage
1
2
3
4
5
filters:
  - set-default:
      some_key_a: !expr '[]'
      some_key_b: !expr '{}'
      some_key_c: Default String!

The Set Default Filter’s argument is a dictionary mapping of key-value pairs in which, for each pair:

  • The key is used to look for a matching dictionary key or attribute name on the incoming value, and

  • The value is set to the key on the incoming value if the key does not exist on it

Incoming Value

A dictionary mapping or an object instance.

Transform Result

The dictionary mapping or object instance passed to the filter is returned containing all of the keys specified in the parameter dictionary mapping. If any of the specified keys did not previously exist on the incoming value, the key is now set to the default value specified for the key in the parameter dictionary mapping.

Examples

In most cases, assuring that a key or attribute has a default value is key in avoiding a KeyError or AttributeError when the filter chain is ran against data that may not consistently have a wanted key or attribute defined. A feed source may choose to exclude a key if it does not have data for it instead of providing an appropriate default value.

In the following example, the CDF writer wants to apply the Title Filter to each item in a list referenced by the tags key in JSON objects returned by the feed source. The CDF writer has the following filter chain constructed:

Filter Chain Without set-default

1
2
3
4
5
6
7
filters:
  - json
  - iterate
  - filter-mapping:
      tags:
        each:
          title

With the sample input that the CDF writer has available at the time, it looks as if the feed source always returns a tags key. Everything may seem fine until one day there is an error generated by the feed:

ERROR Error applying filter FilterMapping(tags=Each(filter=Title())) to value {}: KeyError('tags',)

Looking at the JSON input, the CDF writer notices that there is an object that does not have a tags key.

Output Without set-default

1
2
3
4
5
6
7
8
[
  {
    "tags": ["malware", "vulnerability", "exploit"]
  },
  {

  }
]

The CDF writer learned an important lesson this day: be proactive and do not make assumptions that the feed source always provides keys that need to be further transformed in the Filter Chain.

The CDF writer inserts the set-default filter before the filter-mapping filter since this filter needs the tags key to exist on the current value:

Filter Chain With set-default

1
2
3
4
5
6
7
8
9
filters:
  - json
  - iterate
  - set-default:
      tags: []
  - filter-mapping:
      tags:
        each:
          title

The CDF writer also sets the default value of tags to an empty list since the filter-mapping is applying the title filter to each item in the list, which gracefully handles an empty list as opposed to some other data type.

Rerunning the updated filter chain against the input that caused the KeyError results in the following expected output:

Output With set-default

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[
  {
    "tags": [
      "Malware",
      "Vulnerability",
      "Exploit"
    ]
  },
  {
    "tags": []
  }
]

Note

The Set Default Filter’s only determination for whether the specified default value should be assigned to the associated key or attribute is based on whether the key or attribute exists on the current value in the filter chain. It has nothing to do with whether the current value assigned to the key or attribute is truthy or falsy.

1
2
3
4
5
filters:
  - new:
      tags: null
  - set-default:
      tags: []

This results in value being a dict containing a tags key whose value is set to null. This is because by the time the set-default filter is applied, tags already exists on value, and even though its value is set to null, which is falsy, its value is not reassigned.

Set Index Filter

Overview

The SetIndex Filter allows a CDF writer to explicitly set a given value into a given element index on a list value.

Usage
1
2
3
4
filters:
  - set-index:
      index: 0
      value: some value

Warning

One should keep general list IndexError’s in mind when leveraging the Set-Index Filter. For instance, attempting to set a value into an index that is greater than the list’s length raises an IndexError.

Incoming Value

Any list value.

Transform Result

The same list with the specified value set into the specified index.

Examples

To set a value as the last element of a given list, one could do the following:

Input File Contents

1
2
3
4
5
6
7
8
2.2.2.2
2.2.2.2
duckduckgo.com
duckduckgo.com
google.net
google.net
facebook.pro
facebook.com

Filter Chain

1
2
3
4
5
filters:
  - split-lines
  - set-index:
      index: -1
      value: "Last Value Replaced"

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[
    [
        "2.2.2.2",
        "2.2.2.2",
        "duckduckgo.com",
        "duckduckgo.com",
        "google.net",
        "google.net",
        "facebook.pro",
        "Last Value Replaced"
    ]
]

Warning

One cannot currently utilize the Set-Index filter to append a value to a list, only set values into specific, already defined indices. To accomplish an append-like behavior, one would need to leverage the Set Filter and Jinja2 Expressions as such:

1
2
3
filters:
  - set:
      list_to_append_to: !expr 'value.list_to_append_to + [some_element]'

Summarize IP Range Filter

Overview

The SummarizeIPRange Filter allows a CDF writer to transform an IP range into a list of CIDR blocks.

New in version 4.36.0.

Usage
1
2
3
4
filters:
  - summarize-ip-range:
      start: !expr value.start_ip
      end: !expr value.end_ip

One can also use the Summarize IP Range Filter by supplying Jinja2 expressions as positional arguments:

1
2
3
4
filters:
  - summarize-ip-range:
      - !expr value.start_ip
      - !expr value.end_ip
Incoming Arguments

Two string formatted IP addresses.

Transform Result

A list of CIDR blocks given the start and end IP addresses.

Examples

Filter Chain

1
2
3
4
filters:
  - summarize-ip-range:
      start: 136.1.1.0
      end: 136.1.5.255

Output

1
2
3
4
5
[
  "136.1.1.0/24",
  "136.1.2.0/23",
  "136.1.4.0/23"
]

Feed providers may not provide the boundaries of an IP range in a nicely parsed format. Here’s an example in which the CDF writer needs to parse out the IP Addresses from a string:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filters:
  - new:
      ip_range: '136.1.1.0 - 136.1.5.255'
  - filter-mapping:
      ip_range:
        chain:
          - split: ' - '
          - summarize-ip-range:
              start: !expr value.0
              end: !expr value.1

Output

1
2
3
4
5
6
7
{
    "ip_range": [
        "136.1.1.0/24",
        "136.1.2.0/23",
        "136.1.4.0/23"
    ]
}

Switch Filter

Overview

The Switch Filter allows CDF writers to evaluate multiple conditional cases, using the switch statement to determine the branch of execution to follow.

As per standard conventions for Switch Statements, only the first branch of logic whose condition resolves to true will be executed.

Usage
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
filters:
  - switch:
    - condition: !expr value.is_fdqn
      filter:
        chain:
          - iterate
          - csv
          ...
    - condition: !expr value.is_ip
      filter:
        ...
    - filter:
        ...

A default case can be achieved if no condition is specified.

Incoming Value

This can be any value.

Transform Result

If one of the condition’s resolves as True, the incoming value will be transformed as per the associated filter.

Note

Prior to ThreatQ version 4.55.0 if a value did not satisfy one of the switch filter conditions and a default case was not explicitly set, the value would be silently dropped from the filter-chain. This behavior was modified in version 4.55.0 so that if a value does not satisfy a condition in the switch statement the value will continue to pass through the rest of the filter-chain.

Examples

The Switch Filter is useful in situations where multiple complex transformations are required conditionally. For instance: given a list of dictionary IOC values, each specifying a type and different keys/value pairs relative to said type, a writer could leverage the Switch Filter to concisely map each Indicator Type case:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
- parse-json
- iterate
- switch:
  - condition: !expr value.type == 'FQDN'
    filter:
      new:
          value: !expr value.fqdn
          type: FQDN
          attributes:
            - name: Scheme
              value: !expr value.scheme
  - condition: !expr value.type == 'IP'
    filter:
      chain:
        - filter-mapping:
            sighted_at: timestamp
        - new:
            value: !expr value.ipv4
            type: IP Address
            attributes:
              - name: Sighting
                value: !expr value.sighted_at
  - filter: drop

In this example, a simple Drop Filter is used as the default case, denoting that any object type’s that are not accounted for within the switch statement will be dropped from consideration.

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
filters:
  - new:
      contents: [1, 2, 3, 4, 5, 6]
  - get: contents
  - iterate
  - new:
      num: !expr value
      text: !tmpl 'This is Mambo Number {{value}}'
  - switch:
      - condition: !expr value.num != 5
        filter:
          filter-mapping:
            text:
              chain:
                - new:
                    text: 'This is not Mambo Number 5!'
                - get: text

In this example, prior to ThreatQ version 4.55.0, the switch statement would evaluate on the condition and silently drop any value that did not satisfy a condition.

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
    "num": 1,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 2,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 3,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 4,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 6,
    "text": "This is not Mambo Number 5!"
}

After ThreatQ version 4.55.0, the same condition produces the following:

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
    "num": 1,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 2,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 3,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 4,
    "text": "This is not Mambo Number 5!"
},
{
    "num": 5,
    "text": "This is Mambo Number 5"
},
{
    "num": 6,
    "text": "This is not Mambo Number 5!"
}

Text Filters

The following filters allow a CDF writer to transform a string data value.

CaseFold Filter
Overview

The CaseFold Filter removes all case distinctions present in a string. It is useful for case-less matching.

Usage
1
2
filters:
  - casefold
Incoming Value

Any string value.

Transform Result

The same value casefolded.

Examples

The casefold filter can be used to assure more accurate case-less string comparisons. For instance, the German lowercase letter ß is equivalent to ss. However, since ß is already lowercase, a classic lower method does nothing to it, while casefold converts it to ss.

Filter Chain

1
2
3
filters:
  - new: 'Heißluftballon'
  - casefold

Output

1
2
3
[
    "heissluftballon"
]
Lower Filter
Overview

The Lower Filter transforms a text value into all lowercase.

Usage
1
2
filters:
  - lower
Incoming Value

Any string value.

Transform Result

The same value transformed to lowercase.

Examples

Filter Chain

1
2
3
filters:
  - new: 'HoWdY paRtNeR!'
  - lower

Output

1
2
3
[
    "howdy partner!"
]
Replace Filter
Overview

The Replace Filter transforms a text value by doing a substring replace on some target substring.

Usage
1
2
3
4
filters:
  - replace:
      old: target string
      new: replacement string
Incoming Value

Any string value.

Transform Result

The same value transformed via substring replacement.

Examples

The Replace Filter can be immensely useful for removing undesired text from some value. For instance, the following:

Filter Chain

1
2
3
4
5
filters:
  - new: 'Reporter - CrowdStrike'
  - replace:
      old: 'Reporter - '
      new: ''

Output

1
2
3
[
    "CrowdStrike"
]
Split Filter
Overview

The Split Filter allows a CDF writer to split a string value into a list of substrings based on a given delimiter.

Usage
1
2
3
4
filters:
  - split:
      sep: ','       # Optional, defaults to None
      maxsplit: 1    # Optional, defaults to None

If sep is not supplied, any whitespace character is assumed a separator. If maxsplit is given, at most that many substring splits are done, starting at the beginning of the string.

If neither option is required, the filter can be short-handed as:

1
2
filters:
  - split
Incoming Value

Any string value.

Transform Result

A list of substring values based on the specified sep delimiter.

Examples

When using the Split Filter, a CDF writer must be aware that they are transforming a singular string value into a list of string values. As such, the result of a Split Filter would usually be acted upon in a Filter Mapping and Each combination. For example:

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
filters:
  - new:
      name: Some Mock Adversary
      aliases: APT13##COMMA##APT66##COMMA##Karma Police
  - filter-mapping:
      aliases:
        chain:
          - split:
              sep: ##COMMA##
          - each:
            - new:
                name: Adversary Alias
                value: !expr value

The filter-mapping in this example splits the string of aliases and then loops through each alias substring, transforming it into a new dictionary value. The resulting value would look like:

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
    "aliases": [
        {
            "name": "Adversary Alias",
            "value": "APT13"
        },
        {
            "name": "Adversary Alias",
            "value": "APT66"
        },
        {
            "name": "Adversary Alias",
            "value": "Karma Police"
        }
    ],
    "name": "Some Mock Adversary"
}
RSplit Filter
Overview

Like the Split Filter, the RSplit Filter splits substrings starting from the end of the string rather than the beginning.

Usage
1
2
3
4
filters:
  - rsplit:
      sep: ','       # Optional, defaults to None
      maxsplit: 1    # Optional, defaults to None

If sep is not supplied, any whitespace character is assumed a separator. If maxsplit is given, at most that many substring splits are done, starting at the end of the string.

If neither option is required, the filter can be short-handed as:

1
2
filters:
  - rsplit
Incoming Value

Any string value.

Transform Result

A list of substring values based on the specified sep delimiter.

Examples

Filter Chain

1
2
3
4
filters:
  - new: "We're gonna walk down to Electric Avenue"
  - rsplit:
     sep: 'o'

Output

1
2
3
4
5
6
7
8
[
    [
        "We're g",
        "nna walk d",
        "wn t",
        " Electric Avenue"
    ]
]
Split-Lines Filter
Overview

The SplitLines Filter splits a line boundary delimited string value into a list of line substrings.

Usage
1
2
3
filters:
  - split-lines:
      keepends: False  # Optional, defaults to ``False``

If keepend is specified as True, then the line boundary characters are preserved at the end of each line substring. By default, these characters are not included. Since default usage is almost always desired, the Split-Lines Filter can be short-handed as:

1
2
filters:
  - split-lines

Note

Both split-lines and splitlines are valid entry points for the Split-Lines Filter.

Incoming Value

Any line boundary delimited string.

Transform Result

A list of line substring values.

Examples

The Split-Lines Filter can be used to iterate over a line boundary delimited text string. For example:

Filter Chain

1
2
3
4
5
filters:
  - new: "test1\ntest2\rtest3" #Double quotes required
  - split-lines
  - iterate
  ...

Output

1
2
3
4
5
[
    "test1",
    "test2",
    "test3"
]
Strip Filter
Overview

The Strip Filter transforms a text value by removing leading and trailing whitespace.

Usage
1
2
3
filters:
  - strip:
      chars: ' \~#' # Optional, defaults to ``None`` (i.e., just whitespace)

If chars is specified, a string of target characters to strip from the value is expected.

Warning

If specifying explicit chars to strip, one must include whitespace characters.

Usually, a user needs to just strip whitespace from a value. In this case, the Strip Filter can be short-handed as:

1
2
filters:
  - strip
Incoming Value

Any string value.

Transform Result

The same value stripped of whitespace and any specified chars, (if applicable).

Examples

The Strip Filter strips leading and trailing characters from a value. For example:

Filter Chain

1
2
3
4
filters:
  - new: ' #~Tester~# '
  - strip:
      chars: ' ~#'

Output

1
2
3
[
    "Tester"
]
LStrip Filter
Overview

Like the Split Filter, but only leading characters are considered for stripping.

Usage
1
2
3
filters:
  - lstrip:
      chars: ' \~#'  # Optional, defaults to ``None`` (ie, just whitespace)

If chars is specified, a string of target characters to strip from the value is expected.

Warning

If specifying explicit chars to strip, one must include whitespace characters.

Usually, a user needs to just strip whitespace from a value. In this case, the LStrip Filter can be short-handed as:

1
2
filters:
  - lstrip
Incoming Value

Any string value.

Transform Result

The same value stripped of leading whitespace and any specified chars, (if applicable).

Examples

The LStrip Filter strips only leading characters from a value. For example:

Filter Chain

1
2
3
4
filters:
  - new: ' #~Tester~# '
  - lstrip:
      chars: ' ~#'

Output

1
2
3
[
    "Tester ~# "
]
RStrip Filter
Overview

Like the Split Filter, but only trailing characters are considered for stripping.

Usage
1
2
3
filters:
  - rstrip:
      chars: ' \~#'  # Optional, defaults to ``None`` (ie, just whitespace)

If chars is specified, a string of target characters to strip from the value is expected.

Warning

If specifying explicit chars to strip, one must include whitespace characters.

Usually, a user needs to just strip whitespace from a value. In this case, the RStrip Filter can be short-handed as:

1
2
filters:
  - rstrip
Incoming Value

Any string value.

Transform Result

The same value stripped of trailing whitespace and any specified chars, (if applicable).

Examples

The RStrip Filter strips only trailing characters from a value. For example:

Filter Chain

1
2
3
4
filters:
  - new: ' #~Tester~# '
  - rstrip:
      chars: ' ~#'

Output

1
2
3
[
    " #~Tester"
]
Title Filter
Overview

The Title Filter transforms a text value into title case, i.e. initial letters of each work in uppercase while other letters are lowercase.

Usage
1
2
filters:
  - title
Incoming Value

Any string value.

Transform Result

The same value transformed to title case.

Examples

The Title Filter returns a titlecased version of the string supplied to it. For instance, title casing can be expected to behave as follows:

  • i am just a string -> I Am Just A String

  • ThAts mY pUrSe! -> Thats My Purse!

  • works_with_underscores -> Works_With_Underscores

Filter Chain

1
2
3
filters:
  - new: 'i am just a string :: ThAts mY pUrSe! :: works_with_underscores'
  - title

Output

1
2
3
[
    "I Am Just A String :: Thats My Purse! :: Works_With_Underscores"
]
Upper Filter
Overview

The Upper Filter transforms a text value into all uppercase.

Usage
1
2
filters:
  - upper
Incoming Value

Any string value.

Transform Result

The same value transformed to uppercase.

Examples

Filter Chain

1
2
3
filters:
  - new: 'i am just a string :: ThAts mY pUrSe! :: works_with_underscores'
  - upper

Output

1
2
3
[
    "I AM JUST A STRING :: THATS MY PURSE! :: WORKS_WITH_UNDERSCORES"
]

Timestamp Filter

Overview

The Timestamp Filter allows CDF writers to transform a datetime string into a standardized format.

Usage
1
2
3
4
filters:
  - timestamp:
      fmt: '%Y-%m-%d %H:%M:%S'  # Optional, defaults to ``YYYY-MM-DD HH:mm:ssZZ``
      timezone: EST             # Optional, defaults to ``UTC``
Incoming Value

A datetime value. Accepted forms include strings, Python datetime and Arrow objects, and integers (i.e., epoch times).

Transform Result

The same datetime value as a string, formatted as per the field arguments supplied to the Timestamp Filter.

Examples

The most common use case for the Timestamp Filter involves simply calling the Filter without any arguments. This results in a datetime string formatted as per ThreatQ’s standards for timestamps, resulting in a value like 2019-01-01 12:00:00-00:00. Since the Timestamp Filter is applied to exactly one field, it usually appears within a Filter-Mapping like so:

Filter Chain

1
2
3
4
5
filters:
  - json
  - iterate
  - filter-mapping:
      created: timestamp

Note

For consistency’s sake, one should format all datetimes with the Timestamp Filter.

Truncate HTML Filter

Overview

The TruncateHTML Filter allows a CDF writer to transform an incoming HTML string value such that its length is at most the value of the provided limit argument. The CDF writer may also define an end string (defaults to '...') which is inserted at the point where the HTML string is truncated. Even with the inserted end string, the transformed HTML string’s length will not exceed the limit.

This filter attempts to maintain as much of the HTML string as possible by recursively traversing the depth of the HTML snippet provided to it. For example, instead of excluding an entire <p> tag because its contents will not fit within the provided limit, this filter will traverse the contents of the <p> tag to see what can fit in the remaining space.

A truncated HTML string is guaranteed to be valid HTML. If the truncation occurs within the contents of a tag, the tag and its parent tags will have closing tags in the output. If the open and end tags (or a self-closing tag) cannot fit in the remaining space, they are not included in the output.

New in version 4.41.0.

Usage
1
2
3
4
5
6
7
filters:
  - truncate-html:
      limit: 32630  # Required
      end: ...[Truncated - see full report]  # Optional, default: '...'

    # The following shorthand can be used if the default 'end' value suffices
    - truncate-html: 32630
Incoming Value

An HTML string.

Transform Result

A valid HTML string that guarantees the following:

  • its length is at most the value of the provided limit

  • if the string had to be truncated, it includes the provided end string

Note

Even if the whole incoming HTML string can fit within the provided limit (inclusive) and thus is not truncated, there may still be modifications made to the HTML string due to lxml’s HTML parser. Some example modifications include:

  • Inserting a trailing slash into self-closing tags

    • Input: <br>

    • Output: <br/>

  • Normalizing whitespace between tags to a single whitespace character

    • Input: <img class="large" src="https://example.org"/>   <b>  hey  </b>    <span>hey</span>

    • Output: <img class="large" src="https://example.org"/> <b>  hey  </b> <span>hey</span>

  • Normalizing whitespace within a tag to a single whitespace character (except that whitespace before a trailing slash is stripped)

    • Input: <img  class="large"  src="https://example.org"  /><img />

    • Output: <img class="large" src="https://example.org"/><img/>

  • Rearranging attributes within a tag

    • Input: <img src="https://example.org" class="large"/>

    • Output: <img class="large" src="https://example.org"/>

  • Attempting to recover illegal HTML (e.g., block-level elements containing other block-level elements)

    • Input: <p><h3><strong>Words!</strong></h3></p>

    • Output: <p></p><h3><strong>Words!</strong></h3>

    • Some tags, like <div> and <section>, are considered generic containers and are therefore not affected by this.

Note

lxml’s HTML parser outputs a minimal valid HTML document regardless of the HTML string passed to it. For example, if one were to provide the HTML string <em>hey</em>, lxml’s HTML parser would output <html><body><em>hey</em></body></html>. Since the primary use case of the truncate-html filter is to fit a snippet of HTML within a certain limit such that it can, for example, be rendered as an object’s description in the ThreatQ UI, the filter will always output the contents of the <body> element. Therefore, the <html> and <body> tags are effectively stripped away, even if the incoming HTML string contains them.

Examples

Due to current constraints in the ThreatQ Platform, a ThreatQ Object’s description cannot exceed 32,766 characters. Suppose that a CDF writer wants to set a ThreatQ Object’s description to a provider’s rich text HTML description. The provider’s HTML description may range anywhere from several hundred characters to several hundreds of thousands of characters. The CDF writer wants to fit as much of the provider’s HTML description into the ThreatQ Object’s description as possible.

The CDF writer also wants to prepend a link to the full description that is available on the vendor’s website to the ThreatQ Object’s description. As a result, the CDF writer approximates that he or she will be able to fit at most 32,600 characters of the provider’s HTML description to give wiggle room for the link.

The CDF writer may leverage the following solution. For the sake of brevity in this example, please imagine that the rich_text_description field’s value is several large paragraphs of Lorem Ipsum which, in total, exceeds the 32,600 character limit set by the CDF writer.

Response from Provider:

1
2
3
4
5
6
[
    {
        "rich_text_description": "<p>Lorem ipsum dolor sit amet, conse...</p>...<p>...dictum laoreet nisi sit.</p>",
        "url": "https://example.net/27/reports-r-us"
    }
]

Filter Chain:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filters:
  - parse-json
  - iterate
  - filter-mapping:
      rich_text_description:
        chain:
          - truncate-html:
              limit: 32600
              end: ...[Truncated - see link at the top]
          - new: !tmpl '<p><strong>Report Link:</strong> <a href="{{parent_values[1].url}}" target="_blank">{{parent_values[1].url}}</a></p>{{value}}'

Output:

1
2
3
4
5
6
[
    {
        "rich_text_description": "<p><strong>Report Link:</strong> <a href=\"https://example.net/27/reports-r-us\" target=\"_blank\">https://example.net/27/reports-r-us</a></p><p><p>Lorem ipsum dolor sit amet, conse...</p>...<p>Fusce a maximus nisi, in eleifend ligula. D...[Truncated - see link at the top]</p>",
        "url": "https://example.net/27/reports-r-us"
    }
]

Type Filters

The following filters allow a CDF writer to typecast some data value into a specific data type. All type filters inherit from the TypecastFilter base class.

Bool Filter
Overview

The Bool Filter allows a CDF writer to typecast a data value as a boolean value.

Usage
1
2
filters:
  - bool
Incoming Value

This can be any value.

Transform Result

The value typecast as a boolean.

Examples

Boolean typecasting here works exactly the same as Python boolean typecasting and includes the same caveats. Any value that resolves as “truthy” typecasts as True while values resolving as “falsy” typecasts as False. For instance, typecasting the string "false" results in True since this is a valid string value, while typecasting the integer 0 results in False.

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
filters:
  - new:
      key_a: True
      key_b: False
      key_c: 0
      key_d: 1
      key_e: '0' #strings with content return true regardless of what the content is
      key_f: '' #empty strings return false
  - filter-mapping:
      key_a: bool
      key_b: bool
      key_c: bool
      key_d: bool
      key_e: bool

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
[
    {
        "key_a": true,
        "key_b": false,
        "key_c": false,
        "key_d": true,
        "key_e": true,
        "key_f": false
    }
]
Decimal Filter
Overview

The Decimal Filter allows a CDF writer to typecast a data value as a decimal/float value.

Usage
1
2
filters:
  - decimal
Incoming Value

This can be any value.

Transform Result

The value typecast as a Decimal.

Dict Filter
Overview

The Dict Filter allows a CDF writer to typecast a value as a dictionary.

Usage
1
2
filters:
  - dict
Incoming Value

This can be any value.

Transform Result

The value typecast as a dictionary.

Examples

The Dict Filter follows the same __init__ construction arguments as Python’s built in dict, namely:

dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)

Given these construction parameters, one can typecast a dictionary from a list of length 2 lists within a CDF like so:

Filter Chain

1
2
3
filters:
  - new: !expr '[["a", 1], ["b", 2]]'
  - dict

Output

1
2
3
4
5
6
[
    {
        "a": 1,
        "b": 2
    }
]

Note

While not currently possible, the Dict Filter could be leveraged to transform an object to a dictionary if said object implements an __iter__() method.

Int Filter
Overview

The Int Filter allows a CDF writer to typecast a value as an integer.

Usage
1
2
filters:
  - int
Incoming Value

This can be any value.

Transform Result

The value typecast as an integer.

Examples

Integer typecasting follows the same __init__ construction arguments as Python’s built in int, namely:

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is a number, return x.__int__().  For floating point
numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base.  The literal can be preceded by '+' or '-' and be surrounded
by whitespace.

Filter Chain

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
filters:
  - new:
      key_a: True
      key_b: False
      key_c: 1.6 #Rounds toward zero
      key_d: -2.3
  - filter-mapping:
      key_a: int
      key_b: int
      key_c: int
      key_d: int

Output

1
2
3
4
5
6
7
8
[
    {
        "key_a": 1,
        "key_b": 0,
        "key_c": 1,
        "key_d": -2
    }
]
List Filter
Overview

The List Filter allows a CDF writer to typecast a value as a list.

Usage
1
2
filters:
  - list
Incoming Value

This can be any value.

Transform Result

The value typecast as a list.

Examples

List typecasting follows the same __init__ construction arguments as Python’s built in list, namely:

list() -> new empty list
list(iterable) -> new list initialized from iterable's items

One could use the List Filter in order to create a list from a given dictionary’s keys like so:

1
2
3
4
5
filters:
  - new:
      key_a: A
      key_b: B
  - list

Output

1
2
3
4
5
6
[
    [
        "key_b",
        "key_a"
    ]
]
Str Filter
Overview

The Str Filter allows a CDF writer to typecast a value as a string.

Usage
1
2
3
filters:
  - str:
      encoding: utf-8  # Optional, defaults to ``utf-8``

Usually, the default encoding is desired and the filter can be short-handed as:

1
2
filters:
  - str
Incoming Value

This can be any value.

Transform Result

The value typecast as a string.

Examples

The Str Filter is useful when a CDF writer, for instance, wants to represent True/False values as strings rather than booleans. In this example, the Str Filter is leveraged within a Filter Mapping to typecast multiple boolean fields:

Filter Chain

1
2
3
4
5
6
7
filters:
  - new:
      is_malicious: True
      is_unknown: False
  - filter-mapping:
      is_malicious: str
      is_unknown: str

Output

1
2
3
4
5
6
[
    {
        "is_malicious": "True",
        "is_unknown": "False"
    }
]

Unset Key Filter

Overview

The UnsetKey Filter enables a CDF writer to remove a key-value pair from an incoming dictionary value.

Usage
1
2
filters:
  - unset-key: some_key_name

Warning

One cannot use a Jinja2 Expression or Template as an argument for the Unset-Key Filter. One should supply a simple string corresponding to the key one wants to remove.

Incoming Value

Any dictionary value.

Transform Result

The same dictionary value with the specified key and its accompanying value removed.

Examples

This filter can be useful in cases where a provider may supply a large amount of data that is not actually necessary for the reporting the CDF writer wants to achieve. For instance, assume a provider provides a JSON list of objects, each of which having a detailed_attribution key that contains a huge amount of data that the CDF writer does not need. In order to reduce some memory utilization, the CDF writer could explicitly drop this detailed_attribution key:

Input File Contents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
[
    {
        "IP_Addresses": [
            "1.1.1.1",
            "2.2.2.2",
            "3.3.3.3"
        ],
        "detailed_attribution": [
            "1",
            "2",
            "3",
            "4",
            "5",
            "6",
            "7",
            "8",
            "9",
            "0"
        ]
    }
]

Filter Chain

1
2
3
4
filters:
  - json
  - iterate
  - unset-key: detailed_attribution

Output

1
2
3
4
5
6
7
8
9
[
    {
        "IP_Addresses": [
            "1.1.1.1",
            "2.2.2.2",
            "3.3.3.3"
        ]
    }
]

Zip Filter

Overview

The Zip filter allows a CDF writer to transform multiple lists of values into a single list of tuples. Python’s built-in zip() functionality is used to pack the values of the incoming list and the result is returned to the Filter Chain once again as a list.

Usage
1
2
3
zip:
  - !expr value.list1
  - !expr value.list2
Incoming Value

A list of lists.

Transform Result

A single list of zipped values.

Note

If the zip filter were to be passed an iterable of length 3 and an iterable of length 5, the resulting list has 3 elements. The iterator stops when the shortest input iterable is exhausted.

Examples

Input File Contents

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "attributes_dict": {
      "creators": [
          "mike.wyatt@riskiq.net"
      ],
      "date": [
          "2017-02-15T13:31:41.256000"
      ],
      "monitors": [
          true
      ],
      "types": [
          "domain"
      ]
  }
}

Filter Chain

1
2
3
4
5
6
7
- parse-json
- get: attributes_dict
- zip:
    - !expr value.date
    - !expr value.types
    - !expr value.creators
    - !expr value.monitors

Output

1
2
3
4
5
6
7
8
[
    [
        "2017-02-15T13:31:41.256000",
        "domain",
        "mike.wyatt@riskiq.net",
        true
    ]
]

Filter CLI Interface

A command-line interface for filters has been provided. See the tq-filter page for more information.

Notes On Examples

Many filter examples leverage the New Filter to create an example value to manipulate. These examples can be easily run with the tq-filter command.

Contexts and Transform Results

When writing a Filter Chain, it is important to keep in mind what data is being manipulated at the time. value may be modified by a number of filters. For instance:

1
2
3
4
filters:
   - parse-json          # ``value`` == ``{'content': [1, 2, ...], 'pagination': {...}}``
   - get: content        # ``value`` == ``[1, 2, ...]``
   - iterate             # ``value`` == ``1``, then ``value`` == ``2``, ...

value is always set initially based on the data returned by the Source section. It may then be modified using a parsing filter (such as Parse JSON Filter), sometimes followed by a Get Filter. Naturally, the ways in which one needs to modify value and the current context differ from feed to feed, as they are very dependent on the structure of the incoming data.

Similarly, it is possible to change the value of a variable itself - for this purpose, Filter Mapping Filter is commonly used (along with Set Filter and Set Default Filter, among others). For example, if a feed offers a timestamp for when a certain incident occurred, we may format it as follows:

1
2
3
4
5
filters:
   - parse-json             # ``value`` == ``{'content': {'occurred': ..., ...}, 'pagination': {...}}``
   - get: content           # ``value`` == ``{'occurred': ..., ...}``
   - filter-mapping:
       occurred: timestamp  # ``value.occurred = (formatted timestamp)``

CDF Reporting

This section provides an in-depth explanation of how to define threat objects and includes information for adding attributes to threat objects and creating relationships between objects.

Creating Threat Objects

Using the Filter Chain, we can transform the data from our source so that it is formatted properly before being passed to the reporter. Data accessed within the Report section is injected into Jinja2 Expressions or Templates as a data variable. Using the definition rules defined in our Report section, we can parse the threat data from the Filter Chain to create sets of threat objects.

The following feed demonstrates how to create a single indicator set containing URL indicators.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
feeds:
  VXVault URL:
    source:
      http:
        url: http://vxvault.net/URL_List.php
        response_content_type: text/plain
    filters:
      - split-lines
      - iterate
      - drop: !expr not value.startswith("http")
    report:
      indicator-sets:
        default:
          items:
            - type: URL
              value: !expr data

Examining each part of the feed definition allows us to see exactly how the data is tranformed before being passed to the reporter. The following code block depicts the data as received directly from the source before being processed by the Filter Chain.

"<pre>
VX Vault last 100 Links
Mon, 04 Nov 2019 17:18:20 +0000

http://ring1.ug/files/penelop/5.exe\r\nhttp://happycombi.fr/wp-content/themes/hestia/languages/65y/2c.jpg
http://cleaner-ge.hk/drp\r\nhttp://151.80.8.7/mmort/win.exe
http://217.8.117.22/sokge.exe\r\nhttp://cleaner-ge.hk/kiskis.exe
http://kustdomaetozaebis.hk/klop.exe
"

Once the data is retrieved from the source, it can be parsed and transformed by the Filter Chain. The following code block is the resulting data after being processed via the Filter Chain. First, the Split Lines Filter is utilized to split the data into a list individual URL strings. Next, the Filter Chain uses the Iterate Filter to yield each URL string one at a time from the Split Lines Filter down to the rest of the Filter Chain and Report section. Each value is then evaluated using the Drop Filter to remove any strings from our data that do not begin with http.

[
  "http://ring1.ug/files/penelop/5.exe",
  "http://happycombi.fr/wp-content/themes/hestia/languages/65y/2c.jpg",
  "http://cleaner-ge.hk/drp",
  "http://151.80.8.7/mmort/win.exe",
  "http://217.8.117.22/sokge.exe",
  "http://cleaner-ge.hk/kiskis.exe",
  "http://kustdomaetozaebis.hk/klop.exe"
]

Since the Iterate Filter was used in the Filter Chain, each line from our Filter Chain output is yielded to the reporter, allowing us to create a single indicator set of URL indicators. Observe how the definition leverages the !expr tag to evaluate and set data as the indicator value.

Creating Attributes

When creating a report, we can also map sets of attributes to each threat object in our object sets.

The following feed demonstrates how to create an attribute set named default that is defined as a list of dictionary mappings. Once the attribute set mapping is created, we can apply each attribute from the attribute set to each indicator object in the indicator set by specifying the attribute set name of default in the attribute-sets field in our indicator-sets definition. Thus, in this feed, each created indicator has attributes SBL ID and SBL Link.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
feeds:
  Spamhaus DROP List:
    source:
      http:
        url: http://www.spamhaus.org/drop/drop.txt
        response_content_type: text/plain
    filters:
      - split-lines
      - iterate
      - drop: !expr value.startswith(";")
      - split: " ; "
      - map-items: [cidr, sblid]
    report:
      attribute-sets:
        default:
          items:
            - name: SBL ID
              value: !expr data.sblid
            - name: SBL Link
              value: !tmpl https://www.spamhaus.org/sbl/query/{{data.sblid}}
      indicator-sets:
        default:
          attribute-sets:
            - default
          items:
            - type: cidr block
              value: !expr data.cidr

Creating Relationships

When creating a report, we can relate a threat object to other threat objects within the current object set or to threat objects in a different object set.

The following Report section creates an indicator object set and sets the inter-related flag to True, which creates relationships between all indicator objects created from a single item yielded from the Filter Chain for that indicator set.

Note

The inter-related flag defaults to False unless otherwise specified.

Note

Object relationships are implicitly bidirectional. A single declaration is sufficient for a relationship between two objects, as opposed to having to declare both objects as related to each other.

Warning

One should be aware of the laws of Combinatorics when inter-related is specified as True, (e.g., inter-relating 1000 objects ends up making 499,500 operations according to the formula \(\frac{n!}{k!(n-k!)}\), where \(n\) is the total number of objects and \(k\) is the number of objects picked each combination (here, \(k = 2\))).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
report:
  attribute-sets:
    default:
      items:
        - name: PhishTank ID
          value: !expr data.id
        - name: PhishTank URL
          value: !expr data.detail_url
        - name: Target
          value: !expr data.target
        - name: Announcing Network
          value: !expr data.announcing_networks
        - name: Country
          value: !expr data.countries
        - name: RIR
          value: !expr data.rirs
  indicator-sets:
    default:
      attribute-sets:
        - default
      inter-related: true
      items:
        - type: URL
          value: !expr data.url
        - type: IP Address
          value: !expr data.ip_addresses
        - type: CIDR Block
          value: !expr data.cidr_blocks

The following Report section demonstrates how to relate threat objects from one set/type to threat objects from another set/type. In this feed snippet, indicators created from the default indicator set are related to adversaries created from the default adversary set (based on a single item yielded from the Filter Chain). This is done by defining a related mapping under the default indicator set, in which the key is an object type string and the value is a list of set names defined for the given object type.

Note

While the naming convention for {object_type}-set requires that object_type exists as a built-in model object type or custom object, the name for an object set is arbitrary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
report:
  attribute-sets:
    default:
      items:
        - name: Last Seen
          value: !expr data.last_seen
        - name: Threat Name
          value: !expr data.threat_name
        - name: Category
          value: !expr data.categories
        - name: Classification Disposition
          value: !expr data.classification_dispositions
        - name: Delivery Vector
          value: !expr data.delivery_vectors
        - name: Malware Family
          value: !expr data.malwares
        - name: Threat Type
          value: !expr data.threat_types
        - condition: !expr '"overall_confidence" in data'
          name: Confidence
          value: !expr data.overall_confidence
  indicator-sets:
    default:
      items:
        - type: !expr data.indicator_type
          value: !expr data.item_name
          published_at: !expr data.first_seen
      attributes:
        - default
      related:
        adversary:
          - default
  adversary-sets:
    default:
      items:
        - name: !expr data.actors

Note

If the condition check evaluates to False, the Confidence attribute is not added to the list of attributes for that indicator object.

Creating Tags

Reporting tags with tag-sets is very similar to reporting attributes via attribute-sets, both in terms of declaration and usage. For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
report:
  tag-sets:
    default:
      items:
        - 'example tag'
        - 'some other tag'
        - 'third example'
        - !expr data.example_tag_field
    ...
  attachment-sets:
    default:
      tag-sets:
        - default
        ...
      ...

Note

Reporting tags was supported only for Attachment objects until ThreatQ 4.45.0. Starting with ThreatQ 4.45.0, tags can be reported for any primary Threat Object.

Custom Sources

Beginning in ThreatQ 5.0.1, a CDF writer may override the default source and provide a list of sources for any given threat object in the report. Previously, sources were purely derived from the feed name. Now, the feed name will only be used as a source if sources is not set.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
 report:
   attribute-sets:
     default:
         - name: Harmless
           value: !expr data.supp_feed_count_results.harmless if data.supp_feed_count_results.harmless else None
           sources: [Source1, Source2]
           # Can be a list

         - name: Malicious
           value: !expr data.supp_feed_count_results.malicious if data.supp_feed_count_results.malicious else None
           sources: NoodleTime
           # Can just be a string

         - name: Suspicious
           value: !expr data.supp_feed_count_results.suspicious if data.supp_feed_count_results.suspicious else None
           sources: !expr user_fields.custom_source
           # Can be a user_field

         - name: Undetected
           value: !expr data.supp_feed_count_results.undetected if data.supp_feed_count_results.undetected else None
           sources: !expr data.supp_feed_count_results.undetected | string
           # Must be a string or list of strings

         - name: No Source Declared
           value: !expr data.supp_feed_count_results.undetected if data.supp_feed_count_results.undetected else None
           # No source decalared defaults to the feed name

   indicator-sets:
     default:
       attribute-sets:
         - default
       items:
           - type: !expr data.indicator_type
             value: !expr data.indicator_value
             sources: ["Source1", "Source2"]

Example Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
 [
     {
         "indicators": [
             {
                 "attributes": [
                     {
                         "name": "Harmless",
                         "sources": [
                             {
                                 "name": "Source2"
                             },
                             {
                                 "name": "Source1"
                             }
                         ],
                         "value": "0"
                     },
                     {
                         "name": "Suspicious",
                         "sources": [
                             {
                                 "name": "user-field source string"
                             }
                         ],
                         "value": "0"
                     },
                     {
                         "name": "Undetected",
                         "sources": [
                             {
                                 "name": "7"
                             }
                         ],
                         "value": 7
                     },
                     {
                         "name": "No Source Declared",
                         "value": 7
                     },
                     {
                         "name": "Malicious",
                         "sources": [
                             {
                                 "name": "NoodleTime"
                             }
                         ],
                         "value": 49
                     }
                 ],
                 "description": null,
                 "sources": [
                     {
                         "name": "Source2"
                     },
                     {
                         "name": "Source1"
                     }
                 ],
                 "status_id": 1,
                 "type": {
                     "name": "SHA-256"
                 },
                 "value": "6e56322d553de0b63d92ac90a056ab9fc051db1ae500440fc9066fc78b7d7c8d"
             }
         ]
     }
 ]

Common Feed Errors

This page lays out some commonly seen Feed Run errors, their possible causes, and some suggested solutions for when one encounters them. Generally, errors that arise during a Feed Run are stored in the API and shown in the Feed Activity Log in the ThreatQ UI. While error messages are displayed in the ThreatQ UI, one can find the associated Stack Traces for Feed Run errors and other more detailed information within Dynamo’s logs.

For information on the various stages of a Feed Run and writing a definition, see Feed Definitions.

Source Errors

As the Source Definition section of a Feed deals with pulling data from some provider, errors arising within the Source section’s processing generally have to do with error codes returned by the provider or data transfer errors.

Source errors are highly disruptive to a Feed Run and will cause the run to immediately complete with errors. Errors during the Source section’s processing are always reported in this format:

Error fetching data from provider: <Error Message>

Note

If a run completes with errors, the data ingested up to that point will still appear in ThreatQ.

More often than not, the Source section will be dealing with some kind of HTTP request(s). As such, Source section errors will usually cite some HTTP Status Code. A list of HTTP Response Status Codes should be consulted to determine what the Status Code in question means. Dynamo considers any Status Code between 400 and 599 to be an error code, and will raise the error in that case.

The table below lays out some of the common HTTP error Status Codes and what each generally means in the context of a Feed Run:

Status Code

Description

400 Bad Request

The request as sent was considered malformed by the server that received it. Generally, this means some field in the Source section is incorrectly configured.

401 Unauthorized

The request either lacked authentication credentials entirely or the supplied credentials were expired.

403 Forbidden

The authentication credentials sent with the request do not have the necessary permissions to access the endpoint in question.

404 Not Found

The endpoint has moved and is no longer available at the requested URL. If the Feed Run had previously worked without issue, this may mean that the provider has updated their API and changed the target endpoint’s URL.

500 Internal Server Error

Something went wrong when the server was processing the request. It may be that the Source section is configured incorrectly, or the provider’s server is behaving in an unexpected manner.

503 Service Unavailable

The server is unavailable because it is down or overloaded. This is a provider issue that cannot be readily fixed from within a Feed.

504 Gateway Time-out

The server is not able to serve a response in time. This is a provider issue that cannot be readily fixed from within a Feed.

Outside of HTTP, there are other networking problems that can occur that cannot be readily fixed from within a Feed. The following have been commonly observed when dealing with difficult or offline servers:

  • ClientOSError(104, 'Connection reset by peer')

  • Connect call failed (<Provider IP>, 443)

  • ServerDisconnectedError(None,)

For the ServerDisconnectedError, you may want to ensure that firewall rules and proxies are not blocking outbound communication.

Filter Chain Errors

The most common place to encounter errors during a Feed Run is within the Filter Chain.

Errors during Filter Chain processing are always reported in this format:

Error applying filter <Filter Class> to value <Value>: <Exception>
  • Filter Class relates which Filter raised the error along with that Filter’s construction arguments. In the case of Filters like the Filter Mapping Filter, the Filter’s construction arguments may be quite long as they contain the definition of various sub-filters. Further debugging will be needed in these cases in order to determine exactly which of the sub-filters is causing the Feed Run error.

  • Value is the value that was being processed when the error was raised. In the case of an exceedingly long value, the value will be truncated at 100 characters for the error message show in the ThreatQ UI. If this happens, the value will still appear in full when viewing Dynamo’s logs.

An error within the Filter Chain can have different effects on a Feed Run depending on where it occurs within the Filter Chain:

  • Errors at the beginning of the Filter Chain could prevent the entire data set from being ingested.

  • Filter errors in the Filter Chain that come after an Iterate Filter may affect only a few objects or all objects in the run depending on the nature of the error.

In any case, an error within the Filter Chain will result in the Feed completing with errors. The following are some examples of Filter Chain errors that are commonly encountered:

Unexpected StreamReader Value

The very first Filter in the Filter Chain can raise an error referencing a StreamReader when some kind of text value was actually expected. For instance, when using the Parse JSON Filter:

Error applying filter ParseJSON() to value <StreamReader 397 bytes eof>: the JSON object must be str, not 'StreamReader'

A StreamReader is passed to the Filter Chain from the Source section when the Source section cannot determine how to decode the response it received from the server. This is usually due to the provider not returning an correct Content-Type header with the response.

To resolve this error, the Source section needs to specify an appropriate response_content_type so that it knows what type of data response to expect. See HTTP Source for more information on the response_content_type flag and other Source configuration arguments.

Filter-Mapping KeyError

Feeds can start raising Filter-Mapping Filter errors that resemble the following:

Error applying filter FilterMapping(...) to value {...}: KeyError('some_key',)

This error means that the key specified within the KeyError is not present on the value dictionary the Filter-Mapping Filter is formatting. Generally, this happens when a provider tweaks their response data structure and decides some key / value pair will no longer be present for every object they return.

To resolve this error, the Filter Chain will need to be updated so that the now potentially missing key is given a default value. Usually this is accomplished with the Set Default Filter.

Reporting and Ingestion Errors

Report Section Errors

Feed Run errors within the Report Section are rather rare as long as the CDF writer follows the correct syntax. Errors that can occur here would be evidenced by Feed Runs completing successfully but never ingesting any objects. In this case, the Dynamo logs should be investigated to see if any Unhandled Exceptions have occured.

Batch Failures

A Feed Run will raise a failed Batch error when a set of objects fails to be ingested via one the API’s consume endpoints. Before being marked as failed, a Batch will be retried up to 10 times (by default). If a Batch is marked as failed, the Feed Run in turn will complete with errors. Other successful batches will still have their data ingested into ThreatQ.

Batch failure errors are always of the following form:

Failed batch encountered while parsing response for Batch <Feed Name> TQAPIAuth:<Object Type>#<Batch UUID>. Exception: <Error Message>.

The following are error messages commonly found with failed Batches:

  • ClientResponseError("500, message='Internal Server Error'",) - Some error was raised by the API while processing the consume request. An accompanying Stack Trace should be found in the API’s laravel.log. More information about the Batch request data can be found in the Dynamo log if the process is running at log level 2 or lower. At this level, Dynamo will log out the request and response data bodies of Batches.

  • TimeoutError() - Due to system load, the API was not able to process the consume request fast enough to respond. In this case, general system and feed performance should be investigated.

Ingestion Issues

If Feed Run data seems missing or incomplete after being ingested into ThreatQ, one can investigate the data as it was consumed by running Dynamo at log level 9 or lower. At this level, Dynamo will log out any normalization issues or error messages from consume that are ignored during normal processing. If log messages about normalization are found to correlate with suspected missing data, please consult with ThreatQ Support to see if the issue is already known.

Tutorials

In this section there are several tutorials based on real-life examples. Feel free to clone them and use them to build your own feed.

Simple Flat-File Feed

Many intelligence feeds are just “flat”, or plain-text, files sitting on a server somewhere. For the purposes of this tutorial, we’re going to be using a static github that is a snapshot of the abuse.ch ZeuS IP blocklist feed.

The gist is located here

Setting up the Source

First things first: you have to be able to get the data pulled in. Since the requested data is located on the internet and does not require authentication, the source section is very basic:

1
2
3
4
feeds:
  abuse.ch ZeuS Block IPs:
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist

Note

We explicitly state this is an http source but are leveraging the shortcut approach. This avoids the need for the url key.

This example does not validate as it’s missing some required keys and throws a validation error from our tq-feed tool:

{
    "msg": "abuse.ch ZeuS Block IPs is missing the following key configurations: filters, report",
    "type": "DefinitionError"
}
The Filter Chain

The next section we should focus on is what is called “The Filter Chain”. This is the section of the definition that is intended to prepare the data to be supplied to the ThreatQ API. For all of the available filters, click here.

The idea behind the filter chain is that each item in the list of filters gets passed a value, and then next filter in the chain gets passed the result of the previous filter. This allows a feed designer to progressively manipulate the data to what they are envisioning it to look like.

The first thing to know is that absolutely NOTHING is assumed about the data that was retrieved by the source. The result from the source is provided to the filter-chain as is. In our case it comes back as the raw string from the text file. This means that you must describe what to do with it.

We’re going to take an iterative approach to building this filter chain with extensive examples so that each filter’s contribution is illustrated.

Since abuse.ch returns a text file with different things on each line, we’re going to need to split them up so we can actually start working with that information. That’s what the split-lines filter is for. Let’s take a look at what the following definition gets us:

1
2
3
4
5
6
feeds:
  abuse.ch ZeuS Block IPs:
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    filters:
      - split-lines

The value after the split-lines filter is as follows (truncated for readability):

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Note

Each line has been split into its own string and the entire object is a list of these strings.

The next phase of this would be to start working on each individual lines. In order to do this, we’ll want to use the iterate filter to iterate over each line. The iterate filter allows us to feed each item to the subsequent filters and handle each line as it comes. That definition looks like:

1
2
3
4
5
6
7
feeds:
  abuse.ch ZeuS Block IPs:
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    filters:
      - split-lines
      - iterate

And the result is as follows (it’s not all that different from the split-lines result):

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Now we want to get rid of the lines that don’t contain any data we care about. Here, the drop filter can be used:

1
2
3
4
5
6
7
8
feeds:
  abuse.ch ZeuS Block IPs:
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")

Notice that we’ve added the drop filter and passed in some arguments. The previously used filters have not needed additional arguments, but since we want to be explicit about what we are dropping, we must pass in an expression that is evaluated to determine if the value should be dropped.

Not only did we introduce passing an argument to a filter, we also introduced the !expr yaml tag. This tells the definition parser to evaluate the value of this string as a Jinja2 Expression as opposed to treating it as a literal string. For more information on the template tags, go here.

The resulting output is:

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Note

Notice how all lines that started with # (comments) have been dropped.

Now that we have eliminated all of the extra information returned in the raw text file we are set to start sending that data to the reporter.

The Report Section

Now that we have data that is able to be mapped, we need to build out the reporting section. Since we used the iterate filter to touch each line one at a time, each line is fed to the reporter one at a time. What this means is that we can build out a report from the perspective of a single line in the source file.

Note

Unlike the filter section, the current value of the data you are working with is injected into expressions as data.

Since this feed is fairly simple and only has an IP Address per line, our reporter is a simple single-indicator set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
feeds:
  abuse.ch ZeuS Block IPs:
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
    report:
      indicator-sets:
        default:
          indicators:
            - type: IP Address
              value: !expr data

Note

The reporter provides a shortcut for indicator.type. In this example the type: fqdn is expanded to type: { "name": "fqdn" }

Note

We are leveraging the !expr yaml tag to tell the reporter to parse data and set that as the indicator value.

We now are specifying the indicators we want to see show up in ThreatQ from this data source. This definition is fully functional and parses data from abuse.ch.

Analyzing The Definition

We provide tools to help you build your definition, and if you want to see what the summary is, all you need to do is run it:

$ tq-feed analyze feeds/tutorials/examples/abusech.yaml
{
    "definition_yaml": "feeds:\n  abuse.ch ZeuS Block IPs:\n    source:\n      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist\n    filters:\n      - split-lines\n      - iterate\n      - drop: !expr not value or value.startswith(\"#\")\n    report:\n      indicator-sets:\n        default:\n          indicators:\n            - type: IP Address\n              value: !expr data\n",
    "required_threatq_version": null,
    "summary": {
        "abuse.ch ZeuS Block IPs": {
            "additional_run_params": [],
            "config": {
                "category": "Labs",
                "custom_fields": [],
                "description": "",
                "display_name": "abuse.ch ZeuS Block IPs",
                "indicator_status": "Active",
                "ingest_rules": {},
                "name": "abuse.ch ZeuS Block IPs",
                "namespace": "threatq.feeds.custom.abuse.ch ZeuS Block IPs",
                "signature_status": "Active"
            },
            "is_supplemental": false,
            "object_types": [
                "indicator"
            ],
            "supports_manual": false,
            "type": "primary"
        }
    },
    "version": "0.0.1"
}
Conclusion

Now that we have a basic single object reporting section, let’s take a look at a more Complex Flat-File Feed.

Complex Flat-File Feed

Now that we’ve built a Simple Flat-File Feed, we can dive into a slightly more complex example. For this example, we’re going to be using a snapshot of the Bambenek Banjori Master feed.

The source used to create this example can be found here

Setting up the Source

Just as before, we must set up the source section. It should look like:

1
2
3
4
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt

As before, this example does not validate as it’s missing some required keys and throws a validation error from our tq-feed tool:

{
    "msg": "Bambenek Banjori Master is missing the following key configurations: filters, report",
    "type": "DefinitionError"
}
The Filter Chain

Since you’ve already gone through the simpler example, let us skip forward to where this feed diverges from the other. This means the starting filter chain is:

1
2
3
4
5
6
7
8
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")

We’ve built the definition all the way to the drop filter, which makes the bambenek data look like:

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Note

We start with no lines that begin with #.

Like the simple feed, we now have lines of strings that have data in them. Unlike before, however, we still have a bit more parsing to do. Different feeds have different data structures. Looking at the data from Bambenek, it looks to be in a CSV format. We can leverage the Parse CSV filter for this:

1
2
3
4
5
6
7
8
9
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
      - csv

And the resulting output:

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Note

See how we now have sub-lists that all are individual strings themselves.

Once again, this is beginnning to look like something that can be ingested into ThreatQ. For ease of access, we can make these sub-lists objects (or what Python calls dictionaries). To do this, let’s utilize the Map Items filter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
      - csv
      - map-items: [domains, ips, ns_hosts, ns_host_ips, comment, source]

Now our data looks like:

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Note

The key names chosen are arbitrary and can be set to whatever the feed designer feels makes sense to represent the data.

Now that we have that in a readily available method of accessing the individual keys, we notice that some of these values are actualy multiple values. Bambenek separates these using the | (pipe) character. We’ll want to split all of these up into their own values:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
      - csv
      - map-items: [domains, ips, ns_hosts, ns_host_ips, comment, source]
      - filter-mapping:
          domains:
            split: '|'
          ips:
            split: '|'
          ns_hosts:
            split: '|'
          ns_host_ips:
            split: '|'

Note

Since the | character is one that could be interpreted as a special character, we wrap it in quotes (single or double, does not matter).

Here you should notice two additional filters: Filter Mapping and Split.

The Filter Mapping filter takes a key/value structure. If you notice how the keys specified in that filter correspond to the ones we declared previously for the Map Items filter. This is not a coincidence. Essentially we want to have a filter be executed against the values of each of those keys. The filter we are executing here is the Split filter. This filter takes the delimiter parameter which is what the string is split on.

And now for the latest result:

Usage: tq-filter [OPTIONS] COMMAND [ARGS]...
Try 'tq-filter --help' for help.

Error: Missing option '--threatq-client-id'.
...

Note

All of the ips, ns_host_ips, ns_hosts, and domains are now lists. Even if they did not have a | character, they still got converted to a single-item list. This behavior is the exact same as Python’s.

The Report Section

Now that we have data that is able to be mapped, we need to build out the reporting section. Since we used the Iterate filter to touch each line one at a time, each line is fed to the reporter one at a time. What this means is that we can build out a report from the perspective of a single line in the source file.

Note

Unlike the filter section, the current value of the data you are working with in the reporter is injected into expressions as data.

The first thing we should think about is setting up the indicators object set. We have two obvious keys on each of our objects: ips and domains. Let’s create an indicator set to be submitted to the API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
      - csv
      - map-items: [domains, ips, ns_hosts, ns_host_ips, comment, source]
      - filter-mapping:
          domains:
            split: '|'
          ips:
            split: '|'
          ns_hosts:
            split: '|'
          ns_host_ips:
            split: '|'
    report:
      indicator-sets:
        default:
          inter-related: true
          indicators:
            - type: fqdn
              value: !expr data.domains
            - type: ip address
              value: !expr data.ips

Note

The inter-related flag defaults to false. In this example, we set it to true so that all indicators that get created from a line is related to each other.

Note

The reporter provides a shortcut for indicator.type. In this example, the type: fqdn is expanded to type: { "name": "fqdn" }

Note

We are leveraging the !expr yaml tag to tell the reporter to parse data.domains and data.ips and set those as the indicator value.

We now are specifying the indicators we want to see show up in ThreatQ from this data source. The reporter automatically creates multiple indicators whenever value is a list. This is important to know because it allows a feed-designer to rapidly build references to multiple indicators without having to flatten or iterate on every object.

Note

When dealing with data that may be an iterable or single value, it is often easier to treat that data always as a list, even if it would be a list of length 1.

But we have more data in our source info. We don’t really want them to be indicators, they look more like attributes. Let’s see what that looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
feeds:
  Bambenek Banjori Master:
    source:
      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
      - csv
      - map-items: [domains, ips, ns_hosts, ns_host_ips, comment, source]
      - filter-mapping:
          domains:
            split: '|'
          ips:
            split: '|'
          ns_hosts:
            split: '|'
          ns_host_ips:
            split: '|'
    report:
      attribute-sets:
        default:
          - name: Description
            value: !expr data.comment
          - name: Source
            value: !expr data.source
          - name: Nameserver
            value: !expr data.ns_hosts
      indicator-sets:
        default:
          attribute-sets:
            - default
          inter-related: true
          indicators:
            - type: fqdn
              value: !expr data.domains
            - type: ip address
              value: !expr data.ips

Now we are going to be setting up sets of Attributes. We name these so that we can then reference them and include them in the indicator-sets that we’ve already specified. This definition is fully functional and parses data from Bambenek.

Analyzing The Definition

We provide tools to help you build your definition, and if you want to see what the summary is, all you need to do is run:

$ tq-feed analyze feeds/tutorials/examples/bambenek.yaml
{
    "definition_yaml": "feeds:\n  Bambenek Banjori Master:\n    source:\n      http: https://gist.githubusercontent.com/nickburns2006/7b9773c9f31331047f3515bd461ac5f4/raw/eb09a4ab77da713f50f2a2a6e016fcfa1689f288/bambenek_banjori_master_2018_09_11_10_30.txt\n    filters:\n      - split-lines\n      - iterate\n      - drop: !expr not value or value.startswith(\"#\")\n      - csv\n      - map-items: [domains, ips, ns_hosts, ns_host_ips, comment, source]\n      - filter-mapping:\n          domains:\n            split: '|'\n          ips:\n            split: '|'\n          ns_hosts:\n            split: '|'\n          ns_host_ips:\n            split: '|'\n    report:\n      attribute-sets:\n        default:\n          - name: Description\n            value: !expr data.comment\n          - name: Source\n            value: !expr data.source\n          - name: Nameserver\n            value: !expr data.ns_hosts\n      indicator-sets:\n        default:\n          attribute-sets:\n            - default\n          inter-related: true\n          indicators:\n            - type: fqdn\n              value: !expr data.domains\n            - type: ip address\n              value: !expr data.ips\n",
    "required_threatq_version": null,
    "summary": {
        "Bambenek Banjori Master": {
            "additional_run_params": [],
            "config": {
                "category": "Labs",
                "custom_fields": [],
                "description": "",
                "display_name": "Bambenek Banjori Master",
                "indicator_status": "Active",
                "ingest_rules": {},
                "name": "Bambenek Banjori Master",
                "namespace": "threatq.feeds.custom.Bambenek Banjori Master",
                "signature_status": "Active"
            },
            "is_supplemental": false,
            "object_types": [
                "attribute",
                "indicator"
            ],
            "supports_manual": false,
            "type": "primary"
        }
    },
    "version": "0.0.1"
}

Feeds with User Fields

Feeds have the capability to define user_fields. These fields are presented to users in the ThreatQ UI. This allows feed designers to inject configuration options, credentials, or any other information that is needed for a feed to operate. To learn more about the configuration options available when declaring user fields, see the User Fields and Parameters page.

Using User Fields

Take the following example:

1
2
3
4
5
user_fields:
  - name: attribute_key
    label: Attribute Key
  - name: attribute_value
    label: Attribute Value

Here we are defining 2 different fields. Both are text fields that we have set up to allow an analyst to add a custom attribute to all of the objects that come in from this feed. One is “Attribute Key” and the other is “Attribute Value”.

Once user fields are defined in a definition and that definition has been installed into ThreatQ, analysts are able to see them available on the “Incoming Feeds” page under that specific feed’s settings:

Example Feed User Fields Settings

Any value defined in these fields are injected into the parser at run-time. To utilize them in the definition, a feed designer can do something along the lines of:

1
2
3
attributes:
  - name: !expr user_fields.attribute_key
    value: !expr user_fields.attribute_value

Note

We utilize ThreatQ’s custom template tag !expr to tell the definition parser that we want this compiled and evaluated. For more information on what’s going on here, please review Jinja2 Templating in CDF.

Note

In order to access a specific value on user_fields, we are using the name key of the defined user field from before. (i.e. attribute_key for the “Attribute Key” and attribute_value for “Attribute Value”)

Let’s see what this would look like being utilized within the context of the reporter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
feeds:
  Example Feed:
    user_fields:
      - name: attribute_key
        label: Attribute Key
      - name: attribute_value
        label: Attribute Value
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
    report:
      attribute-sets:
        default:
          attributes:
            - name: !expr user_fields.attribute_key
              value: !expr user_fields.attribute_value
      indicator-sets:
        default:
          indicators:
            - type: IP Address
              value: !expr data
              attribute-sets:
                - default

Here we’ve added the custom-defined attribute under an attribute-set defined with a name of default. This allows us to tell the indicators to utilize them by just including the name of the key under attribute-sets.

Taking Utilization a bit Further

One thing to be aware of when feed designers are allowing analysts to add this level of customization to the feed output is that the data may not always be what you’re expecting. In the case present here, what would happen if the settings for this custom attribute was never changed from default, which is empty? The ThreatQ application actually won’t allow this to be inserted, so we really want to prevent this attribute from getting added if the settings are empty.

Since we have the custom template tags, we can leverage Jinja2 and Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
feeds:
  Example Feed:
    user_fields:
      - name: attribute_key
        label: Attribute Key
      - name: attribute_value
        label: Attribute Value
    source:
      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
    report:
      attribute-sets:
        default:
          attributes:
            - name: !expr user_fields.attribute_key
              value: !expr user_fields.attribute_value
      indicator-sets:
        default:
          indicators:
            - type: IP Address
              value: !expr data
              attribute-sets: !expr "['default'] if user_fields.attribute_key else []"

Note

We wrapped the expression on line 27 in " due to it starting with a bracket ([). This is done as the attribute-sets key expects a list of string attribute-set names.

We changed the structure of how we are defining the attribute-sets that the indicator definition is using. This new definition is completely valid as we are just using a different “flow style” for YAML. Using the more compressed flow style, we are able to define the attribute-sets on a single line.

Thanks to the alternative structure, we can leverage Python’s conditional expressions in order to make an if...else statement on a single line.

This single-line expression only includes the default attribute-set when the user fills in the “Attribute Key” in the UI.

Output from Analysis Command
$ tq-feed analyze feeds/tutorials/examples/examplefeed_with_user_fields.yaml
{
    "definition_yaml": "feeds:\n  Example Feed:\n    user_fields:\n      - name: attribute_key\n        label: Attribute Key\n      - name: attribute_value\n        label: Attribute Value\n    source:\n      http: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist\n    filters:\n      - split-lines\n      - iterate\n      - drop: !expr not value or value.startswith(\"#\")\n    report:\n      attribute-sets:\n        default:\n          attributes:\n            - name: !expr user_fields.attribute_key\n              value: !expr user_fields.attribute_value\n      indicator-sets:\n        default:\n          indicators:\n            - type: IP Address\n              value: !expr data\n              attribute-sets:\n                - default\n              attribute-sets: !expr \"['default'] if user_fields.attribute_key else []\"\n",
    "required_threatq_version": null,
    "summary": {
        "Example Feed": {
            "additional_run_params": [],
            "config": {
                "category": "Labs",
                "custom_fields": [
                    {
                        "default": null,
                        "description": null,
                        "label": "Attribute Key",
                        "name": "attribute_key",
                        "required": false,
                        "type": "text"
                    },
                    {
                        "default": null,
                        "description": null,
                        "label": "Attribute Value",
                        "name": "attribute_value",
                        "required": false,
                        "type": "text"
                    }
                ],
                "description": "",
                "display_name": "Example Feed",
                "indicator_status": "Active",
                "ingest_rules": {},
                "name": "Example Feed",
                "namespace": "threatq.feeds.custom.Example Feed",
                "signature_status": "Active"
            },
            "is_supplemental": false,
            "object_types": [
                "attribute",
                "indicator"
            ],
            "supports_manual": false,
            "type": "primary"
        }
    },
    "version": "0.0.1"
}
Conclusion

Now that we know how to use the injected User Fields, we can start making smarter feeds. Such as those that need authentication.

Feeds Requiring Authentication

Now that we’ve walked through creating feeds with user fields, we can use this knowledge to enable our feed to access data from a source that requires authentication.

Defining the User Fields

A simple api id and key authentication header setup could have the following user_fields definition:

1
2
3
4
5
6
user_fields:
  - name: api_id
    label: API ID
  - name: api_key
    label: API Key
    mask: True

The above definition includes a password field type. To learn more about the configuration options available when declaring user fields, see the User Fields and Parameters page. This definition would be represented in the UI like:

Example Feed with Credentials

See how we now have a hidden-value password field for the “API Key” field? The UI allows the analyst to reveal this by clicking the icon on the far right of the input field:

Example Feed with Credentials
Defining Authentication in the Source

Now that we have the definition done, let’s take a look at what it would look like to use these.

Because we want to authenticate with our data source, it only makes sense that the authentication information goes under that section.

Take the following example:

1
2
3
4
5
6
7
8
source:
  http:
    url: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
    auth:
      simple:
        headers:
          x-api-id: !expr user_fields.api_id
          x-api-key: !expr user_fields.api_key

Note

We are now using the long-form of the http source. This means that the value of http is an object and the source url is on its own explicit key.

In this example, we are using the simple authentication type. This type of authentication makes no assumption about the type of data you include inside, and actually merges any keys you provide with the keys defined in the source definition itself. The advantage here is that anything under the auth section of the definition is stripped prior to any logging.

See also

HTTP Source Authentication

Documentation on available source types.

Full Definition

The full definition for this example looks something like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
feeds:
  Example Feed:
    user_fields:
      - name: api_id
        label: API ID
      - name: api_key
        label: API Key
        mask: True
    source:
      http:
        url: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
        auth:
          simple:
            headers:
              x-api-id: !expr user_fields.api_id
              x-api-key: !expr user_fields.api_key
    filters:
      - split-lines
      - iterate
      - drop: !expr not value or value.startswith("#")
    report:
      indicator-sets:
        default:
          indicators:
            - type: IP Address
              value: !expr data
Output from Analysis Command
$ tq-feed analyze feeds/tutorials/examples/examplefeed_with_creds.yaml
{
    "definition_yaml": "feeds:\n  Example Feed:\n    user_fields:\n      - name: api_id\n        label: API ID\n      - name: api_key\n        label: API Key\n        mask: True\n    source:\n      http:\n        url: https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist\n        auth:\n          simple:\n            headers:\n              x-api-id: !expr user_fields.api_id\n              x-api-key: !expr user_fields.api_key\n    filters:\n      - split-lines\n      - iterate\n      - drop: !expr not value or value.startswith(\"#\")\n    report:\n      indicator-sets:\n        default:\n          indicators:\n            - type: IP Address\n              value: !expr data\n",
    "required_threatq_version": null,
    "summary": {
        "Example Feed": {
            "additional_run_params": [],
            "config": {
                "category": "Labs",
                "custom_fields": [
                    {
                        "default": null,
                        "description": null,
                        "label": "API ID",
                        "name": "api_id",
                        "required": false,
                        "type": "text"
                    },
                    {
                        "default": null,
                        "description": null,
                        "label": "API Key",
                        "name": "api_key",
                        "required": false,
                        "type": "password"
                    }
                ],
                "description": "",
                "display_name": "Example Feed",
                "indicator_status": "Active",
                "ingest_rules": {},
                "name": "Example Feed",
                "namespace": "threatq.feeds.custom.Example Feed",
                "signature_status": "Active"
            },
            "is_supplemental": false,
            "object_types": [
                "indicator"
            ],
            "supports_manual": false,
            "type": "primary"
        }
    },
    "version": "0.0.1"
}

Supplemental Feed

In some cases, feed providers will distribute related information across several endpoints or services. For instance, there may be reports associated to adversaries contained in a feed.

Supplemental feeds are used for fetching data related to a parent feed when such data is served separately. In this tutorial, we will be looking at some illustrative and some use case examples.

Setting up the Parent Feed

Supplemental feeds require a parent feed, in that they cannot be run independently. In order to incorporate a supplemental feed into an existing feed, we must use a Set Filter:

1
2
3
4
5
6
7
    filters:
      - set:
          adversary_information:
            feed:
              name: Intelligence Feed Provider Adversary Information
              run_params:
                adversary_name: !expr user_fields.adversary_name
Defining a Supplemental Feed

Supplemental feeds, while different from primary feeds, are still just feeds. As such, defining one is similar to defining other feeds.

Note

Notably, supplemental feeds are different from primary feeds in that supplemental feeds do not have a report stage.

To define a Supplemental Feed in a file, add another entry to your existing feeds mapping:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
feeds:
  Cofense Intelligence Query Report Fulfillment Download:
    feed_type: supplemental
    source:
      http:
        url: !expr run_params.reportURL
        auth:
          <<: *cfauth
    filters:
      - new:
          content: !expr value
          type: Intelligence Report

Note

The Supplemental Feed definition must contain feed_type: supplemental - feed type cannot be inferred. That is, adding a feed run as part of another feed’s filters is, by itself, not sufficient.

It is noteworthy that a supplemental feed can be a parent to other supplemental feeds.

Supplemental feeds can be invoked from a single Set Filter or invoked from multiple Set Filters. Multiple Set Filters must be used if the results returned and transformed by one supplemental feed must be passed as run parameters to a subsequent supplemental feed. For instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
feeds:
  Flashpoint:
    filters:
      - parse-json
      - get: data
      - iterate
      - set-default:
          asset_ids: []
          sources: []
      - <<: *related_indicators_and_attacks_settings
      - set:
          related_reports:
            feed:
              name: Flashpoint Related Reports
              default: []
              run-params:
                report_id: !expr value.id
          orphaned_indicators:
            feed:
              name: Flashpoint Orphaned Indicators
              default: []
              run-once: True
      - set:
          orphaned_indicators: !expr 'value.orphaned_indicators'
          orphaned_attacks: !expr 'value.orphaned_indicators or []'
      - filter-mapping:
          orphaned_indicators:
            chain:
              - each:
                  - get:
                      member: indicators_obj
                      default: !expr '[]'
              - flatten
          orphaned_attacks:
            chain:
              - each:
                  - get:
                      member: attack_patterns_obj
                      default: !expr '[]'
              - flatten
      - <<: *report_filter_mapping
      - set:
          <<: *report_attributes_mapping

  Flashpoint Related Reports:
    feed_type: supplemental
    source:
      http:
        url: !tmpl "https://fp.tools/api/v4/reports/{{run_params.report_id}}/related"
        method: GET
        <<: [*auth, *pagination]
    filters:
      - parse-json
      - get: data
      - iterate
      - <<: *related_indicators_and_attacks_settings
      - <<: *report_filter_mapping
      - set:
          <<: *report_attributes_mapping
      - new:
          - value: !expr value.title
            description: !expr '(value.body | striptags) if (value.body | striptags | length) < 32766 else ""'
            attributes: !expr value.report_attributes or []
            indicators: !expr value.report_indicators or []
            attack_pattern: !expr value.report_attacks or []

  Flashpoint Related Indicators:
    feed_type: supplemental
    source:
      http:
        url: !tmpl "https://fp.tools/api/v4/indicators/event/{{run_params.indicator_id}}"
        method: GET
        <<: *auth
    <<: *indicators_filter

  Flashpoint Orphaned Indicators:
    feed_type: supplemental
    source:
      http:
        url: https://fp.tools/api/v4/indicators/event
        method: GET
        <<: [*auth, *pagination, *params]
    <<: *indicators_filter
Using Run Parameters

Sometimes, a supplemental feed may need to be given some information in order to complete its run. This could be an API key or even a URL. This enables more flexibility, including cases where supplemental feed definitions can be used more than once - such as in a situation where more than one endpoint uses the same data format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
feeds:
  Generic Supplemental Feed:
    feed_type: supplemental
    source:
      http:
        url: !tmpl "http://feedproviderwebsite.com/{{run_params.service}}/reports/api"
        method: GET
      filters:
        - json

  Primary Feed 1:
    source:
      http:
        url: "http://feedproviderwebsite.com/service_1/api"
        method: GET
      filters:
        - json
        - set:
            provider_report:
              feed:
                name: Generic Supplemental Feed
                run_params:
                  service: "service_1/adversaries"

  Primary Feed 2:
    source:
      http:
        url: "http://feedproviderwebsite.com/service_2/api"
        method: GET
      filters:
        - json
        - set:
            provider_report:
              feed:
                name: Generic Supplemental Feed
                run_params:
                  service: "service_2_adversary_report_url"
Supplemental Feed Examples

An illustrative example of a feed definition file containing a feed that incorporates a supplemental feed is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
feeds:
  Intelligence Feed Provider:
    user_fields:
      - name: adversary_name
        label: Adversary Name
        required: True
    source:
      http: http://google.com
    filters:
      - new:
          type: test
      - set:
          adversary_information:
            feed:
              name: Intelligence Feed Provider Adversary Information
              run_params:
                adversary_name: !expr user_fields.adversary_name
    report:
      adversary-sets:
        adversary:
          items:
            - name: !expr user_fields.adversary_name

  Intelligence Feed Provider Adversary Information:
    feed_type: supplemental
    source:
      http:
        url: !tmpl "http://google.com/adversary_information/{{run_params.adversary_name}}"
        method: GET
    filters:
      - new:
          content: !expr value
          type: Adversary Information Report


  Generic Supplemental Feed:
    feed_type: supplemental
    source:
      http:
        url: !tmpl "http://feedproviderwebsite.com/{{run_params.service}}/reports/api"
        method: GET
      filters:
        - json

  Primary Feed 1:
    source:
      http:
        url: "http://feedproviderwebsite.com/service_1/api"
        method: GET
      filters:
        - json
        - set:
            provider_report:
              feed:
                name: Generic Supplemental Feed
                run_params:
                  service: "service_1/adversaries"

  Primary Feed 2:
    source:
      http:
        url: "http://feedproviderwebsite.com/service_2/api"
        method: GET
      filters:
        - json
        - set:
            provider_report:
              feed:
                name: Generic Supplemental Feed
                run_params:
                  service: "service_2_adversary_report_url"

A real-world example, for a Cofense Intelligence feed, is the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
feeds:
  Cofense Intelligence Query Report Fulfillment:
    feed_type: fulfillment
    filters:
      - set:
          reportData:
            feed:
              name: Cofense Intelligence Report Fulfillment Download
              default: null
              run-params:
                reportURL: !expr value.apiReportURL
  Cofense Intelligence Query Report Fulfillment Download:
    feed_type: supplemental
    source:
      http:
        url: !expr run_params.reportURL
        auth:
          <<: *cfauth
    filters:
      - new:
          content: !expr value
          type: Intelligence Report

Note

In this example, a fulfillment feed is used as a parent to the supplemental feed. In this case, there is no consequence to the supplemental feed - it is written exactly as if it had a primary feed as a parent.

Action Definition

The ThreatQ TDR Orchestrator platform feature depends on the ability to define and install Actions. Find here and in Writing a Feed Definition the information you need to create your own Action.

Action Definition Example

A real-world example of an Action definition file containing an Action is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
version: 0.0.1
required_threatq_version: '>=5.12.1'

template_values:
  gn_ioc_type_map: # Required on actions. This is a known bug.
    domains: FQDN

feeds:
  GreyNoise:
    feed_type: action
    namespace: threatq.actions.greyNoise.community
    user_fields:
      - name: GreyNoise_gnAPIkey  # Should be action specific
        label: GreyNoise API Key
        required: True
        mask: True
    invoking_filter:  # Required
      - invoke-connector:
          condition: !expr value.0.threatq_object_type in ["indicator"]
          filters: # Pre-Filters only apply to this scope. Results from here are passed to the Action
            - each:
                - drop: !expr value.type not in gn_ioc_type_map # Drop the indicators we dont want
          connector:
            iterate: True # For each value in the incoming list, run the Action. If the values of the list are lists, each list is yielded to the Action.
            name: GreyNoise
            return: value # Return what we were passed so that the next Action has unmodified incoming data.
            run-params: 
              object: !expr value
            to-stage: publish 

    source:
      http:
        url: !tmpl "https://api.greyNoise.io/v3/community/{{run_params.object.value}}"
        auth:
          simple:
            headers:
              key: !expr user_fields.GreyNoise_gnAPIkey
        status_code_handlers:
          404: ignore  # Or `pass` if handled

    filters:
      - parse-json
      - new:
          classification: !expr value.classification
          name: !expr value.name
          riot: !expr value.riot
          object: !expr run_params

    report:
      attribute-sets:
        GreyNoiseSet:
          items:
            - name: Classification
              value: !expr data.classification
              sources: "GreyNoise"  # Use Configurable sources
            - name: Name
              value: !expr data.name
              sources: "GreyNoise"
            - name: Noise
              value: !expr data.noise
              sources: "GreyNoise"
            - name: Riot
              value: !expr data.riot
              sources: "GreyNoise"
      indicator-sets:
        default:
          attribute-sets:
            - GreyNoiseSet
          items:
            - type: !expr data.object.type.name
              value: !expr data.object.value
              sources: !expr data.object.sources[0]  # report on the source if you want to avoid adding a new source to the primary object.
Writing an invoking_filter

As of ThreatQ Version 5.12.1 the Action definition file should contain an Invoke Connector Filter as shown in the example above. In previous versions of ThreatQ, the Action defintion was invoked using the set filter. This is considered a deprecated pattern, and an author should use the invoke-connector filter instead.

This section intends to describe common implementations and caveats for the invoking_filter. In these examples, we must acknowledge that the incoming value to the invoke-connector filter is a list. This is because in our generated workflow, the Threat Collection Source is yielding a list of objects to the filter chain.

Single Submit

The most common use case for the invoking_filter is to invoke an Action only when an indicator of matching subtype is being passed through the Workflow. For example, if you want to invoke an Action only when an IP Address indicator is being passed through the Workflow, you would use the following invoking_filter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
invoking_filter:
  - invoke-connector:
      condition: !expr value.0.threatq_object_type == 'indicator' # Ensure the incoming list is of indicators
      filters:
        - each:
            - drop: !expr value.type not in ["IP Address"]
      connector:
        name: My Action Name
        iterate: True
        to-stage: publish
        run-params: !expr value
        return: value

In the above example we are using the invoke-connector filter to invoke the Action, and we are using the each filter to iterate over the list of objects being passed through the Workflow. We then use the drop filter to drop any objects that are not of type IP Address.

After our filters have been executed, the Action is invoked with the remaining objects in the list. In this case, the iterate argument is set to True because our filters have returned a list of objects. If the iterate argument were set to False, the Action would be invoked with a list of objects as the value assigned in run-params: !expr value

When iterate is set to True, the Action will be invoked once for each object in the list. In this case, the Action invocations are executed asynchronously, so the order of execution is not guaranteed. If you need to ensure that the Action invocations are executed in a specific order, or have conflicting rate limits, you can use the seq argument instead. This will cause the Action invocations to be executed sequentially.

to-stage is configurable, representing the stage in the Workflow that the Action should be run to. The most common use-case will be to run the Action to the publish stage, but an author may choose to run the Action to the source stage if the Action is not intended to publish any data to ThreatQ.

return is a powerful argument that allows dynamic returns for advanced use-cases in Advanced Workflows only. If the writing of an Action intended for use in the workflow builder, this return should always be set to value, which is the default case.

Bulk Submit

If an author wishes to invoke an Action with a list of objects, many of the same notes from the previous section apply. However, there are some additional considerations to be made.

The following is an example of an invoke-connector filter that will invoke an Action with a list of 25 IP Address indicators:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
invoking_filter:
  - invoke-connector:
      condition: !expr value.0.threatq_object_type == 'indicator'
      filters:
        - each:
            - drop: !expr value.type not in ["IP Address"]
        - iterate:
            chunk_size: 25
      connector:
        name: My Action Name
        iterate: True
        to-stage: publish
        run-params:
            value: !expr value
            ips: !expr value | map(attribute='value')|join(',')
        return: value

In the above example, we are using the iterate filter to iterate over the list of objects being passed through the Workflow. We are also using the chunk_size argument to specify that we want to iterate over the list in chunks of 25 objects. See Iterate Filter for more detail about this usage. In our case, we leverage this for endpoints that allow several objects to be submitted in a single request when they have a limit on the number of objects of 100 or less.

As of ThreatQ Version 5.12.1, the incoming list of values has an absolute maximum of 100 objects. The 100 object limit is not yet configurable, applies to all Workflows, and represents the limit that the ThreatLibrary request yields to the filter chain. This limit does not consider object subtypes, and so your result will most likely be less than 100 objects unless the data-collection defined for the Workflow is very specific.

The run-params argument contains a mapping of key-value pairs that will be passed to the Action as arguments. In this example, we are passing the entire list of objects as the value assigned to the value key, and we are also passing a comma-separated list of the IP Address values as the value assigned to the ips key. This would then be used in our action POST body.

Since we preserve the original list of objects in the value key, we would then use run_params.value for reporting (and implicitly, publishing) in the Action.

Creating Pynoceros Commands

Pynoceros commands can be created via the tq-scaffold command line tool and leveraged once installed on the platform.

Creating a Pynoceros Command

The tq-scaffold tool can be used to quickly create a project structure for a new Pynoceros command. In order to create a new command project, use the tq-scaffold tool with the command argument. By default, the project’s root directory is created in the current working directory. One can use the -o or --output-directory flag to specify the output directory for the scaffold project.

/opt/threatq/python/bin/tq-scaffold command

The tq-scaffold command will prompt the user with a series of interactive questions, displaying the default value for each option in brackets. At the end of this interactive process, a note is displayed to the user mentioning the base class for the command is adjustable, providing details about optional arguments and unit tests:

friendly_name [Friendly Name for Command]: mycommand
vendor_prefix [tq]:
slug [mycommand]: mycmd
version [0.0.1]:
pypi_name [tq-mycmd]:
pkg_name [tq_mycmd]: mycmd_pkg
class_name [mycommand]: Mycommand
short_description [Short description of the command's purpose.]: This is my custom command
want_tq_api_credential_parameters [y]:
want_logging_parameters [y]:
want_proxy_parameters [y]:
want_config_file_support [y]:
config_file_path [auto-per-user]:
want_auto_serialized_data_output [y]:
want_threat_object_models [y]:
default_log_level_name [WARNING]:
min_threatq_version [4.56.0]:
full_name [Your Name]: Your Name
email [you@yourdomain.com]: yourname@tq.com

Note

The vendor_prefix field is intended to be overwritten by some unique vendor prefix. If a command is developed internally by ThreatQuotient, the default tq prefix should be used.

Note

The default value for min_threatq_version matches the version of the threatq-pynoceros package that is installed, which is in sync with the version of ThreatQ that is installed. A command is not able to be installed with a version of ThreatQ that is less than the default or user-provided min_threatq_version. If it is known that the command will work with a lesser version of ThreatQ, this should be modified to match the minimum required version. TODO: Plugins term

Scaffold Contents

The scaffold tool creates a new directory in the current folder or the folder given with the -o or --output-directory flag. The directory structure created by the command scaffold tool run with the input data from the previous paragraph is as follows:

/tq-mycmd/
├── setup.cfg
├── setup.py
├── README.md
└── mycmd_pkg
    └── __init__.py
└── tests
    └── test_mycmd_pkg.py
  • setup.cfg - This file has options that are used by linting tools, such as flake8.

  • setup.py - This file contains information that is used to create a package out of the scaffold. This file contains the following:

    from setuptools import find_packages, setup
    
    setup(
          name='tq-mycmd',
          description='This is my custom command',
          version='0.0.1',
          author='Your Name',
          author_email='yourname@tq.com',
          install_requires=[
              'threatq-pynoceros>=4.56.0',
          ],
          packages=find_packages(),
          entry_points={
              'console_scripts': [
                  'tq-mycmd = mycmd_pkg:mycmd_pkg',
              ],
          },
      )
    

    The user can specify any third party dependency packages in install_requires (using the standard Python versioning convention as specified in PEP 440).

  • test_mycmd_pkg - The folder tests contains the test file. The user can write any unit testing functions under this class and add more fixtures as needed. A default Python test function is provided.

  • __init__.py - This holds the logic of the command and is expected to be modified by the user to make the command functional.

Customizing the command

The scaffold tool builds the appropriate directory structure and instantiates a default class that the user is expected to fill in appropriately.

Implement the command

In order to implement the new command, the __init__.py file needs to be altered in order to make the command functional:

In its simplest form, the user is expected to modify the params variable and function run to customize the command. The default generated code includes command line parameters and tests are intended only as examples. The parameters must be replaced according to the needs of your command, and tests must be modified/replaced in order to make assertions on the command’s expected results. The documentation for the “click” library may be helpful.

This file contains code as shown below:

import click

from threatq.core.models import Models
from threatq.core.tools.base import CommandError, PynocerosCommand

class Mycommand(PynocerosCommand):
    """
    This is my custom command
    """

    name = 'mycmd'
    config_param_sets = ('tq_api', 'logging', 'proxies')
    params = (
        click.Option(['--first-param', '-f'], help='Help text for parameter'),
        click.Option(['--second-param'], help='Help text for parameter'),
        click.Argument(['argument'], required=True),
    )
    config_file = True
    data_output = True

    async def run(self, first_param, second_param, argument):
        await self.ctx.request_resource(Models)
        # Implement logic here
  • name - holds the name of the command.

  • params - holds the command line arguments of the command.

  • run() - runs the command logic. Here is an example using the previously generated Mycommand class:

    import click
    
    from threatq.core.models import Models
    from threatq.core.tools.base import CommandError, PynocerosCommand
    
    class Mycommand(PynocerosCommand):
        """
        This is my custom command
        """
    
        name = 'mycmd'
        config_param_sets = ('tq_api', 'logging', 'proxies')
        params = (
            click.Option(['--first-param', '-f'], help='Help text for parameter'),
            click.Option(['--second-param'], help='Help text for parameter'),
            click.Argument(['argument'], required=True),
        )
        config_file = True
        data_output = True
    
        async def run(self, first_param, second_param, argument):
            await self.ctx.request_resource(Models)
    
            # Example: text output
            click.secho('This line outputs to stderr in yellow.', fg='yellow', err=True)
    

The default base class for the generated command is PynocerosCommandBase. There are several possible base classes defined in threatq.dynamo.tools.* that could be used as reference.

Bundling

Command projects generated via tq-scaffold can be bundled into standard Python whl packages. To bundle the new command, run the following command from the project’s root directory:

python setup.py bdist_wheel

Note

The whl package must be built on Python version 3.5+. If you are on a ThreatQ appliance, this version is

available in a virtual environment. To drop in Python 3.5 on the ThreatQ appliance, run the following command:

source /opt/threatq/python/bin/activate

Once the package is bundled, the whl file is placed in the dist directory at the project’s root.

Installing the Command

Once built, the whl file must be installed before the Command can be leveraged. The command can be installed via the following command:

sudo -u apache /opt/threatq/python/bin/pip install <Path to whl>

Note

To update a command that is already installed, the --upgrade flag can be used with pip install

Once installed, the command should be available for use.

Usage

After successfully installing the plugin, the new command is ready to be used (using the name).

Creating CDF Filters

CDF Filters can be created via the tq-scaffold command line tool and leveraged within any CDF once installed on the platform.

Creating a Filter package

The tq-scaffold tool can be used to quickly create a project structure for a new CDF Filter. In order to create a new filter project, use the tq-scaffold tool with the filter argument. By default, the project’s root directory is created in the current working directory. One can use the -o or --output-directory flag to specify the output directory for the scaffold project.

/opt/threatq/python/bin/tq-scaffold filter

The tq-scaffold command will prompt the user with a series of interactive questions, displaying the default value for each option in brackets. At the end of this interactive process, a note is displayed to the user mentioning the base class for the filter is adjustable, providing details about optional arguments and unit tests:

[youruser@yourbox bin]$ /opt/threatq/python/bin/tq-scaffold filter -o /home/youruser
friendly_name [Friendly Name for Filter]: myfilter
vendor_prefix [tq]: mycompany
slug [myfilter]: myfilter
version [0.0.1]: 0.0.1
pypi_name [mycompany-myfilter]: myfilter
pkg_name [mycompany_myfilter]: myfilter_package
class_name [myfilter]: Myfilter
short_description [Short description of the filter's purpose.]: Short description of my filter.
min_threatq_version [4.4.4]: 4.24.0
full_name [Your Name]: Myname
email [you@yourdomain.com]: myname@mycompany.com

Note

The vendor_prefix field is intended to be overwritten by some unique vendor prefix. If a filter is developed internally by ThreatQuotient, the default tq prefix should be used.

Note

The default value for min_threatq_version matches the version of the threatq-pynoceros package that is installed, which is in sync with the version of ThreatQ that is installed. A plugin is not able to be installed with a version of ThreatQ that is less than the default or user-provided min_threatq_version. If it is known that the plugin will work with a lesser version of ThreatQ, this should be modified to match the minimum required version.

Scaffold Contents

The scaffold tool creates a new directory in the current folder or the folder given with the -o or --output-directory flag. The directory structure created by the filter scaffold tool run with the input data from the previous paragraph is as follows:

/tq-myfiler/
├── setup.cfg
├── setup.py
├── README.md
└── docs
    └── myfilter.rst
└── myfilter_package
    └── __init__.py
└── tests
    ├── conftest.py
    └── test_myfilter_package.py
  • setup.cfg - This file has options that are used by linting tools, such as flake8.

  • setup.py - This file contains information that is used to create a package out of the scaffold. This file contains the following:

    from setuptools import find_packages, setup
    
    setup(
        name='myfilter',
        description="Short description of my filter.",
        packages=find_packages(),
        version='0.0.1',
        author='Myname',
        author_email='myname@mycompany.com',
        install_requires=[
            'threatq-pynoceros>=4.24.0',
        ],
        entry_points={
            'threatq.dynamo.feeds.filters': [
                'myfilter = myfilter_package:Myfilter'
            ],
        },
    )
    

    The user can specify any third party dependency packages in install_requires (using the standard Python versioning convention as specified in PEP 440).

  • test_myfilter_package - The folder tests contains the test file and a conftest.py file with fixtures. The user can write any unit testing functions under this class and add more fixtures as needed. A default Python test function is provided (the user is expected to modify the values and expected_call_result_values fixtures, as the default test will check they are equal after the filter transformation).

  • __init__.py - This holds the logic of the filter and is expected to be modified by the user to make the filter functional.

Customizing the filter

The scaffold tool builds the appropriate directory structure and instantiates a default class that the user is expected to fill in appropriately.

Implement the Filter

In order to implement the new filter, the __init__.py file needs to be altered in order to make the filter functional:

In its simplest form, the user is expected to modify the function transform to customize the filter. The function argument represents the input data and the functions should return the data transformed as expected by the filter. The default generated code makes no transformation on the input data and it just returns the received value.

This file contains code as shown below:

import typing as t

from threatq.dynamo.feeds.filters.base import FunctionFilter


class Myfilter(FunctionFilter):

  entry_points = ('myfilter',)
  """"""

  @staticmethod
  def _signature():
      """"""

  def transform(self, value: t.Any) -> t.Any:
      return value
  • entry_point - holds the name of the filter to be referenced in a CDF.

  • _signature() - holds the arguments of the filter. Any argument defined with the _signature function can be referenced in the transform function through self.args.argument_name. Default values can be set for keyword arguments just like any standard Python function. Here is an example using the previously generated Myfilter class:

    class Myfilter(FunctionFilter):
    
        entry_points = ('myfilter',)
        """"""
    
        @staticmethod
        def _signature(prefix_arg: str, optional_arg: bool = True):
            """"""
    
        def transform(self, value: t.Any) -> t.Any:
            if self.args.optional_arg:
                return self.args.prefix_arg + value
    

    In a new CDF, the myfilter will be used with arguments as follows:

1
2
3
    filters:
      myfilter:
        prefix: 'myprefix'

Note

If you have only one positional argument for your filter, (here, that would be only having prefix_arg), you can short hand the filter call in the CDF by writing myfiler: 'myprefix'

The default base class for the generated filter is FunctionFilter. There are several possible base classes defined in threatq.dynamo.feeds.filters.base that could be used.

The optional arguments for transform() method currently are: args, parent_values and value_ancestry. If any of these are added to the signature, they will automatically be included when the transform function is called (see TransformFilter.call_transform()).

Note

The transform() method needs to be made async if the body of the method requires awaiting on an async operation.

Bundling

CDF Filter projects generated via tq-scaffold can be bundled into standard Python whl packages. To bundle the new filter, run the following command from the project’s root directory:

python setup.py bdist_wheel

Note

The whl package must be built on Python version 3.5+. If you are on a ThreatQ appliance, this version is

available in a virtual environment. To drop in Python 3.5 on the ThreatQ appliance, run the following command:

source /opt/threatq/python/bin/activate

Once the package is bundled, the whl file is placed in the dist directory at the project’s root.

Installing the Filter

Once built, the whl file must be installed before the Filter can be leveraged within a CDF. The filter can be installed via the following command:

sudo -u apache /opt/threatq/python/bin/pip install <Path to whl>

Note

To update a filter that is already installed, the --upgrade flag can be used with pip install

Once installed, the filter should be available for use within a CDF - just leverage the Filter by entry point name within the Filter Chain.

Usage

After successfully installing the plugin, the new filter is ready to be used (using the name specified by the entry_point name) in any filter section of a CDF.

Creating CDF Sources

CDF Sources can be created via the tq-scaffold command line tool and leveraged within any CDF once installed on the platform. The purpose of the source is to pull down information from a provider and yield the results back, so that they can be passed to the filter chain for further processing.

Creating a Source package

The tq-scaffold tool can be used to quickly create a project structure for a new CDF Source. In order to create a new source project, use the tq-scaffold tool with the source argument. By default, the project’s root directory is created in the current working directory. One can use the -o or --output-directory flag to specify the output directory for the scaffold project.

/opt/threatq/python/bin/tq-scaffold source

The tq-scaffold command will prompt the user with a series of interactive questions, displaying the default value for each option in brackets. At the end of this interactive process, a note is displayed to the user mentioning the base class for the source and unit tests:

[youruser@yourbox bin]$ /opt/threatq/python/bin/tq-scaffold source -o /home/youruser
friendly_name [Friendly Name for Source]: mysource
vendor_prefix [tq]: mycompany
slug [mysource]: mysource
version [0.0.1]: 0.0.1
pypi_name [mycompany-mysource]: mysource
pkg_name [mycompany_mysource]: mysource_package
class_name [mysource]: Mysource
short_description [Short description of the source's purpose.]: Short description of my source.
min_threatq_version [4.4.4]: 4.24.0
full_name [Your Name]: Myname
email [you@yourdomain.com]: myname@mycompany.com

Note

The vendor_prefix field is intended to be overwritten by some unique vendor prefix. If a new source is developed internally by ThreatQuotient, the default tq prefix should be used.

Note

The default value for min_threatq_version matches the version of the threatq-pynoceros package that is installed, which is in sync with the version of ThreatQ that is installed. A plugin is not able to be installed with a version of ThreatQ that is less than the default or user-provided min_threatq_version. If it is known that the plugin will work with a lesser version of ThreatQ, this should be modified to match the minimum required version.

Scaffold Contents

The scaffold tool creates a new directory in the current folder or the folder given with the -o or --output-directory flag. The directory structure created by the source scaffold tool run with the input data from the previous paragraph is as follows:

/tq-mysource/
├── setup.cfg
├── setup.py
├── README.md
└── mysource_package
    └── __init__.py
└── tests
    └── test_mysource_package.py
  • setup.cfg - This file has options that are used by linting tools, such as flake8.

  • setup.py - This file contains information that is used to create a package out of the scaffold. This file contains the following:

    from setuptools import find_packages, setup
    
    setup(
        name='mysource',
        description="Short description of my source.",
        packages=find_packages(),
        version='0.0.1',
        author='Myname',
        author_email='myname@mycompany.com',
        install_requires=[
            'threatq-pynoceros>=4.24.0',
        ],
        entry_points={
            'threatq.dynamo.feeds.source_types': [
                'mysource = mysource_package:Mysource'
            ],
        },
    )
    

    The user can specify any third party dependency packages in install_requires (using the standard Python versioning convention as specified in PEP 440).

  • test_mysource_package - The folder tests contains this test file. The user can write any unit testing functions under this class and add fixtures as needed.

  • __init__.py - This holds the logic of the source and is expected to be added by the user to make the source functional. The initial implementation of the fetch function yields None.

Customizing the source

The scaffold tool builds the appropriate directory structure and instantiates a default class that the user is expected to fill in appropriately.

Implement the Source

In order to implement the new source, the __init__.py file needs to be altered in order to make the source functional:

In its simplest form, the user is expected to modify the function fetch to customize the source. The generated code processes no data and simply yields None.

This generated file contains code as shown below:

from async_generator import async_generator, yield_
from threatq.dynamo.feeds.sources.base import FeedSource


class Mysource(FeedSource):
    """
    my description
    """

    entry_points = ('mysource',)

    @staticmethod
    def _signature():
        """"""

    @async_generator
    async def fetch(self):
        """
        Insert logic here, currently just yielding None.

        Returns:
            AsyncGenerator: Async Generator yielding return values
        """
        await yield_(None)
  • entry_point - value is the name of the source to be referenced in the CDF file.

  • _signature() - holds the arguments of the source. Any argument defined with the _signature function can be referenced in the fetch function through self.args.argument_name. Default values can be set for keyword arguments just like any standard Python function. Here is an example using the previously generated Mysource class:

    class Mysource(FeedSource):
    
        entry_points = ('mysource',)
    
        # noinspection PyMethodOverriding
        @staticmethod
        def _signature(content: str, optional_argument: bool = True):
            """"""
    
        @async_generator
        async def fetch(self):
            if self.args.optional_argument:
                yield self.args.content
    

    In a new CDF, the mysource will be used with arguments as follows:

1
2
3
4
5
feeds:
  CustomFeed:
    source:
      mysource:
        content: 'Custom source'

Note

If you have only one positional argument for your source, (here, that would be only having content), you can short hand the source call in the CDF by writing mysource: 'mycontent'.

Note

The default base class for the generated source is FeedSource. The fetch function is a coroutine; specifically, it is an asynchronous generator yielding return values via the await yield_(value) syntax. To pass values to the Filter Chain, one simply has to use the await yield_ syntax instead of returning a value and/or list of values.

Bundling

CDF Source projects generated via tq-scaffold can be bundled into standard Python whl packages. To bundle the new source, run the following command from the project’s root directory:

python setup.py bdist_wheel

Note

The whl package must be built on Python version 3.5+. If you are on a ThreatQ appliance, this version

is available in a virtual environment. To drop in Python 3.5 on the ThreatQ appliance, run the following command:

source /opt/threatq/python/bin/activate

Once the package is bundled, the whl file is placed in the dist directory at the project’s root.

Installing the Source

Once built, the whl file must be installed before the Source can be leveraged within a CDF. The source can be installed via the following command:

sudo -u apache /opt/threatq/python/bin/pip install <Path to whl>

Note

To update a source that is already installed, the --upgrade flag can be used with pip install

Once installed, the source should be available for use within a CDF - just leverage the source by entry point name within the source section.

Usage

After successfully installing the plugin, the new source is ready to be used (using the name specified by the entry_point name) in any source section of a CDF.

Operations

The Pynoceros codebase provides support to execute special plugins (built as python whl packages) called operations. Operations typically allow a user to enrich data in the ThreatQ appliance or send ThreatObjects to third party vendors’ software, although their usage is not limited to these examples. This section provides a general overview of writing operations. This section describes how to write, install, and execute a simple operation on the ThreatQ platform.

Quickstart

Overview

ThreatQ provides a quickstart tool to build a project scaffold for an operation as described in Project scaffold. The scaffold tool builds the appropriate directory structure and instantiates a default class that the user is expected to fill in appropriately.

Execution

When you call tq-scaffold from the command line (using the full path) as shown below, it asks a few interactive questions. Default answers for each question are specified in braces and most answers can be defaulted.

Below is an example plugin that can be created by using the tq-scaffold tool. This example plugin is based on HTTP and does not use authentication. It has only one action called get_data (which is the default action name provided by the tq-scaffold tool).

[youruser@yourbox bin]$ /opt/threatq/python/bin/tq-scaffold operation -o /home/youruser
plugin_friendly_name [My New Plugin]: Example Plugin
vendor_prefix [tq]:
plugin_slug [example-plugin]:
plugin_version [0.0.1]:
plugin_pypi_name [tq-op-example-plugin]:
plugin_pkg_name [tq_op_example_plugin]:
plugin_class_name [ExamplePlugin]:
plugin_short_description [General, non-action-specific info about this plugin.]: This is an Example Plugin.
plugin_uses_http [y]:
http_plugin_needs_authentication [y]: n
action_method_name [get_data]:
action_help [Info about what this action does]: Gets some data.
min_threatq_version [4.11.1]:
full_name [Your Name]: ThreatQuotient
email [you@yourdomain.com]: info@threatq.com

Note

The vendor_prefix field is intended to be overwritten by some unique vendor prefix. If an operation is developed internally by ThreatQuotient, the default tq prefix should be used.

Note

The default value for min_threatq_version matches the version of the threatq-pynoceros package that is installed, which is in sync with the version of ThreatQ that is installed. A plugin is not able to be installed with a version of ThreatQ that is less than the default or user-provided min_threatq_version. If it is known that the plugin will work with a lesser version of ThreatQ, this should be modified to match the minimum required version.

Scaffold Contents

The above creates a directory called tq-op-example-plugin in the /home/youruser folder. The scaffold directory structure is as follows:

/tmp/tq-op-example-plugin/
├── setup.cfg
├── setup.py
├── test_tq_op_example_plugin.py
└── tq_op_example_plugin
    ├── __init__.py
    └── static
        └── plugin_logo.png

Note

The configurations and examples shown in this scaffold represent what ThreatQ considers to be best practices. However, for projects developed by customers or third parties, they are intended to be customizable. They should be viewed as educated suggestions rather than constraints.

  • setup.cfg - This file has options that are used by linting tools, such as flake8.

  • setup.py - This file contains information that is used to create a package out of the scaffold. This file contains the following:

    from setuptools import find_packages, setup
    
    setup(
        name='tq-op-example-plugin',
        version='0.0.1',
        author='ThreatQuotient',
        author_email='info@threatq.com',
        install_requires=[
            'threatq-pynoceros>=4.11.1',
        ],
        packages=find_packages(),
        package_data={
            'tq_op_example_plugin': [
                'static/*.*',
            ],
        },
        entry_points={
            'threatq.plugins.api': [
                'example-plugin = tq_op_example_plugin:ExamplePlugin'
            ],
        },
    )
    

    Most of the required libraries are populated by the interactive tq-scaffold program. The user may want to specify any third party dependency packages in install_requires (using the standard Python versioning convention as specified in PEP 440).

    If the plugin requires additional data files that must be packaged and installed with it, the user can add filepath or glob pattern strings to the list associated with the tq_op_example_plugin key in the package_data parameter’s dictionary mapping.

  • test_tq_op_example_plugin.py - This file contains a test class (which in our case is automatically named as TestExamplePlugin). The user can write any unit testing functions under this class. Unit testing is strongly encouraged as best practice. The suggested unit testing framework is pytest, though any Python unit testing framework can be used. Two default Python test functions are provided (the user is expected to modify both functions as appropriate). Additional test functions can be written as necessary.

  • logo files - There are two logos under the static folder. The plugin_logo.png appears on the Operations Management page in the ThreatQ web interface as shown here. The scaffold includes placeholder images for these logos, but they should certainly be replaced. When selecting your logo images, keep in mind that they are constrained to 100x100 pixels when displayed. It is also suggested to use PNG images with transparent backgrounds in most cases.

  • __init__.py - This is the meat of the operation and is expected to be modified by the user to make this operation functional. While one can probably fit all the code necessary for the operation within this file, it should be noted that more files can be added to the operation in order to logically organize code and classes. In its simplest form, the user is expected to modify the function get_data to make this operation functional. This file contains code as shown below:

    import threatq.core.lib.markup as markup
    from threatq.core.lib.plugins import action, APIPluginResponse, HTTPAPIPlugin
    
    
    class ExamplePlugin(HTTPAPIPlugin):
        """This is a Example Plugin."""
    
        friendly_name = 'Example Plugin'
        entry_points = ('example-plugin',)
        external_endpoints = ['https://some.provider']
        version = '0.0.1'
        static_logo_file = 'plugin_logo.png'
    
        @action(
            accepts={
                'indicator': ('FQDN',),
                # Other possible examples:
                # 'event': ('Spearphish',),
                # 'signature': ('Snort',),
                # 'attachment': ('Generic Text',)
                # 'adversary': True,
            },
            # Specify user-provided inputs that vary *per execution*
            parameters=[
                # These entries are just examples. In actual use, the params names must be added to the action signature.
                # {
                #     'name': 'days',
                #     'default': 5,
                #     'required': True,
                #     'label': 'How many days back to search?',
                #     'description': 'The maximum number of days back the provider should search for matches.',
                # },
                # {
                #     'name': 'include_unconfirmed',
                #     'default': True,
                #     'boolean': True,
                #     'label': 'Include unconfirmed results?',
                #     'description': 'Determines if the action should include unconfirmed matches in its results.',
                # },
            ],
            help='Gets some data.',
        )
        async def get_data(self, indicator):
            parsed_response = await self.get('https://some.provider/example_endpoint/{}'.format(indicator['value']))
            data = parsed_response.data['response']
            resp_markup = markup.Info('Use markup classes to build response markup based on data')
            return APIPluginResponse(data=data, markup=resp_markup)
    

Development Guidelines

Overview

The quickstart tool that builds an operation scaffold is described in the Quickstart section. In this section, we describe the general guidelines on writing the operation. Some of the guidelines are standard Python coding conventions, while some of them are specific to how operations should be written on the ThreatQ platform for performance and maintainability.

Updating various parts of the operation

Import statements

In Python, import statements should be included at the top of the file. To minimize conflicts on changes, they are typically done in three groups:

  • Python Core modules (such as datetime and json)

  • Third party Python modules external to this plugin. Included here should be any ThreatQ classes imported into the operation, eg threatq.core, which provides the plugin base classes.

  • Local modules to this plugin. These should be relative imports beginning with a .

Each module section should list the module imports alphabetically, this allows for simpler differentiation when using a source control system.

Since we have built an SDK for putting plugins together, generally one only needs to pull in modules that are in the plugins and markup modules. It is best practice to pull in ONLY the modules, classes, and/or methods that you’ll need.

So, if you want to pull in the markup for a Heading and PreformattedText and then base your plugin on our HTTPAPIPlugin, your import statement would look like:

from threatq.core.lib.markup import Heading, PreformattedText
from threatq.core.lib.plugins import HTTPAPIPlugin

Note

The Scaffold Tool automatically adds in additional import statements in the file. The user can remove unused import statements as appropriate.

Class Definition

You can create an operation using only the generated __init__.py file, though one can add as many files as necessary to logically order code. The class should be defined with a name that is identifiable as unique: such as DomainToolsPlugin or VirusTotalPlugin. This doesn’t really matter to the end user as it won’t get used for anything other than Python references.

Note

The Scaffold Tool automatically derives a class name and allows the user to override it during the interactive phase.

The class should be subclassed from one of the provided Plugin base classes: APIPlugin or HTTPAPIPlugin. Most enrichment sources are queried via HTTP web service requests, so HTTPAPIPlugin is commonly used, as it provides behaviors that greatly simplify this type of usage. In cases where HTTP isn’t used, (e.g. alternative remote call mechanisms such as XML-RPC or SOAP, or where the plugin is completely self-contained and requires no remote calls), APIPlugin should be used.

Specifically, HTTP-related behaviors provided by HTTPAPIPlugin are:

  • HTTP request methods, such as self.get() and self.post()

  • Some automatic, controllable decoding of the response.

  • The self.setup_auth_plugin() method, which allows you to select and setup one of our automatic authentication plugins for access to the remote system.

The Scaffold Tool creates the following class definition (if the user chose HTTP plugin during the interactive phase):

class ExamplePlugin(HTTPAPIPlugin):
"""This is an Example Plugin."""
Definable Class Properties

Once you have the class defined, you then need to have some properties defined on the class.

  • entry_points: This is a list or tuple of strings. Usually, this should only contain one string that is the name of the plugin that you’re creating.

  • external_endpoints: This should be a list of domains and URLs that the plugin needs access to.

  • friendly_name: This is the name that is shown in the UI as well as the Source Name. It helps to make something like example_plugin look better as Example Plugin. If you don’t specify this name, the name of the class is used. Using our example class declaration above, the default friendly name would be Example Plugin.

  • static_logo_file: Path to a logo image file for this plugin. Must be a path relative to the “static” directory.

  • version: This is the current version of the plugin. You should set this and iterate it following the Semantic Versioning Spec .

    Note

    The Scaffold Tool automatically creates defaults (unless specifically overridden by

    the user during the interactive phase) as below.

    friendly_name = 'Example Plugin'
    entry_points = ('example-plugin',)
    external_endpoints = ['https://some.provider']
    version = '0.0.1'
    static_logo_file = 'plugin_logo.png'
    
Optional Definable Class Properties
  • status_text_overrides: This should be a dictionary key/value hash that has the HTTP Error code as the key, and the friendlier error message you’d like to override. There are already some built in matches, they include:

    {
      400: "Failed to process request",
      403: "The provided credentials are incorrect.",
      404: "Record not found in data-set.",
      500: "Unknown Server Error.",
      503: "Service Unavailable.",
    }
    
  • user_field_spec: This is a list of dictionaries describing configuration settings for the plugin. This allows the settings to be configured via the UI on the “Operations Management” page. To learn more about the configuration options available when declaring user fields, see the User Fields and Parameters page. User fields are declared for Operations in the following manager:

    user_field_spec = [
      {
        'name': 'api_username',
        'mask': False
      },
      {
        'name': 'api_key',
        'mask': True
      }
    ]
    
The Plugin Execution Context Property

During execution, the plugin has a context namespace available at self.ctx. It holds the current state of the entire execution, exposing all other applicable properties:

  • user_fields: This is a dictionary of the configuration that is passed in to the plugin call. It should have the keys of the field names that were defined in user_field_spec previously and the values that were set by the user in the UI. These user fields are often used only for authentication information. For this case, it is recommended to avoid referencing them directly, Instead, reference them within templated fields specified in the auth plugin setup, as described below.

User Definable Class Methods

These are the methods that you want to override to do various things.

  • setup: This method is synchronous and is called once immediately after a Plugin Class is instantiated. You do not have to call it explicitly. Generally you would want to set up authentication or whatever is necessary for all actions on a plugin to operate. The following example creates a SimpleAuth instance:

    def setup(self):
      self.setup_auth_plugin('simple', headers={'Authorization': '{{user_fields.api_key}}')
    

    In this block, an error is thrown if api_key is not passed in or set. If it is set, then we build a SimpleAuth instance and set it on the auth property.

Actions

Actions are the actual legwork of a plugin. A plugin needs at least ONE action to do anything. An action can do almost anything you can imagine, from attaching an attribute to making an external call and displaying the results to the user. Any method on the class can be defined as an action simply by using the @action decorator.

Python uses decorators to wrap methods, such as tracking or injecting information to a method. For more on those, look here.

The ThreatQ action decorator is located in the Plugins (plugins) module. You can import like the example below:

from threatq.core.lib.plugins import action

The Scaffold Tool automatically has created a method with the decorator as shown below.

@action(
    accepts={
        'indicator': ('FQDN',),
        # Other possible examples:
        # 'event': ('Spearphish',),
        # 'signature': ('Snort',),
        # 'attachment': ('Generic Text',)
        # 'adversary': True,
    },
    parameters=[
          {
              'name': 'days',
              'default': 5,
              'required': True,
              'label': 'How many days back to search?',
              'description': 'The maximum number of days back the provider should search for matches.',
          },
          {
              'name': 'include_unconfirmed',
              'default': True,
              'boolean': True,
              'label': 'Include unconfirmed results?',
              'description': 'Determines if the action should include unconfirmed matches in its results.',
          },
    ],
    help='Gets some data.',
)
async def get_data(self, indicator, days, include_unconfirmed):
    url = yarl.URL('https://some.provider/example_endpoint/')/indicator['value']
    parsed_response = await self.get(url, params={'days': int(days), 'include_unconfirmed': int(include_unconfirmed)})
    data = parsed_response.data['response']
    resp_markup = markup.Info('Use markup classes to build response markup based on data')
    return APIPluginResponse(data=data, markup=resp_markup)

The decorator takes the following arguments:

  • accepts: This is a dictionary mapping that indicates which object types this action can handle. If you define it here, then the action reports that it accepts it. A full example is shown here:

    {
      "adversary": "*",
      "event": (
          "Anonymization", "Command and Control", "Compromised PKI Certificate",
          "DoS Attack", "Exfiltration", "Host Characteristics",
          "Incident", "Login Compromise", "Malware",
          "Spearphish", "SQL Injection Attack", "Watchlist", "Watering Hole"
      ),
      "file": (
          "CrowdStrike Intelligence", "Cuckoo",
          "Early Warning and Indicator Notice (EWIN)",
          "FBI FLASH", "FireEye Analysis", "Generic Text",
          "Intelligence Whitepaper", "iSight Report",
          "iSight ThreatScape Intelligence Report", "JIB", "MAEC",
          "Malware Analysis Report", "Malware Initial Findings Report (MFIR)",
          "Malware Sample", "Packet Capture", "Palo Alto Networks WildFire XML",
          "PCAP", "PDF", "Private Industry Notification (PIN)", "Spearphish Attachment",
          "STIX", "ThreatAnalyzer Analysis", "ThreatQ CSV File", "Whitepaper"
      ),
      "indicator": (
          "CIDR Block", "Email Address", "Email Attachment", "Email Subject",
          "File Path", "Filename", "FQDN", "Fuzzy Hash",
          "GOST Hash", "IP Address", "MD5", "Mutex",
          "Password", "Registry Key", "SHA-1", "SHA-256", "SHA-384",
          "SHA-512", "String", "URL", "URL Path", "User-agent",
          "Username", "X-Mailer"
      ),
      "signature": "*"
    }
    
  • parameters: This is an optional list of parameters that a user executing the action should be able to specify via the UI. Its structure is exactly like that of user_field_spec above. During execution, the user-specified parameters are passed to the action as keyword parameters using the parameter names (which therefore must be valid Python identifiers, and usually present in the action signature explicitly). Note that with the exception of those parameters marked as boolean, all parameters are passed as strings currently. Any necessary validation/conversion should be performed within the implementation of the action. To learn more about the configuration options available when declaring parameters, see the User Fields and Parameters page. The following is an example of a parameters declaration:

[
    {
        'name': 'days',
        'default': 5,
        'required': True,
        'label': 'How many days back to search?',
        'description': 'The maximum number of days back the provider should search for matches.',
    },
    {
        'name': 'include_unconfirmed',
        'default': True,
        'boolean': True,
        'label': 'Include unconfirmed results?',
        'description': 'Determines if the action should include unconfirmed matches in its results.',
    },
]
  • static_logo_file: Path to a logo image file for this specific action. Must be a path relative to the “static” directory. If not provided, an identifying logo is constructed using the static_logo_file of the plugin itself (if specified) with the action name. If it is provided, no additional text decoration is applied, so be sure that the action logo presents an identification that is clear to users.

  • help: This is a string that represents a friendly description of what this action is doing.

An Example Operation

Overview

The quickstart tool that builds an operation scaffold is described in the Quickstart section. In this section, we write a simple operation building on the guidelines described in the Guidelines section.

Note

Modern ThreatQ integrations, including Operations, are run in an asynchronous, concurrent environment. Be aware that at any given time, the underlying ThreatQ libraries may be performing background tasks on your behalf (for example, efficiently uploading created objects to the API is handled in this way).

The example operation uses a simple online random JSON blob generator. This operation contains an action that can be executed against IP Address and FQDN indicators. The JSON blob that gets returned contains a number of key/value pairs. We will map these pairs to attributes and add those attributes to the said indicator.

One example of the query and its output is displayed below.

[youruser@yourbox bin]$ curl https://jsonplaceholder.typicode.com/todos/1
{
  "userId": 1,
  "id": 1,
  "title": "delectus aut autem",
  "completed": false
}

A Complete Operation Writeup Guide

The jsonplaceholder service supports the following resources that can be accessed using the GET service.

/posts        100 posts
/comments     500 comments
/albums       100 albums
/photos       5000 photos
/todos        200 todos
/users        10 users

In our simple operation, we call the todos endpoint and request a few random todos.

The full operation code is below and the explanation follows.

from markup import Button, Column, MultiMarkup, Table
from threatq.core.lib.plugins import action, APIPluginResponse, HTTPAPIPlugin


class ExamplePlugin(HTTPAPIPlugin):
    """This is an Example Plugin."""

    friendly_name = 'Example Plugin'
    entry_points = ('example-plugin',)
    external_endpoints = ['https://jsonplaceholder.typicode.com']
    version = '0.0.1'
    static_logo_file = 'plugin_logo.png'

    @action(
        accepts={
            'indicator': ('FQDN', 'IP Address'),
        },
        parameters=[
            {
                'name': 'days',
                'default': 5,
                'required': True,
                'label': 'How many days back to search?',
                'description': 'The maximum number of days back the provider should search for matches.',
            },
            {
                'name': 'include_unconfirmed',
                'default': True,
                'boolean': True,
                'label': 'Include unconfirmed results?',
                'description': 'Determines if the action should include unconfirmed matches in its results.',
            },
        ],
        help='Gets some data.',
    )
    async def get_data(self, indicator, days, include_unconfirmed):
        # The action itself.
        request_params = {'days': int(days), 'include_unconfirmed': int(include_unconfirmed)}
        parsed_response = await self.get('https://jsonplaceholder.typicode.com/todos/1', params=request_params)

        # Raw data from the response object.
        data = parsed_response.data

        # The response data is in the JSON format. Get all items (key/value pairs).
        attributes = data.items()

        # ***** Table Markup ***** #
        # Create a Button Markup object that is used in conjuction with the
        # table to add attributes to the Indicator. This parses the
        # (key, value) pairs from the first and second columns respectively
        att_attrs_buttons = Button(
          label='Add Attribute',
          action='add',
          object_type='attribute',
          action_params={'name': Column(0), 'value': Column(1)})

        # Create a Table Markup object that displays two columns, one for
        # the attribute name (key) and the second for the attribute value (value)
        attribute_table = Table(
        title='Attributes',
        headings=['Name', 'Value'],
        rows=list(attributes),
        buttons=[att_attrs_buttons],
      )

      # ***** Final Markup ***** #
      # This is the final Markup object displayed in the UI. It is
      # made up of the table from above
      resp_markup = MultiMarkup(attribute_table,)
      return APIPluginResponse(data=data, markup=resp_markup)

As seen above, the function get_data has three major tasks.

  • The actual action: In this case, the action constituted making a simple API call. However, an action could be arbitrarily complex.

    Note

    As noted above, actions are run asynchronously, and other concurrent tasks may be running at the same time. It is important that any calls that have the potential not to be nearly instantaneous are also run asynchronously so as not to disrupt other tasks. As such, in cases where third-party libraries are needed, it is usually best to select one with asynchronous (“asyncio”) support when available. Sometimes, the necessary functionality is available only in a synchronous library. Please see the section Running third-party synchronous code in a thread for such cases.

  • data: This is the raw data that is available from the response. It is set as a property of the APIPluginResponse object.

  • markup: The markup object created below uses a Button and a Table. The Table contains Key/Value pair columns. This Markup object is returned to the UI when the action is executed, and the Button allows the user to associate the Attributes to the Object in question. See the section Installing Operations for further details.

    Note

    A markup could be a simpler object that doesn’t have to be a table. For example:

    markup = MultiMarkup(Heading('Example'), Paragraph('Hello World!'))
    

Third-Party Synchronous Code

It is sometimes necessary to call synchronous functions when asynchronous counterparts are not available (typically in third-party code). Below is an example of how to handle these cases in a thread.

  • Only one synchronous function: In this case, we can execute it in a thread as follows:

    # Equivalent to "result = other_module.fetch_information(value)", but in a thread.
    result = await self.ctx.call_in_executor(other_module.fetch_information, value)
    

    Note

    call_in_executor runs any synchronous process in a thread. To pass parameters to the synchronous function being threaded, pass them in this manner: await self.ctx.call_in_executor(callable_function, arg1, arg2, kwarg1=1, kwarg2=2)

  • Several synchronous functions: Use the form like below.

    async with self.ctx.threadpool():
      result = other_module.fetch_information(value)
      result = other_module.process_the_result(result)
      result = other_module.process_it_some_more(result)
    
  • Several synchronous functions: Another way of handling this case is as shown below. See the Asphalt Framework for more advanced techniques like these.

    from asphalt.core import executor
    
    @executor  # This decorator transforms the function into a coroutine wrapping a thread
    def fetch_and_process(self, value):
      result = other_module.fetch_information(value)
      result = other_module.process_the_result(result)
      result = other_module.process_it_some_more(result)
      return result
    
    @action(
        accepts={
            'indicator': ('FQDN', 'IP Address'),
        },
        help='Gets some data.',
    )
    async def get_data(self, indicator):
        result = await self.fetch_and_process(value)  # See, it turns it into a coroutine
    

Debugging

Overview

Oftentimes, the code in the operation would be complex enough to warrant usage of command line utility tools. In case the operation fails to install, fails to execute, or returns unexpected results, the developer would want to debug this operation.

tq-plugin

An operation installation can fail via the GUI (as described here) for a variery of reasons. Some of these reasons are listed below:

  • You do not have proper directory ownership privileges

  • Bad third party dependency

  • Syntax errors in the code

The tq-plugin command-line tool is used to debug most of the operations workflow from the command-line. The tq-plugin command can:

  • List all plugins

  • List all plugins with their metadata information

  • List actions for a specific plugin

  • Install a specific plugin

  • Execute a specific plugin action

Note

The tq-plugin command is available in the Python 3.5 virtual environment on the ThreatQ appliance. To call it, execute the following command:

/opt/threatq/python/bin/tq-plugin

Below, we discuss how to debug a specific operation action.

List Plugins and Plugin Actions

The first step in debugging is to list plugins and actions specific to the plugin. For our example plugin, the output is shown below.

(python) [youruser@yourbox tq-op-example-plugin]$ tq-plugin list

  Plugin            Ver.    Description
  domaintools       0.0.3   Enrichment data made available by domaintools.com
  emerging_threats  1.0.1   Enrichment data from Emerging Threats IQRisk
  example-plugin    0.0.1   This is an Example Plugin.
  virustotal        0.0.2   Enrichment data made available by virustotal.com

(python) [youruser@yourbox tq-op-example-plugin]$ tq-plugin list-actions example-plugin
  Plugin          Action    Description
  example-plugin  get_data  Gets some data.
Create Test Input

As seen from above, the plugin name is example-plugin and the plugin action is get_data. In order to call this plugin action from the command-line, we create an input file with an indicator (that simulates the indicator for which this plugin is executed from the GUI) as shown below.

{
    "data": {
      "type": {
          "name": "FQDN"
      },
      "value": "google.com"
    }
}
Execute Plugin

Below is the tq-plugin command that you can execute from the command-line for the example plugin’s action.

(python) [youruser@yourbox tq-op-example-plugin]$  tq-plugin --indent 2 execute -i /tmp/input.json example-plugin get_data

This execution should return a valid data and markup object. Refer to the example code where the get-data action function returns these two objects). For this example plugin, we should see the following data object.

"data": {
  "completed": false,
  "id": 1,
  "title": "delectus aut autem",
  "userId": 1
}

This same data object is returned to the UI in the form of a Table (as described here and shown here) consisting of attribute key/value pairs. Getting this data object by invoking the tq-plugin command-line tool can indicate that the operation is working properly and is ready for deployment.

Note

While a proper tq-plugin execution is a good indication of basic operations workability, it is not a substitute for real testing. It is expected that the operations writer implements meaningful tests. The Scaffold Tool helps by creating a shell test driver class in a test file that the developer can further enhance.

Execute Plugin (Advanced Debugging)

If the execution does not result in a desired outcome, consider putting breakpoints in the code. In this section, a pudb workflow is described. pudb is a Python debugger that provides a console-based user interface.

  • Install pudb: If pudb is not installed, install it in the virtual environment as below.

    (python) [youruser@yourbox tq-op-example-plugin]$ sudo -u apache /opt/threatq/python/bin/pip install pudb
    
  • Locate Source Code: Python packages are installed in a directory called site-packages. In the virtual environment on the ThreatQ appliance, this directory can be found under /opt/threatq/python/lib/python3.5/. The specific module can be easily located by the plugin name. For our example, the source code is as below.

    (python) [youruser@yourbox tq-op-example-plugin]$ ls -al /opt/threatq/python/lib/python3.5/site-packages/tq_op_example_plugin/__init__.py
    -rw-r--r--. 1 apache apache 3134 Nov  1 19:41 /opt/threatq/python/lib/python3.5/site-packages/tq_op_example_plugin/__init__.py
    

    Note

    Directory ownership is controlled by apache user and group, reinforcing the fact again that all modules in the virtual environment have this default ownership. As a good practice, this convention must be followed for installing any additional modules. For example, the module pudb above was installed using the user apache.

  • Put Breakpoints: At any line in the code, the breakpoints can simply be put using the following two lines:

    import pudb
    pudb.set_trace()
    
  • Happy Debugging: After the breakpoints are put in the source code, save it, and execute the plugin from the command-line again. Refer to the pudb workflow on how to debug, step through the code, and determine any problems.

Bundling

The operation bundle is a standard Python whl package. To build the package, run the following command:

python setup.py bdist_wheel

Note

The whl package must be built on Python version 3.5+. If you are on a ThreatQ appliance, this version is available in a virtual environment. To drop in Python 3.5 on the ThreatQ appliance, run the following command:

source /opt/threatq/python/bin/activate

Note

Make sure you are in the project’s root directory when running the above command to build the package.

As the package is bundled, it gets dropped in the dist directory, as shown below for our example operation.

[youruser@yourbox tq-op-my-new-plugin]$ source /opt/threatq/python/bin/activate

(python) [youruser@yourbox tq-op-my-new-plugin]$ python setup.py bdist_wheel
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/tq_op_my_new_plugin
copying tq_op_my_new_plugin/__init__.py -> build/lib/tq_op_my_new_plugin
creating build/lib/tq_op_my_new_plugin/static
copying tq_op_my_new_plugin/static/plugin_logo.png -> build/lib/tq_op_my_new_plugin/static
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/tq_op_my_new_plugin
copying build/lib/tq_op_my_new_plugin/__init__.py -> build/bdist.linux-x86_64/wheel/tq_op_my_new_plugin
creating build/bdist.linux-x86_64/wheel/tq_op_my_new_plugin/static
copying build/lib/tq_op_my_new_plugin/static/plugin_logo.png -> build/bdist.linux-x86_64/wheel/tq_op_my_new_plugin/static
running install_egg_info
running egg_info
creating tq_op_my_new_plugin.egg-info
writing top-level names to tq_op_my_new_plugin.egg-info/top_level.txt
writing tq_op_my_new_plugin.egg-info/PKG-INFO
writing dependency_links to tq_op_my_new_plugin.egg-info/dependency_links.txt
writing entry points to tq_op_my_new_plugin.egg-info/entry_points.txt
writing manifest file 'tq_op_my_new_plugin.egg-info/SOURCES.txt'
reading manifest file 'tq_op_my_new_plugin.egg-info/SOURCES.txt'
writing manifest file 'tq_op_my_new_plugin.egg-info/SOURCES.txt'
Copying tq_op_my_new_plugin.egg-info to build/bdist.linux-x86_64/wheel/tq_op_my_new_plugin-0.0.1-py3.5.egg-info
running install_scripts
creating build/bdist.linux-x86_64/wheel/tq_op_my_new_plugin-0.0.1.dist-info/WHEEL

(python) [youruser@yourbox tq-op-my-new-plugin]$ ls dist/
tq_op_my_new_plugin-0.0.1-py3-none-any.whl

Installation and Execution

Installation

As described in the Bundling Operations section, an operation bundle is a Python Wheel Package. This package can be installed on a ThreatQ Appliance in one of the two possible ways.

UI Installation

The operation package can be installed via the GUI as well by navigating to Operations Management and clicking Install Operations.

Operation Installation via the UI

Once installed, the operation appears in the list of operations. Any additional user arguments can be configured at this point too.

The Installed Test Operation
Command Line Installation

Command line installation occurs in two steps:

  • Installation: The operation can be installed via the following artisan command:

    sudo -u apache php /var/www/api/artisan threatq:plugin-install <Path To whl File>
    

Note

To update an operation that is already installed, this command can be passed the --upgrade option

  • Syncing: This step syncs the installed plugin data with the database.

    sudo php /var/www/api/artisan threatq:plugin-sync
    

Execution

Once an operation is installed, it needs to be activated on the UI as shown below.

Test Operation Activated

This operation is then available for execution for any indicator type that was specified in the action decorator as described here.

The test operation available for execution

Clicking an action on the UI results in the execution of the operation. In many use cases, the user gets presented with a Table of Attribute key/value pairs as shown below.

Output of Operation Execution

The user has an option to select one or more of these attribute key/value pairs and add the selected attributes for the indicator in question.

Note

While the above describes the most common way an operation is used in the ThreatQ Appliance, the use of operations is not limited to this process. An operations package essentially returns a Python Coroutine object to the caller that the ThreatQ framework can then execute just like any standard Python Coroutine. An operation could be used to achieve any useful functionality for this reason.

Common Features and Functionalities

A number of features are widely used within the Pynoceros system and are available for use in Configuration Driven Feeds, Operations, command line tools, and more.

User Fields and Parameters

Feeds and Operations have the capability of defining User Fields and/or Parameters. These fields are presented to users in the ThreatQ UI and allow feed/operation designers to inject configuration options, credentials, and/or any other information that is needed for the feed/operation to operate.

Defining User Fields or Parameters with Value

The structure of a user field or parameter definition is relatively straightforward. Be it in YAML when designing a Configuration Driven Feed or Python when writing an Operation, User Fields and Parameters are a list of dictionaries containing metadata about each field. Display ordering of the fields is based on the order in which the field objects are declared in the list. Each field dictionary can have the following keys:

  • name (string, required): Name of the field for internal reference within a CDF or Operation.

  • label (string, optional): Friendly name for the field that is displayed in the UI.

  • description (string, optional): Longer text describing the purpose of the field exposed to users as a tooltip in the UI. Defaults to the value of label.

  • threat_collection (bool, optional): Denotes whether or not a drop down field with a list of data collections should be present. Defaults to False.

  • required (bool, optional): Denotes whether or not the field is required. Defaults to False.

  • default (any, optional): Specifies a default value for a field that is used unless overwritten in the UI. Defaults to None.

  • mask (bool, optional): Specifies whether a text input should be hidden as a password. This key is only valid for use on text fields. Defaults to False.

  • large (bool, optional): Denotes a large textarea field. Defaults to False.

  • boolean (bool, optional): Denotes a True/False checkbox. Defaults to False.

  • options (list, optional): A list that allows CDF writers to specify options for a field. The presence of the options key inherently denotes a select dropdown in the UI for the field. To specify a multiselect or radio choice in the UI, see the multiple and radio keys respectively. Option values can be declared either as strings or dictionaries:

    • Options as strings:

      • If one simply specifies a string value, that value is treated as both the display text for the option and the actual option value passed back for the field.

    • Options as Dictionaries:

      • value (any, required): Defines the value that is passed to the field if this option is chosen.

      • text (string, optional): Defines the display text for the option that is shown in the UI. If not defined, the display text defaults to value.

      • default (bool, optional): If True, denotes that this option value should be treated as the default selected option. In the case of normal select dropdowns and radio button choices, one should only specify a single option as the default. When declaring options for a multiselect, one can declare multiple options as default.

  • multiple (bool): Used in conjunction with an options list to denote a multiselect field if set to True. Defaults to False.

  • radio (bool): Used in conjunction with an options list to denote a radio button choice field if set to True. Defaults to False.

  • display_condition (list, optional): Denotes the conditions for hiding or showing the user field based on name-value pairs. Each pair represents a different and separately defined user field and its current value. One or more user fields must be given. If none of the conditions are met, then the user field will be hidden. If any condition is met, the user field will be shown.

    Below defines a name-value pair:
    • name (string, required): Denotes a user field’s name value, e.g. authentication_method

    • value (string, required): Denotes a user field current value, e.g. password

Warning

The required and boolean keys are mutually exclusive and cannot be used together.

Defining User Fields or Parameters without Value

New in ThreatQ Version 5.5.0:

We refactored this section in an attempt to reduce complexity of type declaration, as well as extend functionality to display only types. For all user field types, it is now possible to declare a singular type key/value pair to represent the intended user field type. An author may now use the example syntax type: radio to declare a radio button field.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
user_fields:
  - name: radio_field
    type: radio
    label: Radio Field
    description: Radio Field Description
    options:
      - value: 1
        text: Radio 1 Option 1
      - value: 2
        text: Radio 1 Option 2
  - name: username
    type: text
    label: Username
    description: Username of your account
    required: True
  - name: password_field
    type: password
    label: Password Field
    description: Password of your account
  - name: other_mask
    type: password
    label: Other Mask
    description: Other Mask
    required: True

Additionally, this change included new user field types. These new ‘display only’ types DO NOT take the traditionally required name entry, as they have no function in the integration other than display purposes.

Note

The following types only accept the syntax type: <type>. The type key is required.

On header types, the type and label keys are the only available keys. label defines the display value of the header field in the configuration UI.

No value may be derived from these types within the integration.

  • h1: A header 1 text field.

  • h2: A header 2 text field.

  • h3: A header 3 text field.

  • p: A paragraph text field.

  • hr: A horizontal rule. No other keys accepted.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
user_fields:
  - type: h1
    label: Header 1 label
  - type: h2
    label: Header 2 label
  - type: h3
    label: Header 3 label
  - type: p
    label: Paragraph label
  - type: hr  # No label is allowed for this type.

Defining User Field Fieldsets

New in ThreatQ Version 5.6.0:

Fieldsets are a way to group related fields together, along with an optional heading. By defining type: fieldset on a field, additional fields can be passed to the items key.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
user_fields:
  - type: fieldset
    name: my_fieldset
    label: Fieldset Label
    items:
      - name: fieldset_field
        type: text
        label: Fieldset Field
        description: Fieldset Field Description
        required: True
      - name: fieldset_field2
        type: text

These nested values can then be accessed by using dot notation and including the name of the parent item:

value: !expr user_fields.my_fieldset.fieldset_field | string

Process Log Levels

One of the common run arguments available for dynamo, tq-filter, and tq-feed, the --log-level flag specifies under which logging level the process should run. Like most logging systems, specifying a lower log level will include all logs from the levels above it as well.

The following log levels are available for use, with verbosity of logging increasing the lower the log level:

  • Critical

  • Error

  • Warning

  • Info

  • Debug

  • 9

  • 8

  • 7

  • 6

  • 5

  • 4

  • 3

  • 2

  • 1

Python Coding Style Guide

The following python file can be used as a general style guide when developing code for Pynoceros itself or one of the Integrations built on top of it.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
# This file contains some example Python and is intended to act as a Style Guide for Pynoceros, defining best practices
# for development and documentation within the code base.

# Each source file defined should contain the following Copyright Disclaimer at the top. The current year should always
# be kept up to date:

# -*- coding: utf-8 -*-
# --------------------------------------------------------------------------------------------------
# ThreatQuotient Proprietary and Confidential
# Copyright ©2022 ThreatQuotient, Inc. All rights reserved.
#
# NOTICE: All information contained herein, is, and remains the property of ThreatQuotient, Inc.
# The intellectual and technical concepts contained herein are proprietary to ThreatQuotient, Inc.
# and its suppliers and may be covered by U.S. and Foreign Patents, patents in process, and are
# protected by trade secret or copyright law.
#
# Dissemination of this information or reproduction of this material is strictly forbidden unless
# prior written permission is obtained from ThreatQuotient, Inc.
# --------------------------------------------------------------------------------------------------

# Import statements should be at the top of the file in three distinct groupings, each separated by a new line:
# 1.) Built in library imports
# 2.) Third Party library imports
# 3.) In-project imports (use relative imports!)

import json  # Comments on the same line as code must be have 2 spaces before their start
import typing as t  # By convention, we import typing as t to avoid collisions with other namespaces
from types import MappingProxyType

import addict

# Relative import from threatq.core.lib...
from ..threatq.core.lib.stix.stix2.attack_patterns import AttackPatterns
from ..threatq.core.lib.stix.stix2.campaigns import Campaigns
from ..threatq.core.lib.stix.stix2.courses_of_action import CoursesOfAction

# Warnings will be issued on imports that are not explicitly used. To resolve these in cases where the imports are
# necessary for a file but unused, one can assert them like so:
assert(any([json, addict]))


# There should be two blank lines between imports and a class definition
class ExampleClass:
    """
    A description block should be supplied for each Class, explaining the purpose of the class and its behavior unless
    the behavior is patently obvious from the code. Logically, the more complex a class, the more in-depth the
    documentation should be to explain said class. After the description block, one should stub out the class's
    Attributes and Args (parameters).

    When defining Args/Attributes, one should supply a type hint in parentheses after the arg / attribute name. If the
    arg / attribute is a complex type, one should use a directive like ``:class:`` or ``:mod:`` along with a leading
    ``~`` to create a relative reference to the complex type within the generated docs.

    One should define initialization parameters in the Args section under the _Class_ docstring if the Class's __init__
    does not warrant further documentation. If one intends to show the Class's "magic" members like __init__ within the
    generated docs, Args should be defined in the Class's __init__ docstring instead.

    One should define all public attributes for a Class under the Attributes section.

    Args:
        attack_patterns (:class:`~threatq.core.lib.stix.stix2.attack_patterns.AttackPatterns`): AttackPatterns instance
        campaigns (:class:`~threatq.core.lib.stix.stix2.campaigns.Campaigns`): Campaigns instance
        courses_of_action (:class:`~threatq.core.lib.stix.stix2.courses_of_action.CoursesOfAction`): CoursesOfAction
            instance

    Attributes:
        attack_patterns (:class:`~threatq.core.lib.stix.stix2.attack_patterns.AttackPatterns`): AttackPatterns instance
        campaigns (:class:`~threatq.core.lib.stix.stix2.campaigns.Campaigns`): Campaigns instance
        courses_of_action (:class:`~threatq.core.lib.stix.stix2.courses_of_action.CoursesOfAction`): CoursesOfAction
            instance
    """

    class_attr_a = 'Some Value'
    """
    Class attributes can be documented just like methods
    """
    attribute_base_map = MappingProxyType(
        dict(last_seen_as='Last Seen As', severity='Severity', threat_types='Threat Type', created_on='First Seen')
    )
    """
    Mappings should be placed at the top of the class
    """
    _private_class_attr = 'Super secret value'
    """
    Private class attributes can also be documented, but won't show in generated docs like private methods
    """

    def __init__(self, attack_patterns: AttackPatterns, campaigns: Campaigns, courses_of_action: CoursesOfAction):
        self.attack_patterns = attack_patterns
        self.campaigns = campaigns
        self.courses_of_action = courses_of_action

    # Specify a return type hint with ``-> Type:``. This can help clear up IDE warnings and help future developers.
    def do_something_special(self, arg_a: t.Any) -> t.Any:
        """
        Each public method should have a docstring detailing what its behavior / purpose is.

        When a method returns a value, one should specific a ``Returns`` section in the docstring that denotes the type
        of the returned value and give some description of what a user should expect to be returned.

        When a method raises an error in some case, one should specify a ``Raises`` section that denotes the type of
        each exception raised, followed by the condition under which said error is raised.

        Args:
            arg_a (Any): Some arg

        Returns:
            Any: Value that had something special done to it.

        Raises:
            TypeError: If arg_a is an ``int``
            TypeError: If arg_a is a ``str``
        """
        if isinstance(arg_a, int):
            raise TypeError(
                'Strings or parameters that exceed the 120 character line limit should be spaced on their own line'
            )
        elif isinstance(arg_a, str):
            raise TypeError(
                'Strings that exceed the 120 character line limit on their own should be spaced out even more like '
                'so. No need for plus signs or backslashes, Python can gracefully continue strings that have new lines '
                'in the middle.'
            )
        return self._some_private_method(
            arg_a,
            kwarg_a='A',
            kwarg_b='B',
            kwarg_c='C',
            kwarg_d='D',
            kwarg_e='E',
            kwarg_f='F',
            kwarg_g='G',
            kwarg_h='H',  # Trailing commas should be kept in when expanding arguments like this within method calls
        )

    @staticmethod
    def _some_private_method(arg_a: t.Any, **kwargs) -> bool:
        """
        By default, private methods beginning with an underscore are *not* included within generated documentation for a
        class. Thus, docstrings for private members of a class are not particularly necessary. If the logic/behavior is
        substantially complex and involved though, it may be a good idea to supply a docstring to help future developers
        in maintaining the codebase.

        Args:
            arg_a (Any): Some arg
            kwargs (str): Mapping of keyword arguments

        Returns:
            bool: True if ``arg_a`` and ``kwargs`` have the necessary attributes, False otherwise.
        """
        # When a line grows longer than 120 characters, it should be continued onto the next line by wrapping the entire
        # expression in parentheses. Line continuation with ``\`` should be explicitly **avoided**.
        return (getattr(arg_a, 'member_a', None) and getattr(arg_a, 'member_b', None) and
                kwargs.get('kwarg_c') and kwargs.get('kwarg_d') and kwargs.get('kwarg_e'))


# Files must have a single blank line at the end

Pynoceros-based Command Line Tools

Pynoceros provides several command line tools that function as part of the ThreatQ system, as well as user/developer oriented tools. In addition to details about these tools, this section also shows how to easily create new/custom Pynoceros- driven command line tools.

Included Tools

There are a few tools that are provided in order to assist in building out feeds and other components built on top of the Pynoceros framework.

TQ Scaffold

tq-scaffold

Create scaffolds (temporary structures) for ThreatQ-based projects, i.e., it builds a directory structure and instantiates its respective base class for the given argument. Requires additional customization to be practical.

  • command: Uses the PynocerosCommandBase class to build a custom Pynoceros Command, e.g. tq-update

  • filter: Uses the Filter class to build a custom CDF (Dynamo) filter

  • operation (plugin): Uses the APIPlugin class to execute specific action(s), e.g. Send data per IP

  • source: Uses the FeedSource class to build a custom CDF (Dynamo) source, which retrieves data and passes itself to the filter chain

tq-scaffold [OPTIONS] [command|filter|operation|source]

Options

-o, --output-directory <output_directory>

Directory in which to generate the project scaffold directory [default: current directory]

Arguments

SCAFFOLD_TYPE

Required argument

Available scaffolds: command, filter, operation, source

Useful references for more details about using the scaffold tool:

TQ Filter

tq-filter

Interactive Pynoceros filter tool

Note

tq-filter requires API access for custom objects and currently requires the --tq-client-id and --tq-client-secret flags

tq-filter [OPTIONS] COMMAND [ARGS]...

Options

--threatq-url <threatq_url>

ThreatQ Application URL

Default

https://threatq/

--threatq-client-id <threatq_client_id>

Required Client ID for ThreatQ API authentication

--threatq-client-secret <threatq_client_secret>

Required Client secret for ThreatQ API authentication

--log-level <log_level>

Logging level

Default

ERROR

--format <data_output_format>

Data output format

Default

json-pretty

Options

json | json-pretty | json-pretty-unsorted | yaml

--config-file <config_file>

File specifying defaults for these parameters (YAML/JSON, “_” for “-“, leading removed)

Default

/Users/carlhenderson/Library/Application Support/tq-filter/config.yaml

run

Pass input data through a filter and output the results.

tq-filter run [OPTIONS] YAML_FILTER_DEFINITION_STRING

Options

-f, --filter-definition-file <filter_definition_file>

File containing YAML filter definition.

-i, --input <infile>

Input file or “-” for standard input [default: standard input]

Arguments

YAML_FILTER_DEFINITION_STRING

Optional argument

Exactly one of –definition-file or YAML_FILTER_DEFINITION_STRING must be specified

TQ Feed

tq-feed

Interactive Pynoceros feed tool

Note: tq-feed requires API access for some operations, so may require the --tq-client-id and --tq-client-secret parameters.

tq-feed [OPTIONS] COMMAND [ARGS]...

Options

--log-level <log_level>

Logging level

Default

ERROR

--threatq-url <threatq_url>

ThreatQ Application URL

Default

https://threatq/

--threatq-client-id <threatq_client_id>

Client ID for ThreatQ API authentication

--threatq-client-secret <threatq_client_secret>

Client secret for ThreatQ API authentication

--format <data_output_format>

Data output format

Default

json-pretty

Options

json | json-pretty | json-pretty-unsorted | json-seq | json-seq-pretty | json-seq-pretty-unsorted | human

--config-file <config_file>

File specifying defaults for these parameters (YAML/JSON, “_” for “-“, leading removed)

Default

/Users/carlhenderson/Library/Application Support/tq-feed/config.yaml

analyze

Analyze a feed definition file and parse the output

tq-feed analyze [OPTIONS] DEFINITION

Options

--show-supplemental, --no-show-supplemental

Should we include the supplemental and fulfillment feeds in the output

Default

False

Arguments

DEFINITION

Required argument

inspect

Inspect a feed definition and output results.

Note: All the rules defined in the rules folder will be executed, outputting the result of each one.

tq-feed inspect [OPTIONS]

Options

-d, --definition <definition>

Required Standard feed definition file (YAML format) defining the feed that needs to be inspected.

run

Simulate a partial feed run and output results.

Be aware that though this tool does not generally require access to the ThreatQ API, it is needed in some cases as described below. Those cases will require providing the relative command line options (–threatq-url, –threatq-client-id, and –threatq-client-secret) to tq-feed via the command line or the configuration file.

One case that requires API access is if the “report” stage is desired to be included in the simulated run, as details of user-configurable object types that can be generated must be loaded from the API.

Note: as most feed names contain spaces, FEED_NAME should almost always be quoted per standard shell conventions.

tq-feed run [OPTIONS] FEED_NAME

Options

--from-stage <from_stage>

Stage from which to start the feed run. Note that data provided in the input file must be appropriate for this stage. [default: first non-empty choice present in definition]

Options

source | filters | report

--to-stage <to_stage>

Stage at which to end the feed (showing the output from that stage). [default: last consecutive non-empty choice present in definition after the from-stage (inclusive)]

Options

source | filters | report

-u, --user-fields-file <user_fields_file>

A file in YAML (or JSON) format representing a name/value mapping for any user-specified configuration fields defined by the feed definition. Definitions containing user fields will require API access if no user fields file is provided. User fields found in the API will be used, if present in the definition but omitted in the user fields file.

-r, --run-params-file <run_params_file>

Run params file.

-i, --input <infile>

File containing input data, or - for standard input. Note that this input data is ignored if the –from-stage is source (as a source generates input data). The prescribed format of the input data is JSON-Seq - see RFC 7464 (this format can be conveniently generated by this tool, as well, by using the –format json-seq (or json-seq-pretty) output option. Alternatively, one JSON object per line is also accepted. [default: standard input]

-d, --definition <definition>

Standard feed definition file (YAML format) defining the feed as referenced by FEED_NAME. The feed does not need to be fully defined (as it does for real ingestion in Dynamo). It is sufficient to define only the stages implied by the –from-stage and –to-stage options. If not specified, an attempt will be made to fetch the definition for the named feed from the ThreatQ API, but this does require the API access options be provided as described above.

-s, --since <since>

Since date for the feed run, defaults to None

-un, --until <until>

Until date for the feed run, defaults to None

-t, --trigger-type <trigger_type>

Run Trigger Type, options include scheduled and manual. Defaults to scheduled

Arguments

FEED_NAME

Required argument

workflow

Build a completed Workflow from a JSON blob of modular configs

tq-feed workflow [OPTIONS]

Options

-f, --file <file>

JSON file path

-h, --hex <hex_>

Hex representation of the JSON

-b, --b64 <b64>

Base64 representation of the JSON

-s, --string <string>

String representation of the JSON

Building Command Line Tools Using Pynoceros

Overview

In addition to a rich set of programmatic tools for working with and ingesting threat data and interacting with the ThreatQ API, the Pynoceros library also provides a strong foundation for easily building command line tools on top of those capabilities.

Until a tutorial is written, please see see the threatq.core.tools.base documentation for details, as well as tq_filter for an example.

Base Classes for Creating Pynoceros-driven CLI Tools: threatq.core.tools.base

This module provides base classes used to easily create Pynoceros-based CLI tools. These base classes provide CLI definition and parsing using click, automatic setup of a standard Pyncoeros environment with resources made available, such as HTTP sessions for different purposes and a Jinja2 templating environment, Asphalt asynchronous execution of a subclass-defined coroutine with access to these resources and the parsed CLI arguments, and handling of errors raised during it’s execution.

class threatq.core.tools.base.PynocerosCommandComponent(*, name: Optional[str] = None, config_param_sets: Iterable[str] = (), kwargs: Optional[MutableMapping] = None, config: Optional[MutableMapping] = None)

This Asphalt component sets up a Pynoceros execution environment and handles execution of a (sub-)command. It is used internally, and implements many of the interface semantics described in other classes in this module.

class threatq.core.tools.base.PynocerosCommandBase

The base class for below classes. These arguments, methods, and attributes apply to those classes.

name

the command name, as used from the command line. If not specified in a subclass’s definition, it is automatically derived from the class name (converting CamelCase to lowercase_underscore_separated).

Type

str

click_cls

Normally Command for PynocerosCommand and Group for PynocerosCommandGroup, but can be overridden to specify an alternative click class to be instantiated to handle CLI parsing duties.

Type

subclass of BaseCommand

click_kwargs

if specified, additional keyword arguments to be passed when instantiating the click_cls class.

Type

dict

config_param_sets

There are groups of CLI options that are standard across Pynoceros apps, and are specifically referenced as part of setting up a Pynoceros execution environment, such as credentials for the ThreatQ API. These sets of parameters are predefined and named, but not all commands need all of these parameters (e.g. a command may not need API access at all). This attribute explicitly defines the param sets that will be added. Current parameter sets and the parameters they imply are:

  • tq_api:
    • --threatq-url

    • --threatq-client-id

    • --threatq-client-secret

  • proxies
    • --http-proxy

    • --https-proxy

  • logging
    • --log-level

The values for these paramaters will be available to the running application in the configuration

object at self.ctx.config.

Type

tuple[str]

params

This attribute defines additional, command-specific click options and arguments for the command’s CLI. These values will be passed as keyword arguments to the run() coroutine.

Type

tuple[Parameter]

config_file

Pynoceros commands have the option of supporting a configuration file, from which default values for any or all command line parameters can be loaded. If the file exists, it should contain a YAML or JSON representation of a dictionary where the keys are the parameter names with leading dashes removed and all other dashes translated to underscores. The default config_file value of False turns off this support. If True, the default will be config.yaml in a user-specific directory will be automatically determined based on the platform where the command is executing (the resulting location is visible in the command’s --help output). If a string, it explicitly defines a location for the configuration file.

If support is enabled, an additional --config-file parameter is implicitly added to the command to allow the caller to specify a location, as well.

It should be noted that, since –config-file is a CLI parameter, and configuration files can specify values for all parameters, a configuration file can itself delegate to a different configuration file. This situation is honored, with the values from the indirectly-referenced file having lower priority than those of the original one.

Type

str|bool, optional

data_output

True enables automatic formatting of data returned by the command’s run() method. If False, the other data-output-formatting related attributes described here (except data_output_command_errors) are ignored.

Type

bool

data_output_command_errors

True enables automatic formatting of either a CommandError’s original_exception, if applicable, or a CommandError itself as data (see CommandError.as_data()) for a CommandError raised by the command’s run() method. If False, the CommandError’s msg attribute is echoed. Enabling this is useful for commands that are expected to be utilized programmatically, as the data format for stdout and stderr can match.

Type

bool

data_output_formats

A list of data output format names to be supported by the command. If more than one, a --format parameter is implicitly added to the command’s CLI to allow callers to specify. This names refer to formats that have been registered on, or inherited by, the command class (see the register_data_output_format() decorator). Built-in formats registered on this base class are:

  • json (which automatically implies the availability of json-pretty, as well)

  • json-pretty (multiline JSON with four-space indentation and sorted keys)

  • yaml

  • human (enables the capability of specifying an output template - not registered on this base class)

Type

tuple[str]

default_data_output_format

specifies the default data format name (visible in --help output). If None, the first member of data_output_formats becomes the default.

Type

str, optional

human_data_output_template

specifies the template to use if human output format is requested.

Type

str, optional

data_output_format_descriptor

an object exposing details about the active data output format.

Type

DataOutputFormatDescriptor

Note

config_param_sets and all data-output-related attributes are currently supported only on the root command object. E.g. the top-level “group” object if the command has sub-commands.

async format_data(data, fmt: Optional[str] = None)

Returns the result of formatting the specified data.

Parameters
  • data – The data structure to be formatted.

  • fmt (str, optional) – the name of the data output format to use in formatting the data before echoing. See register_data_output_format(). Defaults to the format specified as data_output_format.

Raises

KeyError – indicates that no data output format matching the passed fmt identifier was registered.

make_command()click.core.Command

Generates a click command object based on the attributes of the class. User-facing commands are generally exposed by following the class definition with the pattern my_command = MyCommand().make_command() and referencing this object’s location in a console_scripts entry point in the package’s setup.py.

async output_data(data, fmt: Optional[str] = None, **kwargs)

Formats the passed data and echoes it. Arguments and semantics are as with format_data(), with the addition of all keyword arguments supported by click.secho(), which will be passed through to it.

classmethod register_data_output_format(name: str, *, implies: Iterable[str] = (), default: Optional[str] = None, seq: bool = False, pass_ctx: bool = False)

This decorator method is applied to a data-output-formatter function to register it as an available format for the class and its subclasses. The decorated function will be passed the output data as its only argument (usually - see pass_ctx) and is expected to return a string for screen output.

Parameters
  • name – The name of the format.

  • implies – Optional list of additional formats that should be implicitly exposed whenever this format is exposed in data_output_formats.

  • default – An alternative default format that is to be preferred whenever this format is set as default. An example of this is the built-in json format. json-pretty is always the preferred default format when json is specified (so compact JSON output always requires an explicit :code`–format json` argument. The rationale here is that defaults should always be oriented for humans as much as possible, as machines are generally more patient about having to pass explicit arguments.

  • seq – Indicates that the output formatter is intended for sequenced output. I.e. that in most cases the command code should call output_data() as it progresses rather than simply returning data for output upon completion. This is an advertisement of intent for the command code to inspect via self.data_output_format_descriptor.seq, and is not enforced at all (both methods of data output are available regardless of this flag).

  • pass_ctx – If true, the Pynoceros execution context will be additionally be passed to the formatter function via the ctx keyword argument. Few formatter functions need this, but all of the resources created by Pynoceros, including the configuration object itself, are exposed on this context if necessary.

class threatq.core.tools.base.PynocerosCommand

Bases: threatq.core.tools.base.PynocerosCommandBase

This class should be subclassed to define an executable (sub-)command.

epilog

If present, this text is appended at the bottom of --help output.

Type

str, optional

short_help

More concise help text to be used in a sub-command list in --help output when this command is part of a group.

Type

str, optional

click_cls

alias of click.core.Command

abstract async run(*args)

This asynchronous coroutine method encapsulates the primary functionality of this class. It must be overridden to implement the code that should actually be executed when running the command. It will receive all passed CLI arguments defined in params or those of any PynocerosCommandGroup objects owning it. Within its body, Pynoceros’ execution context object will be available as self.ctx, with the configuration object at self.ctx.config, as well as all other available resources.

Any text output intended for human consumption should normally be output directly (click utilities such as echo(), secho(), style(), and progressbar() are excellent for these purposes). This allows incremental output, which is appreciated by humans, especially during longer-running commands.

Machine output, on the other hand, usually requires the completion of a data structure for serialization. These needs are fulfilled by the built-in support for data output formatting (described in more detail in PynocerosCommandBase). If this support is enabled, run() need only return a data structure, which will be formatted as determined from class defaults and user CLI specifications.

Note

Returning data for formatting does not preclude incremental/progress output for humans during the execution of :meth:run, which can, in fact, be very beneficial. Be aware that all such output should be to stderr instead of stdout so as not to interfere with parsing of the data output by any receiving process. If using click.echo() or click.secho() for output, this is simple matter of passing err=True.

Error handling should be handled by raising a CommandError or a custom subclass of it. See there for more.

Returns: any data structure intended for formatted output, or None.

class threatq.core.tools.base.PynocerosCommandGroup

Bases: threatq.core.tools.base.PynocerosCommandBase

This allows the definition of a sub-command oriented CLI in the style of git. Typical usage is easiest to illustrate with an example:

class MainCommand(PynocerosCommandGroup):
    params = (...)

@MainCommand.command
class SubCommand1(PynocerosCommand):
    params = (...)

    async def run(self, ...):
        ...

@MainCommand.command
class SubCommand2(PynocerosCommand):
    params = (...)

    async def run(self, ...):
        ...

main_command = MainCommand().make_command()
command_classes

a list of all sub-commands (and sub-groups) contained within the group.

Type

list[PynocerosCommandBase]

click_cls

alias of click.core.Group

classmethod command(command_cls: Type[threatq.core.tools.base.PynocerosCommandBase])Type[threatq.core.tools.base.PynocerosCommandBase]

This decorator should be applied to other command classes to add them to the group. See the class documentation above for an example. Note that other classes are usually PynocerosCommand subclasses but can, in fact, be other PynocerosGroupCommand subclasses, if a multilevel hierarchy of sub-commands is desired.

exception threatq.core.tools.base.CommandError(msg: Optional[str] = None, exit_code: Optional[int] = None, *, secho_kwargs: Optional[Mapping] = None, original_exception: Optional[BaseException] = None)

This class (and any custom subclasses) represent a runtime error condition that the command code has detected and needs to present to the end user, but it not a bug in the command code itself. Raising it from inside a run() coroutine will cause its error message to be echoed to the user (via stderr by default), and the process to exit with the specified exit code. As it is intended to be seen by the end user, and explicitly does not indicate a bug, no stack trace is produced. The error message should be sufficiently explanatory.

Note

In the case that an exit with a specific exit code is desired, but echoing an error message is not, CommandError can be raised thusly: raise CommandError('', exit_code=5)

Note

Any other exception type that is raised is considered to be an unhandled error that does indicate a bug. It therefore triggers a process exit accompanied by a stack trace, as per normal Python behavior.

Parameters
  • msg (str) – Error text for presentation to the user. The default value is a generic message.

  • exit_code (int) – The exit code to be returned at process exit.

  • secho_kwargs (dict) – The error message is echoed via click.secho(). Any keyword arguments specified in secho_kwargs will be passed to it. The default results in red text echoed to stderr.

  • original_exception (BaseException) – An exception instance. If the CommandError is being raised due to another expected exception, the original exception can be accessed from where the CommandError is caught by accessing its original_exception attribute.

It’s often a good practice to create subclasses of CommandError for specific error conditions. In such subclasses, defaults for all of the above arguments can be specified using class attributes of the same names.

as_data()dict

Parses either the instance’s original_exception, if applicable, or the instance itself. Returns a dict containing the following keys:

  • type (str): the class name of the chosen exception instance

  • msg (str): the result of friendly_str if the chosen exception has a friendly_str method, else the result of the exception’s __str__ method

API Documentation

Here you can find API documentation for Pynoceros modules.

Feeds

Base Feed Classes

class threatq.dynamo.feeds.BaseFeed(*args, **kwargs)

Bases: object

Base class for feeds.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

run_class

alias of threatq.dynamo.feeds.common.FeedRun

class threatq.dynamo.feeds.base.DynamoFeedElement(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.core.lib.utils.ErrorManaging, threatq.dynamo.feeds.base.FeedTemplateContextMixin, threatq.dynamo.base.DynamoElement

Base class for a dynamo feed element, e.g. source, publisher, filters.

class threatq.dynamo.feeds.base.FeedTemplateContextMixin(*args, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Bases: object

Class used for creating a feed template context with mix-in classes. Context(s) are “hubs” through which resources are shared between components.

__init__(*args, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

Feed Types

class threatq.dynamo.feeds.Feed(connector: threatq.core.lib.connectors.Connector, ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.templates.TemplateRenderMixin, threatq.dynamo.feeds.BaseFeed

Class used for creating a feed. There are four different types of feeds that can be created: Primary (default), Supplemental, Fulfillment, Action.

Parameters
  • connector (Connector) – the connector instance for the feed

  • ctx (Context) – Context instance.

__init__(connector: threatq.core.lib.connectors.Connector, ctx: threatq.core.lib.asphalt.Context)

Initialize self. See help(type(self)) for accurate signature.

__str__()str

Return str(self).

class threatq.dynamo.feeds.SupplementalFeed(parent_feed: threatq.dynamo.feeds.Feed, name: str, definition: Mapping)

Bases: threatq.core.lib.asphalt.ThreatQComponentMixin, threatq.core.lib.asphalt.ContextRefMixin, threatq.core.lib.templates.TemplateRenderMixin, threatq.dynamo.feeds.BaseFeed

Supplemental feeds, while different from primary feeds, work functionally the same as feeds, but the difference is they are used for fetching data related to a parent feed. Furthermore, they cannot be run independently, are not listed in the UI, and cannot be externally triggered.

Parameters
  • parent_feed (Feed) – The feed’s parent

  • name (str) – The name of the feed

  • definition (t.Mapping) – the feed’s definition

__init__(parent_feed: threatq.dynamo.feeds.Feed, name: str, definition: Mapping)

Initialize self. See help(type(self)) for accurate signature.

run_class

alias of threatq.dynamo.feeds.common.SupplementalFeedRun

class threatq.dynamo.feeds.NonAPIUpdatingFeed(connector: threatq.core.lib.connectors.Connector, ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.dynamo.feeds.Feed

Feeds that do not update the API or have a FulfillmentManager.

Parameters
  • connector (Connector) – the connector instance for the feed

  • ctx (Context) – Context instance.

__init__(connector: threatq.core.lib.connectors.Connector, ctx: threatq.core.lib.asphalt.Context)

Initialize self. See help(type(self)) for accurate signature.

Feed Runs

class threatq.dynamo.feeds.common.FeedRun(feed: threatq.dynamo.feeds.BaseFeed, trigger_type: str, *, uuid: Optional[uuid.UUID] = None, since: Optional[Union[arrow.arrow.Arrow, str]] = None, until: Optional[Union[arrow.arrow.Arrow, str]] = None, source: Union[bool, Any] = True, source_request_recorder: Optional[Union[bool, threatq.dynamo.feeds.sources.base.RequestRecorder]] = True, stages: Optional[Iterable[Union[str, threatq.core.lib.pipeline.PipelineSegment]]] = None, lock: Optional[asyncio.locks.Lock] = None, **kwargs)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.dynamo.feeds.base.FeedTemplateContextMixin, threatq.core.lib.templates.TemplateRenderMixin

Class for processing a feed run.

Parameters

ctx (threatq.core.lib.asphalt.Context) – The feed’s context object

__eq__(other: threatq.dynamo.feeds.common.FeedRun)bool

Return self==value.

__hash__()

Return hash(self).

__init__(feed: threatq.dynamo.feeds.BaseFeed, trigger_type: str, *, uuid: Optional[uuid.UUID] = None, since: Optional[Union[arrow.arrow.Arrow, str]] = None, until: Optional[Union[arrow.arrow.Arrow, str]] = None, source: Union[bool, Any] = True, source_request_recorder: Optional[Union[bool, threatq.dynamo.feeds.sources.base.RequestRecorder]] = True, stages: Optional[Iterable[Union[str, threatq.core.lib.pipeline.PipelineSegment]]] = None, lock: Optional[asyncio.locks.Lock] = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__repr__()str

Return repr(self).

ack_milestone(milestone, *, notify_api=True)

Validate and process the given milestone for the feed run.

Parameters
  • milestone (str) – The milestone’s name

  • notify_api (bool) – Whether or not to send a message on the context’s MessageManager

kill(failed: bool = True)

Kill the FeedRun. By default, the killed FeedRun is also marked as failed.

Parameters

failed (bool) – Optional, defaults to True. Whether the FeedRun being killed failed.

new_pipeline_segment(stage: Union[str, threatq.core.lib.pipeline.PipelineSegment])

Using the given stage, create a new segment for the pipeline.

Parameters

stage – The feed’s stage name as a string or PipelineSegment

new_pipeline_source(source: Union[bool, Any] = True)

Add a new pipeline source or set the pipeline source.

Parameters

source – If True, create a new source; Otherwise, set the pipeline source

notify(event, description: Optional[str] = None, *, notify_api=True)

Adds a message to the pipeline.

Parameters
  • event – an object to be converted into a pipeline message

  • description (str) – additional information about the message

  • notify_api (bool) – Whether or not to send a message on the context’s MessageManager

notify_err(descriptor: Optional[Union[Exception, str]] = None)

Adds an error level message to the pipeline.

Parameters

descriptor – additional information about the exception or error

class threatq.dynamo.feeds.common.SupplementalFeedRun(feed: threatq.dynamo.feeds.BaseFeed, *, parent: Optional[threatq.dynamo.feeds.common.FeedRun] = None, trigger_type: str = 'supplemental', run_params: Mapping = {}, **kwargs)

Bases: threatq.dynamo.feeds.common.FeedRun

Class for processing a feed run of a supplemental type.

Parameters
  • feed (BaseFeed) – The supplemental feed

  • parent (FeedRun) – The parent of the feed

  • trigger_type (str) – how the feed was started, e.g. manual, scheduled, supplemental

__init__(feed: threatq.dynamo.feeds.BaseFeed, *, parent: Optional[threatq.dynamo.feeds.common.FeedRun] = None, trigger_type: str = 'supplemental', run_params: Mapping = {}, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

notify(event, description: Optional[str] = None, *, notify_api=False)

Adds a message to the pipeline.

Parameters
  • event – an object to be converted into a pipeline message

  • description (str) – additional information about the message

  • notify_api (bool) – Whether or not to send a message on the context’s MessageManager

Feed Components

class threatq.dynamo.feeds.FeedManager(logname: Optional[str] = None)

Bases: threatq.core.lib.asphalt.Component, threatq.core.lib.asphalt.ContextRefMixin

A feed manager manages a feed’s run schedule.

Parameters

logname (str) – The supplemental feed’s log name

__init__(logname: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

async start(ctx: threatq.core.lib.asphalt.Context)

Perform any necessary tasks to start the services provided by this component.

In this method, components typically use the context to:
  • add resources and/or resource factories to it (add_resource() and add_resource_factory())

  • request resources from it asynchronously (request_resource())

It is advisable for Components to first add all the resources they can to the context before requesting any from it. This will speed up the dependency resolution and prevent deadlocks.

Parameters

ctx – the containing context for this component

async workflow_fetch_manual_on_event(event: threatq.core.lib.messaging.base.MessageEvent)None

Handle workfow fetch manual events from the dynamo_control queue.

This function is called when a signal is received indicating that a manual workflow run has been requested. Based on the workflow ids that it receives as part of the event message, it creates new workflow run objects (FeedRun) that use a different source than the one defined. That new source used for all of the workflows is determined by the single api query that is passed as part of the event message. The workflows are started concurrently and are awaited until they are all completed.

Parameters

event – event containing message data passed for the “workfow fetch manual” message type

Raises

ValueError – when the event message does not contain the required workflow ids or api query

class threatq.dynamo.feeds.FeedRunScheduler(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin

A feed run scheduler runs a feed and determines when it should run again in the future. The scheduler sleeps until there is a new feed run or a feed run was completed.

Parameters
  • ctx (Context) – Context instance.

  • logname (str) – The supplemental feed’s log name

__init__(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

class threatq.dynamo.feeds.FulfillmentManager(feed: threatq.dynamo.feeds.Feed, name: str)

Bases: threatq.core.lib.asphalt.ContextRefMixin, threatq.core.lib.logging.InstanceLoggingMixin

A fulfillment manager handles a feed that is not listed in the UI and cannot be directly triggered.

Parameters
  • feed (Feed) – Feed to be run

  • name (str) – The fulfillment feed’s name

__init__(feed: threatq.dynamo.feeds.Feed, name: str)

Initialize self. See help(type(self)) for accurate signature.

Feed Segments

class threatq.dynamo.feeds.common.FeedRunPipelineSegment(feed_run: threatq.dynamo.feeds.common.FeedRun, *args, **kwargs)

Bases: threatq.core.lib.pipeline.PipelineSegment

Base class for a feed run pipeline segment

Parameters

feed_run (FeedRun) – The feed run of the pipeline segment

__init__(feed_run: threatq.dynamo.feeds.common.FeedRun, *args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

class threatq.dynamo.feeds.common.FeedRunFirstSegment(feed_run: threatq.dynamo.feeds.common.FeedRun, *args, **kwargs)

Bases: threatq.dynamo.feeds.common.FeedRunPipelineSegment

A pipeline segment to be added as the first segment in a feed run.

handle_message(message: threatq.core.lib.pipeline.PipelineMessage)

Handle a received message. It can be overridden to provide specific handling, but the common case is that it’s desirable to pass messages through to the next processor, which is what the default implementation does.

Note

As above, any exceptions raised are ignored.

Parameters

message – the received message

Returns

Any object.

class threatq.dynamo.feeds.common.FeedRunPrePublisher(feed_run: threatq.dynamo.feeds.common.FeedRun, *args, **kwargs)

Bases: threatq.dynamo.feeds.common.FeedRunPipelineSegment

A pipeline segment to be added before starting the publish stage.

handle_value(value)

Handle a received value. The default implementation simply returns the original value so that it can be transparently forwarded, but it is normally overridden.

Note

Any exceptions raised are ignored, so must be appropriately and fully handled within this method.

Parameters

value – the received value

Returns

Any object.

Filters

Filter Overview

The following Filters are available for use within the filters section of a CDF definition.

Base Filters

class threatq.dynamo.feeds.filters.base.Filter(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.base.DynamoFeedElement

Base filter class.

__call__(items: AsyncIterable)

Asynchronous generator call method of a Filter. Will call the filter’s apply() method before yielding results.

Parameters

items – an asynchronous generator yielding a value ancestry tuple for each value to be processed. Pipeline messages can be interspersed with these ancestries - they are passed through verbatim.

Yields

values yielded by the filter’s apply() method, interspersed with any received pipeline messages

property binary

Property representing whether this filter strictly requires its incoming value to be binary. These values are introspected by FeedFilters in order to inform a TQFeed or TQFilter run as to which file opening mode should be passed into the IOSource.

on_finish()

This hook may be defined by subclasses needing to be notified upon exhaustion of source values. It can be a function, a coroutine, or an asynchronous generator. Any values or pipeline messages returned (except None) or yielded will be sent through the rest of the filter chain.

class threatq.dynamo.feeds.filters.base.TransformFilter(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.Filter

Simple filter which transforms one value to another.

abstract apply(value_ancestry: Tuple[Any, ])AsyncIterator[Any]

An asynchronous generator that wraps transform(), calling it as necessary.

Override as appropriate for each subclass so that results from transform() are presented as an asynchronous generator. This allows transform() itself to be a simpler implementation to ease creation of filters.

Parameters

value_ancestry – a tuple containing the value being processed as its first item (often referred to as simply “the value” in these documents), and the value that was passed into this filter’s parent filter, grandparent filter, etc., if any. This is referred to as a “value ancestry” or “value ancestry tuple” elsewhere.

call_transform(value_ancestry: Tuple[Any, ], **kwargs)

The transform methods of a TransformFilter should normally be called via this wrapper. It provides calling behavior that is adaptive to the signature of the transform, such that those calling for certain parameters will have them calculated and passed automatically:

  • args: The filter’s arguments will be traversed, parsing any Template and TemplateExpression objects pre-rendered and passed.

  • value_ancestry: the value ancestry corresponding to the value being processed. This is convenient, for example, when the transform calls the apply() method of another filter.

  • parent_values: similar to value_ancestry, but with only the parent members (i.e. the first value (the current value) is omitted. A common case where this is needed is if the filter does any template rendering of data structures sourced from a definition, as they may reference parent_values.

Parameters
  • value_ancestry – a value ancestry tuple

  • **kwargs – optional keyword arguments to be passed through to the transform.

Returns

The resulting value from the filter

abstract transform(value: Any)Any

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.base.FunctionFilter(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.TransformFilter

Simple filter which calls transform() as an async_generator.

apply(value_ancestry: Tuple[Any, ])

This override is appropriate for a transform() that is a simple function or a coroutine.

Parameters

value_ancestry – a value ancestry tuple

class threatq.dynamo.feeds.filters.base.AsyncGeneratorFilter(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.TransformFilter

Filter which allows for transform() methods that are async_generators.

apply(value_ancestry: Tuple[Any, ])

This override is appropriate for a transform() that is an asynchronous generator.

Parameters

value_ancestry – a value ancestry tuple

abstract transform(value: Any)AsyncIterator[Any]

This provides for yielding multiple results for a single transformed value. Semantics are the same as TransformFilter.transform(), except that it should be overridden with an asynchronous generator yielding results instead of returning one.

class threatq.dynamo.feeds.filters.base.ConditionalFunctionFilter(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base._ConditionalFilterMixin, threatq.dynamo.feeds.filters.base.FunctionFilter

Filter which calculates a condition attribute before transforming, calling the transform() method if the condition resolves to true and negative_transform() otherwise.

Parameters

condition (TemplateExpression) – condition on which the ConditionalFunctionFilter will transform.

condition

condition on which the ConditionalFunctionFilter will transform.

Type

TemplateExpression

negative_transform(value: Any)Any

Transform method used in place of transform() if the condition is not satisfied. Semantics are as with to transform().

Chain Filter

The Chain Filter allows definition writers to group multiple filters together and treat them as a single filter.

class threatq.dynamo.feeds.filters.chain.Chain(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.Filter

This is an aggregating filter that accepts a chain of subfilters. Running it is the same as running the value(s) through the subfilter chain in succession. For information on using this filter in a definition, see Chain Filter.

apply(value_ancestry: Tuple[Any, ])collections.abc.AsyncIterable

Apply the chain filter. Runs each filter specified via the filters arg and returns back the result to the filter chain.

Parameters

value_ancestry (ValueAncestry) – Value ancestry for this Chain filter.

Returns

Async generator of result values.

Return type

AsyncIterable

property binary

Property representing whether this filter strictly requires its incoming value to be binary. These values are introspected by FeedFilters in order to inform a TQFeed or TQFilter run as to which file opening mode should be passed into the IOSource.

entry_points = ('chain',)

Compression Filters

Filters for compressing or decompressing data for various file formats.

class threatq.dynamo.feeds.filters.compression.Gunzip(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter accepts a StreamReader or bytes containing gzipped data and returns the uncompressed data as a bytearray. For information on using this filter in a definition, see Gunzip Filter.

entry_points = ('gunzip',)
async transform(value: Union[aiohttp.streams.StreamReader, bytes], args: Mapping)bytearray

Transforms the gzipped data contained in the incoming StreamReader or bytes into a bytearray. If the bytearray is meant to represent a str, pass the returned bytearray into the Str filter.

Parameters
  • value (StreamReader | bytes) – StreamReader or bytes containing gzipped data

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The uncompressed data

Return type

bytearray

Data Structure Filters

Filters that apply or modify various data structure objects.

class threatq.dynamo.feeds.filters.data_structures.Deduplicate(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter will deduplicate a list of objects. For information on using this filter in a definition, see Dedupe Filter.

entry_points = ('dedupe',)
transform(value: Iterable)

Deduplicate an incoming list of values.

Parameters

value (Iterable) – an iterable value

Returns

value having been deduplicated

Return type

list

class threatq.dynamo.feeds.filters.data_structures.Each(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.Filter

This filter is configured with a sub-filter, (which may be a chain), which it applies to each member of an iterable value. For information on using this filter in a definition, see Each Filter.

apply(value_ancestry: Tuple[Any, ])

For each item in the current value, (value_ancestry[0]), apply the specified sub-filter. The current value should usually be a list, though one can loop anything that is loopable.

Parameters

value_ancestry (ValueAncestry) – Tuple of filter chain ancestry values.

Returns

The value transformed by the having specified sub-filter applied to each of its members.

Return type

Any

entry_points = ('each',)
class threatq.dynamo.feeds.filters.data_structures.EachValue(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.data_structures.Each

Like Each, but applies the specified sub-filter to each value in an incoming dictionary value. For information on using this filter in a definition, see Each Value Filter.

apply(value_ancestry: Tuple[Any, ])

For each key/value pair in the current value, (value_ancestry[0]), apply the specified sub-filter to the value. The keys of the current value will be maintained. The incoming value should be a dictionary.

Parameters

value_ancestry (ValueAncestry) – Tuple of filter chain ancestry values.

Returns

The dictionary value transformed by the having specified sub-filter applied to each of its values.

Return type

ValueAncestry`

entry_points = ('each-value',)
class threatq.dynamo.feeds.filters.data_structures.Enumerate(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.AsyncGeneratorFilter

This filter accepts an iterable and yields an index, value pair for each of its values. The index is the corresponding key if the input is a mapping (similar to items()), or otherwise an integer corresponding to the value’s position, starting at zero (similar to enumerate()). For lists in particular, this is the same as the value’s index in the list. For information on using this filter in a definition, see Enumerate Filter.

entry_points = ('enumerate', 'items', 'mapping-pairs')
transform(value: Iterable)

This provides for yielding multiple results for a single transformed value. Semantics are the same as TransformFilter.transform(), except that it should be overridden with an asynchronous generator yielding results instead of returning one.

class threatq.dynamo.feeds.filters.data_structures.FilterMapping(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.Filter

This filter is configured with a dictionary mapping of field names keyed to sub-filters. It transforms incoming mappings by applying to each member value the corresponding filter. Any members of the mapping that have no filter specified are left untouched. The Filter Mapping Filter must be configured with exactly one filter per field, though that filter may be a Chain. For information on using this filter in a definition, see Filter Mapping Filter.

apply(value_ancestry: Tuple[Any, ])

For each specified key on the incoming value, apply the specified sub-filter to said key’s value.

Parameters

value_ancestry (ValueAncestry) – Tuple of filter chain ancestry values.

Returns

The value transformed by the having the specified sub-filters applied to the specified keys.

Return type

Any

entry_points = ('filter-mapping',)
class threatq.dynamo.feeds.filters.data_structures.Flatten(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter flattens out nested lists or generators. It takes a parameter of depth (defaulting to infinite) that will limit how deep the flattening should go. For information on using this filter in a definition, see Flatten Filter.

entry_points = ('flatten',)
transform(value: Iterable, args: Mapping, depth=None)Iterable

Flatten an incoming list or iterable value.

Parameters
  • value (Iterable) – Incoming iterable value.

  • args (MutableMapping) – Mapping of field arguments for this filter

  • depth (int | float) – Depth level that the flattening should go to. Defaults to infinite.

Returns

The list/iterable value now flattened.

Return type

Iterable

class threatq.dynamo.feeds.filters.data_structures.ListItems(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter accepts a dict and returns a list of tuples containing the dict’s key-value pairs. For information on using this filter in a definition, see List Items Filter.

entry_points = ('list-items',)
transform(value: dict)List[Tuple[Any, Any]]

Transforms the incoming dictionary into a list of tuples containing the dictionary’s key-value pairs.

Parameters

value (dict) – Incoming dictionary value

Returns

The dictionary’s key-value pairs

Return type

list

class threatq.dynamo.feeds.filters.data_structures.ListKeys(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter accepts a dict and returns a list of the dict’s keys. For information on using this filter in a definition, see List Keys Filter.

entry_points = ('list-keys',)
transform(value: dict)List[Any]

Transforms the incoming dictionary into a list of the dictionary’s keys.

Parameters

value (dict) – Incoming dictionary value

Returns

The dictionary’s keys

Return type

list

class threatq.dynamo.feeds.filters.data_structures.ListValues(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter accepts a dict and returns a list of the dict’s values. For information on using this filter in a definition, see List Values Filter.

entry_points = ('list-values',)
transform(value: Mapping)List[Any]

Transforms the incoming dictionary into a list of the dictionary’s values.

Parameters

value (dict) – Incoming dictionary value

Returns

The dictionary’s values

Return type

list

class threatq.dynamo.feeds.filters.data_structures.MapItems(*field_names, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter is configured with a list of field names and transforms an incoming iterable value into a dictionary, mapping a field name to a member of the incoming iterable value, in order. For information on using this filter in a definition, see Map Items Filter.

Parameters

field_names (Iterable) – Optional, list of field names to use as the keys of the transformed mapping. The order of this list dictates which field name is mapped to which member of the incoming iterable value. If not provided, this filter’s field_names will be set to the value passed to the first invocation of its transform() method.

entry_points = ('map-items',)
transform(value: Iterable)Optional[Mapping]

Transforms the incoming iterable value into a dictionary, mapping a field name to a member of the incoming iterable value, in order. If field_names was not set in this filter’s constructor, then the incoming value is set as the list of field names to be used by subsequent calls to this method.

Parameters

value (Iterable) – Incoming value.

Returns

Transformed value, if field_names were set prior to this call.

Return type

MutableMapping

class threatq.dynamo.feeds.filters.data_structures.New(*_value, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter is configured with either a positional argument or key word arguments. It transforms an incoming value into a new value equal to the positional argument or a new dictionary mapping the kew word arguments. The incoming value is passed as template context to the new value. For information on using this filter in a definition, see New Filter.

Parameters
  • _value (Any) – Positional argument(s)

  • kwargs (Any) – Keyword argument(s)

Raises
  • TypeError – Raised if more than one positional argument is supplied.

  • TypeError – Raised if both positional and keyword arguments are supplied.

entry_points = ('new',)
transform(value: Any, args: Mapping)

Transform the incoming value into a new value as per configuration.

Parameters
  • value (Any) – Incoming value.

  • args (MutableMapping) – Configuration arguments.

Returns

Transformed value.

Return type

Any

class threatq.dynamo.feeds.filters.data_structures.UnsetKey(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter removes a key/value pair from a dictionary value. For in definition usage see Unset Key Filter. For information on using this filter in a definition, see Unset Key Filter.

entry_points = ('unset-key',)
transform(value: MutableMapping, args: Mapping)

Pop target key out of the incoming dictionary value.

Parameters
  • value (Any) – Incoming value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The same value with the specified key removed.

Return type

MutableMapping

class threatq.dynamo.feeds.filters.data_structures.Zip(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter is configured with lists of values and transforms the input lists into a single list of zipped values. For information on using this filter in a definition, see Zip Filter.

transform(value: Any, args: Mapping)Any

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

Miscellaneous Filters

Filters that do not fit very well into the other categories.

class threatq.dynamo.feeds.filters.misc.Delay(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter simply introduces a delay, and is intended for debugging filter chains. For information on using this filter in a definition, see Delay Filter.

entry_points = ('delay',)
async transform(value: Any, args: Mapping)Any

Introduce a delay and return the current value unchanged.

Parameters
  • value (Any) – Incoming value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The value unchanged

Return type

Any

class threatq.dynamo.feeds.filters.misc.Drop(condition=None, **kwargs)

Bases: threatq.dynamo.feeds.filters.misc.Log

This filter drops (and logs) incoming values. It would normally be used with a condition so that only specific values are dropped. Drop leverages the same field arguments as Log. For information on using this filter in a definition, see Drop Filter.

entry_points = ('drop',)
transform(value: Any, args: Mapping, parent_values)

Drop the current value from the filter chain by not returning. By default, the value is logged at level 5.

Parameters
  • value (Any) – Current value

  • args (MutableMapping) – Mapping of field arguments for this filter

  • parent_values (MutableMapping) – Mapping of parent_values

class threatq.dynamo.feeds.filters.misc.Fail(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

Raises an exception with the given message.

entry_points = ('fail',)
async transform(value: Any, args: Mapping)Any

Raise an exception with the given message.

Parameters
  • value (Any) – Incoming value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The value unchanged

Return type

Any

class threatq.dynamo.feeds.filters.misc.Get(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter yields a specified member of each received value, attempting to find the member as an index for lists, a key for mappings, or an attribute for other object types. If the member is not found, the default is yielded instead if one is specified, otherwise the appropriate exception is raised (IndexError, KeyError, or AttributeError). For information on using this filter in a definition, see Get Filter.

entry_points = ('get',)
transform(value: object, args: Mapping)

Get the specified member out of value and return it. If specified, default will be returned if the target member is not found.

Parameters
  • value (list | object | dict) – Incoming dict, object, or list value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The target member if found in value, otherwise default is specified.

Return type

Any

class threatq.dynamo.feeds.filters.misc.GetRunVar(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter will get a value that is globally available for a feed run. For information on using this filter in a definition, see Get Run Variable Filter.

transform(value: Any, args: Mapping)

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.misc.If(condition, filters, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.ConditionalFunctionFilter

This filter applies a given list of Filters as a Filter Chain if its condition evaluates to True. For information on using this filter in a definition, see If Filter.

entry_points = ('if',)
async transform(value: Any, value_ancestry: Tuple[Any])Any

If condition was true, transform the incoming value by applied the specified filters to it.

Parameters
  • value (Any) – Any incoming value.

  • value_ancestry (Tuple) – Tuple of Filter Chain ancestry values.

Returns

Transformed value result.

Return type

Any

class threatq.dynamo.feeds.filters.misc.InvokeConnector(filters, connector, condition=True, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.ConditionalFunctionFilter

This filter will invoke a feed and return the result. For information on using this filter in a definition, see Invoke Feed Filter.

adjust_return(result, ret_obj, value, parent_values, filters_result)

This method will adjust the return value of the nested connector based on the return value specified in the nested connector info. filters_result is the result of the prefilter chain, exposed here for future use.

entry_points = ('invoke-connector',)
async execute_prefilters(value_ancestry: Tuple[Any])Any

Filters which are used here are executed before the nested connector. This function encapsulates the application of these filters via wrapping with a chain filter.

Parameters

value_ancestry (Tuple) – Tuple of Filter Chain ancestry values.

Returns

Transformed value result.

Return type

Any

async run_nested_connector(nested_connector_info, value: Any, parent_values: Any)

This method runs the nested connector and returns the result

async transform(value: object, value_ancestry: Tuple[Any], parent_values)object

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.misc.Iterate(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.AsyncGeneratorFilter

This filter receives iterable values, then iterates them and yields each item individually. For information on using this filter in a definition, see Iterate Filter.

entry_points = ('iterate',)
transform(value: Iterable, args: Mapping)AsyncIterator[Any]

Yield each item of an incoming iterable value individually.

Args

value (Iterable): Any iterable value. args (MutableMapping): Mapping of field arguments for this filter

Returns

Async generator of values

Return type

AsyncIterator

class threatq.dynamo.feeds.filters.misc.Log(condition=None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.ConditionalFunctionFilter

Intended as a debugging filter, this simply logs a representation of each value, with its associated parent values, passing through it. The log level can be configured. If a condition is specified, the log is generated only if the value satisfies it. For information on using this filter in a definition, see Log Filter.

entry_points = ('log',)
transform(value: Any, args: Mapping, parent_values)

Log out the current value along with any additional information specified via the include field.

Parameters
  • value (Any) – Current value

  • args (MutableMapping) – Mapping of field arguments for this filter

  • parent_values (MutableMapping) – Mapping of parent_values

Returns

The current value unchanged

Return type

Any

class threatq.dynamo.feeds.filters.misc.Set(*values, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter receives key/value pairs and assigns the evaluated value expression to the given key on the value object passed through the filter chain. This filter can be used on value objects or dictionaries. For information on using this filter in a definition, see Set Filter.

entry_points = ('set',)
async transform(value: object, parent_values)object

Transform the incoming dictionary or object value by setting the specified values to their respective keys as per the Set Filter’s configuration.

Parameters
  • value (dict | object) – Incoming value.

  • parent_values – Tuple of Filter Chain parent values.

Returns

The incoming value transformed by setting either the specified data or the result of a Supplemental Feed.

Return type

dict | object

class threatq.dynamo.feeds.filters.misc.SetDefault(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter receives key/value pairs and assigns the evaluated value expression to the given key on the value object passed through the filter chain if and only if the given key on the value object does not already have a value. This filter can be used on value objects or dictionaries. For information on using this filter in a definition, see Set Default Filter.

entry_points = ('set-default',)
transform(value: object, args: Mapping)Any

Set default values on the incoming value and return it.

Parameters
  • value (dict | object) – Incoming dict or object value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The same value but with defaults set as per the values argument.

Return type

dict | object

class threatq.dynamo.feeds.filters.misc.SetIndex(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter receives a key/value pair and assigns the evaluated value expression to the given index key on the value list passed through the filter chain. This filter should be used on value lists. When setting values on objects or dictionaries, Set should be used. For information on using this filter in a definition, see Set Index Filter.

entry_points = ('set-index',)
transform(value: List, args: Mapping)Any

Sets the given value arg as a list element at index. :param value: List value :type value: list :param args: Mapping of field arguments for this filter :type args: MutableMapping

Returns

The same list value with the specified value set at the specified index.

Return type

list

class threatq.dynamo.feeds.filters.misc.SetRunVar(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter will set a value on an object that is globally available for a feed run. For information on using this filter in a definition, see Set Run Variable Filter.

transform(value: Any, args: Mapping)

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.misc.Switch(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.Filter

This Filter applies a given sub-Filter if a Switch condition evaluates to True. For information on using this filter in a definition, see Switch Filter.

entry_points = ('switch',)

Parsing Filters

Filters which parse a value as a particular type of data.

class threatq.dynamo.feeds.filters.parse.IP(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter transforms a value into a parsed IP Address dictionary. IP Addresses can be returned by APIs in an integer format. In that case, we have to parse it for the actual IP string. There is no native way to parse it without this filter. The ipaddress lib we use here allows us to parse strings as well as integers. It will also return information regarding if the IP is private or not.

entry_points = ('ip',)
transform(value: Union[str, int])Dict

Formats value as an IP Address dictionary.

Parameters

value (str | int) – IP Address value

Returns

A simplified representation of the IPv4 or IPv6 Address object

Return type

Dict

class threatq.dynamo.feeds.filters.parse.Timestamp(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter transforms value into a standard string representation of a timestamp. Inputs are reasonably flexible, see arrow.get(). For information on using this filter in a definition, see Timestamp Filter.

entry_points = ('timestamp',)
transform(value: Union[str, arrow.arrow.Arrow, int], args: Mapping)str

Formats value as a timestamp string.

Parameters
  • value (str | Arrow | int) – Timestamp value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

Formatted timestamp value.

Return type

str

Serialization Filters

Filters for parsing common data serialization types.

class threatq.dynamo.feeds.filters.serialization.DecodeBinary(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter decodes string values into bytes using one of the following values for the encoding argument:

  • base16

  • base32

  • base64

  • base85/ascii85

For information on using this filter in a definition, see Decode Binary Filter.

entry_points = ('decode-binary',)
transform(value: str)bytes

Decodes the incoming value using one of the available decoders.

Parameters

value (str) – Incoming string value

Returns

The value decoded.

Return type

bytes

class threatq.dynamo.feeds.filters.serialization.IterateJSONFile(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.AsyncGeneratorFilter

This filter accepts an AsyncTemporaryFile containing a JSON array as data, yielding each item in the deserialized JSON array one at a time. This filter is key to efficiently looping over large JSON files as it avoids loading the whole JSON into memory. For information on using this filter in a definition, see Iterate JSON File Filter.

entry_points = ('iterate-json-file',)
transform(value: Union[asyncio_extras.file.AsyncFileWrapper, tempfile._TemporaryFileWrapper], args: Mapping)AsyncIterator[Any]

Transforms the incoming JSON-encoded data into native Python objects yielding each item one at a time.

Parameters
  • value (AsyncTemporaryFile) – Incoming file containing JSON-encoded string.

  • args (MutableMapping) – Mapping of field arguments for this filter.

Returns

Async generator of values

Return type

AsyncIterator

class threatq.dynamo.feeds.filters.serialization.IterateTextFile(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.AsyncGeneratorFilter

This filter accepts an AsyncTemporaryFile containing text as data yielding each line of the text file one at a time. This filter is key to efficiently looping over large text files as it avoids loading the whole file into memory. For information on using this filter in a definition, see Iterate JSON File Filter.

entry_points = ('iterate-text-file',)
transform(value: Union[asyncio_extras.file.AsyncFileWrapper, tempfile._TemporaryFileWrapper], args: Mapping)AsyncIterator[Any]

Yields each line of the input file one at a time.

Parameters
  • value (AsyncTemporaryFile) – Incoming file containing text.

  • args (MutableMapping) – Mapping of field arguments for this filter.

Returns

Async generator of values

Return type

AsyncIterator

class threatq.dynamo.feeds.filters.serialization.ParseCSV(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter splits an incoming CSV-formatted string value into a list of substrings. For information on using this filter in a definition, see Parse CSV Filter.

entry_points = ('parse-csv', 'csv')

The csv entry point is deprecated. Please utilize the parse-csv entry point.

fmt_keywords = [<Parameter "delimiter=None">, <Parameter "doublequote=None">, <Parameter "escapechar=None">, <Parameter "lineterminator=None">, <Parameter "quotechar=None">, <Parameter "quoting=None">, <Parameter "skipinitialspace=None">]

Additional formatting parameters derived from Dialect that can be passed as arguments to this filter.

classmethod get_signature()inspect.Signature

Extends the Parse CSV Filter’s signature with formatting parameters dynamically derived from attributes of the Dialect class.

transform(value: str)Iterable[str]

Transforms the incoming CSV-formatted string into a list of substrings.

Parameters

value (str) – Incoming CSV-formatted string.

Returns

List of parsed substrings.

Return type

Iterable

class threatq.dynamo.feeds.filters.serialization.ParseJSON(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter deserializes JSON-encoded strings. For information on using this filter in a definition, see Parse JSON Filter.

entry_points = ('parse-json', 'json')

The json entry point is deprecated. Please utilize the parse-json entry point.

transform(value: str)Any

Transforms the incoming JSON-encoded string into native Python objects.

Parameters

value (str) – Incoming JSON-encoded string.

Returns

Native Python objects deserialized from the incoming value.

Return type

Any

class threatq.dynamo.feeds.filters.serialization.ParseJSONSequence(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.AsyncGeneratorFilter

Deserializes sequences of individual JSON objects from strings. Accepts either RFC 7464 “json-seq” format or one-object-per-line. For information on using this filter in a definition, see Parse JSON Sequence Filter.

entry_points = ('parse-json-seq',)
on_finish()

This hook may be defined by subclasses needing to be notified upon exhaustion of source values. It can be a function, a coroutine, or an asynchronous generator. Any values or pipeline messages returned (except None) or yielded will be sent through the rest of the filter chain.

transform(lines: str)Any

This provides for yielding multiple results for a single transformed value. Semantics are the same as TransformFilter.transform(), except that it should be overridden with an asynchronous generator yielding results instead of returning one.

class threatq.dynamo.feeds.filters.serialization.ParseMISP(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter parses MISP JSON data into ThreatQ threat object data that can be fed directly to the reporter. For information on using this filter in a definition, see Parse MISP Filter.

entry_points = ('parse-misp',)
async transform(value: Union[str, list, dict])MutableMapping

Transforms the incoming MISP JSON data into ThreatQ threat object data.

Parameters

value (str | list | dict) – Incoming MISP JSON value

Returns

ThreatQ threat object data

Return type

MutableMapping

class threatq.dynamo.feeds.filters.serialization.ParseOLE2Email(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter reads the streams from an OLE2 file. For information on using this filter in a definition, see Parse OLE2 Email Filter.

property binary

Property representing whether this filter strictly requires its incoming value to be binary. These values are introspected by FeedFilters in order to inform a TQFeed or TQFilter run as to which file opening mode should be passed into the IOSource.

entry_points = ('parse-ole2-email',)
transform(data: bytes)MutableMapping

Parses an OLE2 file’s property and attachment streams via parse_ole2_email().

Parameters

data (bytes) – OLE2 file.

Returns

Dictionary of parsed OLE2 file streams

Return type

MutableMapping

class threatq.dynamo.feeds.filters.serialization.ParseSTIX(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter deserializes STIX-encoded values. Can be applied to a dict, str, or an iterable of strings. For information on using this filter in a definition, see Parse STIX Filter.

entry_points = ('parse-stix',)
async transform(value: Union[str, dict, Iterable[str]], args: Mapping)MutableMapping

Receive some STIX data, parse it via parse_stix(), and return the results back to the filter chain.

Parameters
  • value (str | dict | Iterable[str]) – Incoming STIX value

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

Dictionary of parsed STIX data

Return type

MutableMapping

class threatq.dynamo.feeds.filters.serialization.ParseSnort(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter deserializes Snort/Suricata signature strings. For information on using this filter in a definition, see Parse Snort Filter.

entry_points = ('parse-snort', 'parse-suricata')
transform(value: str)Iterable[Mapping]

Transforms the incoming string containing Snort/Suricata signatures into a list of parsed Snort/Suricata signature mappings.

Parameters

value (str) – Incoming string containing one or multiple Snort/Suricata signatures.

Returns

A list of parsed Snort/Suricata signature mappings.

Return type

Iterable[MutableMapping]

class threatq.dynamo.feeds.filters.serialization.ParseXML(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter deserializes XML-encoded strings. For information on using this filter in a definition, see Parse XML Filter.

entry_points = ('parse-xml',)
async transform(value: str)Any

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.serialization.ParseYARA(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter deserializes YARA signature strings. For information on using this filter in a definition, see Parse YARA Filter.

entry_points = ('parse-yara',)
async transform(value: str)Iterable[Mapping]

Transforms the incoming string containing YARA signatures into a list of parsed YARA signature mappings.

Parameters

value (str) – Incoming string containing one or multiple YARA signatures.

Returns

A list of parsed YARA signature mappings.

Return type

Iterable[MutableMapping]

class threatq.dynamo.feeds.filters.serialization.SummarizeIPRange(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter transforms an IP range into a list of CIDR blocks. For information on using this filter in a definition, see Summarize IP Range Filter.

entry_points = ('summarize-ip-range',)
transform(value, args: Mapping)List[str]

Converts an IP range into a list of CIDR blocks.

Parameters
  • value (Any) – Any incoming value.

  • args (MutableMapping) – Mapping of field arguments for this filter.

Returns

Summarized CIDR blocks represented as a list of strings.

Return type

list[str]

Text Filters

Filters which offer various actions that can be taken upon text data values.

class threatq.dynamo.feeds.filters.text.CaseFold(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter casefolds values passing through it. This is similar to lowercasing, but is intended for string comparisons and works with unicode inputs more reliably for that purpose. For information on using this filter in a definition, see CaseFold Filter.

entry_points = ('casefold',)
transform(value: str)str

Casefold the incoming value.

Parameters

value (str) – Incoming string value.

Returns

The value casefolded.

Return type

str

class threatq.dynamo.feeds.filters.text.LStrip(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.text.Strip

Like Strip, but only strips leading characters. For information on using this filter in a definition, see LStrip Filter.

entry_points = ('lstrip',)
transform(value: str)str

LStrip the incoming value of whitespace and/or any characters specified in _method_args.

Parameters

value (str) – Incoming string value.

Returns

The value LStripped.

Return type

str

class threatq.dynamo.feeds.filters.text.Lower(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter transforms string values to lowercase. For information on using this filter in a definition, see Lower Filter.

entry_points = ('lower',)
transform(value: str)str

Lower the incoming value.

Parameters

value (str) – Incoming string value.

Returns

The value lowered.

Return type

str

class threatq.dynamo.feeds.filters.text.RSplit(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.text.Split

Like Split, but if a max number of splits N is specified, then only the rightmost N splits will be performed. For information on using this filter in a definition, see RSplit Filter.

entry_points = ('rsplit',)
transform(value: str, args: Mapping)Iterable[str]

Split a string value into a list of substrings using sep as the delimiter string. If sep is not specified or is None, any whitespace character is assumed a separator. If maxsplit is given, at most maxsplit splits are done, starting from the end of the string.

Parameters

value (str) – Incoming string value.

Returns

The value split into an array of substrings.

Return type

str

class threatq.dynamo.feeds.filters.text.RStrip(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.text.Strip

Like Strip, but only strips trailing characters. For information on using this filter in a definition, see RStrip Filter.

entry_points = ('rstrip',)
transform(value: str)str

RStrip the incoming value of whitespace and/or any characters specified in _method_args.

Parameters

value (str) – Incoming string value.

Returns

The value RStripped.

Return type

str

class threatq.dynamo.feeds.filters.text.RegexFindAll(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.text.RegexMatch

This filter returns a list containing all of the substrings within the value that match the regular expression. For information on using this filter in a definition, see Regex Find All Filter.

entry_points = ('regex-findall',)
transform(value: str)str

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.text.RegexMatch(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

If the value matches the regular expression as the beginning of the string, this filter returns the match object representing the match (otherwise None). For information on using this filter in a definition, see Regex Match Filter.

entry_points = ('regex-match',)
transform(value: str)str

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.text.RegexReplace(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.text.RegexMatch

This filter replaces all substrings within the value that match the regular expression with a specified replacement string and returns the result. For information on using this filter in a definition, see Regex Replace Filter.

entry_points = ('regex-replace',)
transform(value: str, args: Mapping)str

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.text.RegexSearch(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.text.RegexMatch

If the value matches the regular expression (anywhere in the string), this filter returns the match object representing the first match (otherwise None). For information on using this filter in a definition, see Regex Search Filter.

entry_points = ('regex-search',)
transform(value: str)str

Receive a single value and return a transformed value.

Parameters

value – the value being processed.

Note

See call_transform() for information on additional parameters overrides may support.

class threatq.dynamo.feeds.filters.text.Replace(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter runs string find/replace and returns the result. For information on using this filter in a definition, see Replace Filter.

entry_points = ('replace',)
transform(value: str, args: Mapping)str

Preform a substring replace on value.

Parameters

value (str) – Incoming string value.

Returns

The value with old replace with new.

Return type

str

class threatq.dynamo.feeds.filters.text.Split(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter splits incoming strings into lists on the specified separator character. If a maximum number of splits N is specified, only the leftmost N splits will be performed. For information on using this filter in a definition, see Split Filter.

entry_points = ('split',)
transform(value: str, args: Mapping)Iterable[str]

Split a string value into a list of substrings using sep as the delimiter string. If sep is not specified or is None, any whitespace character is assumed a separator. If maxsplit is given, at most maxsplit splits are done.

Parameters

value (str) – Incoming string value.

Returns

The value split into an array of substrings.

Return type

str

class threatq.dynamo.feeds.filters.text.SplitLines(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter splits incoming blocks of text into lists of lines (on newlines). The newlines are left off the resulting strings unless keepends is specified True. For information on using this filter in a definition, see Split Lines Filter.

entry_points = ('split-lines', 'splitlines')
transform(value: str, args: Mapping)Iterable[str]

Splits incoming blocks of text into lists of lines (on newlines). The newlines are left off the resulting strings unless keepends is specified True.

Parameters

value (str) – Incoming string value.

Returns

The value split into an array of substrings on line boundary characters.

Return type

str

class threatq.dynamo.feeds.filters.text.Strip(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter strips leading and trailing characters (whitespace by default) from values passing through it. For information on using this filter in a definition, see Strip Filter.

entry_points = ('strip',)
transform(value: str)str

Strip the incoming value of whitespace and/or any characters specified in _method_args.

Parameters

value (str) – Incoming string value.

Returns

The value stripped.

Return type

str

class threatq.dynamo.feeds.filters.text.Title(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter transforms values to titlized text, i.e. initial letters of each word in uppercase and other letters lowercase. For information on using this filter in a definition, see Title Filter.

entry_points = ('title',)
transform(value: str)str

Title case the incoming value.

Parameters

value (str) – Incoming string value.

Returns

The value title cased.

Return type

str

class threatq.dynamo.feeds.filters.text.TruncateHTML(*args, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter truncates an incoming HTML string value if its length exceeds the provided limit (factoring in the provided end string). If the whole incoming HTML string can fit within the provided limit (inclusive), there may still be modifications made to the HTML string due to lxml’s HTML parser (e.g., <br> changed to <br/>). For information on using this filter in a definition, see Truncate HTML Filter.

entry_points = ('truncate-html',)
transform(value: str)str

Truncates the incoming HTML string if its length exceeds the provided limit.

Parameters

value (str) – Incoming HTML string value

Returns

Truncated HTML string

Return type

str

class threatq.dynamo.feeds.filters.text.Upper(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter transforms string values to uppercase. For information on using this filter in a definition, see Upper Filter.

entry_points = ('upper',)
transform(value: str)str

Upper the incoming value.

Parameters

value (str) – Incoming string value.

Returns

The value uppered.

Return type

str

ThreatQ Filters

Filters that are specifically tied to the ThreatQ Appliance.

class threatq.dynamo.feeds.filters.threatq.Api(url, filters, *, method='GET', params=None, headers=None, data=None, response_content_type='json', **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter reaches out to a given endpoint (url) in the Threatq API and applies a chain of filters to the resulting response data. For information on using this filter in a definition, see API Filter.

async transform(value: Any, args: Mapping, value_ancestry)Any

Poll the TQ API at given url with any supplied params, headers, or data and format the results as per the filters defined in filters before returning the results back to the filter chain. For in definition usage see API Filter.

Parameters
  • value (Any) – Incoming value. Only used here to build out any possible Jinja2 expressions/templates present in this filter’s args.

  • args (MutableMapping) – Mapping of field arguments for this filter.

  • value_ancestry (MutableMapping) – Mapping of this filter’s value ancestry. Need to be based into any filters declared via the filters arg.

Returns

Results from the TQ API.

Return type

Any

class threatq.dynamo.feeds.filters.threatq.DownloadAttachment(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter reaches out to the attachments endpoint in the Threatq API and returns a string representation of the downloaded file.

entry_points = ('download-attachment',)
async transform(value: Any, args: Mapping)str

Poll the TQ API at given url with any supplied params

Parameters
  • value (Any) – Incoming value.

  • args (t.Mapping) – Incoming named arguments

Returns

A text representation of the attachment download

Return type

str

threatq.dynamo.feeds.filters.threatq.is_json(content_type: str)bool

Checks to see if the content_type is referring to JSON.

Parameters

content_type (str) – Content type to check.

Returns

True if content_type is referring to JSON.

Return type

bool

Type Filters

Filters that allow for typecasting of data values.

class threatq.dynamo.feeds.filters.types.Bool(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.types.TypecastFilter

This filter allows a user to typecast a value as a bool. For information on using this filter in a definition, see Bool Filter.

class threatq.dynamo.feeds.filters.types.Decimal(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.types.TypecastFilter

This filter allows a user to typecast a value as a Decimal. For information on using this filter in a definition, see Decimal Filter.

class threatq.dynamo.feeds.filters.types.Dict(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.types.TypecastFilter

This filter allows a user to typecast a value as a dict. For information on using this filter in a definition, see Dict Filter.

class threatq.dynamo.feeds.filters.types.Int(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.types.TypecastFilter

This filter allows a user to typecast a value as a int. For information on using this filter in a definition, see Int Filter.

class threatq.dynamo.feeds.filters.types.List(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.types.TypecastFilter

This filter allows a user to typecast a value as a list. For information on using this filter in a definition, see List Filter.

class threatq.dynamo.feeds.filters.types.Str(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

This filter allows a user to typecast a value as a string. For information on using this filter in a definition, see Str Filter.

entry_points = ('str',)
transform(value: Any, args: Mapping)str

Typecast value and a string. value will be decoded with the specified encoding if the value supports a decode method.

Parameters
  • value (Any) – Any incoming value.

  • args (MutableMapping) – Mapping of field arguments for this filter

Returns

The value typecast as a string.

Return type

Any

class threatq.dynamo.feeds.filters.types.TypecastFilter(*args, error_mgr_factory: Optional[Union[Type[ContextManager], Callable]] = None, **kwargs)

Bases: threatq.dynamo.feeds.filters.base.FunctionFilter

Dynamic filter that allows a user to typecast some value as a specific data type. For information on using these filters in a definition, see Type Filters.

transform(value: Any)

Typecast value to the type specified _type.

Parameters

value (Any) – Value to typecast

Returns

Typecasted value.

Return type

Any

Publishers

Base Publisher

class threatq.dynamo.feeds.publishers.base.Publisher(feed_run: threatq.dynamo.feeds.common.FeedRun, *args, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Bases: threatq.core.lib.pipeline.PipelineSegment, threatq.dynamo.feeds.base.DynamoFeedElement

Class used for creating a publisher for a feed.

Parameters

feed_run (FeedRun) – The feed run for the publisher

__init__(feed_run: threatq.dynamo.feeds.common.FeedRun, *args, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

abstract handle_objects(objects: Set[threatq.core.models.base.ThreatObject])

This is automatically called with each queued object set so that the Publisher subclass can take appropriate publication actions.

This method can be overridden as a function, an async coroutine, or an async generator.

Parameters

objects – an incoming feed objects set

TQ Publisher

class threatq.dynamo.feeds.publishers.tq_api.TQAPIPublisher(*args, **kwargs)

Bases: threatq.dynamo.feeds.publishers.base.Publisher

Class used for creating a TQ API publisher for a feed.

Parameters

feed_run (FeedRun) – The feed run for the publisher

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

handle_message(message: threatq.core.lib.pipeline.PipelineMessage)

Handle a message from the pipeline.

Parameters

message (PipelineMessage) – The message to handle.

Returns

Any result of handling the message.

async handle_objects(objects: Set[threatq.core.models.base.ThreatObject])

Handle a set of objects by pushing them to the TQ API.

Parameters

objects (set) – The objects to handle

Library

The modules in the threatq.core.lib package provide a variety of useful classes and functions for general use.

Connectors: threatq.core.lib.connectors

The following tools are provided for connectors.

class threatq.core.lib.connectors.Connector(ctx: threatq.core.lib.asphalt.Context, id: Optional[int] = None, name: Optional[str] = None, namespace: Optional[str] = None)

Bases: threatq.core.lib.elements.Element

async classmethod all_enabled_infos(ctx, limit=100)

Get all enabled feeds from the API.

Parameters
  • ctx (threatq.core.lib.asphalt.Context) – a context object

  • limit (int, optional) – determines how many feeds to fetch per API request

Returns

contains a dict for each feed, iff the feed is enabled and has a definition.

Return type

list

async update_info(info: Optional[Mapping] = None, *, validate: bool = True)

Updates the connector instance with either the provided info mapping or by getting the connector info from the ThreatQ API by this connector instance’s ID, namespace, or name attribute, in that order of precedence based on availability.

Parameters
  • info (MutableMapping) – Optional mapping of key-value pairs that match those returned by the ThreatQ API’s GET /api/connectors?with=definition,gateOauth2Client endpoint for a single connector object. If provided, this connector instance is updated based only on this mapping; no request is made to the ThreatQ API. Since this is a positional argument, pass None to update the connector’s info from the ThreatQ API.

  • validate (bool) – Optional, defaults to True. If True, the connector’s definition is analyzed for correctness (see ConnectorDefinition.validate()).

Raises

ValueError – Raised if info mapping is not provided and this connector instance does not have a non-None value for id, namespace, and name.

class threatq.core.lib.connectors.ConnectorDefinition(definition_yaml: str, *, ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.asphalt.ContextRefMixin

A Connector (Feed) Definition

Parameters

definition_yaml (str) – The raw string value of the connector definition

definition_yaml

The raw string value of the connector definition

Type

str

definition

The definition_yaml run through the YAML parser

Type

dict

version

The version of the connector definition, if defined in the definition_yaml

Type

str

required_threatq_version

The version specifier indicating the required ThreatQ version(s) needed to install and run the connector definition, if defined in the definition_yaml

Type

str

create_invoking_filter(feed_name: str, supported_objects: collections.OrderedDict)list

Create a filter that can be used to invoke the action.

Parameters
  • feed_name (str) – The name of the action feed.

  • supported_objects (OrderedDict) – The supported objects from the action feed.

Returns

A filter that can be used to invoke the action.

Return type

list

format_supported_objects(supported_objects: list)list

Format the supported objects for an action.

Parameters

supported_objects (list) – A list of supported objects.

Returns

A list of formatted supported objects.

Return type

list

get_action_config_options(action_feed, feed_name)dict

Handle the config options for an action.

Parameters
  • action_feed (dict) – The action feed to handle config options for.

  • feed_name (str) – The name of the action feed.

Returns

A dictionary of the configuration options for the action feed.

Return type

dict

get_action_objects(feed_name: str)collections.OrderedDict

Return a summary object of an action to be passed to the ThreatQ API.

Parameters

feed_name (str) – The name of the feed.

Returns

A dictionary that maps action to a list of actions objects. Each action object

consists of an action_body, a user_fields, and a report. The action_body contains the name, source, and filters of the action.

Return type

dict

get_category(feed_name: str)str

Get the category name for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The category name of the feed.

Return type

str

get_default_period(feed_name: str)int

Get the default period for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The default period of the feed in seconds or None if not defined

Return type

int

get_default_schedule(feed_name: str)str

Get the default schedule for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The default schedule of the feed in crontab or None if not defined

Return type

str

get_description(feed_name: str)str

Get the description for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The default description specified by this feed. If the description is not defined, then it will default to a blank string.

Return type

str

get_display_name(feed_name: str)str

Get the display name for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The value of the “display_name” key specified by this feed definition. If “display_name” key is not defined in the feed definition, return feed_name.

Return type

str

get_feed_config(feed_name: str)dict

Get the configuration for a feed.

Parameters

feed_name (str) – The name of the feed.

Returns

A representation of the default configuration for this feed.

Return type

dict

get_feed_definition(feed_name: str, *, all_keys: bool = False)dict

Get the definition for the given feed_name. If a definition for feed_name is not found within definition, then the definition associated with the default feed name _default is returned. If neither a definition for feed_name nor _default is found, then an error is raised.

Parameters
  • feed_name (str, optional) – The name of the feed. If omitted, it will return a dict with the feed names as the keys and the types as the values.

  • all_keys (bool, optional) – Defaults to False. If True all keys will be returned, if False, only the keys defined in the required_feed_keys set will be returned.

Returns

A dictionary of the definition of the specific feed or one where the dict has the keys as the feed names and the values as the definitions.

Return type

dict

Raises

LookupError – Raised if no definition is found for feed_name or _default.

get_feed_filters(feed_name: str)list

Get list of filters that a specific feed uses.

Parameters

feed_name (str) – The name of the feed.

Returns

The list of filters and the arguments used to call them in the filters section.

Return type

list[dict]

get_feed_names(*, exclude: Optional[list] = None, only: Optional[list] = None)list

Get list of feed names that are in this definition.

Parameters
  • exclude (list[str], optional) – Which feed types should we exclude.

  • only (list[str], optional) – Which feed types should we include.

Returns

The list of feed names in the definition

Return type

list[str]

get_feed_object_types(feed_name: str)list

Get the object types a feed will ingest

Parameters

feed_name (str) – The name of the feed.

Returns

The list of object types that are ingested by this feed.

Return type

list[ThreatObject]

get_feed_type(feed_name: str)str

Get the type of feed.

Parameters

feed_name (str, optional) – The name of the feed. If omitted, it will return a dict with the feed names as the keys and the types as the values.

Returns

The type of feed, possible values are defined here: Feed Type Definition

Return type

str

get_global_template_values()dict

Get the template_values that are defined at the base definition.

Returns

The template_values dict found at the root of the definition.

Return type

dict

get_ingest_rules(feed_name: str)dict

Get the ingest rules for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The ingest rules specified by this feed. If ingest rules are not defined, it will default to an empty dict.

Return type

dict

get_namespace(feed_name: str)str

Get the namespace for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The default namespace specified by this feed. If the namespace is not defined, then it will default to: ‘threatq.feeds.custom.<Feed Name>’

Return type

str

get_non_builtin_used_params(feed_name: str)set

Get the run params that aren’t defined in built_in_run_meta_params

Parameters

feed_name (str) – The name of the feed.

Returns

set[str]: unique list of parameters that are used by this feed.

get_summary_data(exclude_supplemental=True)dict

Return a summary object of the entire feed which is served to the API for installation.

Returns

A dictionary where the keys are the feed names and the values are a dict give an overview of it.

Return type

dict

get_used_run_params(feed_name: str)set

Get the parameters and metadata used by a feed.

Parameters

feed_name (str) – The name of the feed.

Returns

unique list of parameter names that are used by this feed.

Return type

set[str]

get_user_fields(feed_name: str)dict

Get the user fields for the supplied feed.

Parameters

feed_name (str) – The name of the feed.

Returns

The hash of the fields that will be user configurable.

Return type

dict

is_supplemental(feed_name: str)bool

Get whether or not the specified feed is considered a supplemental feed.

Parameters

feed_name (str) – The name of the feed.

Returns

True if feed is supplemental or fulfillment.

Return type

bool

parse_filter_schema(filter_schema: Any)dict

Parse the schema.

Parameters

filter_schema (Any) – filter_schema Parameter

Returns

a structured dictionary of used filters.

Return type

dict

supports_manual(feed_name: str)bool

Get whether or not the specified feed is one that can perform a manual run.

Parameters

feed_name (str) – The name of the feed.

Returns

True if feed is capable of a manual run.

Return type

bool

validate()

Validate the definition file to make sure it will at least run.

This is currently limited to just ensuring that it has the basic parts of a feed definition.

Raises

DefinitionError – A DefinitionError will be raised with the offending issue in its errmsg.

Database: threatq.core.lib.database

class threatq.core.lib.database.base.DatabasePool(logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, max_connections: Optional[int] = None)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin

Base class for Database Pool classes that provides an interface for managing connections to a database server.

Context manager usage example:

async with releasing_connection(database_pool) as connection:
    # `connection` is an object of the underlying database library
    connection.execute('SELECT . . .')
# `connection` is automatically released back to the pool when the context manager exits

Coroutine usage example:

connection = await database_pool.acquire()
connection.execute('SELECT . . .')
await database_pool.release(connection)

If an Asphalt Context is passed to the DatabasePool’s constructor, the database engine is automatically closed when the Context is closed.

Parameters
  • logname (str) – Name for this pool’s logger

  • ctx (Context) – Optional, Asphalt context object

  • max_connections (int) – Maximum number of connections held by this pool

async acquire()

Coroutine that returns a connection object of the derived class’ underlying database library. The database engine/pool is started if it was not previously started.

If all available connections were acquired by other tasks, the current task is blocked until another task releases the connection using release().

async close()

Coroutine that closes out the database engine/pool for the derived class’ underlying database library. If started is set to False, then this coroutine is a no-op.

async release(connection)

Coroutine that releases the connection object of the derived class’ underlying database library back to the database pool.

Parameters

connection – Connection object of the derived class’ underlying database library to release back to the pool

async start()

Coroutine that initializes and starts the database engine/pool for the derived class’ underlying database library. This coroutine is automatically called by acquire(). If started is set to True, then this coroutine is a no-op.

property started

Indicates whether the database pool is ready for use.

class threatq.core.lib.database.threatq.TQMariaDatabasePool(logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, username: Optional[str] = None, password: Optional[str] = None, database_name: Optional[str] = None, host: Optional[str] = None, port: Optional[int] = None, loop=None, **pool_config)

Bases: threatq.core.lib.database.base.DatabasePool

Derived Database Pool class that functions as a connector to a MariaDB server.

Credentials, database name, and database host can either be passed as constructor arguments or accessed from a provided Asphalt Context object’s config resource via the following keys:

  • tq_maria_username

  • tq_maria_password

  • tq_maria_database_name

  • tq_maria_host

  • tq_maria_port

Connection objects returned by acquire() are of type SAConnection.

Number of connections in the database pool is derived from the following keys in order should they be present in pool_config:

  • maxsize

  • minsize

If neither are provided, the number of connections defaults to 5.

In order for this database pool’s engine to be closed, all acquired connections must be released back to the pool. Therefore, close() will block until release() is called for each acquired connection.

Parameters
  • logname (str) – Name for this pool’s logger

  • ctx (Context) – Optional, Asphalt context object

  • username (str) – Optional, username of the user to log into the MariaDB server as

  • password (str) – Optional, password for the user for authentication with the MariaDB server

  • database_name (str) – Optional, name of the database to use

  • host (str) – Optional, IP address or hostname of the MariaDB server

  • port (int) – Optional, port that the MariaDB server listens on

  • loop – Optional, asyncio event loop to use

  • pool_config – Optional, additional kwargs that are passed along to create_engine()

Exceptions: threatq.core.lib.exceptions

The following exceptions are raised within Pynoceros:

exception threatq.core.lib.exceptions.ActionFeedError

Bases: threatq.core.lib.exceptions.ConnectorError

Error raised when an exception is encountered while running a threatq.dynamo.feeds.ActionFeed.

exception threatq.core.lib.exceptions.AuthenticationError(*args, exc=None, **kwargs)

Bases: Exception

Error raised when an exception is encountered while authenticating.

Parameters

exc (Exception) – The causing exception being wrapped.

exc

Inner exception.

Type

Exception

exception threatq.core.lib.exceptions.ConnectorError

Bases: Exception

Ambiguous Error raised when an exception is encountered while running a threatq.dynamo.feeds.

exception threatq.core.lib.exceptions.DefinitionError(definition: Union[str, List, Mapping], errmsg: Optional[str] = None)

Bases: ValueError

Base Error used by exceptions raised during parsing of a Feed Definition.

Parameters
  • definition – Definition in question that errored

  • errmsg – Error message

definition

YAML Definition in question that errored

Type

Iterable

errmsg

Error message

Type

str

exception threatq.core.lib.exceptions.ElementDefinitionEntryPointError(definition: Iterable, entry_point: str, errmsg: Optional[str] = None)

Bases: threatq.core.lib.exceptions.ElementDefinitionError

Base error raised when an element definition specifies an invalid entry_point.

Parameters
  • definition (Iterable) – YAML Definition in question that errored

  • entry_point (str) – Specified entry_point

  • errmsg (str) – Error message

definition

YAML Definition in question that errored

Type

Iterable

entry_point

Specified entry_point

Type

str

errmsg

Error message

Type

str

exception threatq.core.lib.exceptions.ElementDefinitionEntryPointLookupError(definition: Iterable, errmsg: Optional[str] = None, entry_point_group: Optional[threatq.core.lib.decorators.unique.<locals>.UniqueFactory] = None, entry_point: Optional[str] = None)

Bases: threatq.core.lib.exceptions.ElementDefinitionEntryPointError, threatq.core.lib.exceptions.EntryPointLookupError

Error raised when an Element Definition specifies an entry_point that is not found in a given EntryPointGroup.

Parameters
  • definition (Iterable) – YAML Definition in question that errored

  • errmsg (str) – Error message

  • entry_point_group (EntryPointGroup) – Target entry_point group

  • entry_point (str) – Specified entry_point

entry_point_group

entry_point group

Type

EntryPointGroup

property errmsg

Formatted error message.

Returns

The formatted error message

Return type

str

exception threatq.core.lib.exceptions.ElementDefinitionError(definition: Union[str, List, Mapping], errmsg: Optional[str] = None)

Bases: threatq.core.lib.exceptions.DefinitionError

Base error raised when an invalid element is encountered in a Feed Definition.

exception threatq.core.lib.exceptions.ElementMappingDefinitionError(definition: Union[str, List, Mapping], errmsg: Optional[str] = None)

Bases: threatq.core.lib.exceptions.ElementDefinitionError

Error raised when an element definition mapping is invalid.

exception threatq.core.lib.exceptions.ElementMappingMemberCountDefinitionError(definition: Union[str, List, Mapping], errmsg: Optional[str] = None)

Bases: threatq.core.lib.exceptions.ElementMappingDefinitionError

Error raised when an element definition mapping specifies more than one member.

exception threatq.core.lib.exceptions.EntryPointLookupError

Bases: LookupError

Error raised when an entry_point is not found during lookup.

exception threatq.core.lib.exceptions.InvalidElementEntryPointArgumentsDefinitionError(definition: Iterable, entry_point: str, errmsg: Optional[str] = None)

Bases: threatq.core.lib.exceptions.ElementDefinitionEntryPointError

Error raised when an element definition specifies invalid arguments.

Parameters
  • definition (Iterable) – YAML Definition in question that errored

  • entry_point (str) – Specified entry_point

  • errmsg (str) – Error message

property errmsg

Formatted error message

Returns

The formatted error message

Return type

str

exception threatq.core.lib.exceptions.InvalidElementEntryPointDefinitionError(definition: Union[str, List, Mapping], errmsg: Optional[str] = None)

Bases: threatq.core.lib.exceptions.ElementDefinitionError

Error raised when an element definition specifies an invalid entry point.

exception threatq.core.lib.exceptions.MissingPlaceholderFileError

Bases: Exception

Error raised when a placeholder attachment does not have an associated placeholder file in the ThreatQ appliance.

exception threatq.core.lib.exceptions.ModelsNotReadyError

Bases: RuntimeError

Exception indicating an attempt to access the models map before it’s been initialized.

exception threatq.core.lib.exceptions.ResumablePosterError(errmsg, exc)

Bases: Exception

Error raised when an exception is encountered during threatq.core.lib.http.resumable.ResumablePoster processing.

Parameters
  • errmsg (str) – The error message.

  • exc (Exception) – The causing exception being wrapped.

exception

Inner exception.

Type

Exception

exception threatq.core.lib.exceptions.RetryExceededError

Bases: Exception

Error raised when a task has been retried too many times.

exception threatq.core.lib.exceptions.SSLError(*args, cert_type: Optional[str] = None, **kwargs)

Bases: Exception

Error raised when an exception is encountered when loading SSL certificates. Since the error messages returned by the underlying SSL library are often unfriendly, an attempt is made to generate a friendlier error message for some known errors based on the underlying SSL library’s module and errno.

Parameters

cert_type (str) – Certificate type that failed to be loaded, used for generating certain error messages.

cert_type

Certificate type that failed to be loaded, used for generating certain error messages.

Type

str

exception threatq.core.lib.exceptions.STIXMappingError(message, *, exc=None)

Bases: Exception

Error raised when an exception is encountered during STIX data processing.

Parameters

exc (Exception) – The causing exception being wrapped.

exc

Inner exception.

Type

Exception

exception threatq.core.lib.exceptions.SupplementalFeedError

Bases: threatq.core.lib.exceptions.ConnectorError

Error raised when an exception is encountered while running a threatq.dynamo.feeds.SupplementalFeed.

exception threatq.core.lib.exceptions.TAXIIPollError

Bases: Exception

Error raised when an exception is encountered when polling a TAXII server.

exception threatq.core.lib.exceptions.TaskCancelledException

Bases: Exception

Error raised when a task is cancelled for any reason.

exception threatq.core.lib.exceptions.ThreatFileFetchError(errmsg, exc: Optional[Exception] = None)

Bases: RuntimeError

Error raised when an exception is encountered while fetching the file.

Parameters

exc (Exception) – The exception that was raised during the fetch.

exception

The exception that was raised during the fetch.

Type

Exception

exception threatq.core.lib.exceptions.UserDefinedError

Bases: Exception

Error raised when an exception is defined by the user.

HTTP: threatq.core.lib.http

Authentication: threatq.core.lib.http.auth
Authentication Overview

This package provides a variety of HTTP authentication classes for CDFs and Operations. The following Authentication options are available for use:

HTTP Auth Base
class threatq.core.lib.http.auth.base.HTTPAuthBase(*args, params=None, headers=None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Bases: threatq.core.lib.elements.Element

async authenticate()

Should perform authentication or raise an exception in the event of failure. Will be called automatically prior to a request if in an unauthenticated state. It should save any ephemeral authentication info (e.g. an auth token) needed by update_request().

async inspect_response(response: threatq.core.lib.http.ClientResponse)

Should examine response to determine if existing authentication has expired or is otherwise invalid and if so, call the raise_unauthorized() helper (or equivalent). The default behavior handles the case of the server returning an HTTP Unauthorized status. Override this method if that’s not sufficient for a given auth plugin.

abstract update_request(request: threatq.core.lib.http.ClientRequest)

This method will be called so that, prior to sending the request, the plugin can make necessary modifications to the request object. Note that due to their ubiquity, query params and headers are automatically handled separately. If update_request (optionally) returns a mapping, its contents will be added to the template context used for rendering any templates params and headers might contain.

Basic Auth
class threatq.core.lib.http.auth.basic.BasicAuth(*args, encoding='latin1', **kwargs)

Bases: threatq.core.lib.http.auth.base.HTTPAuthBase

Class used for enabling HTTP BASIC Authentication in Feeds and Plugins

Parameters
  • username (str) – The username to be used for authentication

  • password (str) – The password to be used for authentication

  • encoding (str) – Defaults to ‘latin1’. How the result will be encoded.

update_request(request: threatq.core.lib.http.ClientRequest)

This method will be called so that, prior to sending the request, the plugin can make necessary modifications to the request object. Note that due to their ubiquity, query params and headers are automatically handled separately. If update_request (optionally) returns a mapping, its contents will be added to the template context used for rendering any templates params and headers might contain.

Client SSL Auth
class threatq.core.lib.http.auth.client_ssl.ClientSSLAuth(*args, params=None, headers=None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Bases: threatq.core.lib.http.auth.base.HTTPAuthBase

Class used for enabling SSL Client Certificate Authentication in Feeds and Plugins.

Parameters
  • client_certificate (str) – base64 PEM-encoded client certificate

  • client_private_key (str) – Optional, base64 PEM-encoded client private key

  • client_private_key_passphrase (str) – Optional, password used for decrypting client private key

update_request(request: threatq.core.lib.http.ClientRequest)

This method will be called so that, prior to sending the request, the plugin can make necessary modifications to the request object. Note that due to their ubiquity, query params and headers are automatically handled separately. If update_request (optionally) returns a mapping, its contents will be added to the template context used for rendering any templates params and headers might contain.

HMAC Auth
class threatq.core.lib.http.auth.hmac.HMACAuth(secret_key, message, *args, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Bases: threatq.core.lib.http.auth.base.HTTPAuthBase

update_request(request: threatq.core.lib.http.ClientRequest)

This method will be called so that, prior to sending the request, the plugin can make necessary modifications to the request object. Note that due to their ubiquity, query params and headers are automatically handled separately. If update_request (optionally) returns a mapping, its contents will be added to the template context used for rendering any templates params and headers might contain.

Multiple Auth
class threatq.core.lib.http.auth.multi.MultiAuth(*args, params=None, headers=None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Bases: threatq.core.lib.http.auth.base.HTTPAuthBase

Class used for aggregating multiple authentication methods in Feeds and Plugins.

async authenticate()

Should perform authentication or raise an exception in the event of failure. Will be called automatically prior to a request if in an unauthenticated state. It should save any ephemeral authentication info (e.g. an auth token) needed by update_request().

update_request(request: threatq.core.lib.http.ClientRequest)

This method will be called so that, prior to sending the request, the plugin can make necessary modifications to the request object. Note that due to their ubiquity, query params and headers are automatically handled separately. If update_request (optionally) returns a mapping, its contents will be added to the template context used for rendering any templates params and headers might contain.

Simple Auth
class threatq.core.lib.http.auth.simple.SimpleAuth(*args, params=None, headers=None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Bases: threatq.core.lib.http.auth.base.HTTPAuthBase

update_request(request: threatq.core.lib.http.ClientRequest)

This method will be called so that, prior to sending the request, the plugin can make necessary modifications to the request object. Note that due to their ubiquity, query params and headers are automatically handled separately. If update_request (optionally) returns a mapping, its contents will be added to the template context used for rendering any templates params and headers might contain.

Token Auth
class threatq.core.lib.http.auth.token.TokenAuth(*args, **kwargs)

Bases: threatq.core.lib.http.auth.base.HTTPAuthBase

Class used for enabling token-based Authentication in Feeds and Plugins.

This class implements a cache that is accessible by any instances of TokenAuth, as well as its subclasses, within a process.

Parameters
  • token_identifier_set (t.Iterable[str]) – a set of strings that uniquely identify a token.

  • reauthorize_error_codes (t.Container[int]) – a set of HTTP status codes that will trigger reauthentication.

  • get_token (t.Callable[[], t.Awaitable[t.Union[str, t.Mapping]]]) – coroutine that returns a token string or mapping with token and expires_in.

async authenticate()

Acquires a token for subsequent authorized requests.

If a corresponding, unexpired _Token is found in the cache, uses that. Otherwise, invokes get_token.

entry_points = ('token',)
async inspect_response(response: threatq.core.lib.http.ClientResponse)

Handles reauthenticating for certain HTTP response status codes.

Retries when a response’s HTTP status code is in the list of status codes contained in reauthorize_error_codes.

Parameters

response (http.ClientResponse) – The response object to be inspected (i.e., whose status code will be checked).

update_request(request: threatq.core.lib.http.ClientRequest)

If authenticated, updates the request object.

Parameters

request (http.ClientRequest) – The request object to be updated.

Returns

the corresponding token under token, if authenticated; None otherwise.

Return type

dict

ThreatQ API Auth
class threatq.core.lib.http.auth.threatq.ThreatQAPIAuth(*args, **kwargs)

Bases: threatq.core.lib.http.auth.token.TokenAuth

MISP Parsing: threatq.core.lib.misp

async threatq.core.lib.misp.parse_misp(data: Union[Dict, List, str], args=None)Dict

Parses MISP JSON data into threat object data.

Parameters
  • data (dict | list | str) – MISP JSON data to be parsed.

  • args (dict) – Additional information

Returns

Dictionary containing two keys: indicators and events. The value

of each key is a list of dictionaries that represent threat object data.

Return type

dict

NOTICE: Any updates made on this method need also to be reflected in the MISP Import CDF Readme file.

Base MISP Object Classes
class threatq.core.lib.misp.base.BaseObject

Bases: list

MISP parser objects base class.

classmethod get_value(path: str, data: Dict)Any

Returns the value by path.

Parameters
  • path (str) – Dictionary path from where to extract the data.

  • data (dict) – Dictionary from where to extract the data.

Returns

Value at path or None if not found.

Return type

t.Any

MISP Object Classes
class threatq.core.lib.misp.adversaries.Adversaries

Bases: threatq.core.lib.misp.base.BaseObject

MISP Adversaries parser class.

add(data: Union[Dict, str])

Adds an adversary to the list of parsed adversaries.

Parameters

data (dict | str) – Dictionary from which to extract the adversary data or the name of the adversary

class threatq.core.lib.misp.attachments.Attachments

Bases: threatq.core.lib.misp.base.BaseObject

MISP Attachments parser class.

add(data: Dict)

Adds an attachment to the list of parsed attachments.

Parameters

data (dict) – Dictionary from which to extract the attachment data.

class threatq.core.lib.misp.attributes.Attributes

Bases: threatq.core.lib.misp.base.BaseObject

MISP Attributes parser class.

add(name: str, value: Any, transform: Optional[Union[Dict, Callable]] = None, published_at: Optional[str] = None)

Adds an attribute to the list of parsed attributes.

Parameters
  • name (str) – Attribute name.

  • value (t.Any) – Value from which to extract the attribute value.

  • transform (t.Union[dict, t.Callable]) – Optional, dictionary or method used to transform the value.

  • published_at (str) – Optional, published_at timestamp.

bulk_add(data: Dict, attributes_map: List[Tuple], published_at: Optional[str] = None)

Adds multiple attributes to the list of parsed attributes by using an attribute mapping table.

Parameters
  • data (dict) – Dictionary from which to extract the attribute values.

  • attributes_map (list[tuple]) – List of tuples representing how the attributes should be parsed.

  • published_at (str) – Optional, published_at timestamp.

class threatq.core.lib.misp.events.Events(event, args=None)

Bases: threatq.core.lib.misp.base.BaseObject

MISP Events parser class.

parse()

Parses an event.

class threatq.core.lib.misp.generic.GenericObjects(transform: Callable = <function GenericObjects.<lambda>>, key: str = 'value')

Bases: threatq.core.lib.misp.base.BaseObject

MISP GenericObjects parser class.

Parameters
  • transform (t.Callable) – Optional, method used to transform the generic object value.

  • key (str) – Optional, generic object dictionary key name, defaults to value.

add(value: Any)

Adds an generic object to the list of parsed generic objects.

Parameters

value (t.Any) – Value of the generic object.

class threatq.core.lib.misp.indicators.Indicators(*args, obj_iocs: bool = False)

Bases: threatq.core.lib.misp.base.BaseObject

MISP Indicators parser class.

Parameters

obj_iocs (bool) – Optional, if True, uses the logic to parse MISP .response[].Event.Object[].Attribute[] indicators, otherwise by default uses the logic to parse MISP .response[].Event.Attribute[] indicators.

add(data: Dict, tlp: Optional[Dict] = None, additional_attrs: Optional[List] = None)

Adds an indicator to the list of parsed indicators.

Parameters
  • data (dict) – Data from which to extract the indicator data.

  • tlp (dict) – Optional, TLP data to be added to the parsed indicator.

  • additional_attrs (list) – Optional, list of additional attributes to be added to the parsed indicator.

Returns the list of parsed indicators with minimal data used only for relating them to other objects.

Returns

List of parsed indicators containing only type and value.

Return type

list[dict]

class threatq.core.lib.misp.signatures.Signatures(*args, logname: Optional[str] = None, **kwargs)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.misp.base.BaseObject

MISP Signatures parser class.

parse(type_: str, value: str)

Adds a signature to the list of parsed signatures.

Parameters
  • type (str) – type of signature to be parser.

  • value (str) – the value of the signature to be parser.

OLE2 Parsing: threatq.core.lib.ole2

threatq.core.lib.ole2.ATTACHMENT_DATA = '_3701'

Contains the contents of the file to be attached.

threatq.core.lib.ole2.ATTACHMENT_NAME = '_3707'

Contains the full filename and extension of the Attachment object.

threatq.core.lib.ole2.ATTACHMENT_TYPE = '_370E'

Contains a content-type MIME header.

threatq.core.lib.ole2.BODY = '_1000'

Contains message body text in plain text format.

threatq.core.lib.ole2.HEADER = '_007D'

Contains transport-specific message envelope information for email.

threatq.core.lib.ole2.MESSAGE_CLASS = '_001A'

Denotes the specific type of the Message object.

threatq.core.lib.ole2.RECIPIENT_EMAIL = '_39FE'

Contains the SMTP address of the Message object.

threatq.core.lib.ole2.SUBJECT = '_0037'

Contains the subject of the email message.

threatq.core.lib.ole2.parse_ole2_email(data: bytes)MutableMapping

Parse an OLE2.0 file to obtain data inside an email including subject, body, headers, and attachments.

Parameters

data (bytes) – OLE2 file

Returns

Dictionary of file stream mappings

Return type

dict

Object Pipeline Management: threatq.core.lib.pipeline

This module provides mix-in classes that aid in creating a “pipeline” of processors, each of which accepts input items from the previous processor, performs any desired processing on them, and forwards the results (and/or the original objects) onto the next processor. Each processor’s input queue is throttled to a defined number of waiting items, so that any slower processors exert backpressure on earlier ones, preventing memory ballooning.

The initial processor in a pipeline should be fed its input items using its put() asynchronous coroutine method. The pipeline should be constructed such that all desired processing is completed by the time a values reaches the end of the pipeline, as the last processor doesn’t forward its results at all (be aware that anywhere this document refers to forwarding to the next processor, this is the exception).

There are two distinct types of items that pass through a pipeline: messages and values. Values are the main inputs for processing, while messages are intended to communicate any required application state without interfering with that processing. See is_message() for distinguishing them (which should only be necessary with PipelineGeneratorSegment subclasses) and message() for creating them. As ordering of state transitions within the processing stream is often important, it is maintained at all times.

threatq.core.lib.pipeline.new_message(content: Any)

Create a pipeline message.

Parameters

content – The content of the message. Can be any object.

Returns

PipelineMessage

threatq.core.lib.pipeline.is_message(item)

Determined whether the item is a pipeline message.

Parameters

item – the item to be classified

Returns

True if the item is a pipeline message, otherwise False.

Return type

bool

class threatq.core.lib.pipeline.BasePipelineSegment(*args, intake_depth: Optional[int] = None, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

The base class for below classes. These arguments, methods, and attributes apply to those classes.

Parameters

intake_depth – the maximum size of the processor’s intake queue. Callers attempting to put items in the queue will block until space is available, similarly to asyncio.Queue semantics. Note that while it’s possible to set this <= 0 to indicate infinite depth, this is not recommended.

next_segment

If necessary, the structure of the pipeline can be changed dynamically by setting this attribute to a different “next” processor. Setting it to None effectively makes this processor the terminus of the pipeline.

intake_active

True/False indicating whether consumption from the intake queue is still active or has been stopped.

auto_received_value_logs

An iterable of (level, template-string) tuples. If logging is available (see above), then each string will be interpolated (using str.format(), PEP 3101-style) with the object available via the field name value. The result is then logged at the associated logging level. Setting this attribute to None disables automatic logging of received values.

auto_received_message_logs

As above, but for messages. The template field name is message.

auto_forward_value_logs, auto_forward_message_logs

As with the similar options above, but these specify logging for values being forwarded to the next processor.

full()

True if the queue is is at maximum capacity, otherwise false.

Returns

bool

async join(closed=True)

Block until all intake items have all been handled.

Parameters

closed – if True (the default), in addition to requiring that the queue is empty, the processor must also have been closed via stop_intake(). This state ensures that no further processing will occur.

async pre_intake()

This coroutine method is called prior to beginning intake. It is intended to be overridden in order to perform any required setup, and should block until it’s desired for processing to begin.

async put(item: Any)

Pass an item into the component for handling as described above. This is the intended intake point.

Parameters

item – the value or message to be processed

qsize()

Number of items in the queue.

Returns

int

stop_intake(*, clear: bool = False)

Permanently disallow any new values entering the intake queue. Calls to via put() will be ignored (silently, to avoid blocking the source).

Parameters

clear – if True, the intake queue will immediately be emptied so no further values will be processed.

class threatq.core.lib.pipeline.PipelineSegment(*args, **kwargs)

Bases: threatq.core.lib.pipeline.BasePipelineSegment

This mixin provides behavior where each received item is passed to an appropriate handler: handle_value() for values and handle_message() for messages. Results are then dispatched to the the next processor. These handlers are passed the incoming item as the only argument, and may be implemented differently for different effects. A handler can be:

  • A normal (synchronous) method. If it returns any value other than None, it will be forwarded.

  • An asynchronous coroutine. It will be awaited, and then the return value (except for None) forwarded.

  • An asynchronous generator. All values it yields (including None) will be forwarded.

See below for more details. Note that a handler must complete before processing will begin for the next received item. This is necessary to assure ordering and provide throttling as described above.

Note

There’s no restriction on the return values themselves. A value handler is free to return/yield a message and vice versa (and there are legitimate cases for doing so).

add_intake_done_callback(func: Callable)

This method registers a callback function to be called when intake has been completed/stopped. It is attached to an asyncio.Future and will be called as described in that documentation. If intake has already stopped when this method is called, the callback will be scheduled to be called immediately (in this case, it will be passed a None instead of the Future).

Parameters

func – the callback function

handle_message(message: threatq.core.lib.pipeline.PipelineMessage)

Handle a received message. It can be overridden to provide specific handling, but the common case is that it’s desirable to pass messages through to the next processor, which is what the default implementation does.

Note

As above, any exceptions raised are ignored.

Parameters

message – the received message

Returns

Any object.

handle_value(value: Any)

Handle a received value. The default implementation simply returns the original value so that it can be transparently forwarded, but it is normally overridden.

Note

Any exceptions raised are ignored, so must be appropriately and fully handled within this method.

Parameters

value – the received value

Returns

Any object.

class threatq.core.lib.pipeline.PipelineGeneratorSegment(*args, **kwargs)

Bases: threatq.core.lib.pipeline.BasePipelineSegment

Provides pipeline behavior appropriate for processors implemented as asynchronous generators. It provides the items() asynchronous generator, which simply yields received items. The intended usage is for a subclass to override items(), wrapping it via super() to perform any additional processing. This is especially useful when the desire is to lay a stack of asynchronous generators on top of the input generator. This wrapper generator is automatically consumed and all items it yields are forwarded.

Warning

Under this paradigm, preservation of ordering is ensured by yielding both values and messages through the asynchronous generator stack. It’s important to be aware that the wrapper(s) must handle them appropriately, as well. For most uses, simply yielding the original message is the right thing to do.

Warning

It is critical that any wrapping asynchronous generators yield a received item through unchanged if the item is threatq.core.lib.pipeline.ITEM_DONE or is threatq.core.lib.pipeline.PIPELINE_CLOSE.

items()

Asynchronous generator yielding items as they appear in the intake queue. It should be wrapped with an override utilizing super() to make it useful.

Note

To reiterate: this asynchronous generator, or an override of it, will be consumed automatically to forward yielded values to the next processor.

Warning

It’s critical to ensure that every member of the generator stack handles any exceptions internally. By the nature of generators, any unhandled exception would cause it to exit, breaking the whole stack, and the entire rest of the pipeline, as a result.

logging_wrapper(source, value_logs: Iterable[Tuple[int, str]], message_logs: Iterable[Tuple[int, str]])

This is a convenience method for creating an asynchronous generator that logs items passing through it.

Parameters
Yields

the same sequence of values consumed from its source.

threatq.core.lib.pipeline.ITEM_DONE pipeline message

This is a special message object that acts as an internal sentinel for PipelineGeneratorSegment. Please see the related warning above.

Subprocess Execution: threatq.core.lib.subprocess

The threatq.core.lib.subprocess module provides functions and utilities related to running and communicating with external executables.

threatq.core.lib.subprocess.ensure_interpreter_path()

Ensures that the directory containing the active Python interpreter is in the command search path. This is desirable in most contexts, as related tools are frequently installed in the same path. This is especially true for virtualenvs, which is the common case in ThreatQ environments. This should be called at application startup.

async threatq.core.lib.subprocess.exec_subprocess(cmd: List[str], *, timeout=None, kill_timeout=1, **kwargs)

Run a command asynchronously, awaiting its completion before returning. The interface and return value are as exec_subprocess_sync() with the following additional option:

Parameters

kill_timeout (int) – (seconds) When used with timeout, initial timeout will signal the command to exit cleanly. Exit will be awaited for the kill_timeout duration, at which point the command will be forcibly killed. Default: 1s

Returns

(subprocess.CompletedProcess)

Raises

asyncio.TimeoutError – Process was terminated due to timeout.

Note

The input argument is currently not supported.

threatq.core.lib.subprocess.exec_subprocess_background(cmd: List[str], **kwargs)

Run a command asynchronously. Parameters are as create_subprocess_exec(), except that the process’s stdin is set to subprocess.DEVNULL by default.

Returns

(asyncio.subprocess.Process)

threatq.core.lib.subprocess.exec_subprocess_sync(cmd: List[str], **kwargs)

Run a command synchronously. The interface and return value are as subprocess.run() with the enhancements included in Python 3.7 (i.e. the text and capture_output parameters), except with the following changes and rationales:

  • Optional keyword cannot be passed positionally (clarity & prevent unintended external info leaks)

  • The shell argument is prohibited (security)

  • stdin defaults to :attr:subprocess.DEVNULL when input is not passed (passing through stdin by default is error-prone and rarely desired in ThreatQ environments).

  • universal_newlines is prohibited - use its alias, text, instead (force consistency)

  • check defaults to True (prevent hiding of errors by default)

  • text defaults to True (we use unicode as pervasively as possible)

STIX Parsing: threatq.core.lib.stix

This module contains logic that maps STIX data to ThreatQ appliance ThreatObjects. The parse_stix() function offers a top level entry point into STIX parsing. From there, if opinionated ThreatObject parsing is desired, the following logic is applied:

  • a BaseParser instance is created with the passed in data.

  • The BaseParser instance handles preformatting the incoming STIX data as necessary.

  • parse() is then called, which will asynchronously call parse_objects() on each subclass implementing BaseMapping instance to parse out ThreatObjects from the STIX data.

  • Each BaseMapping class implements parse_object() such that each object parsed includes object attributes and any object relations.

async threatq.core.lib.stix.parse_stix(stix_data: Union[str, Iterable[str], dict], ctx: threatq.core.lib.asphalt.Context, *, parse: bool = True, logname: Optional[str] = None)MutableMapping

Parses STIX data.

Parameters
  • stix_data (str | Iterable[str] | dict) – String, iterable of strings, or dict STIX formatted data

  • ctx (Context) – Context instance needed for custom object models.

  • parse (bool) – Optional, defaults to True. If True, the STIX data will be parsed and formatted as ThreatQ ThreatObjects before being returned. If False, the STIX data will immediately be returned after being parsed to JSON.

  • logname (str) – Parser log name.

Returns

Dictionary of parsed STIX data.

Return type

MutableMapping

Base STIX Parsing Classes
class threatq.core.lib.stix.base.BaseMapping(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin, abc.ABC

Base class for STIX Mapping classes.

Parameters

ctx (Context) – Context instance.

create_parse_results(obj_type: str, *obj_data: dict, inter_relate=False, **kwargs)

Create BaseParseResult instances for each object dictionary passed in via obj_data. Object dictionaries will be created as ThreatObjects via create_threat_object() and inter-related if specified via the inter_relate argument.

Parameters
  • obj_type (str) – Type of the ThreatObject to create corresponding to a key in Models.

  • obj_data (dict) – Dictionary of object data usable by from_data().

  • inter_relate (bool) – Optional, defaults to False. If true and an iterable of values is supplied to obj_data, those objects will be inter-related upon creation as BaseParseResult instances.

  • kwargs – Additional kwargs that should be passed to the BaseParseResult constructor call for each object in obj_data.

Returns

Asynchronous Generator of BaseParseResult instances created from obj_data.

Return type

AsyncIterable[BaseParseResult]

create_threat_object(obj_type: str, obj_data: dict)threatq.core.models.base.BaseThreatObject

Create a BaseThreatObject instance of type obj_type from obj_data.

Parameters
  • obj_type (str) – Type of the ThreatObject to create corresponding to a key in Models.

  • obj_data (dict) – Dictionary of object data usable by from_data().

Returns

ThreatObject instance

Return type

BaseThreatObject

obj_type = None

custom object collection names are not plural.

Type

Corresponds to a key in Models. Note

abstract parse_object(stix_obj)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parse a STIX object and return an async generator of resulting objects.

Parameters

stix_obj (Any) – STIX object to build ThreatObject dictionary mappings for.

Yields

BaseParseResult – Parsed ThreatObjects wrapped into BaseParseResult instances.

parse_objects(stix_objects: Iterable)

This method intakes STIX data as an iterable and formats the STIX data objects as mappings usable by from_data().

Parameters

stix_objects (Iterable) – STIX objects to parse

Yields

BaseParseResult – BaseParseResult instances.

stix_object_types = ()

Tuple of type’s of the STIX objects this Mapping instance is concerned with parsing.

class threatq.core.lib.stix.base.BaseParser(opinionated: bool, logname: str, ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin, abc.ABC

Base Controller class for STIX parsing. Holds the final result set of parsed STIX Objects in a BaseResultSet instance and triggers individual BaseMapping instances.

Note

BaseParser’s must be initialized via the create() method to allow for any asynchronous pre-processing that may be necessary.

Parameters
  • opinionated (bool) – If True, the STIX data will be parsed and formatted as ThreatQ ThreatObjects before being returned. If False, the STIX data will immediately be returned as JSON after being parsed.

  • logname (str) – Logname of the instance

  • ctx (Context) – Context instance.

opinionated

Whether data should be parsed as ThreatQ ThreatObjects or returned as JSON.

Type

bool

object_map

Dictionary mapping of STIX objects to to parse.

Type

dict

mappers

Mapping of BaseMapping instances applicable for the Parser.

Type

MutableMapping

result

BaseResultSet instance defined by the result_set_cls attribute.

Type

BaseResultSet

async classmethod create(stix_data: Any, opinionated: bool = True, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None)

Asynchronous constructor for a parser instance. Returns a new instance initialized with stix_data. Creates the BaseSTIXParser instance and calls initialize_data() and init_mappers().

Parameters
  • stix_data (Any) – STIX data (see initialize_data() for structure)

  • opinionated (bool) – Optional, defaults to True. If True, the STIX data will be parsed and formatted as ThreatQ ThreatObjects before being returned. If False, the STIX data will immediately be returned as JSON after being parsed.

  • logname (str) – Logname of the instance.

  • ctx (Context) – Context instance.

Returns

Parser instance

Return type

BaseParser

init_mappers()

Import and initialize all appropriate BaseMapping subclasses that are needed for this BaseParser.

abstract async initialize_data(stix_data: Any)

Initialize instance with passed STIX data. Usually called via create() rather than directly.

Parameters

stix_data – STIX data (data structure depends upon STIX version)

abstract property mapping_cls

Subclasses must specify the base class of their BaseMapping implementations.

async parse()

Parsing each STIX object in passed data according to its corresponding mapper type and return the results of report().

Returns

The result of report().

Return type

MutableMapping

abstract property result_set_cls

Subclasses must specify the class of their BaseResultSet implementation.

abstract property version

Subclasses must specify their STIX version.

Base STIX Result Classes
class threatq.core.lib.stix.results.BaseParseResult(obj: Optional[threatq.core.models.base.BaseThreatObject] = None, ids: Optional[List[str]] = None, inline_relations: Optional[List[threatq.core.models.base.BaseThreatObject]] = None, idref_relations: Optional[List[str]] = None, **kwargs)

Represents a parsed STIX object. Any objects passed in as inline_relations will be constructed as BaseParseResult’s themselves before being related via relate().

Parameters
  • obj (BaseThreatObject) – Optional, ThreatObject Model instance

  • inline_relations (List[BaseThreatObject]) – Optional, list of inline objects this object is related to.

  • idref_relations (List[str]) – Optional, list of STIX IDs for STIX objects to relate to this BaseParseResult.

  • kwargs (MutableMapping) – Further keyword arguments to apply to any specified inline_relations when initializing them as BaseParseResult’s.

obj

Object as represented by a Model instance

Type

BaseThreatObject

inline_relations

Inline related BaseParseResult’s

Type

List[BaseParseResult]

relation_ids

Set of STIX IDs for STIX objects to relate to this BaseParseResult

Type

Set[str]

add_inline_relations(*inline_relations: Iterable[Union[threatq.core.lib.stix.results.BaseParseResult, threatq.core.models.base.BaseThreatObject]], ids: Optional[List[str]] = None, **kwargs)

Add inline_relations to this BaseParseResult.

Parameters
  • inline_relations (BaseThreatObject | BaseParseResult) – BaseThreatObjects or BaseParseResults to inline relate to this BaseParseResult. A BaseParseResult is created for each provided BaseThreatObject.

  • ids (Iterable[str]) – Optional, iterable of STIX Reference ID strings to apply to any BaseParseResults created from BaseThreatObjects.

  • kwargs (MutableMapping) – Optional, further keyword arguments to apply to any BaseParseResults created from BaseThreatObjects.

add_relation_id(*idref: Iterable[str])

Adds a idref string to the relation_ids set. Can be passed either a single string, or a list of strings.

Parameters

idref (str | List[str]) – A single idref string or a list of idref strings

add_stix_id(*ids: Iterable[str])

Add each STIX Reference ID attribute in ids to ids.

Parameters

ids (Iterable[str]) – Iterable of STIX Reference ID strings.

relate(*others: threatq.core.lib.stix.results.BaseParseResult)

Relate this BaseParseResult to other BaseParseResult’s.

Parameters

others (BaseParseResult) – Other BaseParseResult instances to relate to this one.

class threatq.core.lib.stix.results.BaseResultSet(**kwargs)

Manages a set of BaseParseResult’s, allowing for deduplication of BaseParseResult’s on the fly. Further, the BaseResultSet is responsible for resolving any relations in the STIX data and formatting the final result as a MutableMapping.

results

List of BaseParseResult objects

Type

List

get_objects_by_stix_id(id_)List[threatq.core.lib.stix.results.BaseParseResult]

Get all objects in results that have STIX ID id_.

Parameters

id (str) – STIX ID.

Returns

List of BaseParseResults’s.

Return type

List

get_unresolved_relations()

Gets a list of any relation_ids on any ParseResult objects in results.

Returns

List of unresolved relation ids.

Return type

List[str]

merge_add(obj: threatq.core.lib.stix.results.BaseParseResult)

Add a result dictionary or BaseParseResult to results. If the BaseParseResult is already present in results, the matching BaseParseResult’s will be merged.

Parameters

obj (BaseParseResult) – Object to add to results

report()

Get results as a MutableMapping.

Returns

Result set data formatted like {model.api_collection: [{...}, ...], ...}

Return type

MutableMapping

STIX 1 Parsing: threatq.core.lib.stix.stix1
Base STIX 1 Parsing Classes
class threatq.core.lib.stix.stix1.base.Mapping(ctx: threatq.core.lib.asphalt.Context, util: threatq.core.lib.stix.stix1.utils.CyboxSTIX)

Bases: threatq.core.lib.stix.base.BaseMapping

Base class for STIX 1 Mapping classes.

Parameters

util (CyboxSTIX) – CyboxSTIX instance.

cyboxSTIX

CyboxSTIX instance..

Type

CyboxSTIX

build_attributes(data: mixbox.entities.Entity, attr_name: str, attr_paths: List[Tuple[Union[str, Callable], ]], stix_object: threatq.core.lib.stix.stix1.utils.STIXObject, published_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None)

Build out attributes that are embedded within a STIX object. A published at value is checked for on the top level of each attribute being built. If present, this value is applied as the attribute’s published_at, otherwise published_at is used.

Parameters
  • data (Entity) – object from which attributes are being built

  • attr_name (str) – Name of the attribute being built

  • attr_paths (List[Tuple[str, ..]]) –

    List of attribute path tuples. These attribute paths can be simple strings, or callable functions. A callback function reference within an attribute path tuple will be called with the following arguments:

    • obj: STIX data object currently being processed within _loop_attrs()

    • parent_objs: tuple of parent objects of the STIX data object

  • stix_object (STIXObject) – STIX Object to build attributes for

  • published_at (str | int | Arrow) – Published at date

Returns

List of attribute dictionaries

Return type

list

get_stix_ids(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject, object_: Optional[mixbox.entities.Entity] = None)List[str]

This method parses STIX Reference ID’s for a given stix_obj as Attribute’s.

Parameters
  • stix_obj (STIXObject) – STIX Object to parse STIX Reference ID attributes for.

  • object (Entity) – Given the complexities of STIX Objects, it may be necessary override the default object, (found at path stix_obj.object), used to parse for attributes from. In this case, the object_ keyword argument can be utilized to override attribute parsing for stix_obj.object by passing any STIX Entity, (likely, an object further embedded under stix_obj.object).

Returns

List of STIX Reference ID strings

Return type

list

parse_attributes(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject, published_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None)List[dict]

This method parses ThreatObject attributes from a given STIXObject based on the Attribute name/path mapping defined in _attribute_key_map, returning a list of attribute dictionaries. STIX Reference ID attributes are initially parsed and then extended with attributes as defined in _attribute_key_map.

Parameters
  • stix_obj (STIXObject) – STIX Object to parse attributes for

  • published_at (str | int | Arrow) – Published at date

Returns

List of attribute dictionaries

Return type

list

abstract parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX object and return an async generator of parsed ThreatObject dictionaries.

Parameters

stix_obj (STIXObject) – STIXObject instance to build ThreatObject dictionary mappings for.

Yields

ParseResult – Parsed ThreatObjects wrapped into ParseResult instances.

parse_relations(stix_entity: mixbox.entities.Entity)List[str]

This method intakes STIX data and creates a list of id references that is usable during ParseResult initialization. In order for this method to be utilized, a child class must implement the _relation_key_list class attribute.

Parameters

stix_entity (Entity) – STIX entity to get relations for

Returns

list of idref relations passed to ParseResult’s idref_relations parameter.

Return type

List[str]

class threatq.core.lib.stix.stix1.results.ParseResult(obj: Optional[threatq.core.models.base.BaseThreatObject] = None, ids: Optional[List[str]] = None, inline_relations: Optional[List[threatq.core.models.base.BaseThreatObject]] = None, idref_relations: Optional[List[str]] = None, **kwargs)

Bases: threatq.core.lib.stix.results.BaseParseResult

Represents a parsed STIX object.

add_attributes(*attributes: threatq.core.models.common_fields.Attribute)

Add Attribute objects to ParseResult.obj’s attributes.

Parameters

attributes (Iterable[Attribute]) – Attribute dictionaries to add

STIX 1 Preprocessing
class threatq.core.lib.stix.stix1.preprocessing.XMLPreprocesser

Bases: object

Class which handles any preprocessing of a STIX XML file required for parse(). This preprocessor injects controlled structure XPaths for TLP marking nodes that are missing a controlled structure.

async static get_object_map(stix_containers: Iterable[stixmarx.container.MarkingContainer])Dict[str, threatq.core.lib.stix.stix1.utils.STIXObject]

Generate an object map from a list of MarkingContainer’s that maps object IDs to STIXObject instances.

Parameters

stix_containers (Iterable[MarkingContainer]) – Iterable of stix containers

Returns

Dictionary mapping object IDs to STIXObject instances.

Return type

dict

classmethod preprocess_xml(ctx: threatq.core.lib.asphalt.Context, xml_content: str)Coroutine

Preprocess XML for STIX parsing. An xml tree object is created from an encoded XML string and is passed through various preprocessing methods before being parsed by stixmarx and returned as a MarkingContainer.

Parameters
  • ctx (Context) – context instance

  • xml_content (str) – String of XML content

Returns

Parsed STIX MarkingContainer

Return type

MarkingContainer

STIX 1 Mappings
class threatq.core.lib.stix.stix1.campaigns.Campaigns(ctx: threatq.core.lib.asphalt.Context, util: threatq.core.lib.stix.stix1.utils.CyboxSTIX)

Bases: threatq.core.lib.stix.stix1.base.Mapping

Campaign STIX mapping class.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX Campaign object.

Parameters

stix_obj (STIXObject) – STIXObject instance to build Campaign dictionary mappings for.

Yields

ParseResult – Parsed Campaign ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix1.identities.CiqIdentityMapping

Bases: object

CIQIdentity3_0Instance STIX extension mapping class.

This mapping is a special snowflake: it is not a subclass of Mapping and is not meant to be executed as an independent top-level mapper by Parser. This mapping is meant to enrich a subclass of Mapping’s parsed object with additional attributes derived from that mapper’s identities_key_map attribute, which provides paths to where a CIQIdentity3_0Instance object may be.

classmethod parse_attributes(parent_mapping: threatq.core.lib.stix.stix1.base.Mapping, parent_stix_object: threatq.core.lib.stix.stix1.utils.STIXObject, published_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None)Generator

Parses out attributes from a collection of CIQIdentity3_0Instance objects. The provided parent_mapping should have an identities_key_map attribute, which is a dict where:

  • the key is a str label to be prepended to the derived attribute names from the keys of CiqIdentityMapping._attribute_key_map

  • the value is a list of paths where each path is a tuple containing str or callable items; each list provides a path to where a CIQIdentity3_0Instance object may be for the associated key

In other words: the structure of identities_key_map is the same as _attribute_key_map but used in a different context.

The published_at for each parsed attribute is set to the provided published_at. Usually, this should be the derived published_at value of parent_stix_object.

The TLP for each parsed attribute of a CIQIdentity3_0Instance object is the same as the TLP marked on the CIQIdentity3_0Instance object itself. This is required due to a bug in the python-stix library.

Parameters
  • parent_mapping (Mapping) – STIX mapping that contains the identities_key_map attribute.

  • parent_stix_object (STIXObject) – STIX object whose object is of the type handled by parent_mapping and that may contain CIQIdentity3_0Instance objects.

  • published_at (str | int | Arrow) – Published at value of the parent_stix_object to be applied to all parsed attributes.

Returns

Each iteration yields a dict with keys: name, value, tlp, and published_at.

Return type

Generator

class threatq.core.lib.stix.stix1.courses_of_action.CoursesOfAction(ctx: threatq.core.lib.asphalt.Context, util: threatq.core.lib.stix.stix1.utils.CyboxSTIX)

Bases: threatq.core.lib.stix.stix1.base.Mapping

Course of Action STIX mapping class.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX Course of Action object.

Parameters

stix_obj (STIXObject) – STIXObject instance to build CourseOfAction dictionary mappings for.

Yields

ParseResult – Parsed Course of Action ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix1.exploit_targets.ExploitTargets(ctx: threatq.core.lib.asphalt.Context, util: threatq.core.lib.stix.stix1.utils.CyboxSTIX)

Bases: threatq.core.lib.stix.stix1.base.Mapping

Exploit Target STIX mapping class.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX Exploit Target object.

Parameters

stix_obj (STIXObject) – STIXObject instance to build Exploit Target dictionary mappings for.

Yields

ParseResult – Parsed Exploit Target ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix1.incidents.Incidents(ctx: threatq.core.lib.asphalt.Context, util: threatq.core.lib.stix.stix1.utils.CyboxSTIX)

Bases: threatq.core.lib.stix.stix1.base.Mapping

Incident STIX mapping class.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX Incident object.

Parameters

stix_obj (STIXObject) – STIXObject instance to build Incident dictionary mappings for.

Yields

ParseResult – Parsed Incident ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix1.indicators.Indicators(ctx, util)

Bases: threatq.core.lib.stix.stix1.base.Mapping

Indicator STIX mapping class. Parses STIX Indicators and Observables.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Build ThreatObject indicator mappings for an observable, related object, or cybox object embedded within a STIXPackage. Indicator ThreatObject dictionaries will be built as long as a definition exists for them within _indicator_type_map.

Parameters

stix_obj (STIXObject) – STIXObject instance to build indicator dictionary mappings for.

Yields

ParseResult – Parsed Indicator ThreatObjects wrapped into ParseResult instances.

async update_indicator_observables(stix_objs: Iterable[threatq.core.lib.stix.stix1.utils.STIXObject], result_set)

For each Indicator, gather observables that the Indicator references, (which have already been parsed in _parse_observables()), and add attributes to them as needed for the indicator in question.

:param stix_objs Iterable[STIXObject]: Iterable of STIXObject’s

threatq.core.lib.stix.stix1.indicators.format_registry_key(object_: cybox.objects.win_registry_key_object.WinRegistryKey, parent_objects)List[str]

Callback for generating Registry Key values by concatenating hive values with key values.

Parameters
  • object (WinExecutableFile) – WinExecutableFile object containing the Registry Key

  • parent_objects – Iterable of parent_objects

Returns

List of string values formatted as {hive}\{key}

Return type

List[str]

threatq.core.lib.stix.stix1.indicators.get_address_type(properties: cybox.objects.address_object.Address, value: str)Optional[str]

Callback for determining the type_name for Address objects. Returns IP Address if value is an IPv4 Address, IPv6 Address if value is an IPv6 Address, or None if value is not a valid IP Address.

Parameters
  • properties (ObjectProperties) – Object property associated with

  • provided IP address (the) –

  • value (str) – String to determine the IP Address version from

Returns

ThreatQ indicator type

Return type

Optional(str)

threatq.core.lib.stix.stix1.indicators.get_file_hash_type(properties: cybox.common.object_properties.ObjectProperties, value: str)Optional[str]

Callback for determining the type_name for File and WinExecutableFile file hashes that have a simple_hash_value property.

Parameters
  • properties (ObjectProperties) – Object properties to search for the file hash value within

  • value (str) – Hash string to search for in order to determine the indicator type

Returns

ThreatQ indicator type

Return type

Optional(str)

class threatq.core.lib.stix.stix1.threat_actors.ThreatActors(ctx: threatq.core.lib.asphalt.Context, util: threatq.core.lib.stix.stix1.utils.CyboxSTIX)

Bases: threatq.core.lib.stix.stix1.base.Mapping

Threat Actor STIX mapping class.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX Threat Actor object. Returns a list of parsed adversary dictionaries since multiple adversary names or organizations can be parsed from a CIQIdentity3_0Instance identity object.

Parameters

stix_obj (STIXObject) – STIXObject instance to build Adversary dictionary mappings for.

Yields

ParseResult – Parsed Adversary ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix1.ttps.TTPs(ctx, util)

Bases: threatq.core.lib.stix.stix1.base.Mapping

TTP STIX mapping class.

parse_object(stix_obj: threatq.core.lib.stix.stix1.utils.STIXObject)AsyncIterable[threatq.core.lib.stix.stix1.results.ParseResult]

Parses a STIX TTP object.

Parameters

stix_obj (STIXObject) – STIXObject instance to build TTP dictionary mappings for.

Yields

ParseResult – Parsed TTP ThreatObjects wrapped into ParseResult instances.

STIX 1 Utilities
class threatq.core.lib.stix.stix1.utils.STIXObject(obj: mixbox.entities.Entity, container: stixmarx.container.MarkingContainer, parent_objects: Optional[Set[mixbox.entities.Entity]] = None)

Bases: object

STIX Object to be parsed.

Parameters
  • obj (Entity) – STIX Entity

  • container (MarkingContainer) – Container that contains obj

  • parent_objects (Set[Entity]) – Set of STIX Entities that are parent objects of obj

object

STIX Entity

Type

Entity

container

Container that contains object

Type

MarkingContainer

parent_objects

Set of STIX Entities that are parent objects of obj

Type

Set[Entity]

add_parent_objects(objs: Union[mixbox.entities.Entity, Iterable[mixbox.entities.Entity]])

Add each Entity to parent_objects.

Parameters

objs (Entity | Iterable[Entity]) – objects to add to parent_objects.

class threatq.core.lib.stix.stix1.utils.CyboxSTIX(ctx: threatq.core.lib.asphalt.Context, object_map: dict, logname: Optional[str] = None)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin

Utility class for performing various functions on STIX/Cybox objects.

Parameters
  • ctx (Context) – Context instance.

  • object_map (dict) – Dictionary mapping IDs to STIXObject instances

  • logname (Optional(str)) – CyboxSTIX log name.

object_map

Dictionary mapping IDs to STIXObject instances

Type

dict

build_attribute(attr_name: str, attr_value: Any, container: stixmarx.container.MarkingContainer, *, published_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None, parent_obj: Optional[Any] = None)dict

Returns an attribute dictionary with name equal to attr_name, value equal to attr_value, published_at equal to published_at, and tlp equal to the result of parse_tlp().

Parameters
  • attr_name (str) – Attribute name

  • attr_value (Any) – Attribute value

  • container (MarkingContainer) – STIX data container

  • published_at (str | int | Arrow) – Attribute published at

  • parent_obj (Any) – STIX/Cybox object that contains attr_value, (e.g. UnsignedLong).

Returns

Attribute dictionary.

Return type

dict

find(idref: str)Optional[threatq.core.lib.stix.stix1.utils.STIXObject]

Find a STIX object in a STIXPackage by idref.

Parameters

idref (str) – Id of the STIX object to find

Returns

STIX/Cybox object

Return type

Any

get_kill_chain_phase(kill_chain_id: str, phase_id: str)Optional[stix.common.kill_chains.KillChainPhase]

Finds and returns a KillChainPhase based on the idref of a stixCommon:Kill_Chain (kill_chain_id) and the idref of the target phase contained in the found stixCommon:Kill_Chain (phase_id). If a matching stixCommon:Kill_Chain cannot be found, or a matching stixCommon:Kill_Chain is found but does not contain a matching stixCommon:Kill_Chain_Phase, then None is returned and the specific failed lookup attempt is logged as a WARNING message.

Parameters
  • kill_chain_id (str) – Idref for a stixCommon:Kill_Chain block

  • phase_id (str) – Idref of the target phase within the stixCommon:Kill_Chain with idref kill_chain_id

Returns

KillChainPhase object with phase_id

Return type

Optional(KillChainPhase)

get_max_tlp(tlps: list)str

Get max TLP from a list of TLP string names.

Parameters

tlps (List[str]) – list of TLP names

Returns

Name of the TLP value with the most restrictive level

Return type

str

static get_min_timestamp(timestamps: list)str

Get earliest timestamp from a list of datetime strings or datetime objects.

Parameters

timestamps (list) – list of timestamps

Returns

Formatted arrow object for use as a published_at value.

Return type

str

get_published_at(data)Optional[str]

This method parses the timestamp value from a STIX object or list of STIX objects. If a list of STIX objects is provided, the earliest timestamp of all of the STIX objects’ timestamps is returned.

Parameters

data – STIX object or list of STIX objects

Returns

Formatted arrow object for use as a published_at value.

Return type

Arrow

kill_chain_callback(obj: stix.common.kill_chains.KillChainPhase, parent_objs: Tuple[mixbox.entities.Entity, ])Optional[stix.common.kill_chains.KillChainPhase]

Callback used by _get_values() to find a KillChainPhase object. Returns obj if obj has a name, otherwise returns the result of get_kill_chain_phase().

Parameters
Returns

KillChainPhase found for object

Return type

Optional(KillChainPhase)

parse_tlp(data: Any, container: Optional[stixmarx.container.MarkingContainer])str

Returns the TLP name for the most restrictive TLP marking that applies to the provided STIX/Cybox object.

Parameters
  • data (Any) – STIX/Cybox object

  • container (MarkingContainer) – STIX data container

Returns

TLP name if found

Return type

str

Traverse dict_ and search for value within dict_’s values.

Parameters
  • dict (dict) – dictionary to search

  • value – value to search dict_’s values for

  • exclude_keys (List[str, Callable]) – list of key values or callbacks that should return True for a key that should be excluded from the search

Returns

True if value is found, False otherwise.

Return type

bool

STIX 1 ID Namespaces

In order to maintain uniqueness and avoid ID collisions, STIX v1.2 provides a convention for formatting namespaces. This also requires that some of these are loaded.

Our STIX parser automatically loads the following namespaces:

STIX 2 Parsing: threatq.core.lib.stix.stix2
Base STIX 2 Parsing Classes
class threatq.core.lib.stix.stix2.base.Mapping(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.base.BaseMapping

Base class for STIX2 Mapping classes.

classmethod get_external_references(stix_object: dict, published_at: Union[str, int, arrow.arrow.Arrow])

This method parses External References attributes from a given STIX2 object and returns a list of corresponding attribute dictionaries.

Parameters
  • stix_object (dict) – STIX2 Object to parse for external references

  • (class (published_at) – ~arrow.arrow.Arrow | str | int): Published at date

Returns

List of external references attribute dictionaries

Return type

list

classmethod get_kill_chain_phases(stix_obj: dict, published_at: Union[str, int, arrow.arrow.Arrow])

This method parses Kill Chain Phase attributes from a given STIX2 object and returns a list of corresponding attribute dictionaries.

Parameters
  • stix_obj (dict) – STIX2 Object to parse for Kill Chain Phases

  • (class (published_at) – ~arrow.arrow.Arrow | str | int): Published at date

Returns

List of Tactic attribute dictionaries

Return type

list

parse_attributes(stix_object: dict, published_at: Union[str, int, arrow.arrow.Arrow])

This method parses ThreatObject attributes from a given STIX2 object based on _attribute_key_map, returning a list of attribute dictionaries.

Parameters
  • stix_object (dict) – STIX2 Object to parse attributes from

  • (class (published_at) – ~arrow.arrow.Arrow | str | int): Published at date

Returns

List of attribute dictionaries

Return type

list

abstract parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 object and returns an iterable of BaseParseResult’s.

Parameters

stix_obj (dict) – STIX2 object to build BaseParseResult’s for.

Yields

BaseParseResult – Parsed ThreatObjects wrapped into BaseParseResult instances.

class threatq.core.lib.stix.stix2.results.MarkingDefinitionParseResult(definition_data: Optional[dict] = None, obj=None, inline_relations: Optional[List[threatq.core.models.base.BaseThreatObject]] = None, idref_relations: Optional[List[str]] = None, ids: Optional[List[str]] = None)

Bases: threatq.core.lib.stix.results.BaseParseResult

class threatq.core.lib.stix.stix2.results.RelationParseResult(source_ref: str, target_ref: str, obj=None, inline_relations: Optional[List[threatq.core.models.base.BaseThreatObject]] = None, idref_relations: Optional[List[str]] = None, ids: Optional[List[str]] = None)

Bases: threatq.core.lib.stix.results.BaseParseResult

class threatq.core.lib.stix.stix2.results.ResultSet(**kwargs)

Bases: threatq.core.lib.stix.results.BaseResultSet

Manages a set of ParseResult’s.

STIX 2 Mappings
class threatq.core.lib.stix.stix2.attack_patterns.AttackPatterns(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Attack Patterns STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Attack Pattern object.

Parameters

stix_obj (dict) – STIX2 object to build Attack Pattern dictionary mappings for.

Yields

ParseResult – Parsed Attack Pattern ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.campaigns.Campaigns(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Campaign STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Campaign object.

Parameters

stix_obj (dict) – STIX2 object to build Campaign dictionary mappings for.

Yields

ParseResult – Parsed Campaign ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.courses_of_action.CoursesOfAction(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Course of Action STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Course of Action object.

Parameters

stix_obj (dict) – STIX2 object to build Courses of Action dictionary mappings for.

Yields

ParseResult – Parsed Courses of Action ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.identities.Identity(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Identity STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Identity object.

Parameters

stix_obj (dict) – STIX2 object to build Identity dictionary mappings for.

Yields

ParseResult – Parsed Identity ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.indicator_patterns.IndicatorPatterns(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

STIX 2 Indicator Patterns mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX 2 Indicator object as a ThreatQ Signature of type STIX Indicator Pattern and attempts to extrapolate observable objects from the pattern field in order to parse ThreatQ objects from them using ObservedDataParser. Any objects parsed via ObservedDataParser are related back to the parsed ThreatQ Signature.

Parameters

stix_obj (dict) – STIX2 object to build Signature dictionary mappings for.

Yields

ParseResult

Parsed Signature, Indicator, and/or Event

ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.intrusion_sets.IntrusionSet(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Intrusion Set STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX 2 Intrusion Set object. MITRE ATT&CK Intrusion Sets are parsed as ThreatQ Adversaries, while all other STIX 2 Intrusion Sets are parsed as ThreatQ Intrusion Sets.

Parameters

stix_obj (dict) – STIX2 object to build Intrusion Set/Adversary dictionary mappings for.

Yields

ParseResult – Parsed Intrusion Set/Adversary ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.malware.Malware(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Malware STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Malware object.

Parameters

stix_obj (dict) – STIX2 object to build Malware dictionary mappings for.

Yields

ParseResult – Parsed Malware ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.marking_definitions.MarkingDefinition(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Marking Definition STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.stix2.results.MarkingDefinitionParseResult]

Parses a STIX 2 Marking Definition object.

Parameters

stix_obj (dict) – STIX 2 object to build Marking Definition dictionary mappings for.

Yields

MarkingDefinitionParseResult – Parsed MarkingDefinition ThreatObjects wrapped into MarkingDefinitionParseResult instances.

class threatq.core.lib.stix.stix2.observed_data.ObservedData(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Observed Data STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX 2 Observed Data Object.

Parameters

stix_obj (dict) – STIX 2 object to build Observed Data dictionary mappings for.

Yields

ParseResult – Parsed Observed Data ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.observed_data.ObservedDataParser

Bases: object

Utility class used by several mapping classes for parsing STIX 2 Observed Data SDOs.

classmethod create_parse_results(*obj_data: dict, inter_relate=False, parent_mapping: threatq.core.lib.stix.stix2.base.Mapping, **kwargs)

Create BaseParseResult instances for each object dictionary passed in via obj_data. Object dictionaries will be created as ThreatObjects via create_threat_object() and inter-related if specified via the inter_relate argument.

Each object in obj_data must have a _type key whose value corresponds to a key in Models. This differs from the base class’ implementation, which expects an obj_type argument that applies to all objects created from obj_data.

If an object in obj_data has a _inline_relations key, each object dictionary in _inline_relations will be created as ThreatObjects via create_threat_object(), wrapped in a new BaseParseResult instance that has no associated STIX IDs, and inline-related to the BaseParseResult instance created from the parent object from obj_data. An inline relation’s BaseParseResult does not have any associated STIX IDs so that other STIX 2 SDOs are not related to them. These inline relations are also not inter-related with other objects in obj_data.

Parameters
  • obj_data (dict) – Dictionary of object data usable by from_data(), in addition to keys _type and _inline_relations to be used internally for deriving ThreatObject type and relations specific to a single created ThreatObject.

  • inter_relate (bool) – Optional, defaults to False. If true and an iterable of values is supplied to obj_data, those objects will be inter-related upon creation as BaseParseResult instances.

  • parent_mapping (Mapping) – STIX 2 mapping class invoking this parser.

  • kwargs – Additional kwargs that should be passed to the BaseParseResult constructor call for each object in obj_data.

Returns

Asynchronous Generator of BaseParseResult instances created from obj_data.

Return type

AsyncIterable[BaseParseResult]

classmethod parse_object(stix_obj: dict, parent_mapping: threatq.core.lib.stix.stix2.base.Mapping)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX 2 Observed Data Object.

Parameters
  • stix_obj (dict) – STIX 2 object to build Observed Data dictionary mappings for.

  • parent_mapping (Mapping) – STIX 2 mapping class invoking this parser.

Yields

ParseResult – Parsed Observed Data ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.relationships.Relationships(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Relationships STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.stix2.results.RelationParseResult]

Parses a STIX 2 Relationship object.

Parameters

stix_obj (dict) – STIX 2 object to build Relationship dictionary mappings for.

Yields

RelationParseResult – Parsed Relationship ThreatObjects wrapped into RelationParseResult instances.

class threatq.core.lib.stix.stix2.reports.Reports(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Reports STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Report object.

Parameters

stix_obj (dict) – STIX2 object to build Report dictionary mappings for.

Yields

ParseResult – Parsed Report ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.sightings.Sightings(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Sightings STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Sighting object.

Parameters

stix_obj (dict) – STIX2 object to build Sighting dictionary mappings for.

Yields

ParseResult – Parsed Sighting ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.threat_actors.ThreatActors(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Threat Actors STIX 2 Mapping Class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX 2 Threat Actor Object.

Parameters

stix_obj (dict) – STIX2 object to build Threat Actor dictionary mappings for.

Yields

ParseResult – Parsed Threat Actor ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.tools.Tools(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Tool STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Tool object.

Parameters

stix_obj (dict) – STIX2 object to build Tool dictionary mappings for.

Yields

ParseResult – Parsed Tool ThreatObjects wrapped into ParseResult instances.

class threatq.core.lib.stix.stix2.vulnerabilities.Vulnerabilities(ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.core.lib.stix.stix2.base.Mapping

Vulnerabilities STIX 2 mapping class.

parse_object(stix_obj: dict)AsyncIterable[threatq.core.lib.stix.results.BaseParseResult]

Parses a STIX2 Vulnerability object.

Parameters

stix_obj (dict) – STIX2 object to build Vulnerability dictionary mappings for.

Yields

ParseResult – Parsed Vulnerability ThreatObjects wrapped into ParseResult instances.

STIX 2 Pattern Grammar Classes
class threatq.core.lib.stix.stix2.pattern_grammar.base.BaseSTIXPatternGrammar

Bases: object

Provides purely class-based functionality for parsing STIX 2 Patterns.

classmethod get_grammar(spec_version: str, pattern_listener_subclass=None)

Generates a new class that is a subclass of BaseSTIXPatternGrammar and the STIXPatternGrammar class dynamically imported for the provided spec_version.

The new agglomerated class has the following class attributes from the imported STIXPatternGrammar:

  • version (str): The spec version for this collection of lexer, parser, and listener classes

  • lexer_cls (STIXPatternLexer): ANTLR4-generated lexer class

  • parser_cls (STIXPatternParser): ANTLR4-generated parser class

  • listener_cls (STIXPatternListener): ANTLR4-generated listener class

Parameters
  • spec_version (str) – The version of the STIX 2 Pattern grammar to use.

  • pattern_listener_subclass (STIXPatternListener) – Optional, subclass of STIXPatternListener. Used to create a new listener class that is a subclass of the provided pattern_listener_subclass and the listener_cls of the imported STIXPatternGrammar class. This new listener class is then set to the listener_cls class attribute of the returned class. This is used as a means to inject new listener functionality into the returned class.

Returns

A new class that is a subclass of BaseSTIXPatternGrammar and the STIXPatternGrammar class dynamically imported for the provided spec_version.

Return type

BaseSTIXPatternGrammar

Raises

STIXPatternGrammarImportError – Raised if the provided spec_version does not have an associated module or if the imported module does not have a STIXPatternGrammar class.

classmethod parse(pattern: str)antlr4.tree.Tree.ParseTree

Parses the provided STIX 2 Pattern.

Parameters

pattern (str) – STIX 2 Pattern to parse.

Returns

The ANTLR4 parse tree for the parsed pattern.

Return type

ParseTree

Raises

STIXPatternParseError – Raised if the provided pattern causes any syntax errors during parsing.

classmethod walk_parse_tree(parse_tree: antlr4.tree.Tree.ParseTree, *listener_args, **listener_kwargs)

Walks the provided parse_tree, using an instance of listener_cls instantiated with any provided listener_args or listener_kwargs.

Parameters
  • parse_tree (ParseTree) – ANTLR4 parse tree

  • listener_args (Any) – Optional, positional arguments to pass to the listener class’ constructor.

  • listener_kwargs (Any) – Optional, keyword arguments to pass to the listener class’ constructor.

Returns

Instance of the listener after walking through the parse tree.

Return type

STIXPatternListener

class threatq.core.lib.stix.stix2.pattern_grammar.extrapolating_grammar.ExtrapolateObjectsListener(version: str)

Bases: object

Injected as a subclass of STIXPatternListener via get_grammar() and contains the business logic for extrapolating observable objects.

Parameters

version (str) – STIX 2 Pattern Grammar version

version
Type

str

property objects

Generates the observable objects mapping of a string index -> observable object dict after this listener was used to walk the parse tree.

class threatq.core.lib.stix.stix2.pattern_grammar.extrapolating_grammar.ExtrapolatingSTIXPatternGrammar

Bases: threatq.core.lib.stix.stix2.pattern_grammar.base.BaseSTIXPatternGrammar

Provides an interface for extrapolating observable objects from a STIX 2 Pattern.

classmethod extrapolate_objects(pattern: str)dict

Extrapolates observable objects from the provided STIX 2 Pattern. That is, this method attempts to derive an Observed Data SDO’s objects property dict.

A set of observable objects are derived from each Comparison Expression within an Observation Expression. An Observation Expression is a set of brackets containing at least one Comparison Expression, such as [domain-name:value = 'something.org']. A Comparison Expression consists of an Object Path (domain-name:value), a Comparison Operator (=), and a constant ('something.org') or a set of constants (('something.org', 'somethingelse.net')). In the STIX Pattern Grammar, these Comparison Expressions are referred to as PropTest rules. Non-empty objects sets can only be derived from two types of PropTests: PropTestEqual (in which the only Comparison Expression Operators that observables can be derived from are =, ==, NOT !=, and NOT <> ) and PropTestSet (in which the Comparison Expression Operator is IN). All other Comparison Expression Operators are ignored since they cannot reasonably result in a set of discrete values, thus resulting in an empty set of observable objects for that given Comparison Expression. A PropTestEqual rule may result in an observable object set of size one, and a PropTestSet rule may result in an observable object set of size N, where N is the number of elements in the constant set.

As an example, the Comparison Expression domain-name:value IN ('something.org', 'somethingelse.net') yields the following two dicts in its observable object set:

{
    'type': 'domain-name',
    'value': 'something.org'
}
{
    'type': 'domain-name',
    'value': 'somethingelse.net'
}

If a Comparison Expression’s Object Path contains a _ref property, the Comparison Expression will yield an empty observable object set since the pattern itself does not denote what the observable type of the referenced object should be. For example, the Comparison Expression domain-name:resolves_to_refs[*].value IN ('198.51.100.1/32', 'https://bad-foo-stuff.com', 'fd7e:cb41:cb1b:d476::') provides no indicator that the first element of the constant set is an ipv4-addr, the second element is a url, and the third element is an ipv6-addr.

If a Comparison Expression’s Object Path contains a List Object Property (e.g. file:extensions.'windows-pebinary-ext'.sections[*].entropy = 7.0), the List Object Property is resolved at the termination of the Observation Expression according to the following rules:

  • Dictionaries indexed by the same list index (a positive integer) or the literal * are merged together.

  • Dictionaries indexed by a list index are merged with a dictionary indexed by the literal *.

  • Dictionaries or constants indexed by list indices are sorted in ascending numerical order.

  • If the only index is the constant *, the resultant list simply contains the dictionary or constant keyed by *.

All constant values are normalized as to how they would be represented in an Observed Data SDO’s objects property dict:

  • Escaped string characters are unescaped.

  • String and string-like constants are stripped of the single quotes wrapping them.

  • The leading b, h, and t characters for binary, hex, and timestamp constant types, respectively, are removed.

  • Boolean constants (true/false) are converted to a Python bool.

An Observation Expression may contain multiple Comparison Expressions joined by the Boolean Operators AND or OR. Observable object sets from each Comparison Expression are recursively joined until there is one set of observable objects remaining for an Observation Expression. The logic for joining observable object sets from Comparison Expressions or already-joined Comparison Expressions is the following:

  • For the expression a AND b, where a and b are observable object sets derived from Comparison Expressions or already-joined Comparison Expressions, the resultant observable object set is the Cartesian Product of a and b, in which each pair (observable_from_a, observable_from_b) is merged together to result in a new observable object. Observable objects are only merged together if their observable type is the same. If the observable types are different, the AND is instead treated as an OR.

  • For the expression a OR b, where a and b are observable object sets derived from Comparison Expressions or already-joined Comparison Expressions, the resultant observable object set is the Union of a and b.

Some examples of the above logic follow:

[file:name IN ('pdf.exe', 'foos.jar') AND file:hashes.MD5 = 'eb889de3cb5fa2166fe88fb45220567a']:

The left-hand Comparison Expression yields the following two dicts:

{
    'type': 'file',
    'value': 'pdf.exe'
}
{
    'type': 'file',
    'value': 'foos.jar'
}

The right-hand Comparison Expression yields the following dict:

{
    'type': 'file',
    'hashes': {
        'MD5': 'eb889de3cb5fa2166fe88fb45220567a'
    }
}

The joining of the above observable object sets results in the observable object set yielding the following two dicts:

{
    'type': 'file',
    'value': 'pdf.exe',
    'hashes`: {
        'MD5': 'eb889de3cb5fa2166fe88fb45220567a'
    }
}
{
    'type': 'file',
    'value': 'foos.jar',
    'hashes`: {
        'MD5': 'eb889de3cb5fa2166fe88fb45220567a'
    }
}

If the above example instead had an OR joining the two Comparison Expressions ([file:name IN ('pdf.exe', 'foos.jar') OR file:hashes.MD5 = 'eb889de3cb5fa2166fe88fb45220567a']), the resultant observable object set would yield the following three dicts:

{
    'type': 'file',
    'value': 'pdf.exe'
}
{
    'type': 'file',
    'value': 'foos.jar'
}
{
    'type': 'file',
    'hashes': {
        'MD5': 'eb889de3cb5fa2166fe88fb45220567a'
    }
}

If a pattern contains multiple Observation Expressions, the observable object set from each is simply yielded. Observation Operators and Qualifiers have no effect on the extrapolated observable objects since these pattern constructs make sense only in the context of the pattern.

Parameters

pattern (str) – STIX 2 Pattern to parse and extrapolate observable objects from.

Returns

Extrapolated observable objects represented as a mapping of a string index -> observable object dict. This is the format used by the objects property of an Observed Data SDO.

Return type

dict

classmethod get_grammar(spec_version: str, pattern_listener_subclass=<class 'threatq.core.lib.stix.stix2.pattern_grammar.extrapolating_grammar.ExtrapolateObjectsListener'>)

Generates a version of BaseSTIXPatternGrammar that uses an injected pattern listener class which constructs observable objects as nodes of the pattern tree are entered and exited.

class threatq.core.lib.stix.stix2.pattern_grammar.exceptions.STIXPatternErrorListener

Bases: antlr4.error.ErrorListener.ErrorListener

An ANTLR4 ErrorListener that collects syntax error messages into errors.

errors

Collected syntax error messages

Type

List[str]

raise_on_error(pattern: str)

Raises a STIXPattternParseError containing the last syntax error message encountering during parsing if this ErrorListener collected any syntax error messages during parsing of the provided pattern.

Parameters

pattern (str) – The STIX Pattern that may have failed to be parsed.

Raises

STIXPatternParseError – Raised if this ErrorListener collected any syntax error messages during parsing.

exception threatq.core.lib.stix.stix2.pattern_grammar.exceptions.STIXPatternGrammarImportError(errmsg)

Bases: Exception

Error raised when the dynamic importing of a STIXPatternGrammar fails.

Parameters

errmsg (str) – Error message

exception threatq.core.lib.stix.stix2.pattern_grammar.exceptions.STIXPatternParseError(pattern: str, errmsg: str)

Bases: Exception

Error raised when the parsing of a STIX Pattern generated errors.

Parameters
  • pattern (str) – The STIX Pattern that failed to be parsed.

  • errmsg (str) – The ANTLR4 syntax error message.

TAXII: threatq.core.lib.taxii

class threatq.core.lib.taxii._base.TAXIIClient(*args, version: str = '1.1', **kwargs)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin, abc.ABC

Base class for TAXII client functionality. Exposes poll() as an asynchronous generator.

discovery_url

URL of the TAXII server’s discovery service

Type

str

auth

Authentication information necessary for requests to the TAXII server.

Type

BasicAuth | CabbyAuth

headers

Additional headers for the client to send with requests to the TAXII server

Type

dict

use_proxies

Specifies if the TAXII client should use proxies

Type

bool

verify_ssl

Defaults to True, specifies whether the client should verify SSL when polling a client

Type

bool

version

Version of the TAXII server the client is connecting to

Type

str

__init__(discovery_url: str, *, auth: Optional[Union[threatq.core.lib.http.auth.basic.BasicAuth, CabbyAuth]] = None, headers: Optional[dict] = None, use_proxies: bool = True, verify_ssl: bool = True, host_ca_certificate: Optional[str] = None, version: str = '1.1', ctx: Optional[threatq.core.lib.asphalt.Context] = None, logname: Optional[str] = None, **kwargs)

Initialize a TAXII Client.

Parameters
  • discovery_url (str) – URL of the TAXII server’s discovery service

  • auth – (BasicAuth | CabbyAuth ): Optional, defaults to None. TAXII authentication credentials.

  • headers (dict) – Optional, defaults to None. Dictionary of additional headers to send with requests to the TAXII server.

  • use_proxies (bool) – Optional, defaults to True. Specifies if the TAXII client should use proxies. If True, proxies will be extracted from ctx.proxies.

  • verify_ssl (bool) – Optional, defaults to True. Specifies whether the client should verify a provider’s SSL certificate when polling.

  • host_ca_certificate (str) – Optional, defaults to None. Specifies a base64 PEM encoded CA Certificate Bundle to verify the provider’s SSL certificate against. Applicable only to https URLs.

  • version (str) – Optional, defaults to 1.1. Specifies the version of TAXII server the client is connecting to.

  • ctx (Context) – Optional, Context of the instance

  • logname (str) – Optional, logname the instance should utilize

static __new__(cls, *args, version: str = '1.1', **kwargs)

Encapsulates factory logic of initializing a TAXIIClient. Checks the version parameter against each subclass’s handled_versions and returns the appropriate subclass instance.

Parameters

version (str) – Optional, defaults to 1.1. Specifies the version of TAXII server that the client is connecting to.

Returns

TAXIIClient instance

Return type

TAXIIClient

Raises

ValueError – If the specified version argument is not support by any of TAXIIClient’s subclasses

handled_versions = ()

Tuple of strings denoting which TAXII versions are support by this TAXIIClient implementation

classmethod handles_taxii_version(version: str)

Checks if a given version is supported by this TAXIIClient class. If needed, a subclass can override this method to perform a more involved check.

Parameters

version (str) – Version to check support for

Returns

True if the given version is in handled_versions, False otherwise.

Return type

bool

abstract poll(collection_name: str, begin_date: Optional[Union[arrow.arrow.Arrow, str, int]] = None, end_date: Optional[Union[arrow.arrow.Arrow, str, int]] = None, poll_url: Optional[str] = None, new_task=<function ensure_future>)

Poll the TAXII server for collection_name.

Parameters
  • collection_name (str) – Name of the collection to poll on the TAXII server.

  • begin_date (str | int | Arrow) – Optional, defaults to None. Poll start time.

  • end_date (str | int | Arrow) – Optional, defaults to None. Poll end time.

  • poll_url (str) – Optional, defaults to None. URL to poll objects from. If not specified, the client will attempt to determine the collection’s poll URL via services advertised by the TAXII Server’s the discovery service.

  • new_task – Function used to spawn an internal thread if needed by the client. Abstracted here such that a caller can supply an appropriate method for spawning tasks, eg threatq.dynamo.feeds.common.FeedRun.new_task(). Defaults to ensure_future().

Returns

Asynchronous Generator of STIX content retrieved by the poll

Return type

AsyncGenerator

class threatq.core.lib.taxii.cabby.CabbyAuth(client_certificate: Optional[str] = None, client_private_key: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None, verify_ssl: bool = True, host_ca_certificate: Optional[str] = None, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Bases: threatq.core.lib.elements.Element

Authentication class for use with the TAXIIFeedSource source. This authentication element is presented as a context manager, due to the limitation in cabby clients requiring certificate/private key files be represented as filepaths rather than strings.

client_certificate

String representation of a client certificate

Type

str

client_certificate_file

holder attribute for the temporary file created for data passed into client_certificate

Type

NamedTemporaryFile()

client_private_key

String representation of a client private key

Type

str

client_private_key_file

holder attribute for the temporary file created for data passed info client_private_key

Type

NamedTemporaryFile()

username

String username for TAXII basic authentication

Type

str

password

String password for TAXII basic authentication

Type

str

verify_ssl

Defaults to True, specifies whether cabby should verify SSL when polling a client.

Type

bool

host_ca_certificate

String representation of a host’s CA Certificate Bundle to verify SSL against.

Type

str

host_ca_certificate_file

holder attribute for the temporary file created for data passed into host_ca_certificate

Type

NamedTemporaryFile()

__enter__()

Enters CabbyAuth as a Context Manager. Temporary file objects are created for values supplied to client_certificate`, client_private_key, and/or host_ca_certificate.

Returns

This instance

Return type

CabbyAuth

__exit__(exc_type, exc_val, exc_tb)

Exit the CabbyAuth Context Manager. Closes and removes references to temporary file objects that were created for client_certificate`, client_private_key, and/or host_ca_certificate.

__init__(client_certificate: Optional[str] = None, client_private_key: Optional[str] = None, username: Optional[str] = None, password: Optional[str] = None, verify_ssl: bool = True, host_ca_certificate: Optional[str] = None, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, tmpl_contexts: Sequence[Mapping] = (), **kwargs)

Initialize CabbyAuth

Parameters
  • client_certificate (str) – Optional, string representation of a client certificate

  • client_private_key (str) – Optional, string representation of a client private key

  • username (str) – Optional, string username for TAXII basic authentication

  • password (str) – Optional, string password for TAXII basic authentication

  • verify_ssl (bool) – Defaults to True, specifies whether cabby should verify SSL when polling a client.

  • host_ca_certificate (str) – Optional, string representation of a CA Certificate Bundle to verify provider SSL certificate against. If specified, verify_ssl will effectively be ignored, and SSL verification will be checked against this specified certificate rather than public CA Bundles.

  • logname (str) – logname of the CabbyAuth instance

  • ctx (Content) – Content of the CabbyAuth instance.

  • tmpl_contexts (MutableMapping) – Mapping of template contexts used to render TemplateExpression arguments.

__iter__()

Enable dict() representation of a CabbyAuth. Attributes are formatted for use with set_auth().

class threatq.core.lib.taxii.cabby.ThreadedCabby(*args, version: str = '1.1', **kwargs)

Bases: threatq.core.lib.taxii._base.TAXIIClient

Class wrapping cabby functionality so that its synchronous processing can be called in a Thread and not block Dynamo’s event loop.

__init__(discovery_url: str, **kwargs)

Initialize ThreadedCabby. TAXII Server Authentication information must be supplied as a CabbyAuth instance passed via the auth parameter.

Parameters

discovery_url (str) – URL of the TAXII server’s discovery service

Raises

ValueError – If an auth is supplied that is not a CabbyAuth instance

poll(collection_name: str, begin_date: Optional[Union[arrow.arrow.Arrow, str, int]] = None, end_date: Optional[Union[arrow.arrow.Arrow, str, int]] = None, poll_url: Optional[str] = None, new_task=<function ensure_future>)

Poll the TAXII server for collection_name.

Parameters
  • collection_name (str) – Name of the collection to poll on the TAXII server.

  • begin_date (str | int | Arrow) – Optional, defaults to None. Poll start time. Will be converted to an Arrow and then a datetime via datetime.

  • end_date (str | int | Arrow) – Optional, defaults to None. Poll end time. Will be converted to an Arrow and then a datetime via datetime.

  • poll_url (str) – Optional, defaults to None. URL to poll objects from. If not specified, the cabby client will attempt to determine the collection’s poll URL via services advertised by the TAXII Server’s discovery service.

  • new_task – Function that spawns the internal threaded TAXII polling and returns a future.

Returns

Asynchronous Generator of STIX content retrieved by the poll

Return type

AsyncGenerator

Raises

TAXIIPollError – If the TAXII client failed to parse XML contained in a response from the TAXII server

class threatq.core.lib.taxii.taxii2.TAXII2Client(*args, version: str = '1.1', **kwargs)

Bases: threatq.core.lib.taxii._base.TAXIIClient

Class representing a TAXII 2.X client.

auth

Basic Auth credentials for requests to the TAXII Server.

Type

BasicAuth

paginate

Number of objects to retrieve per request in poll_objects().

Type

int

__init__(discovery_url: str, paginate: int = 1000, **kwargs)

Initialize TAXII2Client. TAXII Server Authentication information, if required, must be supplied as a BasicAuth instance via the auth parameter.

Parameters
  • discovery_url (str) – URL of the TAXII server’s discovery service

  • paginate (int) – Number of objects to retrieve per request in poll_objects().

Raises

ValueError – If an auth is supplied that is not a BasicAuth instance

poll(collection_name: str, begin_date: Optional[Union[arrow.arrow.Arrow, str, int]] = None, end_date: Optional[Union[arrow.arrow.Arrow, str, int]] = None, poll_url: Optional[str] = None, new_task=<function ensure_future>)

Poll the TAXII server for collection_name. While poll_objects() paginates data, we need to collate the paginated response data into a single response object here so that relation and marking objects are all present in the package when the parse_stix() process goes to parse the data.

Parameters
  • collection_name (str) – Name of the collection to poll on the TAXII server.

  • begin_date (str | int | Arrow) – Optional, defaults to None. Poll start time. Sent to TAXII 2.0 servers as an added_after query string parameter.

  • end_date (str | int | Arrow) – Optional, defaults to None. Poll end time. Not currently applicable for TAXII 2.0 Servers.

  • poll_url (str) – Optional, defaults to None. URL to poll objects from. If not specified, the client will determine the collection’s poll URL via services advertised by the TAXII Server’s discovery service.

  • new_task – Function used to spawn an internal thread if needed by the client.

Returns

Asynchronous Generator of STIX content retrieved by the poll

Return type

AsyncGenerator

poll_collection_objects(collection, collection_params)

Poll the given TAXII collection for its objects.

Parameters
  • collection (Collection | Collection) –

  • collection to poll on the TAXII server. (TAXII) –

  • collection_params (dict) – Additional request parameters when polling on the TAXII server.

Returns

object values

Return type

dict

poll_collection_using_discovery(collection_name, collection_params)

Poll the TAXII server for collection_name via services advertised by the TAXII Server’s discovery service.

Parameters
  • collection_name (str) – Name of the collection to poll on the TAXII server.

  • collection_params (dict) – Additional request parameters when polling on the TAXII server.

Returns

object values

Return type

dict

poll_collection_using_url(url, collection_name, collection_params)

Poll the TAXII server for collection_name using the URL given

Parameters
  • url (str) – TAXII server URL.

  • collection_name (str) – Name of the collection to poll on the TAXII server.

  • collection_params (dict) – Additional request parameters when polling on the TAXII server.

Returns

object values

Return type

dict

Template Parsing: threatq.core.lib.templates

exception threatq.core.lib.templates.NotAReferencePath
class threatq.core.lib.templates.Template(source: str)

Class for enabling !tmpl tags inside a yaml file.

Parameters

source (str) – The value that will compiled to be parsed.

yaml_tag

The tag that should be used to specify the usage of this parser.

Type

str

jinja2_env

The jinja2 environment to be used.

Type

jinja2.environment.Environment

source

The value that was passed in at instantiation.

Type

str

fields

A dict with the different types of fields that are used in the Template.

Type

dict

compile()

Compile the source string against the jinja2_env.

find_reference_paths(base: Union[Tuple[str, ], str] = (), depth: Optional[int] = None)set

Inspect the template source to find any references to variables.

Parameters
  • base (str, tuple[str]) – Lookup references to this variable or list of variables.

  • depth (int, optional) – Limits the number of reference levels that should be followed when searching the template. None or 0 makes the depth unlimited.

Returns

A unique set of the tuples of variables and their paths.

Return type

set[tuple[str]]

find_references(base: Union[Tuple[str, ], str] = ())

Look for any references to the base variable and return a set of the paths on those that are used.

Parameters

base (str, tuple[str]) – Lookup references to this variable or list of variables.

Returns

A unique set of bases that the source references.

Return type

set[str]

render(*args, **kwargs)

Passthrough call to the pre-compiled jinja2 Template render() method.

Parameters
  • *args – The positional arguments to pass to render

  • **kwargs – The keyword arguments to pass to render

Returns

The rendered template/expression.

Return type

str

class threatq.core.lib.templates.TemplateExpression(source: str)

Class for enabling !expr tags inside a yaml file.

Parameters

source (str) – The value that will compiled to be parsed.

yaml_tag

The tag that should be used to specify the usage of this parser.

Type

str

jinja2_env

The jinja2 environment to be used.

Type

jinja2.environment.Environment

source

The value that was passed in at instantiation.

Type

str

fields

A dict with the different types of fields that are used in the TemplateExpression.

Type

dict

compile()

Compile the source string against the jinja2_env.

render(*args, **kwargs)

Passthrough call to the pre-compiled jinja2 Template render() method.

Parameters
  • *args – The positional arguments to pass to render

  • **kwargs – The keyword arguments to pass to render

Returns

The rendered template/expression.

Return type

str

class threatq.core.lib.templates.YAMLTagged

Utilities: threatq.core.lib.utils

threatq.core.lib.utils.AsyncTemporaryFile = typing.Union[asyncio_extras.file.AsyncFileWrapper, tempfile._TemporaryFileWrapper]

Returned by get_async_temporary_file()

class threatq.core.lib.utils.ErrorManager(context_desc: str, log: Optional[logging.Logger] = None, *, propagate: Optional[bool] = None)
classmethod format_exc(exc)

Attempt to grab the more informative readout of exc by choosing the longer of the results of str() and repr().

Parameters

exc (Exception) – The exception to stringify

Returns

The stringified exception

Return type

str

class threatq.core.lib.utils.ParsedFQDN(subdomain, domain, suffix)
property domain

Alias for field number 1

property subdomain

Alias for field number 0

property suffix

Alias for field number 2

class threatq.core.lib.utils.SystemExitingErrorManager(context_desc: str, log: Optional[logging.Logger] = None)

Similar to ErrorManager, except raises a SystemExit instead of logging and propagating the error. Designed for use with CLI tools like TQFilter.

class threatq.core.lib.utils.TempFiles(mode='w+', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, named: bool = False, contents: Optional[MutableMapping] = None)

Creates and manages the cleanup of temporary files. A temporary file is created on-demand when an undefined attribute is accessed.

tempfiles = TempFiles()
a_tempfile = tempfiles.my_file
a_tempfile.write('this is a test')
a_tempfile.seek(0)
print(a_tempfile.read())  # Prints 'this is a test'
tempfiles.close()

An instance can be used as a context manager, in which the lifetime of the temporary files are scoped to the with block (a named temporary file will not be deleted from the filesystem if delete was passed to the constructor with a False value).

with TempFiles() as tempfiles:
    tempfiles.my_file.write('this is a test')
    tempfiles.my_file.seek(0)
    print(tempfiles.my_file.read())  # Prints 'this is a test'
# Any temporary files created in the `with` block are closed, deleted from the filesystem
# (if the constructor arguments `named == True` and `delete == True` were provided),
# and memory deallocated

A mapping of attribute name to file contents can be passed via the contents constructor argument. A temporary file is created for each key-value pair in the contents mapping, in which, for a given key-value pair, the key is the attribute name to be used to access the temporary file object, and the value is the content written to the created temporary file.

with TempFiles(contents=dict(my_file='this is a test')) as tempfiles:
    print(tempfiles.my_file.read())  # Prints 'this is a test'
    print(tempfiles.some_other_file.read())  # Prints empty string, matching name not in `contents`

If temporary files are to be added over time and the contents constructor argument will not work, the class method write_content_to_tempfile() can be used to conveniently write content to a file that is created by accessing an attribute on the TempFiles object.

with TempFiles() as tempfiles:
    tempfiles.write_content_to_tempfile(tempfiles.my_file, 'this is a good file')
    print(tempfiles.my_file.read())  # Prints 'this is a good file'

For all constructor arguments other than named and contents, see the TemporaryFile() or NamedTemporaryFile() documentation for information about these arguments.

Parameters
Raises

ValueError – Raised if a provided contents mapping contains a key that is not a valid Python identifier.

close()

Closes, deletes from the filesystem (if the constructor arguments named == True and delete == True were provided), and clears references to all temporary files previously created with this TempFiles instance.

get_tempfile_attrs()

Returns sorted attribute names whose values are temporary files in this TempFiles instance.

Returns

Sorted attribute names whose values are temporary files in this TempFiles instance.

Return type

list

classmethod write_content_to_tempfile(tempfile_, content: str, seek_to_beginning: bool = True)

Writes content to tempfile_.

Parameters
  • tempfile – Temporary file object

  • content (str) – Content to write to the temporary file

  • seek_to_beginning (bool) – If True, sets the file cursor to the start of the file object.

Returns

The provided temporary file object

threatq.core.lib.utils.all_subclasses(cls_: type)

Recursively list all of the subclasses of the specified class:

Parameters

cls (type) – The class to recursively list the classes of.

Returns

A set of the subclasses.

Return type

set[type]

threatq.core.lib.utils.copy_path(src: Union[importlib_resources.abc.Traversable, pathlib.Path], dst: pathlib.Path)

Copy a importlib.abc.Traversable object (including most Path-likes) at the Path-like destination.

Parameters
  • src (Traversable | Path) – Source package resource (file or directory)

  • dst (Path) – Destination

threatq.core.lib.utils.flatten(value: Any, depth: Optional[Union[float, int]] = inf)Any

Get a flat list of all values in iterables and nested iterables.

async threatq.core.lib.utils.for_items_chunked(action: Callable, object_groups: Iterable[Any], *, chunk_size: int = 100)

Preform action on each object in object_groups. Can be used to chunk down large object_groups’s to avoid bogging down the event loop.

Parameters
  • action (Callable | Coroutine) – Action to be applied to each object returned from object_groups.

  • object_groups (Iterable[Any]) – Iterable of objects.

  • chunk_size (int) – Optional, defaults to 100. Size of chunks to grab from object_groups.

threatq.core.lib.utils.get_all_values(obj: Any)

Get all values in an object.

Parameters

obj (Any) – The dict, list, object

Returns

A list of the values all nested in a hashable object or list of objects.

Return type

list

async threatq.core.lib.utils.get_async_temporary_file(mode='w+b', buffering=- 1, encoding=None, newline=None, suffix=None, prefix=None, _dir=None, delete=True, named=False)

Creates a temporary file with certain I/O operations wrapped so they’re guaranteed to run in a thread pool.

The wrapped methods are:

  • flush()

  • read()

  • readline()

  • readlines()

  • seek()

  • truncate()

  • write()

  • writelines()

The standard file handler together with all its methods can still be used by using the _raw_file attribute.

temp_file = await get_async_temporary_file()
await temp_file.write('Asynchronous write')
temp_file._raw_file('Synchronous write')
Parameters
Returns

A temporary file having asynchronous capabilities.

Return type

AsyncTemporaryFile

threatq.core.lib.utils.merge_async_gens(*sub_gens: AsyncIterator)AsyncIterable[Any]

Merge multiple async generators into a single async generator. Yields back the results of sub_gens as they complete, regardless of which generator in sub_gens the value was yielded from.

Parameters

sub_gens (AsyncIterator) – Async generators to merge into a single async generator.

Yields

Any – Item yielded by one of the specified sub_gens.

threatq.core.lib.utils.new_ssl_context(verify_host_ssl: bool = True, host_ca_certificate: Optional[str] = None)

Helper function to create and initialize a client-side SSLContext.

Parameters
  • verify_host_ssl (bool) – Validates the server’s certificate and checks whether the server’s hostname matches if True.

  • host_ca_certificate (str) – Specifies a base64 PEM encoded CA Certificate Bundle to verify the provider’s SSL certificate against. If not provided, the operating system’s default CA Certificate Bundle is used.

Returns

SSLContext initialized based on provided input.

Return type

SSLContext

threatq.core.lib.utils.normalize_action_name(name: str)str

Normalize an action name by replacing spaces/dots with underscores and adding _action suffix.

Parameters

name – action feed name to be normalized

Returns

normalized action feed name

async threatq.core.lib.utils.parse_fqdn(fqdn: str)

This function seems like dead code because NOTHING in pynoceros calls it. It’s used by plugins, though, so it gets to stay.

Parameters

fqdn (str) – Domain name to parse

Returns

None on a parse failure (uses a TLD that doesn’t exist, etc) A ParsedFQDN namedtuple (defined above) otherwise with the domain parts

threatq.core.lib.utils.resource_as_dir(resource_dir_path: Union[importlib_resources.abc.Traversable, pathlib.Path])
threatq.core.lib.utils.resource_as_dir(path: pathlib.Path)

A context manager that, given a importlib.abc.Traversable object representing a package resource directory, will return a real directory, containing its contents, on the local filesystem. If a temporary directory was needed, it will be automatically deleted upon context exit.

Models

Models Overview

Models allows one to easily work with their ThreatQ threat intelligence data:

  • Create threat objects (Indicators, Adversaries, Events, etc.) using the model’s constructor.

  • Push threat objects and their relationships to the ThreatQ API in batches for ingestion.

  • Create a threat object or a list of threat objects from dictionary representations of the threat object’s attributes (usually by calling json.loads() on JSON data returned by the API).

    • This allows one to easily access and manipulate fetched threat intelligence data by accessing attributes or calling methods on the threat object. The user can further enrich the object or add relationships before pushing them back to the ThreatQ API, or the user can use the fetched threat intelligence to script integrations with other vendors’ products.

  • Create, retrieve, update, and remove common fields of a threat object if the threat object’s definition has those fields.

Some use cases for Models include Operations and Configuration Driven Feeds.

Known Models (threat object classes) can be accessed inside of an Asphalt application that has the component ModelsComponent added. After adding this component and asynchronously requesting the Models resource from the Asphalt context (await self.ctx.request_resource(Models)), a model may be obtained from the models mapping by calling self.ctx.models[type_], where type_ is a str of the model’s name or collection name (e.g. self.ctx.models['indicator']). The class instance needing the model in this example needs to have the Asphalt application’s context stored as the ctx attribute.

Todo

It would be nice to have some example code here instead of trying to explain how to use models in paragraph form above. Problem is trying to think of a non-contrived example.

Base Objects

class threatq.core.models.base.BaseThreatObject(*, common: Iterable[threatq.core.models.common_fields.CommonFieldObject] = (), **kwargs)

An abstract base threat object.

api_collection = None

Class attribute, the collection (a component of the object path) in the ThreatQ API

Type

(str)

api_name = None

Class attribute, the canonical name of the object type in the ThreatQ API

Type

(str)

ctx = None

Context instance for use by the ThreatObject

Type

(Context, optional)

classmethod from_data(obj_data: Any, **kwargs)

Create a threat object from a dictionary representation of the threat object’s attributes. Calls _kwargs_from_data() on whatever threat object from_data() is called on.

Parameters

obj_data (dict) – dictionary of threat object attributes.

Returns

Threat object created

Return type

ThreatObject

classmethod from_data_expanded(obj_data: Any, **kwargs)

Create multiple ThreatObjects expanded from obj_data. If obj_data is a dict, returns an object for every possible combination of the member values. Otherwise obj_data must be iterable, and returns an object for each member of obj_data.

Parameters
  • obj_data (dict or iterable) – source data - see above

  • **kwargs – Kwargs for the ThreatObject(s)

Returns

Threat objects created.

Return type

list[ThreatObject]

merge(other: Optional[threatq.core.models.base.BaseThreatObject])

Merges this threat object with other BaseThreatObject. Classes implementing should provide logic in _merge() that specifies how to merge the given threat object’s attributes with other.

Parameters

other (BaseThreatObject) – The threat object to merge with this one.

Returns

self

Return type

BaseThreatObject

unique_params = ()

Class attribute, attributes on ThreatObject that are considered unique. Used by equality comparisons

Type

(tuple[str])

class threatq.core.models.base.DynamicThreatObject(*, id: Optional[int] = None, source: Optional[Any] = None, **kwargs)

Bases: threatq.core.models.base.ThreatObject

A dynamically generated threat object.

classmethod new_subclass_from_definition(definition: Mapping[str, Any])Type[threatq.core.models.base.ThreatObject]

Class factory for dynamically generating a subclass of ThreatObject from an object’s definition from the API.

Parameters

definition (MutableMapping) – Mapping of the object definition.

Returns

Dynamically generated threat object class

Return type

ThreatObject

reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

class threatq.core.models.ThreatObject(*, id: Optional[int] = None, source: Optional[Any] = None, **kwargs)

Bases: threatq.core.models.base.BaseThreatObject

A threat object.

Parameters
  • common (list[common_fields.CommonFieldObject], optional) – Common descriptive objects to be added to the threat object.

  • id (int, optional) – Id of the threat object.

  • source (optional) – Source of the threat object. Can be set to instance of a Feed.

id

Id of the threat object.

Type

int

attributes

Attributes of the threat object.

Type

frozenset[Attribute]

relations

Mapping of relations to other threat objects. Mapping is set as {ThreatObject.api_name: [ThreatObject]}

Type

defaultdict(set)

classmethod check_relatable(obj: Any)

Checks if given ThreatObject obj is relatable to this threat object.

Parameters

obj (ThreatObject) – ThreatObject to check if this threat object can be related to.

Raises
get_markup(with_relations: bool = False)

Get markup content for this threat object.

Parameters

with_relations (bool, optional) – Defaults to False. Whether this threat object’s relations should be included with the markup.

Returns

markup for this threat object

Return type

str

merge_overwrite_params = ()

Class attribute, attributes on the ThreatObject that are explictly overwritten during _merge()

Type

(tuple[str])

async push(now=False, *, group=None, update='id')

Push the threat object to the ThreatQ API. Object will be passed to the batcher associated with the threat object.

Parameters
  • now (bool, optional) – Defaults to False. Specifies whether the push should be triggered immediately or left up to the batcher to determine activation.

  • group (str, optional) – Defaults to None. A run_uuid can be passed to group that will be associated with the threat object on creation within the ThreatQ API.

  • update (str|bool, optional) – Defaults to ‘id’. When supplied, the threat object will be updated based on the creation response from the API, updating the threat object’s id to be equal to the id created by the API.

Returns

Future representing the completed push of the threat object.

Return type

asyncio.Future

async push_relationships(objects: Optional[Iterable[threatq.core.models.base.ThreatObject]] = None, *, group=None, id_strict=True, now=False)

Push the threat object’s relations to the ThreatQ API. Object will be passed to the batcher associated with the threat object.

Parameters
  • objects (list[ThreatObject, optional) – Defaults to None. Related objects to post to the ThreatQ API. If None, the objects found within relations will be used.

  • group (str, optional) – Defaults to None. A run_uuid can be passed to group that will be associated with the threat object on creation within the ThreatQ API.

  • id_strict (bool, optional) – Defaults to True. When supplied, will raise error when attempting to push related objects that do not have id’s. When set to false, related objects with no id values will be skipped and not pushed.

  • now (bool, optional) – Defaults to False. Specifies whether the push should be triggered immediately or left up to the batcher to determine activation.

Returns

Future representing the completed push of the threat objects.

Return type

asyncio.Future

Raises
abstract reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

relate(threat_objects: Union[Iterable[threatq.core.models.base.ThreatObject], threatq.core.models.base.ThreatObject])

Relate this threat object to each ThreatObject in given list threat_objects.

Parameters

threat_objects (list[ThreatObject] | ThreatObject) – Threat object(s) to relate to this threat object.

related(obj: threatq.core.models.base.ThreatObject)

Check if given ThreatObject obj is related to this threat object.

Parameters

obj (ThreatObject) – ThreatObject to check relations for.

Returns

if obj is already related and present in this threat objects relations. False: if obj is not found in this threat object’s relations.

Return type

True

unrelate(threat_objects: Optional[Union[Iterable[threatq.core.models.base.ThreatObject], threatq.core.models.base.ThreatObject]] = None)

Unrelate this threat object from each ThreatObject in given list threat_objects.

Parameters

threat_objects (list[ThreatObject] | ThreatObject) – Threat object(s) to unrelate from this threat object.

Built-in Models

Until existing built-in models have support to completely be loaded from object definitions returned by the API, these models will exist in the framework. However, direct use of them should be considered to be deprecated in favor of using the Models Asphalt resource.

Adversaries
class threatq.core.models.Adversary(name: str, description: Optional[str] = None, **kwargs)

Bases: threatq.core.models.base.ThreatObject

A threat adversary.

Parameters
  • name (str) – Name for the adversary.

  • description (str) – Adversary description.

name

The name of the adversary.

Type

str

description

The description of the Adversary.

Type

str

published_at

Publish date of the adversary.

Type

arrow.Arrow | str | int, optional

reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

Events
class threatq.core.models.events.Event(type: Union[str, threatq.core.models.events.EventType], title: str, happened_at: Union[arrow.arrow.Arrow, str, int], description: str, **kwargs)

Bases: threatq.core.models.base.ThreatObject

A threat event.

Parameters
  • type (str | EventType) – The type of the event.

  • title (str) – Title for the event.

  • description (str) – Description of the event.

type

The type of the event.

Type

EventType

title

Title of the event.

Type

str

description

Description of the event.

Type

str

published_at

Publish date of the event.

Type

arrow.Arrow | str | int, optional

api_collection = 'events'
api_name = 'event'
merge_overwrite_params = ('description',)
reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

unique_params = ('title', 'happened_at')
class threatq.core.models.events.EventType(name)

Bases: object

A threat event type.

Parameters

name (str) – Name of the threat event type.

name

Name of the threat event type.

Type

str

Indicators
class threatq.core.models.indicators.Indicator(type: Union[str, threatq.core.models.indicators.IndicatorType], value, *, status: Optional[Union[str, threatq.core.models.indicators.IndicatorStatus]] = None, description: Optional[str] = None, **kwargs)

Bases: threatq.core.models.base.ThreatObject

A threat indicator.

Parameters
  • type (str | IndicatorType) – The type of the indicator.

  • value (object) – The indicator value. Valid values depend on the indicator type.

  • status (str | IndicatorStatus) – The indicator status.

  • description (str) – Description of the indicator.

type

The type of the indicator.

Type

IndicatorType

value

The indicator value. Valid values depend on the indicator type.

Type

Any

status

The indicator status.

Type

str | IndicatorStatus

description

Description of the indicator.

Type

str

published_at

Publish date of the indicator.

Type

arrow.Arrow | str | int, optional

api_collection = 'indicators'
api_name = 'indicator'
merge_overwrite_params = ('description',)
reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

unique_params = ('type', 'value')
class threatq.core.models.indicators.IndicatorStatus(name)

Bases: object

An indicator threat status.

Parameters

name (str) – Name of the indicator threat status.

name

Name of the indicator threat status.

Type

str

class threatq.core.models.indicators.IndicatorType(name)

Bases: object

An indicator threat type.

Parameters

name (str) – Name of the indicator threat type.

name

Name of the indicator threat type.

Type

str

Signatures
class threatq.core.models.signatures.Signature(type: Union[str, threatq.core.models.signatures.SignatureType], name, description, value, *, status: Optional[Union[str, threatq.core.models.signatures.SignatureStatus]] = None, **kwargs)

Bases: threatq.core.models.base.ThreatObject

A threat signature.

Parameters
  • type (str | SignatureType) – The type of the signature.

  • name (str) – Name for the signature.

  • description (str) – Description of the signature.

  • status (SignatureStatus) – The signature threat status

type

The type of the signature.

Type

SignatureType

name

Name of the signature.

Type

str

description

Description of the signature.

Type

str

published_at

Publish date of the signature.

Type

arrow.Arrow | str | int, optional

status

The signature threat status

Type

SignatureStatus

api_collection = 'signatures'
api_name = 'signature'
api_type_url = 'signature/types'
merge_overwrite_params = ('description',)
reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

unique_params = ('type', 'value')
class threatq.core.models.signatures.SignatureStatus(name)

Bases: object

A Signature threat status.

Parameters

name (str) – Name of the Signature threat status.

name

Name of the Signature threat status.

Type

str

class threatq.core.models.signatures.SignatureType(name)

Bases: object

A threat signature type.

Parameters

name (str) – name of the threat signature type.

Threat Files

The ThreatFile represents an Attachment in the ThreatQ API.

class threatq.core.models.threat_files.ThreatFile(type: Union[str, threatq.core.models.threat_files.ThreatFileType], title, name, attributes=(), *, malware_locked: bool = False, content: Optional[Union[aiohttp.streams.StreamReader, str, bytearray, bytes]] = None, mime_type: Optional[str] = 'text/plain', content_type_id: Optional[int] = None, created_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None, description: Optional[str] = None, file_size: Optional[int] = None, hash: Optional[str] = None, touched_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None, type_id: Optional[int] = None, updated_at: Optional[Union[arrow.arrow.Arrow, str, int]] = None, **kwargs)

Bases: threatq.core.models.base.ThreatObject

A threat file. ThreatFiles support creation as placeholder files by initializing a ThreatFile without specifying content.

Parameters
  • type (str | ThreatFileType) – The type of the threat file.

  • title (str) – Title for the threat file.

  • name (str) – Name for the threat file.

  • tags (list[Tag], optional) – Tags to be added to the threat file.

  • malware_locked (bool, optional) – Whether the ThreatFile should be locked to prevent accidental downloading of possible malware files.

  • content (aiohttp.StreamReader | str | bytearray | bytes, optional) – File content. Links directly with the placeholder attribute. File content can be supplied as a plain text str or as binary file content via an aiohttp.StreamReader.

  • mime_type (str, optional) – Mime type of the threat file needed for upload into the ThreatQ API. Defaults to text/plain.

  • content_type_id (int, optional) – Id of the threat file’s content type. Supplied by ThreatQ API.

  • created_at (arrow.Arrow | str | int, optional) – Date the threat file was created. Supplied by ThreatQ API.

  • description (str, optional) – Description associated with the threat file. Supplied by ThreatQ API.

  • file_size (int, optional) – Size in bits of the threat file’s content. Supplied by ThreatQ API.

  • hash (str, optional) – MD5 hash of the threat file. Supplied by ThreatQ API.

  • touched_at (arrow.Arrow | str | int, optional) – Last time the threat file was touched. Supplied by ThreatQ API.

  • type_id (int, optional) – ID of the threat file’s type. Supplied by ThreatQ API.

  • updated_at (arrow.Arrow | str | int, optional) – Last time the threat file was updated. Supplied by ThreatQ API.

type

The type of the threat file.

Type

ThreatFileType

title

Title of the threat file.

Type

str

name

Name of the threat file. This name is calculated based on the placeholder attribute. If the threat file is a placeholder, pending- is prepended to the name & .txt is appended to the name. If the threat file is not a placeholder, the name is returned as supplied.

Type

str

tags

Tags of the threat file.

Type

list[Tag]

malware_locked

Whether the threat file is locked in ThreatQ API to prevent downloading of potential malware.

Type

bool

placeholder

Whether the threat file is a placeholder.

Type

bool

content

File content of the threat file.

Type

aiohttp.StreamReader | str | bytearray | bytes, optional

published_at

Publish date of the threat file.

Type

arrow.Arrow | str | int, optional

api_collection = 'attachments'
api_name = 'attachment'
async fetch()

Returns a response object for obtaining the file’s contents from the ThreatQ API.

Returns

the HTTP fetch’s response object.

Return type

threatq.core.lib.http.ClientResponse

async fulfill_placeholder()

Fulfill placeholder ThreatFile. The ThreatFile should be supplied file content via the content attribute first. The ThreatFile’s placeholder attribute will be set to false before the ThreatFile is PUT to the ThreatQ API.

Returns

Response from ThreatQ API for ThreatFile PUT.

Return type

dict

async classmethod get_placeholder_files(source: str, ctx)

Poll the ThreatQ API for placeholder files for a given source name

Parameters
  • source (str) – Name of the connector/source to find placeholders for.

  • ctx (Context) – Context instance used to poll ThreatQ API.

Returns

List of placeholder ThreatFiles found for given source.

Return type

list

get_post_meta_data()

Get meta data associated with the ThreatFile needed to POST a file to the ThreatQ API.

Returns

Mapping including the ThreatFile’s type, malware_locked, title, name, published_at, placeholder, description, and tlp

Return type

dict

async handle_fulfillment_error(exception: Exception)

Handles an error during placeholder fulfillment. The ThreatFile will have the following actions taken on it before being PUT to the ThreatQ API:

  • name attribute will be reverted back to the original name supplied.

  • placeholder attribute will be set to True

  • The placeholder file associated with the ThreatFile will be downloaded, have its try_count incremented, and have the exception param appended to its error_log

Parameters

exception (Exception) – Exception which occurred during fulfillment.

async push(now=False, *, group=None, update='id', put_data=False)

Push the threat file to the ThreatQ API.

Parameters
  • now (bool, optional) – Defaults to False. Ignored for ThreatFiles as AttachmentPoster always immediately activates.

  • group (str, optional) – Defaults to None. A run_uuid can be passed to group that will be associated with the ThreatFile on creation within the ThreatQ API.

  • update (str|bool, optional) – Defaults to ‘id’. When supplied, the ThreatFile object will be updated based on the creation response from the API, updating the ThreatFile’s id to be equal to the id created by the API.

  • put_data (bool, optional) – Defaults to False. Indicates whether the ThreatFile push to the API should be POST’ed or PUT. ThreatFiles are initially created via POST and updated via PUT.

Returns

Future representing the completed push of the ThreatFile.

Return type

asyncio.Future

push_url = URL('attachments')
reference()

Get a dictionary mapping of reference information for threat object.

Returns

Mapping including reference information for the threat object.

Return type

dict

unique_params = ('title',)
class threatq.core.models.threat_files.ThreatFileType(name)

Bases: object

A threat file type.

Parameters

name (str) – Name of the threat file type.

name

Name of the threat file type.

Type

str

Common Field Base Objects

class threatq.core.models.common_fields.CommonFieldObject(*, common: Iterable[threatq.core.models.common_fields.CommonFieldObject] = (), **kwargs)

Bases: threatq.core.models.base.BaseThreatObject

class threatq.core.models.common_fields.CommonMultiFieldObject(*, common: Iterable[threatq.core.models.common_fields.CommonFieldObject] = (), **kwargs)

Bases: threatq.core.models.common_fields.CommonFieldObject

Common Field Objects

class threatq.core.models.common_fields.Attribute(name: str, value: Any, **kwargs)

Bases: threatq.core.models.common_fields.CommonMultiFieldObject

Attribute on a threat object.

Parameters
  • name (str) – Attribute name.

  • value (Any) – Attribute value.

name

Attribute name.

Type

str

value

Attribute value.

Type

Any

class threatq.core.models.common_fields.PublishedAt(timestamp: Optional[Union[arrow.arrow.Arrow, str, int]], **kwargs)

Bases: threatq.core.models.common_fields.CommonFieldObject

class threatq.core.models.common_fields.Tag(name: str, **kwargs)

Bases: threatq.core.models.common_fields.CommonMultiFieldObject

A tag on a threat object.

Parameters
  • name (str) – Tag name.

  • source (str, optional) – Defaults to None. Can be set to a Feed instance.

name

Tag name.

Type

str

source

source of the tag.

Type

str, optional

class threatq.core.models.common_fields.TLP(name: str, **kwargs)

Bases: threatq.core.models.common_fields.CommonFieldObject

Traffic Light Protocol object. Represents the US Cert TLP information relating to data sensitivity.

Parameters

name (str) – Name of the Color of the traffic light protocol

name

Name of the Color of the traffic light protocol.

Type

str

sensitivity_level

A numerical representation of the restriction of the TLP.

Type

int

is_standard

Property indicating if this TLP value is a standard TLP value.

Type

bool

Framework Support

class threatq.core.models.Models(*args, **kwargs)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin, Mapping

Asphalt resource for sharing a mapping of known threat objects. During setup, models are dynamically created from object definitions retrieved from the API.

class threatq.core.models.ModelsComponent(*args, definition: Optional[Any] = None, **kwargs)

Bases: threatq.core.lib.asphalt.Component

async start(ctx: threatq.core.lib.asphalt.Context)

Perform any necessary tasks to start the services provided by this component.

In this method, components typically use the context to:
  • add resources and/or resource factories to it (add_resource() and add_resource_factory())

  • request resources from it asynchronously (request_resource())

It is advisable for Components to first add all the resources they can to the context before requesting any from it. This will speed up the dependency resolution and prevent deadlocks.

Parameters

ctx – the containing context for this component

Operations

Plugin Overview

ThreatQ Pynoceros has the capability to allow for custom operations to be performed on all of the Object Types.

class threatq.core.lib.plugins.APIPlugin(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None, tmpl_contexts: Sequence[Mapping] = (), user_fields: Optional[Mapping[str, str]] = None)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin, threatq.core.lib.templates.TemplateRenderMixin

Define an APIPlugin.

Parameters
  • ctx (Context) – The asphalt context variable to be used for referencing the instance of the asphalt framework.

  • logname (str, optional) – The logname to use for this operation instance. Defaults to None.

author

The person/organization that built this plugin.

Type

str

author_email

The email address of the author that built this plugin.

Type

str

entry_point_group_prefix

This should be overloaded with the specified Python entry point that this plugin should hook onto. Defaults to “threatq.plugins”

Type

str

entry_point_group_name

This is the name of the grouping of entry points that this plugin will be associated with once it is hooked in. Defaults to “api”.

Type

str

external_endpoints

Represents the external endpoints that this operation will need to access. These should be used to configure firewalls.

Type

list(str)

force_include_data_files

List of files in the static files location that should be included with this plugin. Use this if there is a CSV, PDF, or some other supporting information that should be shipped with this operation.

Type

tuple(str)

friendly_name

A more “Human Readable” version of the name of this operation.

Type

str

install_requires

The list of Python3 package dependencies to already have installed prior to installing this plugin.

Type

list(str)

min_threatq_version

The version of the ThreatQ Appliance Application code that is minimially required to use this plugin.

Type

str

package_name

The name of the package that this Plugin belongs to.

Type

str

static_logo_file

The name of the file that is in the static files folder that should be used as the logo of this plugin.

Type

str, optional

user_field_spec

List of defined user fields to allow configuration to be provided to this plugin.

Type

tuple(UserFieldSpecItem)

version

The version of this plugin. This should follow the conventions established at semver.org

Type

str

async execute_action(action: str, data, user_params: Mapping)

Execute the action on this operation.

This method is called via the command line, and generally not done using the code. The command line equivalent would be:

cat data.json | tq-plugin execute <operation_name> <action_name>
Parameters
  • action (str) – The name of the action to perform

  • data (obj) – The data passed in as parameters for this action.

  • user_params – User-specified action parameters

Raises

PluginExecutionError – If there was a problem during the execution of this plugin, this exception is raised with reasoning.

classmethod get_info()dict

Get the full info for this operation.

Returns

The dictionary mapping of the various aspects of this operation. The dictionary will have the following keys:

Return type

dict

classmethod list_actions()

List the available actions for this operation.

Returns

A list of actions that can be run using this operation.

classmethod resource_exists(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)bool

Determine if the specified resource exists.

This will use the pkg_resources.resource_exists() method to determine if the resource exists. The only difference is that before the data gets passed to pkg_resources.resource_exists(), it will attempt to determine the default base of the resource.

Returns

True if the resource exists, otherwise False.

classmethod resource_filename(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)str

Return the filename for the specified resource.

This will use the pkg_resources.resource_filename() method to get the result, the only difference is that before the data gets passed to pkg_resources.resource_filename(), it will attempt to determine the default base of the resource.

Returns

The filename on disk of the resource.

classmethod resource_isdir(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)bool

Is the named resource a directory?

This will use the pkg_resources.resource_isdir() method to determine if the resource is a directory.. The only difference is that before the data gets passed to pkg_resources.resource_isdir(), it will attempt to determine the default base of the resource.

Returns

True if the resource is a directory, otherwise False.

classmethod resource_listdir(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)List[str]

List the contents of the named resource directory.

Behaves just like os.listdir except that it works even if the resource is in a zipfile. This will use the pkg_resources.resource_listdir() method to get the result. The only difference is that before the data gets passed to pkg_resources.resource_listdir(), it will attempt to determine the default base of the resource.

Returns

Resources in the specified directory.

classmethod resource_stream(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)str

Return resource as a readable file-like object.

This will use the pkg_resources.resource_stream() method to get the readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.

Returns

The resource as a readable file-like object

classmethod resource_string(name, decode: bool = True, encoding: str = 'utf-8', base: Optional[Union[str, pkg_resources.Requirement]] = None)str

Return the specified resource as a string.

This will use the pkg_resources.resource_string() method to get the result, the only difference is that before the data gets passed to pkg_resources.resource_string(), it will attempt to determine the default base of the resource.

Returns

The resource is read in binary fashion, such that the returned string contains exactly the bytes that are stored in the resource.

class threatq.core.lib.plugins.APIPluginResponse(data: Any, markup: Union[threatq.core.lib.markup.Markup, str])

Bases: object

@DynamicAttrs

class threatq.core.lib.plugins.HTTPAPIPlugin(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None, tmpl_contexts: Sequence[Mapping] = (), user_fields: Optional[Mapping[str, str]] = None)

Bases: threatq.core.lib.plugins.APIPlugin

This class will create a plugin to access an HTTP API. It inherits from APIPlugin.

auth

This is the attribute that will hold the class used to describe the way this plugin will need to authenticate with its remote API.

Type

obj

status_text_overrides

Apply values in the start method to describe what text should be applied to HTTP Error Responses to make them more friendly to users of this plugin. By default, they are:

{
    400: "Failed to process request",
    403: "The provided credentials are incorrect.",
    404: "Record not found in data-set.",
    500: "Unknown Server Error.",
    503: "Service Unavailable.",
}
Type

dict

async delete(*args, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP DELETE Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('DELETE', *args, **kwargs)
async execute_action(action: str, data, user_params: Mapping)

Execute the action on this operation.

This method is called via the command line, and generally not done using the code. The command line equivalent would be:

cat data.json | tq-plugin execute <operation_name> <action_name>
Parameters
  • action (str) – The name of the action to perform

  • data (obj) – The data passed in as parameters for this action.

  • user_params – User-specified action parameters

Raises

PluginExecutionError – If there was a problem during the execution of this plugin, this exception is raised with reasoning.

async get(*args, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP GET Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('GET', *args, **kwargs)
get_auth_plugin(name: str, *args, **kwargs)

Authentication Class Factory/Getter Method.

This method should be used to get an authentication class object instance without setting it to be the main auth for the plugin. This may be useful in cases where more than one auth object is needed, such as in a token-based authentication workflow.

Parameters
  • name – The name of the Authentication Class to use. Available options are defined in the threatq.core.http.auth module.

  • *args – Any additional positional arguments to be passed to the auth class.

  • **kwargs – Any additional keyword arguments to be passed to the auth class.

Returns

an authentication object

Return type

threatq.core.lib.http.auth.HTTPAuthBase

classmethod get_info()dict

Get the full info for this operation.

Returns

The dictionary mapping of the various aspects of this operation. The dictionary will have the following keys:

Return type

dict

async get_session()threatq.core.lib.http.ClientSession

Fetch the current client session.

async head(*args, auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP HEAD Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('HEAD', *args, **kwargs)
classmethod list_actions()

List the available actions for this operation.

Returns

A list of actions that can be run using this operation.

async options(*args, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP OPTIONS Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('OPTIONS', *args, **kwargs)
async patch(*args, data=None, json=None, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP PATCH Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('PATCH', *args, **kwargs)
async post(*args, data=None, json=None, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP POST Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('POST', *args, **kwargs)
async put(*args, data=None, json=None, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)

Perform HTTP PUT Request.

This method is a convenience wrapper for HTTPAPIPlugin.request().

It is the same as calling:

HTTPAPIPlugin().request('PUT', *args, **kwargs)
async raise_for_status(response: threatq.core.lib.http.ClientResponse)

Raise error if one occurs.

Parameters

response – The response from the request.

Raises

aiohttp.ClientResponseError – The error that occurred (if it did)

async request(method: str, *args, response_decode: Optional[str] = 'auto', auth: Optional[threatq.core.lib.http.auth.base.HTTPAuthBase] = <unset>, **kwargs)threatq.core.lib.plugins.ParsedHTTPResponse

Perform HTTP Request.

Parameters
  • method – The method name to use for this request.

  • *args – Any additional positional arguments to be passed to the request method.

  • response_decode – How should we go about decoding the response. Possible values: “auto”: Attempt to determine the response data type automatically. It looks at the content_type value. “json”: Convert the response text using the json method. “binary”: Attempt to read the binary file response from the request. “text”: Return just the plain text of the response without any parsing.

  • auth (optional) – Override the class’s set auth method. If unsupplied, it will use the class definition.

  • **kwargs

    Any additional optional keyword arguments to be passed to the request method.

    Notable keyword arguments that are specific to this request method (and thus do not apply to aiohttp.ClientSession.request()):

    • verify_host_ssl - Defaults to True. Boolean specifying whether the provider’s certificate and hostname is verified for each request. Applicable only to https URLs. Favor using this keyword argument over the keyword argument verify_ssl.

    • host_ca_certificate - Defaults to None. String specifying a base64 PEM encoded CA Certificate Bundle to verify the provider’s SSL certificate against. If not provided, the operating system’s default CA Certificate Bundle is used. Applicable only to https URLs.

Raises
Returns

The response from the request.

classmethod resource_exists(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)bool

Determine if the specified resource exists.

This will use the pkg_resources.resource_exists() method to determine if the resource exists. The only difference is that before the data gets passed to pkg_resources.resource_exists(), it will attempt to determine the default base of the resource.

Returns

True if the resource exists, otherwise False.

classmethod resource_filename(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)str

Return the filename for the specified resource.

This will use the pkg_resources.resource_filename() method to get the result, the only difference is that before the data gets passed to pkg_resources.resource_filename(), it will attempt to determine the default base of the resource.

Returns

The filename on disk of the resource.

classmethod resource_isdir(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)bool

Is the named resource a directory?

This will use the pkg_resources.resource_isdir() method to determine if the resource is a directory.. The only difference is that before the data gets passed to pkg_resources.resource_isdir(), it will attempt to determine the default base of the resource.

Returns

True if the resource is a directory, otherwise False.

classmethod resource_listdir(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)List[str]

List the contents of the named resource directory.

Behaves just like os.listdir except that it works even if the resource is in a zipfile. This will use the pkg_resources.resource_listdir() method to get the result. The only difference is that before the data gets passed to pkg_resources.resource_listdir(), it will attempt to determine the default base of the resource.

Returns

Resources in the specified directory.

classmethod resource_stream(name, base: Optional[Union[str, pkg_resources.Requirement]] = None)str

Return resource as a readable file-like object.

This will use the pkg_resources.resource_stream() method to get the readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.

Returns

The resource as a readable file-like object

classmethod resource_string(name, decode: bool = True, encoding: str = 'utf-8', base: Optional[Union[str, pkg_resources.Requirement]] = None)str

Return the specified resource as a string.

This will use the pkg_resources.resource_string() method to get the result, the only difference is that before the data gets passed to pkg_resources.resource_string(), it will attempt to determine the default base of the resource.

Returns

The resource is read in binary fashion, such that the returned string contains exactly the bytes that are stored in the resource.

setup_auth_plugin(name: str, *args, **kwargs)

Authentication Class Factory/Setter Method.

This method should be used to set the main authentication class of a plugin instead of interacting with the classes directly. The main authentication class of a plugin is implicitly injected into any HTTP requests made by the plugin. This will help prevent breaking changes to Authentication classes to existing plugins.

Parameters
  • name – The name of the Authentication Class to use. Available options are defined in the threatq.core.http.auth module.

  • *args – Any additional positional arguments to be passed to the auth class.

  • **kwargs – Any additional keyword arguments to be passed to the auth class.

class threatq.core.lib.plugins.ParsedHTTPResponse(data, response)

Bases: tuple

count(value)integer return number of occurrences of value
property data

Alias for field number 0

index(value[, start[, stop]])integer return first index of value.

Raises ValueError if the value is not present.

property response

Alias for field number 1

exception threatq.core.lib.plugins.PluginExecutionError(phase: str, message: Optional[str] = None, exc: Optional[Exception] = None)

Bases: RuntimeError, threatq.core.lib.plugins.APIPluginResponse

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception threatq.core.lib.plugins.PluginNonexistentActionError(phase: str, message: Optional[str] = None, exc: Optional[Exception] = None)

Bases: threatq.core.lib.plugins.PluginExecutionError

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

threatq.core.lib.plugins.action(accepts=None, parameters=None, static_logo_file=None, help='')

Decorator used for defining a method as an action for a plugin.

This should be applied to the methods in the instance of the Plugin class that will be identified as being able to run as actions for that Operation.

Parameters
  • accepts (dict, optional) – A dictionary of Models and Sub-types that this action will be able to be performed against.

  • parameters (list, optional) – A list of specification dicts describing user-supplied execution parameters for the action.

  • static_logo_file (str, optional) – A string that represents the name of the file that is put into the static directory included with this operation’s package.

  • help (str, optional) – This is used to add some further context as to what this action will do.

Raises

TypeError – If the decorated method was not defined using the async flag.

Examples

Note that the key for accepts is an actual Model class, not a string.

>>> @action(
    accepts={ Indicator: ('FQDN', 'IP Address') },
    static_logo_file='logo.png',
    help="You can totally do this thing"
)
async def fun_action(self):
    # Do the thing
    return
threatq.core.lib.plugins.isaction(meth)

Is the method an action?

Parameters

meth (func) – The method that you want to check if it is an action.

Returns

True if method is an action

Reporting

Reporting Overview

The functionality of the Report Definition section of a Feed Definition YAML is encompassed by the following classes.

class threatq.dynamo.feeds.reporter.FeedThreatReporter(feed_run: threatq.dynamo.feeds.common.FeedRun, definition: Mapping, *args, **kwargs)

Entry class for threat object parsing and reporting. This class handles incoming data objects by pushing them into a FeedThreatReport object and forwarding each resulting object set down the pipeline.

Parameters
  • feed_run – the feed run object owning this reporter

  • definition – data structure defining parsing rules to extract threat objects from incoming pipeline data

class threatq.dynamo.feeds.reporter.FeedThreatReport(definition: Mapping, threat_data, **kwargs)

Generates a set of threat objects by parsing a specified data structure according to the rules set out in a report definition, including defined relationships.

Parameters
  • definition – Definition data specifying the parsing of threat objects from the data. The structure would normally include Jinja2 templates to be rendered as part of the parsing process.

  • threat_data – data being parsed. It will be included in Jinja2 template contexts as data.

async get_objects()Set[threatq.core.models.base.ThreatObject]

Execute the parsing process, building out threat objects from the data, including their relationships, then return the entire set.

Returns

The resulting set of threat objects.

Sources

Source Overview

The following Sources are available for use within the source section of a CDF definition.

Base Source

class threatq.dynamo.feeds.sources.base.RequestRecorder(*args, logname: Optional[str] = None, **kwargs)

Bases: abc.ABC, threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin

Base class for recording sources.

abstract async record_configurations(private_data: Optional[Any] = None, **configurations)

Writes the configuration data into a file

Parameters
  • private_data

  • configurations

Returns

abstract async record_request(request_data: Optional[Any] = None, **request_data_items)

Writes the request data into a file

Parameters
  • request_data

  • request_data_items

Returns

abstract async record_response(response_data: Optional[Any] = None, **response_data_items)

Writes the response data into a file

Parameters
  • response_data

  • response_data_items

Returns

class threatq.dynamo.feeds.sources.base.FeedSource(*args, request_recorder: Optional[threatq.dynamo.feeds.sources.base.RequestRecorder] = None, new_task: Callable = <function ensure_future>, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Bases: threatq.dynamo.feeds.base.DynamoFeedElement

Base class for retrieving feed data from a source, e.g. http, io, taxii.

Recorders

class threatq.dynamo.feeds.sources.base.NullRequestRecorder(*args, logname: Optional[str] = None, **kwargs)

Bases: threatq.dynamo.feeds.sources.base.RequestRecorder

A RequestRecorder that does not record requests or responses.

async record_configurations(private_data: Optional[Any] = None, **configurations)

Writes the configuration data into a file

Parameters
  • private_data

  • configurations

Returns

async record_request(request_data: Optional[Any] = None, **request_data_items)

Writes the request data into a file

Parameters
  • request_data

  • request_data_items

Returns

async record_response(response_data: Optional[Any] = None, **response_data_items)

Writes the response data into a file

Parameters
  • response_data

  • response_data_items

Returns

class threatq.dynamo.feeds.sources.base.FileRequestRecorder(run_uuid, storage_directory, ctx: threatq.core.lib.asphalt.Context)

Bases: threatq.dynamo.feeds.sources.base.RequestRecorder

A RequestRecorder that is used for writing feed data to a file

async record_configurations(private_data: Optional[Any] = None, **configurations)

Writes the configuration data into a file

Parameters
  • private_data

  • configurations

Returns

async record_request(request_data: Optional[Any] = None, **request_data_items)

Writes the request data into a file

Parameters
  • request_data

  • request_data_items

Returns

async record_response(response_data: Optional[Any] = None, **response_data_items)

Writes the response data into a file

Parameters
  • response_data

  • response_data_items

Returns

File

class threatq.dynamo.feeds.sources.io.IOSource(*args, request_recorder: Optional[threatq.dynamo.feeds.sources.base.RequestRecorder] = None, new_task: Callable = <function ensure_future>, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Bases: threatq.dynamo.feeds.sources.base.FeedSource

A source that yields the contents of a file. Useful for testing and reproducing feed runs from a given file of response data.

Parameters
  • file (str | int | TextIOBase | LazyFile) – Filename, file descriptor, or file stream from which to read content

  • mode (str) – File opening mode. Defaults to reading text files ("r").

  • yield_chunk (bool) – Optional, defaults to False. Specifies whether the source should yield a “chunk” of text (based on the chunk_size parameter), or the full text.

  • chunk_size (int) – Optional, defaults to 1000. The number of lines to read into memory as a chunk of data.

  • record_responses (bool) – Optional, defaults to False. Whether or not the feed should record the file”s data in the activity log of the feed.

entry_points = ('file',)
fetch()

Fetch data from a file specified via file.

Returns

Async Generator yielding the file’s content

Return type

AsyncGenerator

class threatq.dynamo.feeds.sources.io.LineIOSource(*args, request_recorder: Optional[threatq.dynamo.feeds.sources.base.RequestRecorder] = None, new_task: Callable = <function ensure_future>, logname: Optional[str] = None, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Bases: threatq.dynamo.feeds.sources.io.IOSource

A source that yields the contents of a file line-by-line. Useful for testing and reproducing feed runs from a given file of response data.

Parameters
  • file (str | int | TextIOBase | LazyFile) – Filename, file descriptor, or file stream from which to read content

  • strip (bool) – If true (the default), lines will be stripped of leading and trailing whitespace.

  • mode (str) – File opening mode. Defaults to reading text files ("r").

entry_points = ('file-lines',)

HTTP

class threatq.dynamo.feeds.sources.http.HTTPFeedSource(*args, **kwargs)

Bases: threatq.dynamo.feeds.sources.base.FeedSource

HTTP Source Element used to make HTTP requests to various providers. Largely wraps the parameters available in request(). See http source for a detailed explanation of HTTPFeedSource’s usage within a Feed Definition.

Parameters
  • url (str) – URL to send requests to

  • method (str) – Optional, defaults to GET. Method to use when sending requests to url

  • base_url (str) – Optional. If specified, url will be treated as a relative path and appended to base_url in order to get the final target URL to send requests to. base_url is further passed to a HTTPPaginator if one is specified.

  • params (dict) – Optional. Dictionary of query string parameters. Supports multiple values for a single key by formatting the final URL as such: given {'key': [1,2,3]}, the resulting query string will be ?key=1&key=2&key=3.

  • data (dict) – Optional. Dictionary of body data to send with the request.

  • headers (dict) – Optional. Dictionary of header name and value pairs

  • request_content_type (str) – Optional. Denotes the request Content-Type header if specified.

  • response_content_type (str) – Optional. Necessary in some cases to help processing determine the content type of a response. Sometimes, a provider will supply an incorrect or undesirable content type with its responses (ie. HTML content type instead of plain/text). If HTTPFeedSource cannot appropriately determine the response’s content type, the response will be yielded as a Stream.

  • auth (dict) – Optional. Definition of an authentication Element

  • pagination (dict) – Optional. Definition of a HTTPPaginator

  • compress (bool) – Optional, defaults to None. If True, the request will be compressed with deflate encoding.

  • chunked (int) – Optional, defaults to None. Enables chunked transfer encoding

  • status_code_handlers (dict) – Optional. Can be used to define response status codes and how to handle them. Currently, one can only specify to ignore a response given its status code - effectively dropping the response.

  • expect100 (bool) – Optional, defaults to False. If True, aiohttp will expect the request to return a 100 Continue response from the server.

  • host_ca_certificate (str) – Optional, defaults to None. Specifies a base64 PEM encoded CA Certificate Bundle to verify the provider’s SSL certificate against. Applicable only to https URLs.

  • verify_host_ssl (bool) – Optional, defaults to True. Specifies whether the provider’s certificate and hostname is verified for each request. Applicable only to https URLs.

  • disable_proxies (bool) – Optional, defaults to False. Specifies whether configured proxies should be ignored when making HTTP requests.

  • total_timeout (int) – Optional, defaults to 119 seconds. Specifies the total timeout of the HTTP request, including connection establishment, request sending, and response reading.

Warning

The status_code_handlers attribute is not yet implemented.

entry_points = ('http',)
fetch()

Fetch data from the server specified via url.

Returns

Async Generator yielding response values returned by the HTTP request(s)

Return type

AsyncGenerator

TAXII

class threatq.dynamo.feeds.sources.taxii.TAXIIFeedSource(*args, **kwargs)

Bases: threatq.dynamo.feeds.sources.base.FeedSource

TAXII source Element used to make requests to a TAXII source utilizing TAXIIClient. See taxii source for a detailed explanation of TAXIIFeedSource’s usage within a Feed Definition.

Parameters
  • discovery_url (str) – URL of the TAXII server’s discovery service

  • collection_name (str) – Name of the collection to poll from the TAXII server

  • auth (dict) – Optional, defaults to None. Definition of a CabbyAuth or BasicAuth authentication.

  • disable_proxies (bool) – Optional, defaults to False. Specifies whether proxy use should be disabled when making TAXII requests.

  • headers (dict) – Optional, defaults to None. Dictionary of optional header key and value pairs to be sent with requests to the TAXII Server.

  • poll_url (str) – Optional, defaults to None. URL of the collection to poll on the TAXII server.

  • since (str | int | Arrow) – Optional, defaults to None. Start time to poll the TAXII server with.

  • until (str | int | Arrow) – Optional, defaults to None. End time to poll the TAXII server with.

  • verify_ssl (bool) – Optional, defaults to True. Specifies whether the TAXIIClient should attempt to verify the provider’s SSL certificate against public CA Certificate Bundles.

  • host_ca_certificate (str) – Optional, defaults to None. Specifies a CA Certificate Bundle to verify a provider’s SSL certificate against. If specified, verify_ssl is effectively ignored, and SSL verification is checked against this specified certificate rather than public CA Bundles. Currently only applies for TAXII Server versions 1.0 or 1.1.

  • version (str) – Optional, defaults to 1.1. Specifies the TAXII version of the server being polled. Available versions are 1.0, 1.1, and 2.0

  • paginate (int) – Optional, defaults to 1000. Defines the number of objects per paginated requests to a TAXII Server. Currently only applicable to TAXII 2.0 Servers.

client

TAXIIClient instance used to poll the provider

Type

TAXIIClient

entry_points = ('taxii',)
fetch()

Fetch data from the TAXII server specified via discovery_url.

Returns

Async Generator yielding response values returned by the TAXII Server requests

Return type

AsyncGenerator

Threat Collection

class threatq.dynamo.feeds.sources.threat_collection.ThreatCollectionFeedSource(*args, ctx: Optional[threatq.core.lib.asphalt.Context] = None, **kwargs)

Bases: threatq.dynamo.feeds.sources.base.FeedSource

Threat Collection Source Element used to make an API query against the system’s ThreatQ Threat Library. One can provide either a well-formed API query (api_query) or a saved Threat Library search’s Data Collection hash (collection_hash). If neither are provided, all objects in the system’s ThreatQ Threat Library are returned (use with caution). One can also narrow down what object collection names to query against (object_collections) (all by default), which fields (object_fields_mapping) and contexts (object_contexts_mapping) are returned for a given object collection name (all by default), and whether returned threat objects were last touched in a given time range defined by since and until (no filtering by default). See Threat Collection Source for a detailed explanation of ThreatCollectionFeedSource’s usage within a Feed Definition.

Parameters
  • collection_hash (str) – Optional, defaults to None. A saved ThreatQ Threat Library search’s Data Collection hash. Collection hashes can be obtained from the ThreatQ API endpoint GET /search/query. A collection hash is resolved to an API query, which is then provided as the request payload for the set of ThreatQ API endpoints POST /{object_collection}/query. If collection_hash is provided, api_query cannot also be provided; these arguments are mutually exclusive. Only a single collection hash can be provided. If neither collection_hash nor api_query is provided, all objects in the system’s ThreatQ Threat Library are returned (use with caution).

  • api_query (str | dict) – Optional, defaults to None. A serialized (str) or deserialized (dict) API query. An example API query looks like: {"criteria": {"+or": [{"mentions": "mailer"}]}, "filters": {"+and": [{"+or": [{"type_name": "X-Mailer"}]}]}}. A provided API query does not need to have a matching Data Collection (saved search) entry in the system’s ThreatQ instance. The API query is provided as the request payload for the set of ThreatQ API endpoints POST /{object_collection}/query. If api_query is provided, collection_hash cannot also be provided; these arguments are mutually exclusive. Only a single API query can be provided. If neither collection_hash nor api_query is provided, all objects in the system’s ThreatQ Threat Library are returned (use with caution).

  • chunk_size (int) – Optional, defaults to 1000. The number of threat objects to return in each paginated response from the set of ThreatQ API endpoints POST /{object_collection}/query.

  • objects_per_run (int) – Optional, default is set in GET /configuration. Allows a specification of the number of objects permitted in a given Feed Run.

  • yield_chunk (bool) – Optional, defaults to False. Specifies whether the source should yield individual threat objects from each paginated chunk (default behavior) or whether the source should yield the entire list of threat objects from each paginated chunk (if provided as True). The latter may be useful in the future if there is a downstream filter that operates on bulk threat objects.

  • object_collections (list[str]) – Optional, defaults to None. A list of object collection names. This list narrows down which object collections the provided (api_query) or resolved (collection_hash) API query is executed against. For example, if one wishes to return all MD5 indicators, then one should provide a list with a single “indicators” element in it since it would not make sense to query against non-indicator object collections. Defaults to querying against all object collection endpoints. Valid object collection names can be obtained from the “collection” field for objects returned from the ThreatQ API endpoint GET /objects. Some examples of valid object collection names: indicators, adversaries, events, attachments, signatures, malware, course_of_action, tool, etc.

  • object_fields_mapping (dict[str, list[str]]) – Optional, defaults to None. A mapping of object collection names to a list of field names to include within threat objects returned from executing the query. By default, most fields are returned. Due to the volume of data that could be returned, it is recommended to provide only the fields that are needed for each object collection that is being queried against. Fields are typically scalar properties of a threat object. For example: value, type, happened_at, published_at, description, etc. Since fields may only be relevant depending on the given object collection, the list of fields must be keyed by an object collection name. By default, fields id and threatq_object_type (added by the source) are always returned.

  • object_contexts_mapping (dict[str, list[str]]) – Optional, defaults to None. A mapping of object collection names to a list of context names to include within threat objects returned from executing the query. By default, most contexts are returned. Due to the volume of data that could be returned, it is recommended to provide only the contexts that are needed for each object collection that is being queried against. Contexts are typically complex objects containing their own scalar fields. For example: sources, adversaries, attributes, etc. Note: adversaries are the only object relation context supported by the ThreatQ API today. Since contexts may only be relevant depending on the given object collection, the list of contexts must be keyed by an object collection name.

  • object_sort_mapping (dict[str, list[str]]) – Optional, defaults to None. A mapping of object collection names to a list of field names prefixed with + (ascending order) or - (descending order). The default sort order is by ascending IDs (+id). If multiple sort fields are provided, the sorting is applied for the first sort field, and if the sorting results in a conflict (for example, the objects have the same type name), the second sort field is applied, and so on. Since fields may only be relevant depending on the given object collection, the list of sort fields must be keyed by an object collection name.

  • since (str | int | Arrow) – Optional, defaults to None. A threat object’s last touched datetime must be greater than or equal to a provided value.

  • until (str | int | Arrow) – Optional, defaults to None. A threat object’s last touched datetime must be less than or equal to a provided value.

collection_hash

Optional, defaults to None. Stores the rendered collection_hash if it was provided as an argument.

Type

str

api_query

Optional, defaults to {"filters": [], "criteria": []}. Stores the rendered and deserialized api_query.

Type

dict

object_collections

Optional, defaults to an empty set. If empty at the time of initialization, the first call to fetch() will store the set of all object collection names in the system’s ThreatQ instance in this attribute.

Type

set[str]

object_fields_mapping

Optional, defaults to an empty dict. If object_fields_mapping was provided as an argument, the list of field names in each value of the dict is transformed into a CSV string.

Type

dict[str, str]

object_contexts_mapping

Optional, defaults to an empty dict. If object_contexts_mapping was provided as an argument, the list of context names in each value of the dict is transformed into a CSV string.

Type

dict[str, str]

object_sort_mapping

Optional, defaults to an empty dict. If object_sort_mapping was provided as an argument, the list of sort fields in each value of the dict is transformed into a CSV string.

Type

dict[str, str]

since

Optional, defaults to None. Stores the rendered timestamp in the format YYYY-MM-DD HH:mm:ss-00:00 if it was provided as an argument.

Type

str

until

Optional, defaults to None. Stores the rendered timestamp in the format YYYY-MM-DD HH:mm:ss-00:00 if it was provided as an argument.

Type

str

Raises

ValueError – If api_query is provided as a str that is not valid JSON or if api_query and collection_hash are both provided as arguments (they are mutually exclusive).

entry_points = ('threat-collection', 'data-collection')
fetch()

Fetches threat objects from the system’s ThreatQ Threat Library.

Yields

Union[dict, list] – If yield_chunk is True, yields the entire paginated chunk as a list of mappings representing threat objects. If yield_chunk is False, yields one mapping representing a threat object at-a-time from each paginated chunk.

Scoring

Scorer

class threatq.dynamo.scoring.IndicatorScorer(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin

Class used for scoring indicators and starting jobs.

Parameters
  • ctx (threatq.core.lib.asphalt.Context) – The context object where resources are shared

  • logname (str) – name of the log

__init__(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

async reload()

Reload score configuration and start mass scoring

Scoring Dictionaries

class threatq.dynamo.scoring.ScoringDict

Bases: dict

Dictionary for scoring values.

__contains__(key)

True if D has a key k, else False.

__delitem__(key)

Delete self[key].

__getitem__(key)

x.__getitem__(y) <==> x[y]

__setitem__(key, value)

Set self[key] to value.

__weakref__

list of weak references to the object (if defined)

setdefault(k[, d])D.get(k,d), also set D[k]=d if k not in D
class threatq.dynamo.scoring._ScorableIndicators

Bases: dict

Dictionary for holding scorable indicators.

__weakref__

list of weak references to the object (if defined)

Scoring Classes

class threatq.dynamo.scoring.IndicatorScorerConfig(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None)

Bases: threatq.core.lib.logging.InstanceLoggingMixin, threatq.core.lib.asphalt.ContextRefMixin

Handles (re)loading score configuration from the threatquotient2 MariaDB database.

__init__(ctx: threatq.core.lib.asphalt.Context, logname: Optional[str] = None)

Initialize self. See help(type(self)) for accurate signature.

property is_ready

Indicates whether score configuration is loaded and that scoring jobs can run

class threatq.dynamo.scoring._IndicatorScoringJob(name: str, database_pool, score_config: threatq.dynamo.scoring.IndicatorScorerConfig, logger, indicator_ids: Optional[Iterable[int]] = None, progress_log_level: int = 9, score_batch_size: int = 1000)

Bases: object

Class used for creating a scoring indicator job.

Parameters
  • name (str) – name of the job

  • database_pool – pool from the threatquotient2 MariaDB database

  • score_config (IndicatorScorerConfig) – score configuration from the threatquotient2 MariaDB database.

  • logger – the logger for the job

  • indicator_ids – The IDs of the indicators

  • progress_log_level (int) – log level for the given logger

  • score_batch_size (int) – limit on the scores to be retrieved from the database

__init__(name: str, database_pool, score_config: threatq.dynamo.scoring.IndicatorScorerConfig, logger, indicator_ids: Optional[Iterable[int]] = None, progress_log_level: int = 9, score_batch_size: int = 1000)

Initialize self. See help(type(self)) for accurate signature.

__weakref__

list of weak references to the object (if defined)

Scoring Components

class threatq.dynamo.scoring.ScoringComponent(*args, definition: Optional[Any] = None, **kwargs)

Bases: threatq.core.lib.asphalt.ContainerComponent

Class used to initialize the scoring components, e.g. IndicatorScorer, and uses the given context as shared resources.

Parameters

ctx (threatq.core.lib.asphalt.Context) – The context object where resources are shared

async start(ctx: threatq.core.lib.asphalt.Context)

Create child components that have been configured but not yet created and then calls their start() methods in separate tasks and waits until they have completed.

Core

class threatq.dynamo.core.ApplicationComponent(**config)

Bases: threatq.core.lib.asphalt.ContainerComponent

Class used to initialize the application’s components, e.g. feed/message manager, and uses the given context as shared resources to run the components.

Parameters

ctx (threatq.core.lib.asphalt.Context) – The context object where resources are shared

__init__(**config)

Initialize self. See help(type(self)) for accurate signature.

async start(ctx: asphalt.core.context.Context)

Create child components that have been configured but not yet created and then calls their start() methods in separate tasks and waits until they have completed.

CLI

threatq.dynamo.cli.setup_asyncio_debug()

Setup the logger for asyncio to debug.

threatq.dynamo.cli.setup_logging(log_level)

Setup the logging functionality for the CLI.

Parameters

log_level (int) – level of the log

Indices and tables