threatq:sync-export

Current ThreatQ Version Filter

threatq:sync-export

This export command pulls all objects, object context, tags, and object links from the source ThreatQ installation and then stores them in CSV data dump files. You can specify which objects are pulled, based on a date or via configuration. All data pulled into the CSV data dump files can then be transferred to a target air gapped ThreatQ installation for validation and import. Each run of this command also generates a sync report with output logs for the run.

Upon upgrade to ThreatQ 6x, the /var/lib/threatq/agds_transfer directory is created and becomes the default location for exporting and importing AGDS zip files. As such, AGDS commands only need to specify the relative path to the folders you created within this directory for AGDS exports or imports. Then, use the --target parameter to specify the location when exporting the AGDS zip file and the --file parameter to specify the location from which to import the .gz file.

Parameters

The following table outlines the parameters for the command. All parameters for the threatq: sync-export command are optional. If you do not set any parameters, the system runs a default configuration as explained in threatq:sync-export Configuration.

Parameter	Explanation
`--target`	Required value. Target directory where the output file should be placed. Example: `--target=/my/directory` Example: `--target=relative/path`
`--start-date`	Required value. The start date for data selection. Example: -`-start-date="2018-01-01 00:00:00"`
`--end-date`	Required value. The end date for data selection. Applies only to objects, not object context or object links. Example: `--end-date="2018-01-02 00:00:00"`
`--include-deleted`	Determines whether objects that have been soft-deleted are included in the result set. Options are Y(es) or N(o). Default: N Example: `--include-deleted=Y`
`--include-investigations`	Required value. Determines whether Investigations and Tasks are included in the result set. Options are Y(es) or N(o). Default: N Example: `--include-investigations=N`
`--meta-only`	Optional value. If present, tells the command to only include meta data (no object data) in the result set.
`--memory-limit`	Sets the PHP memory limit in megabytes or gigabytes. Default: 2G Example: `--memory-limit=4G`
`--object-limit`	Sets the limit on the number of objects selected at a time. ThreatQuotient recommends that you set the limit to a number smaller than the default (50,000) on boxes with very large data sets. Default: 50,000 Example: `--object-limit=10000`
`--ignore-file-types`	Defines a comma-delimited list of ThreatQ File Types for which physical files stored on the source ThreatQ installation should not be transferred to the target air gapped ThreatQ installation. Database records are still included in the export tarball. Example: `--ignore-file-types="Malware Analysis Report"` Example: `--ignore-file-types="Malware Analysis Report,Malware Sample"`
`--sources`	Filters objects produced in the sync by the sources they include, allowing the user to send out a subset of data that contains a specific source. For objects with multiple sources, other sources are included in the filter if the object contains the user-specified source(s). Multiple sources are also supported in search parameters. Existing CRON Runs: Use the `initial-start-date` option to avoid pulling all historical data. Example: `--sources=“Black Source”`
`--include-all-relationships`	Exports all related data for an object if its source matches the `--sources` parameter value. If so, the command exports the primary object’s relationships to any object on the system regardless of the sources of the related objects and/or the source that created the relationships. Example: `--sources=“Black Source” --include-all-relationships`

Examples

The following examples provide use cases for air gapped data sync.

No Time Limit, Default Configuration

sudo ./artisan threatq:sync-export

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path

This example pulls all objects in the system (with the exception of Investigations, Tasks, and soft-deleted Objects). The output appears in /tmp.

Meta Data Only

sudo ./artisan threatq:sync-export --meta-only

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --meta-only

This example pulls only meta data objects from the system (Attributes, Sources, Object Statuses and Types, and so on).

Time Limit

sudo ./artisan threatq:sync-export --start-date="2018-10-01 00:00:00" --end-date="2018-11-01 00:00:00"

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --start-date="2018-10-01 00:00:00" --end-date="2018-11-01 00:00:00"

This example pulls objects whose updated_at or touched_at occurs between the start and end date.

Exclude Malware Files

sudo ./artisan threatq:sync-export --ignore-file-types="Malware Sample"

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --ignore-file-types="Malware Sample"

This example pulls all objects, but excludes the physical files attached to any File objects with the type Malware Sample. The File objects themselves (as well as their context and relationships) are still included in the export tarball.

Any File Type can be used with this option, and multiple File Types can be included as a comma-delimited list.

sudo ./artisan threatq:sync-export --ignore-file-types="STIX,PDF,Malware Sample"

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --ignore-file-types="STIX,PDF,Malware Sample"

Cron Configuration

sudo ./artisan threatq:sync-export --target=/my/directory --include-deleted=Y --include-investigations=N

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --include-deleted=Y --include-investigations=N

This example searches for a previous synchronization record with the same hash (comprised of the three options provided). If any hash matches are found, the run uses the started_at date of the most recent previous record as the start date for the current run.

If you do not require soft-deleted Objects, Investigations, or Tasks to be transferred to the target ThreatQ installation, then only the --target option is necessary (as the defaults for the other two options are both (N)o).

Initial Cron for First Time Use

Determine what the cron configuration options should be:

Target directory

Investigations/tasks included (Y/N)

Deleted objects included (Y/N)

The cron configuration options must be the same for every run, but they only need to be specified if different from the defaults.

Run the command with the cron configuration options:

php artisan threatq:sync-export --target=/my/directory --include-investigations=Y --include-deleted=N

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --include-investigations=Y --include-deleted=N

Cron Job Configuration

Retrieve the DB password.
kubectl get secret mariadb-root --output go-template='{{range $k,$v := .data}}{{printf " %s: " $k}}{{if not $v}}{{$v}} {{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}' -n threatq
Since you will be prompted for the DB password, run the first time interactively.
kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=test --include-deleted=Y --include-investigations=N
Enter the DB password
Set up cron for the non root user.
sudo crontab -u <non-root user that installed RKE2> -e
Use the following command in the cronjob:
/var/lib/rancher/rke2/bin/kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=test --include-deleted=Y --include-investigations=N

Instructions for Larger Data Sets (Starting from the Beginning of Time)

For larger data sets, it is undesirable to do a full run from the beginning of time (performance will suffer).

ThreatQuotient recommends that you use the --end-date option to specify an upper limit on the date range pulled. Multiple runs will be necessary to process all data up to the current date.

For each of the runs, provide the configuration options along with the --end-date option:

php artisan threatq:sync-export --target=/my/directory --include-investigations=Y --end-date="2017-01-01 00:00:00"

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --include-investigations=Y --end-date="2017-01-01 00:00:00"

Once the current date has been reached, the --end-date option is no longer necessary.

Instructions for Larger Data Sets (Starting from a Specified Date)

For larger data sets, it is undesirable to do a full run from the beginning of time (performance will suffer).

ThreatQuotient recommends that you use the --end-date option to specify an upper limit on the date range pulled. Multiple runs will be necessary to process all data up to the current date.

If only a subset of data needs to be processed up to the current date, then you should use the --initial-start-date option.

For the first run, provide the configuration options along with the --initial-start-date option.

php artisan threatq:sync-export --initial-start-date="2017-01-01 00:00:00" --target=/my/directory --include-investigations=Y --end-date="2017-02-01 00:00:00"

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --initial-start-date="2017-01-01 00:00:00" --target=relative/path --include-investigations=Y --end-date="2017-02-01 00:00:00"

For each of the runs, provide the configuration options along with the --end-date option:

php artisan threatq:sync-export --target=/my/directory --include-investigations=Y --end-date="2017-01-01 00:00:00"

kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=relative/path --include-investigations=Y --end-date="2017-01-01 00:00:00"

Once the current date has been reached, the --end-date option is no longer necessary.

Run Scenarios

Export Success

When a run of this command completes successfully, a tarball of data appears in the target directory you specified (or /tmp by default). A report file describing the run is available in the data tarball, under the /sync directory. There is also a record in the database synchronizations table for the run.

Export Errors

If a run of this command fails before completion, the tarball is not created. There is a data directory in the target directory (where the data is stored before it is compressed) that contains all the data that was processed before the failure. The report file appears in this directory under /sync. Error messages do not appear in the report file. However, they appear in the Laravel log and in the console.

Regardless of whether the run was part of a cron configuration, it can simply be restarted. The cron configuration will look for the last completed run to find the next start date.

Dates

Start Date

A start date is applied to objects according to the column available - touched_at or updated_at.

touched_at Objects

Adversaries, Attachments, Events, Indicators, Signatures, Custom Objects

updated_at Objects

Investigations, Tasks, Object Links, Tagged Objects

End Date

An end date is applied only if you provide one at run time. It is applied everywhere a start date is used.

Configuration

The configuration used for each run of this command consists of the --target, --include_deleted, and --include_investigations command line options and is stored in the config_json column of the Synchronization record. The hash column of each Synchronization record is a MD5 hash of the config_json column.

Default Configuration

The default configuration is used if the command is run with no options provided:

target_directory = /tmp
include_deleted = false
include_investigations = false

In this configuration, the initial run start date defaults to 1970-01-01 00:00:00.

Cron

If the command is run with the --target, --include_deleted, and --include_investigations parameters, the hash of these values is compared against the hash column of previous runs. Using these three options on every run allows for the command to be incorporated into a scheduled task.

If any hash matches are found, the start date for the run is set to the started_at date in the Synchronization record of the previous run with the same hash.

If no hash matches are found, the start date is set to 1970-01-01 00:00:00.

Start Date Provided

If a start date is included in the command run using the --start-date option, any other options also provided are honored. However, if the --target, --include_deleted and --include_investigations options are also included, a Cron check against the hash of these three options does not occur. The start date provided is included in config_json as the manual_start_date so that the run does not collide with any Cron-related runs.

If a "beginning of time" run is necessary, use the option as --start-date="1970-01-01 00:00:00".

Output and Sync Report

The following sections detail the data you may find in the export output and sync report.

Meta Data

Meta data is transferred with every run of this command by default. You can specify that only meta data (no object data) should be pulled in a run by using the --meta-only option.

Meta data includes information about Sources, Attributes, Tags, as well as Object Statuses and Types (both seeded and user-provided).

While meta data like Connectors and Operations are included in this list, they are not installed on the target ThreatQ installation as part of the air gapped data sync process. They are only placed in the requisite tables for use as Sources of Objects that are transferred. The same is true of any Users that are copied - these will not be enabled Users on the target installation; they will be transferred as disabled.

Meta Data Objects

Attributes
Clients
Connectors
Connector Categories
Connector Definitions
Content Types
Groups
Investigation Priorities
<Object Type> Statuses
<Object Type> Types
Other Sources
Operations
Sources
Tags
TLP
Users

Objects

This command covers any objects installed on the system by default, and any custom objects that have been installed by the user. The only objects that can be excluded are Investigations and Tasks (using the --include-investigations command line option).

Custom Objects that are installed on a source ThreatQ installation that have NOT been installed on a target ThreatQ installation will NOT be installed by the air gapped data sync process. If an object is included in the export data, but is not found on the target, it will be ignored.

Default Objects:

Adversaries
Attachments (Files)
Events
Indicators
Signatures
Campaigns
Courses of Actions
Exploit Targets
Incidents
TTPs
Tasks
Assets
Notes
Attack Pattern
Identity
Intrusion Set
Malware
Report
Tool
Vulnerability

Storage:

The data for each object is copied as a dump file in CSV format using "SELECT * INTO OUTFILE..." MariaDB syntax. The full query for the data is built up using the options you provided (start date, end date, etc).

Dump files contain a maximum object limit of 50,000 (set in the Synchronization base class). Dump files are created (with a counter appended to the file name) until the entire object result has been covered.

To ensure that any Objects present in Object Context (Attributes, Comments, and Sources), Object Links, Tagged Objects, or Investigation Timeline Objects are also included in the base Object data, CSV dump files for each Object type are also created from queries against each of these tables. This is necessary because of the differing date columns used in each query (an object may appear in an Object Link in the specified date range according to the Object Link's updated_at date, even though the Objects themselves saw no change to their touched_at date in that date range). When the data from all of these object files is transferred to the target ThreatQ installation, any duplicates across dump files will be consolidated. Files that contain Object data will always include "_obj_" in the file title.

Sample Object File List (all of these files will contain Adversary records):

adversaries/adversaries_obj_0.csv
adversaries/adversaries_obj_attributes_0.csv
adversaries/adversaries_obj_comments_0.csv

adversaries/adversaries_obj_investigation_timelines_0.csv

adversaries/adversaries_obj_object_links_dest_0.csv

adversaries/adversaries_obj_object_links_src_0.csv
adversaries/adversaries_obj_sources_0.csv

adversaries/adversaries_obj_tags_0.csv

Object Context

The date range for queries on Object Context tables uses the updated_at date column, with the exception of Adversary Descriptions, which uses the created_at date column.

Adversary Descriptions are handled as part of the Object Context gathering process. The adversary_descriptions table is queried using the created_at date column, and the entirety of the adversary_description_values table is pulled, as it does not have a date column.

Not all Objects have all Object Contexts (Attributes, Attribute Sources, Comments, and Sources). Tables are only polled if they exist.

Tables Covered for each Object Type:

<object type>_attributes

<object type>_attribute_sources

<object type>_comments

<object type>_sources

Sample Object Context File List (Indicator Object Type):

indicators/indicator_attribute_sources_0.csv

indicators/indicator_attributes_0.csv

indicators/indicator_comments_0.csv

indicators/indicator_sources_0.csv

Other Data

Attachment Files

Physical files for all attachments included in the date range are copied into the attachments/files directory of the data tarball.

Object Links

The date range for queries on Object Links uses the updated_at date column.

Tables Covered (Object Links and Object Link Context):

object_links

object_link_attributes

object_link_attribute_sources

object_link_comments

object_link_sources

Sample Object Link File List:

object_links/object_links_0.csv
object_links/object_link_attributes_0.csv
object_links/object_link_attribute_sources_0.csv
object_links/object_link_comments_0.csv
object_links/object_link_sources_0.csv

Tags

The date range for queries on Tagged Objects uses the updated_at date column.

Tables Covered (Tags themselves are covered in the Meta Data):

tagged_objects

Sample Tagged Objects File List:

tagged_objects/tagged_objects_0.csv

Spearphish

The date range for queries on Spearphish uses the updated_at date column.

Tables Covered:

spearphish

Sample Spearphish File List (Spearphish files are stored with Event data):

events/spearphish_0.csv

Investigations

The date range for queries on additional Investigation context tables uses the updated_at column.

Tables Covered:

investigation_nodes
investigation_node_properties
investigation_timelines
investigation_timeline_objects
investigation_viewpoints

Sample Investigation additional context File List:

investigations/investigation_node_properties_0.csv
investigations/investigation_nodes_0.csv
investigations/investigation_timeline_objects_0.csv
investigations/investigation_timelines_0.csv
investigations/investigation_viewpoints_0.csv

File Output

Data Tarball

Once all data has been processed, a tarball is created containing all output files. This tarball will be dropped in the directory specified in the --target option, or the /tmp directory by default.

Tarball Naming Convention: tqSync_<run date>.tar.gz

Example

tqSync-19-01-16-1547649934-0849.tar.gz

Sync Report

The output for each run is stored in a Sync Report output file, which is located in the sync directory of the data tarball. The file is always named sync-export.txt.

Command Line Output

Command line output displays command progress, object totals, and files written.

Synchronizations

Table

synchronizations

id - The auto-incremented id for the Synchronization record

type - The Synchronization direction (options are "export" or "import")

started_at - The date and time the command run was started

finished_at - The date and time the command run completed

config_json - A JSON representation of the command run configuration

report_json - A JSON representation of the command run parameters (command line options, object counts, files created, etc)

pid - The process id of the command run

hash - Unique identifier for a command run (MD5 hash of the config_json column)

created_at - The date and time the Synchronization record was created

updated_at - The date and time the Synchronization record was updated

Record Handling

Hash

The Synchronization record hash column is automatically calculated as an MD5 of the config_json column on record creation.

Initial Creation

A Synchronization record is created at the beginning of a command run, right after all command line options have been processed. Initial creation only covers the type, started_at, pid, and config_json columns. For this command (threatq:sync-export), the type will be "export". The command line option portion of the report_json is added as well, but this column will not be complete until the record is finalized. The finished_at column remains NULL.

Finalization

A Synchronization record is finalized when the command run has completed. At this time, the finished_at column is filled with the completion datetime, and the report_json column is updated to include information about the run (object counts, files created, etc).