threatq:sync-export
This export command pulls all objects, object context, tags, and object links from the source ThreatQ installation and then stores them in CSV data dump files. You can specify which objects are pulled, based on a date or via configuration. All data pulled into the CSV data dump files can then be transferred to a target air gapped ThreatQ installation for validation and import. Each run of this command also generates a sync report with output logs for the run.
Upon upgrade to ThreatQ 6x, the /var/lib/threatq/agds_transfer directory is created and becomes the default location for exporting and importing AGDS zip files. As such, AGDS commands only need to specify the relative path to the folders you created within this directory for AGDS exports or imports. Then, use the --target
parameter to specify the location when exporting the AGDS zip file and the --file
parameter to specify the location from which to import the .gz file.
Parameters
The following table outlines the parameters for the command. All parameters for the threatq: sync-export
command are optional. If you do not set any parameters, the system runs a default configuration as explained in threatq:sync-export Configuration.
Parameter | Explanation |
---|---|
--target |
Required value. Target directory where the output file should be placed. Example:
--target=/my/directory Example:
--target=relative/path |
--start-date |
Required value. The start date for data selection. Example: - -start-date="2018-01-01 00:00:00"
|
--end-date |
Required value. The end date for data selection. Applies only to objects, not object context or object links. Example: --end-date="2018-01-02 00:00:00"
|
--include-deleted |
Determines whether objects that have been soft-deleted are included in the result set. Options are Y(es) or N(o). Default: N Example: --include-deleted=Y
|
--include-investigations |
Required value. Determines whether Investigations and Tasks are included in the result set. Options are Y(es) or N(o). Default: N Example: --include-investigations=N
|
--meta-only |
Optional value. If present, tells the command to only include meta data (no object data) in the result set. |
--memory-limit |
Sets the PHP memory limit in megabytes or gigabytes. Default: 2G Example: --memory-limit=4G
|
--object-limit |
Sets the limit on the number of objects selected at a time. ThreatQuotient recommends that you set the limit to a number smaller than the default (50,000) on boxes with very large data sets. Default: 50,000 Example: --object-limit=10000
|
--ignore-file-types |
Defines a comma-delimited list of ThreatQ File Types for which physical files stored on the source ThreatQ installation should not be transferred to the target air gapped ThreatQ installation. Database records are still included in the export tarball. Example: --ignore-file-types="Malware Analysis Report" Example: --ignore-file-types="Malware Analysis Report,Malware Sample"
|
--sources |
Filters objects produced in the sync by the sources they include, allowing the user to send out a subset of data that contains a specific source. For objects with multiple sources, other sources are included in the filter if the object contains the user-specified source(s). Multiple sources are also supported in search parameters. Existing CRON Runs: Use theinitial-start-date option to avoid pulling all historical data.Example: --sources=“Black Source”
|
--include-all-relationships |
Exports all related data for an object if its source matches the --sources parameter value. If so, the command exports the primary object’s relationships to any object on the system regardless of the sources of the related objects and/or the source that created the relationships.Example: --sources=“Black Source” --include-all-relationships
|
Examples
The following examples provide use cases for air gapped data sync.
No Time Limit, Default Configuration
This example pulls all objects in the system (with the exception of Investigations, Tasks, and soft-deleted Objects). The output appears in /tmp.
Meta Data Only
This example pulls only meta data objects from the system (Attributes, Sources, Object Statuses and Types, and so on).
Time Limit
This example pulls objects whose updated_at
or touched_at
occurs between the start and end date.
Exclude Malware Files
This example pulls all objects, but excludes the physical files attached to any File objects with the type Malware Sample. The File objects themselves (as well as their context and relationships) are still included in the export tarball.
Any File Type can be used with this option, and multiple File Types can be included as a comma-delimited list.
Cron Configuration
This example searches for a previous synchronization record with the same hash (comprised of the three options provided). If any hash matches are found, the run uses the started_at
date of the most recent previous record as the start date for the current run.
If you do not require soft-deleted Objects, Investigations, or Tasks to be transferred to the target ThreatQ installation, then only the --target option
is necessary (as the defaults for the other two options are both (N)o).
Initial Cron for First Time Use
Determine what the cron configuration options should be:
- Target directory
- Investigations/tasks included (Y/N)
- Deleted objects included (Y/N)
The cron configuration options must be the same for every run, but they only need to be specified if different from the defaults.
Run the command with the cron configuration options:
Cron Job Configuration
- Retrieve the DB password.
kubectl get secret mariadb-root --output go-template='{{range $k,$v := .data}}{{printf " %s: " $k}}{{if not $v}}{{$v}} {{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}' -n threatq
- Since you will be prompted for the DB password, run the first time interactively.
kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=test --include-deleted=Y --include-investigations=N
- Enter the DB password
- Set up cron for the non root user.
sudo crontab -u <non-root user that installed RKE2> -e
- Use the following command in the cronjob:
/var/lib/rancher/rke2/bin/kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:sync-export --target=test --include-deleted=Y --include-investigations=N
Instructions for Larger Data Sets (Starting from the Beginning of Time)
For larger data sets, it is undesirable to do a full run from the beginning of time (performance will suffer).
ThreatQuotient recommends that you use the --end-date
option to specify an upper limit on the date range pulled. Multiple runs will be necessary to process all data up to the current date.
For each of the runs, provide the configuration options along with the --end-date
option:
Once the current date has been reached, the --end-date
option is no longer necessary.
Instructions for Larger Data Sets (Starting from a Specified Date)
For larger data sets, it is undesirable to do a full run from the beginning of time (performance will suffer).
ThreatQuotient recommends that you use the --end-date
option to specify an upper limit on the date range pulled. Multiple runs will be necessary to process all data up to the current date.
If only a subset of data needs to be processed up to the current date, then you should use the --initial-start-date
option.
For the first run, provide the configuration options along with the --initial-start-date
option.
For each of the runs, provide the configuration options along with the --end-date
option:
Once the current date has been reached, the --end-date
option is no longer necessary.
Run Scenarios
Export Success
When a run of this command completes successfully, a tarball of data appears in the target directory you specified (or /tmp by default). A report file describing the run is available in the data tarball, under the /sync directory. There is also a record in the database synchronizations table for the run.
Export Errors
If a run of this command fails before completion, the tarball is not created. There is a data directory in the target directory (where the data is stored before it is compressed) that contains all the data that was processed before the failure. The report file appears in this directory under /sync. Error messages do not appear in the report file. However, they appear in the Laravel log and in the console.
Regardless of whether the run was part of a cron configuration, it can simply be restarted. The cron configuration will look for the last completed run to find the next start date.
Dates
Start Date
A start date is applied to objects according to the column available - touched_at or updated_at
.
touched_at Objects
Adversaries, Attachments, Events, Indicators, Signatures, Custom Objects
updated_at Objects
Investigations, Tasks, Object Links, Tagged Objects
End Date
An end date is applied only if you provide one at run time. It is applied everywhere a start date is used.
Configuration
The configuration used for each run of this command consists of the --target
, --include_deleted
, and --include_investigations
command line options and is stored in the config_json
column of the Synchronization record. The hash column of each Synchronization record is a MD5 hash of the config_json
column.
Default Configuration
The default configuration is used if the command is run with no options provided:
- target_directory = /tmp
- include_deleted = false
- include_investigations = false
In this configuration, the initial run start date defaults to 1970-01-01 00:00:00.
Cron
If the command is run with the --target
, --include_deleted
, and --include_investigations
parameters, the hash of these values is compared against the hash column of previous runs. Using these three options on every run allows for the command to be incorporated into a scheduled task.
If any hash matches are found, the start date for the run is set to the started_at
date in the Synchronization record of the previous run with the same hash.
If no hash matches are found, the start date is set to 1970-01-01 00:00:00.
Start Date Provided
If a start date is included in the command run using the --start-date
option, any other options also provided are honored. However, if the --target
, --include_deleted
and --include_investigations
options are also included, a Cron check against the hash of these three options does not occur. The start date provided is included in config_json
as the manual_start_date so that the run does not collide with any Cron-related runs.
If a "beginning of time" run is necessary, use the option as --start-date="1970-01-01 00:00:00"
.
Output and Sync Report
The following sections detail the data you may find in the export output and sync report.
Meta Data
Meta data is transferred with every run of this command by default. You can specify that only meta data (no object data) should be pulled in a run by using the --meta-only
option.
Meta data includes information about Sources, Attributes, Tags, as well as Object Statuses and Types (both seeded and user-provided).
While meta data like Connectors and Operations are included in this list, they are not installed on the target ThreatQ installation as part of the air gapped data sync process. They are only placed in the requisite tables for use as Sources of Objects that are transferred. The same is true of any Users that are copied - these will not be enabled Users on the target installation; they will be transferred as disabled.
Meta Data Objects
- Attributes
- Clients
- Connectors
- Connector Categories
- Connector Definitions
- Content Types
- Groups
- Investigation Priorities
- <Object Type> Statuses
- <Object Type> Types
- Other Sources
- Operations
- Sources
- Tags
- TLP
- Users
Objects
This command covers any objects installed on the system by default, and any custom objects that have been installed by the user. The only objects that can be excluded are Investigations and Tasks (using the --include-investigations
command line option).
Custom Objects that are installed on a source ThreatQ installation that have NOT been installed on a target ThreatQ installation will NOT be installed by the air gapped data sync process. If an object is included in the export data, but is not found on the target, it will be ignored.
Default Objects:
- Adversaries
- Attachments (Files)
- Events
- Indicators
- Signatures
- Campaigns
- Courses of Actions
- Exploit Targets
- Incidents
- TTPs
- Tasks
- Assets
- Notes
- Attack Pattern
- Identity
- Intrusion Set
- Malware
- Report
- Tool
- Vulnerability
Storage:
The data for each object is copied as a dump file in CSV format using "SELECT * INTO OUTFILE..." MariaDB syntax. The full query for the data is built up using the options you provided (start date, end date, etc).
Dump files contain a maximum object limit of 50,000 (set in the Synchronization base class). Dump files are created (with a counter appended to the file name) until the entire object result has been covered.
To ensure that any Objects present in Object Context (Attributes, Comments, and Sources), Object Links, Tagged Objects, or Investigation Timeline Objects are also included in the base Object data, CSV dump files for each Object type are also created from queries against each of these tables. This is necessary because of the differing date columns used in each query (an object may appear in an Object Link in the specified date range according to the Object Link's updated_at
date, even though the Objects themselves saw no change to their touched_at
date in that date range). When the data from all of these object files is transferred to the target ThreatQ installation, any duplicates across dump files will be consolidated. Files that contain Object data will always include "_obj_" in the file title.
Sample Object File List (all of these files will contain Adversary records):
- adversaries/adversaries_obj_0.csv
- adversaries/adversaries_obj_attributes_0.csv
- adversaries/adversaries_obj_comments_0.csv
- adversaries/adversaries_obj_investigation_timelines_0.csv
- adversaries/adversaries_obj_object_links_dest_0.csv
- adversaries/adversaries_obj_object_links_src_0.csv
- adversaries/adversaries_obj_sources_0.csv
- adversaries/adversaries_obj_tags_0.csv
Object Context
The date range for queries on Object Context tables uses the updated_at
date column, with the exception of Adversary Descriptions, which uses the created_at
date column.
Adversary Descriptions are handled as part of the Object Context gathering process. The adversary_descriptions
table is queried using the created_at date column, and the entirety of the adversary_description_values
table is pulled, as it does not have a date column.
Not all Objects have all Object Contexts (Attributes, Attribute Sources, Comments, and Sources). Tables are only polled if they exist.
Tables Covered for each Object Type:
- <object type>_attributes
- <object type>_attribute_sources
- <object type>_comments
- <object type>_sources
Sample Object Context File List (Indicator Object Type):
- indicators/indicator_attribute_sources_0.csv
- indicators/indicator_attributes_0.csv
- indicators/indicator_comments_0.csv
- indicators/indicator_sources_0.csv
Other Data
Attachment Files
Physical files for all attachments included in the date range are copied into the attachments/files directory of the data tarball.
Object Links
The date range for queries on Object Links uses the updated_at
date column.
Tables Covered (Object Links and Object Link Context):
- object_links
- object_link_attributes
- object_link_attribute_sources
- object_link_comments
- object_link_sources
Sample Object Link File List:
- object_links/object_links_0.csv
- object_links/object_link_attributes_0.csv
- object_links/object_link_attribute_sources_0.csv
- object_links/object_link_comments_0.csv
- object_links/object_link_sources_0.csv
Tags
The date range for queries on Tagged Objects uses the updated_at
date column.
Tables Covered (Tags themselves are covered in the Meta Data):
tagged_objects
Sample Tagged Objects File List:
tagged_objects/tagged_objects_0.csv
Spearphish
The date range for queries on Spearphish uses the updated_at
date column.
Tables Covered:
spearphish
Sample Spearphish File List (Spearphish files are stored with Event data):
events/spearphish_0.csv
Investigations
The date range for queries on additional Investigation context tables uses the updated_at
column.
Tables Covered:
- investigation_nodes
- investigation_node_properties
- investigation_timelines
- investigation_timeline_objects
- investigation_viewpoints
Sample Investigation additional context File List:
- investigations/investigation_node_properties_0.csv
- investigations/investigation_nodes_0.csv
- investigations/investigation_timeline_objects_0.csv
- investigations/investigation_timelines_0.csv
- investigations/investigation_viewpoints_0.csv
File Output
Data Tarball
Once all data has been processed, a tarball is created containing all output files. This tarball will be dropped in the directory specified in the --target option
, or the /tmp directory by default.
Tarball Naming Convention: tqSync_<run date>.tar.gz
Example
Sync Report
The output for each run is stored in a Sync Report output file, which is located in the sync directory of the data tarball. The file is always named sync-export.txt.
Command Line Output
Command line output displays command progress, object totals, and files written.
Synchronizations
Table
synchronizations
-
id
- The auto-incremented id for the Synchronization record
type
- The Synchronization direction (options are "export" or "import")
started_at
- The date and time the command run was started
finished_at
- The date and time the command run completed
config_json
- A JSON representation of the command run configuration
report_json
- A JSON representation of the command run parameters (command line options, object counts, files created, etc)
pid
- The process id of the command run
hash
- Unique identifier for a command run (MD5 hash of the config_json column)
created_at
- The date and time the Synchronization record was created
updated_at
- The date and time the Synchronization record was updated
Record Handling
Hash
The Synchronization record hash column is automatically calculated as an MD5 of the config_json
column on record creation.
Initial Creation
A Synchronization record is created at the beginning of a command run, right after all command line options have been processed. Initial creation only covers the type
, started_at
, pid
, and config_json
columns. For this command (threatq:sync-export
), the type will be "export". The command line option portion of the report_json
is added as well, but this column will not be complete until the record is finalized. The finished_at
column remains NULL.
Finalization
A Synchronization record is finalized when the command run has completed. At this time, the finished_at
column is filled with the completion datetime, and the report_json
column is updated to include information about the run (object counts, files created, etc).