threatq:sync-import
This import command processes the tarball of object data created by the threatq:sync-export
command. Temporary sync tables are created on the target to house this object data, and integrity checks are run against existing data to verify IDs and check for duplicate objects. Duplicate objects from the source ThreatQ installation are updated, and new objects are inserted. The temporary sync tables are dropped when data processing is complete. Each run of this command also generates a sync report without output logs for the run.
Upon upgrade to ThreatQ 6x, the /var/lib/threatq/agds_transfer directory is created and becomes the default location for exporting and importing AGDS zip files. As such, AGDS commands only need to specify the relative path to the folders you created within this directory for AGDS exports or imports. Then, use the --target
parameter to specify the location when exporting the AGDS zip file and the --file
parameter to specify the location from which to import the .gz file.
The import process moves data from the source instance to the target instance based on changes to the object's updated_at
date. It does not explicitly remove data from the target instance.
Parameters
The following table outlines the parameters for the command. With the exception of --file,
all parameters for the threatq:sync-import
command will use the default value unless otherwise defined by the user.
Parameter | Explanation |
---|---|
--file |
Required value. File path to the tarball created by the threatq:sync-export command. Example:
--file=/tmp/tqSync-19-01-16-1547660837-8345.tar.gz Example:
--file=imports/tqSync-19-01-16-1547660837-8345.tar.gz |
--keep-created-at |
Determines whether the oldest created_at date between the source and target ThreatQ installations should be maintained, or a new created_at is set on the target system. The default if this option is not provided by the user is for the oldest created_at date to be maintained. Options are Y(es) or N(o).Default: Y Example: --keep-created-at=N
|
--object-limit |
Integer value used as the limit for the number of objects updated or inserted at a time. When using this option, the size of the data sets on both source and target ThreatQ installations should be taken into account. Setting the limit too high may hinder performance. Default: 1000 Example: --object-limit=50000
|
--memory-limit |
Sets the PHP memory limit in megabytes or gigabytes. Default: 2G Example: --memory-limit=4G
|
--override-description |
Determines whether or not the descriptions on existing objects on the target ThreatQ installation are be updated. If an existing object has a NULL description, it will be updated regardless of the use of this flag. Default: Y Example: --override-description=N
|
Examples
Basic Run
This example processes all the data in the tarball provided in the --file
option, using an object limit of 1000 for all inserts and updates. The created_at
date of all transferred objects is updated on the target ThreatQ installation if it is older than the current created_at
date (if the object is already present on the source ThreatQ installation). Newly inserted objects keep the created_at
date of the source ThreatQ installation.
Set New created_at Dates on the Write System
This example processes all the data in the tarball provided in the --file
option using an object limit of 1000 for all inserts and updates. The created_at
date of all transferred is left alone in the case of object updates, and changed to the current date in the case of new object inserts.
Increase the Object Limit
This example processes all the data in the tarball provided in the --file
option using an object limit of 50000 for all inserts and updates. The --keep-created-at
option has been left out, so it uses the default setting of Y(es) and created_at
dates are retained from the Source system.
Initial Setup
You must run the threatq:fill-sync-hash-column
command, before running the threatq:sync-import
command on an air gapped ThreatQ installation. This command prepares the database of an air gapped installation to run the threatq:sync-import
command. Upon upgrade to ThreatQ version 4.17 or later, several tables include a sync_hash column, which stores an MD5 hash of the unique fields for records in each table. This command fills in the data in this column, before attempting an Air Gapped Data Sync import. Data added after upgrade automatically have their sync_hash columns populated on insert and update, so it is only necessary to run this command once.
The
command checks for any NULL values in the sync_hash column in the events, indicators, and object_links tables before importing any data, and will fail if any NULL values are found. If the threatq:sync-import
threatq:fill-sync-hash-column
command is not run and sync_hash columns are found on the indicators, events, or object_links tables, the import will fail and ask you to run the command to fill that column before continuing.
- SSH to your target ThreatQ installation.
- Change directories to /var/www/api.
- Put the ThreatQ platform into maintenance mode:
php artisan down
- Run the following command:
sudo ./artisan threatq:fill-sync-hash-column
- Run
php artisan up
to bring ThreatQ out of maintenance mode.
- SSH to your target ThreatQ installation.
- Put the ThreatQ platform into maintenance mode:
kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan down
- Run the following command:
kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan threatq:fill-sync-hash-column
- Bring ThreatQ out of maintenance mode:
kubectl exec --namespace threatq --stdin --tty deployment/api-schedule-run -- ./artisan up
Run Scenarios
Import Success
When a run of this command completes successfully, a report appears in the directory the command was run in. There is also a record in the database synchronizations table for the run. Both of these contain data describing performance metrics and object counts.
Excluded Files
If the --ignore-file-types
option was used during creation of the export tarball, then the physical files associated with File objects that have the File Types specified in that option are not available during the import of those objects. If the import command detects that a file is missing from the export tarball, it creates a placeholder file under the same file path as was set on the read box (this is defined in the path field of the File). This placeholder file is a simple text file with the phrase "File excluded from export.". Please be aware that because the original physical file associated to the File object has been replaced, it will no longer be possible to open the physical file on the Details page for that File object.
Import Errors
If a run of this command fails before completion, error messages do not appear in the report file - though they do appear in the laravel log and in the console. There is not currently a means of restarting the command from where it left off. The command must be restarted and will run through all the data again. Any data from the tarball that was written during the previous failed run is updated (rather than inserted again), meaning the end result is the same - all data is transferred from the tarball to the target system.
Data Processing
Data found in CSV dump files for a table from the tarball provided in the --file option
is inserted into a corresponding sync table. A sync table is a copy of a base table, with column structure maintained but indexes excluded. Indexes are added to unique columns on sync tables (which are later be used in table joins and where clauses) once data insertion from dump files is complete, since indexes slow the insertion process down.
The naming convention for a sync table is sync_import_<base table name>_<process id>.
Base table: adversaries
Sync table: sync_import_adversaries_12345
All sync tables are removed from the target ThreatQ installation's database once data processing is complete.
Basic Table
A basic table has no foreign keys pointing to other tables in the database. It has a single identifier (id) column for each record. Once all the data stored in the tarball for a basic table has been transferred to a sync table, the sync table has an existing_id
column added with a default value of NULL for each record. This column is used to determine whether the record already exists on the target ThreatQ installation. The ID for the record on the target system may be different from that of the record from the source ThreatQ installation, so this existing_id
column ensures that data integrity is maintained between the two.
Sample Basic Table:
attachment_type
s - (id, name, is_parsable, parser_class, created_at, updated_at, deleted_at)
Sample Sync Table created from Basic Table:
sync_import_attachment_types_12345
- (existing_id, id, name, is_parsable, parser_class, created_at, updated_at, deleted_at)
Tables with Pivots
A pivot table has one or more foreign keys pointing to other tables in the database. Once all the data stored in the tarball for a table with pivots has been transferred to a sync table, the sync table has an existing_<pivot>_id
column added for each foreign key column, as well as an existing_id
column for the record itself (all set to a default value of NULL).
File Output
threatq sync-import File Output and Sync Report
Once all data has been processed, a Sync Report is generated. This file is named after the tarball used in the run, with the extension "-sync-import.txt"
Tarball used: tqSync-19-01-16-1547660837-8345.tar.gz
Sync Report name: tqSync-19-01-16-1547660837-8345-sync-import.txt
threatq:sync-import Command Line Output
Command line output displays command progress and object totals. It is similar to the output in the Sync Report.
Synchronizations
Synchronizations | Description |
---|---|
id |
The auto-incremented ID for the Synchronization record. |
type |
The Synchronization direction. Options are export or import. |
started_at |
The date and time the command run was started. |
finished_at |
The date and time the command run completed. |
config_json |
A JSON representation of the command run configuration. |
report_json |
A JSON representation of the command run parameters (command line options, object counts, tables created, etc). |
pid |
The process ID of the command run. |
hash |
Unique identifier for a command run (MD5 hash of the config_json column). |
created_at |
The date and time the Synchronization record was created. |
updated_at |
The date and time the Synchronization record was updated. |
Record Handling
Hash
The Synchronization record hash column is automatically calculated as an MD5 of the config_json
column on record creation.
Initial Creation
A Synchronization record is created at the beginning of a command run, right after all command line options have been processed. Initial creation only covers the type, started_at
, pid
, and config_json
columns. For this command (threatq:sync-import
), the type will be "import". The command line option portion of the report_json
is added as well, but this column will not be complete until the record is finalized. The finished_at
column remains NULL.
Finalization
A Synchronization record is finalized when the command run has completed. At this time, the finished_at
column is filled with the completion date and time, and the report_json
column is updated to include information about the run (object counts, tables created, etc).