The newest and best output plugin for data analysis uses Google's BigQuery service. As of December 2015 it isn't in a server release, so you'll need to at least sync past this commit to use it.
Setup
To set it up you need to visit console.developers.google.com and create a Google cloud project that will hold your BigQuery data. Then follow the instructions to create a service account and download the credential file. From that file populate these values in your GRR config file:
BigQuery.service_account: accountname@projectname.iam.gserviceaccount.com BigQuery.private_key: "-----BEGIN PRIVATE KEY-----......." BigQuery.project_id: "projectname"
Note that OpenSSL is picky about newlines so you should make sure you copy-paste the private key as a single line with the embedded newlines just as it appears in the JSON file.
Restart the server processes (you can use the grr_restart_all shell helper) and test it by running a flow (e.g. FileFinder) that will generate some results with the "BigQueryOutputPlugin" added. After 5 minutes or less you should see a "grr" data set with a table created in the BigQuery console. If the data doesn't turn up check the GRR worker logs in /var/log/grr/grr-worker.log.
Create a hunt with BigQuery output
To get hunt data into BigQuery just choose the BigQuery output plugin in the hunt creation wizard. FileFinder or RegistryFinder are good ones to start with since their output formats are known to export cleanly.
Flows that output results with types that have exporters defined in export.py should export to BigQuery correctly and cover the common use cases. For everything else without defined converters we attempt to export on a best-effort basis.
Use BigQuery to analyze your results
BigQuery is extremely powerful, and intuitive for anyone familiar with SQL syntax. A full query reference is here. Below are some example queries operating on FileFinder hunt results. I ran these on an ExportedFile table with 604,600 rows (243 MB table size).
Calculate some file size stats for a hunt (query time: 1.4s)
SELECT COUNT(st_size) AS file_count, SUM(st_size) AS total_size, MAX(st_size) AS max_size, MIN(st_size) AS min_size, AVG(st_size) AS avg_size, STDDEV(st_size) AS standard_deviation_size FROM grr.ExportedFile WHERE metadata.timestamp > PARSE_UTC_USEC("2015-11-17") AND metadata.source_urn == "aff4:/hunts/H:ED7458F8/Results" [ { "file_count": "598207", "total_size": "5218211208", "max_size": "106065056", "min_size": "0", "avg_size": "8723.086169168866", "standard_deviation_size": "176877.15047685735" } ]
Count results for hunts and flows (query time: 1.3s)
SELECT COUNT(*) AS result_count, metadata.source_urn AS source FROM grr.ExportedFile GROUP BY metadata.source_urn, source [ { "result_count": "3196", "source": "aff4:/hunts/H:F5AF9AB4/Results" }, { "result_count": "598207", "source": "aff4:/hunts/H:ECDB3112/Results" }, { "result_count": "1", "source": "aff4:/C.82f05be53ee950dc/analysis/FileFinder/admin-1447724230.58" }, { "result_count": "3196", "source": "aff4:/hunts/H:ED7458F8/Results" } ]
The 100 least-common filenames found (query time: 3.0s)
SELECT SUBSTR(urn, 25) AS filename, COUNT(metadata.hostname) AS host_count FROM grr.ExportedFile GROUP BY filename ORDER BY host_count ASC LIMIT 100
Screenshots
The BigQuery exporter takes advantage of the protobuf definition to give you rich field name descriptions for the table. |
Writing queries, re-running old queries, and tweaking as you go are all spectacularly easy with the BigQuery UI |
GRR
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteWould it be at all possible to set this up in an offline function, if say due to the sensitivity of the data being analyzed it could not be uploaded into public tables?
ReplyDeleteIt's an online service that takes advantage of Google compute power to give you very fast querying, so there's not really a way to do that without moving the data to Google. Note however that BigQuery has configurable ACLs, and quite restrictive settings by default so it is far from "public".
ReplyDeleteIf you have your own analytical tools and resources on-prem you could export from GRR as CSV and do the analysis yourself.