GRR is great at collecting large amounts of data, but once you get more than a handful of results you need to rely on external systems for analysing that data. To make this work at scale GRR has output plugins that allow you to export data as the results are received from the clients. Upload is automatic and has a < 5min latency.

The newest and best output plugin for data analysis uses Google's BigQuery service. As of December 2015 it isn't in a server release, so you'll need to at least sync past this commit to use it.

Setup

To set it up you need to visit console.developers.google.com and create a Google cloud project that will hold your BigQuery data. Then follow the instructions to create a service account and download the credential file. From that file populate these values in your GRR config file:

BigQuery.service_account: accountname@projectname.iam.gserviceaccount.com
BigQuery.private_key: "-----BEGIN PRIVATE KEY-----......."
BigQuery.project_id: "projectname"

Note that OpenSSL is picky about newlines so you should make sure you copy-paste the private key as a single line with the embedded newlines just as it appears in the JSON file.

Restart the server processes (you can use the grr_restart_all shell helper) and test it by running a flow (e.g. FileFinder) that will generate some results with the "BigQueryOutputPlugin" added. After 5 minutes or less you should see a "grr" data set with a table created in the BigQuery console. If the data doesn't turn up check the GRR worker logs in /var/log/grr/grr-worker.log.

Create a hunt with BigQuery output

To get hunt data into BigQuery just choose the BigQuery output plugin in the hunt creation wizard. FileFinder or RegistryFinder are good ones to start with since their output formats are known to export cleanly.

Flows that output results with types that have exporters defined in export.py should export to BigQuery correctly and cover the common use cases. For everything else without defined converters we attempt to export on a best-effort basis.

Use BigQuery to analyze your results

BigQuery is extremely powerful, and intuitive for anyone familiar with SQL syntax. A full query reference is here. Below are some example queries operating on FileFinder hunt results. I ran these on an ExportedFile table with 604,600 rows (243 MB table size).

Calculate some file size stats for a hunt (query time: 1.4s)

SELECT
  COUNT(st_size) AS file_count,
  SUM(st_size) AS total_size,
  MAX(st_size) AS max_size,
  MIN(st_size) AS min_size,
  AVG(st_size) AS avg_size,
  STDDEV(st_size) AS standard_deviation_size
FROM
  grr.ExportedFile
WHERE
  metadata.timestamp > PARSE_UTC_USEC("2015-11-17")
  AND metadata.source_urn == "aff4:/hunts/H:ED7458F8/Results"

[
  {
    "file_count": "598207",
    "total_size": "5218211208",
    "max_size": "106065056",
    "min_size": "0",
    "avg_size": "8723.086169168866",
    "standard_deviation_size": "176877.15047685735"
  }
]

Count results for hunts and flows (query time: 1.3s)

SELECT
  COUNT(*) AS result_count,
  metadata.source_urn AS source
FROM
  grr.ExportedFile
GROUP BY
  metadata.source_urn,
  source

[
  {
    "result_count": "3196",
    "source": "aff4:/hunts/H:F5AF9AB4/Results"
  },
  {
    "result_count": "598207",
    "source": "aff4:/hunts/H:ECDB3112/Results"
  },
  {
    "result_count": "1",
    "source": "aff4:/C.82f05be53ee950dc/analysis/FileFinder/admin-1447724230.58"
  },
  {
    "result_count": "3196",
    "source": "aff4:/hunts/H:ED7458F8/Results"
  }
]

The 100 least-common filenames found (query time: 3.0s)

SELECT
  SUBSTR(urn, 25) AS filename,
  COUNT(metadata.hostname) AS host_count
FROM
  grr.ExportedFile
GROUP BY
  filename
ORDER BY
  host_count ASC
LIMIT
  100

Screenshots

The BigQuery exporter takes advantage of the protobuf definition to give you rich field name descriptions for the table.

Writing queries, re-running old queries, and tweaking as you go are all spectacularly easy with the BigQuery UI

We get asked this question a lot:

Can GRR search for a hash on all machines?

The shortest answer is yes. If you’re filling out a checklist you can stop reading here, complete the column, crack open a beer, and bask in the warm glow of satisfaction that comes with knowing you did a great job.

The real-world answer is much more complicated, but as a starting point lets go with “that’s almost never useful”.

Can GRR search for a hash on all machines?

Technically yes. You can use the FileFinder flow to hash every file on the filesystem for every machine, and then check if your target hash is in the list. You don’t want to do this, and in fact GRR has default safeguards that won’t let you without an explicit override.

The reason is that it’s incredibly expensive in terms of machine resources for the client. It’s lots of disk IO, since you’re reading every bit of every file on every machine, and it’s lots of CPU cycles since you’re putting all of those bits through complex hashing operations. You’re also spending network bytes to send all the hash results.

If there’s anything users hate more than having security agents on their machines, it’s having security agents that cripple their ability to do their jobs.

But say you do this anyway, and at incredible cost, you now have a snapshot of the hash of every file in your fleet. The problem is the word ‘snapshot’. GRR is a mechanism for capturing the state of a machine at a certain point in time. Even before you finished hashing the disk of a single machine the hash set was out of date. So how often do you re-hash everything to pick up any new files or changes? Clearly this is impractical, and GRR wasn’t designed to be an IDS.

But collection/measurement of hashes is only part of the picture. Why are we using hashes in the first place? Often it’s because that’s what’s shared or bought as “threat intelligence”.

Most of this hash-checking behaviour is aimed at malware detection, but it’s a terrible way to detect malware because a one bit change in the binary means no detection. For anything more interesting than mass malware, you will basically never get a hash hit. Any targeted payload will at a minimum contain a campaign code, be recompiled, or be completely customised for the target. Context aware "fuzzy hashing" can reduce this brittleness to some extent, but it doesn’t help with the resource consumption problem above.

So what should we do?

A better way to hunt for files

Here’s the questions I ask when someone comes to me asking to hunt a hash. You can use it as a checklist for constructing a GRR hunt for a file that will make it as targeted and efficient as the circumstances allow. The more files you exclude with each condition, the faster things will be:

Where is the file most likely to be on the filesystem? If it always lands in System32, or it’s most likely in a browser tempdir or user homedir, just target those paths in FileFinder.
How big is the file likely to be? e.g. if you know it will never be larger than 50MB, set a file size range in FileFinder. Or if you know the exact filesize, this tends to be very unique and is fast to check.
Is it likely to have had a timestamp modified recently or within in a certain time window? Even if the time window is large (months/years) it can help when hunting in places like system directories. Set a modification time range in FileFinder (be careful here if you suspect timestomping).
Can we identify some part of the file that we could use as a binary signature? For malware the best case scenario would be some reversing work that gives us a binary sequence that lives in a fairly predictable location of the file and gets reused across whole families or classes of malware. But the technique applies generally: checking a byte range of a file is much faster than reading the whole content. Set a contents match condition in FileFinder.

You could set the FileFinder action to HASH, but if something matches all of these conditions it is probably interesting, so in most cases you should just make it DOWNLOAD. GRR only transfers globally unique files. Also you’ll kick yourself if you see a really promising match but only get the hash and the machine goes offline.

GRR is smart and knows about the relative resource cost of each of these conditions. It will apply cheaper stat attributes constraints like size and modification time before resorting to more expensive file content matches.

Before running this on all machines you can test it on your own machine to see if it’s too expensive or returns too many results. If you’re downloading files this will also pre-populate the server-side collection of common binaries.

Better ways to use hashes

Searching for hashes isn’t a good fit for live forensics systems like GRR. But that’s not to say hashes are useless. Here’s a few high value ways to use hashes you should consider:

Searching stream capture systems on the server-side, like:

A central database of executable hashes that’s fed by on-host systems hashing every executable run and every library loaded (e.g. CarbonBlack or Bit9).
Network security monitors that hash files downloaded, email attachments etc. that cross a network boundary (e.g. Suricata).

Dead-disk forensics where you have fast I/O and the ability to spend machine resources with zero user impact.

Why No Stream Capture?

So if hashes work better for stream capture, why doesn’t GRR do that too?

We want to be really good at state capture. Finding the bits, transporting the bits, and making the bits available for analysis, all at scale, is a hard problem and we don’t see ourselves running out of work any time soon.

Stream capture is a distinct but similarly hard problem involving instrumentation/hooking at the kernel level. Adding it would significantly complicate maintenance, client stability, and the guarantees we make about client performance. Conversely, keeping a separation between response and detection means you always have a response capability.

Other products have already made good progress on detection using stream capture. Our approach is to use those capabilities to tell GRR about interesting places to grab state.

Wednesday, November 18, 2015

Using BigQuery to analyze data collected by GRR

Setup

Create a hunt with BigQuery output

Use BigQuery to analyze your results

Calculate some file size stats for a hunt (query time: 1.4s)

Count results for hunts and flows (query time: 1.3s)

The 100 least-common filenames found (query time: 3.0s)

Screenshots

Friday, May 22, 2015

Hashing: The Maslow's Hammer of Forensics

Can GRR search for a hash on all machines?

A better way to hunt for files

Better ways to use hashes

Why No Stream Capture?