Wednesday, November 18, 2015

Using BigQuery to analyze data collected by GRR

GRR is great at collecting large amounts of data, but once you get more than a handful of results you need to rely on external systems for analysing that data. To make this work at scale GRR has output plugins that allow you to export data as the results are received from the clients. Upload is automatic and has a < 5min latency.

The newest and best output plugin for data analysis uses Google's BigQuery service. As of December 2015 it isn't in a server release, so you'll need to at least sync past this commit to use it.

Setup

To set it up you need to visit console.developers.google.com and create a Google cloud project that will hold your BigQuery data. Then follow the instructions to create a service account and download the credential file. From that file populate these values in your GRR config file:

BigQuery.service_account: accountname@projectname.iam.gserviceaccount.com
BigQuery.private_key: "-----BEGIN PRIVATE KEY-----......."
BigQuery.project_id: "projectname"

Note that OpenSSL is picky about newlines so you should make sure you copy-paste the private key as a single line with the embedded newlines just as it appears in the JSON file.

Restart the server processes (you can use the grr_restart_all shell helper) and test it by running a flow (e.g. FileFinder) that will generate some results with the "BigQueryOutputPlugin" added. After 5 minutes or less you should see a "grr" data set with a table created in the BigQuery console. If the data doesn't turn up check the GRR worker logs in /var/log/grr/grr-worker.log.

Create a hunt with BigQuery output

To get hunt data into BigQuery just choose the BigQuery output plugin in the hunt creation wizard. FileFinder or RegistryFinder are good ones to start with since their output formats are known to export cleanly.

Flows that output results with types that have exporters defined in export.py should export to BigQuery correctly and cover the common use cases. For everything else without defined converters we attempt to export on a best-effort basis.

Use BigQuery to analyze your results

BigQuery is extremely powerful, and intuitive for anyone familiar with SQL syntax. A full query reference is here. Below are some example queries operating on FileFinder hunt results. I ran these on an ExportedFile table with 604,600 rows (243 MB table size).

Calculate some file size stats for a hunt (query time: 1.4s)

SELECT
  COUNT(st_size) AS file_count,
  SUM(st_size) AS total_size,
  MAX(st_size) AS max_size,
  MIN(st_size) AS min_size,
  AVG(st_size) AS avg_size,
  STDDEV(st_size) AS standard_deviation_size
FROM
  grr.ExportedFile
WHERE
  metadata.timestamp > PARSE_UTC_USEC("2015-11-17")
  AND metadata.source_urn == "aff4:/hunts/H:ED7458F8/Results"

[
  {
    "file_count": "598207",
    "total_size": "5218211208",
    "max_size": "106065056",
    "min_size": "0",
    "avg_size": "8723.086169168866",
    "standard_deviation_size": "176877.15047685735"
  }
]

Count results for hunts and flows (query time: 1.3s)

SELECT
  COUNT(*) AS result_count,
  metadata.source_urn AS source
FROM
  grr.ExportedFile
GROUP BY
  metadata.source_urn,
  source

[
  {
    "result_count": "3196",
    "source": "aff4:/hunts/H:F5AF9AB4/Results"
  },
  {
    "result_count": "598207",
    "source": "aff4:/hunts/H:ECDB3112/Results"
  },
  {
    "result_count": "1",
    "source": "aff4:/C.82f05be53ee950dc/analysis/FileFinder/admin-1447724230.58"
  },
  {
    "result_count": "3196",
    "source": "aff4:/hunts/H:ED7458F8/Results"
  }
]

The 100 least-common filenames found (query time: 3.0s)

SELECT
  SUBSTR(urn, 25) AS filename,
  COUNT(metadata.hostname) AS host_count
FROM
  grr.ExportedFile
GROUP BY
  filename
ORDER BY
  host_count ASC
LIMIT
  100

Screenshots

The BigQuery exporter takes advantage of the protobuf definition to give you rich field name descriptions for the table.

Writing queries, re-running old queries, and tweaking as you go are all spectacularly easy with the BigQuery UI

Friday, May 22, 2015

Hashing: The Maslow's Hammer of Forensics

We get asked this question a lot:

Can GRR search for a hash on all machines?

The shortest answer is yes. If you’re filling out a checklist you can stop reading here, complete the column, crack open a beer, and bask in the warm glow of satisfaction that comes with knowing you did a great job.

The real-world answer is much more complicated, but as a starting point lets go with “that’s almost never useful”.

Can GRR search for a hash on all machines?

Technically yes. You can use the FileFinder flow to hash every file on the filesystem for every machine, and then check if your target hash is in the list. You don’t want to do this, and in fact GRR has default safeguards that won’t let you without an explicit override.

The reason is that it’s incredibly expensive in terms of machine resources for the client. It’s lots of disk IO, since you’re reading every bit of every file on every machine, and it’s lots of CPU cycles since you’re putting all of those bits through complex hashing operations. You’re also spending network bytes to send all the hash results.

If there’s anything users hate more than having security agents on their machines, it’s having security agents that cripple their ability to do their jobs.

But say you do this anyway, and at incredible cost, you now have a snapshot of the hash of every file in your fleet. The problem is the word ‘snapshot’. GRR is a mechanism for capturing the state of a machine at a certain point in time. Even before you finished hashing the disk of a single machine the hash set was out of date. So how often do you re-hash everything to pick up any new files or changes? Clearly this is impractical, and GRR wasn’t designed to be an IDS.

But collection/measurement of hashes is only part of the picture. Why are we using hashes in the first place? Often it’s because that’s what’s shared or bought as “threat intelligence”.

Most of this hash-checking behaviour is aimed at malware detection, but it’s a terrible way to detect malware because a one bit change in the binary means no detection. For anything more interesting than mass malware, you will basically never get a hash hit. Any targeted payload will at a minimum contain a campaign code, be recompiled, or be completely customised for the target. Context aware "fuzzy hashing" can reduce this brittleness to some extent, but it doesn’t help with the resource consumption problem above.

So what should we do?

A better way to hunt for files

Here’s the questions I ask when someone comes to me asking to hunt a hash. You can use it as a checklist for constructing a GRR hunt for a file that will make it as targeted and efficient as the circumstances allow. The more files you exclude with each condition, the faster things will be:

Where is the file most likely to be on the filesystem? If it always lands in System32, or it’s most likely in a browser tempdir or user homedir, just target those paths in FileFinder.
How big is the file likely to be? e.g. if you know it will never be larger than 50MB, set a file size range in FileFinder. Or if you know the exact filesize, this tends to be very unique and is fast to check.
Is it likely to have had a timestamp modified recently or within in a certain time window? Even if the time window is large (months/years) it can help when hunting in places like system directories. Set a modification time range in FileFinder (be careful here if you suspect timestomping).
Can we identify some part of the file that we could use as a binary signature? For malware the best case scenario would be some reversing work that gives us a binary sequence that lives in a fairly predictable location of the file and gets reused across whole families or classes of malware. But the technique applies generally: checking a byte range of a file is much faster than reading the whole content. Set a contents match condition in FileFinder.

You could set the FileFinder action to HASH, but if something matches all of these conditions it is probably interesting, so in most cases you should just make it DOWNLOAD. GRR only transfers globally unique files. Also you’ll kick yourself if you see a really promising match but only get the hash and the machine goes offline.

GRR is smart and knows about the relative resource cost of each of these conditions. It will apply cheaper stat attributes constraints like size and modification time before resorting to more expensive file content matches.

Before running this on all machines you can test it on your own machine to see if it’s too expensive or returns too many results. If you’re downloading files this will also pre-populate the server-side collection of common binaries.

Better ways to use hashes

Searching for hashes isn’t a good fit for live forensics systems like GRR. But that’s not to say hashes are useless. Here’s a few high value ways to use hashes you should consider:

Searching stream capture systems on the server-side, like:

A central database of executable hashes that’s fed by on-host systems hashing every executable run and every library loaded (e.g. CarbonBlack or Bit9).
Network security monitors that hash files downloaded, email attachments etc. that cross a network boundary (e.g. Suricata).

Dead-disk forensics where you have fast I/O and the ability to spend machine resources with zero user impact.

Why No Stream Capture?

So if hashes work better for stream capture, why doesn’t GRR do that too?

We want to be really good at state capture. Finding the bits, transporting the bits, and making the bits available for analysis, all at scale, is a hard problem and we don’t see ourselves running out of work any time soon.

Stream capture is a distinct but similarly hard problem involving instrumentation/hooking at the kernel level. Adding it would significantly complicate maintenance, client stability, and the guarantees we make about client performance. Conversely, keeping a separation between response and detection means you always have a response capability.

Other products have already made good progress on detection using stream capture. Our approach is to use those capabilities to tell GRR about interesting places to grab state.

Monday, December 8, 2014

Wrapping GRR Windows Installers as a .MSI File

GRR Client installation files for Windows are provided as .EXE (Executable) files. These files can be installed interactively or pushed to clients using a software management tool such as SCCM. Administrators intending to deploy GRR using Group Policy Software Installation will find that they need installers packaged as a .MSI (Microsoft Scriptable Install) file. This post describes two ways to wrap the GRR Client .EXE into a .MSI file as required by Group Policy.

The first option is to use an open source tool called WiX Toolset. It’s a developer tool capable of creating complex software installation packages. Using the toolset involves the creation of a XML configuration file (sample provided) and a few command line operations.

An alternative is to use a commercial tool like MSI Wrapper. This purpose-built tool offers a point and click experience. I’ve evaluated the free version of the MSI Wrapper and found it adequately met the needs of this wrapping task.

Please continue reading below for a step-by-step guide of using each method.

WiX Toolset

Step 1) Create a grr.wxs configuration file in the same directory as your grr client installer exe.

An example in provided below. Steps 1a -> 1c are instructions for modifying the values highlighted in bold.

Step 1a) UpgradeCode = a unique GUID for this installation package. You can create a new one in your tool of choice or using GuidGen.

Step 1b) Version = the version of GRR you are packaging.

Step 1c) Source = the name of the file to be packaged.

Example grr.wxs

<?xml version="1.0"?>

<Product Id="*" UpgradeCode="9DA302DE-B45A-4EE8-9758-E547CF3D57F3"

Name="GRR Agent" Version="3.0.0.2" Manufacturer="Grr Opensource" Language="1033">

</Component>

</Directory>

</InstallExecuteSequence>

</Feature>

</Product>

</Wix>

Step 2) Run WiX Tools

Candle compiles the wxs file into a WiX object file.
Light is a linker for the object file and produces the MSI file.

c:\grr-msi>"\Program Files (x86)\WiX Toolset v3.9\bin\candle.exe" grr_3.0.0.2.wxs

Windows Installer XML Toolset Compiler version 3.9.1006.0

grr_3.0.0.2.wxs

c:\grr-msi>"\Program Files (x86)\WiX Toolset v3.9\bin\light.exe" grr_3.0.0.2.wixobj

Windows Installer XML Toolset Linker version 3.9.1006.0

Step 3) Done.

A new .msi file will be created in the directory.

MSI Wrapper

Step 1) Launch the tool.

Screen Shot 2014-12-01 at 11.25.01 AM.png

Step 2) Provide paths to the installer (exe) and output (msi).

Screen Shot 2014-12-01 at 11.25.48 AM.png

Step 3) Leave the Application Id blank. Create a new Upgrade code.

Screen Shot 2014-12-01 at 11.26.06 AM.png

Step 4) Manually enter the Name, Manufacturer and Version.

Screen Shot 2014-12-01 at 11.26.58 AM.png

Step 5) These fields are optional. Next.

Screen Shot 2014-12-01 at 11.27.08 AM.png

Step 6) No arguments are necessary. Next.

Screen Shot 2014-12-01 at 11.27.16 AM.png

Step 7) The exe has been successfully wrapped as a MSI file. Done.

Friday, October 24, 2014

Using the distributed data store in GRR.

We have experienced that the scalability of a GRR installation highly depends on which data store backend is used. Currently, we provide two different data stores, a MongoDB and a MySQL based one, that are pretty simple to use but also, sadly, not very scalable. To mitigate this problem, we have recently released a new distributed data store that offers a huge performance improvement over all other the data stores. This data store is a bit harder to set up than the others but we think it's really worth it performance wise. In this blog post I'll describe how I installed GRR using a small group of data store servers.

I started out by getting four machines on Amazon EC2, all of them running Ubuntu 14.04.

I planned to have one main server running all the GRR processes:

54.76.238.125

And the rest are data store servers:

54.171.141.180 <-- the first one is the master server. 54.171.117.245 54.171.117.72

First, I installed GRR on all those machines using the install script provided ( install_script_ubuntu.sh ). This is just a little easier because it installs all the dependencies we need automatically but in the end I was going to use the latest code from the repository (highly recommended if you try the latest features!) so I did a:

> git clone https://github.com/google/grr.git

and

> python setup.py build > sudo python setup.py install

GRR usually uses two config files:

/etc/grr/grr-server.yaml : This one comes with every server release and should not be touched.
/etc/grr/server.local.yaml : This is the one where all the local modifications go. On the main server where I installed the whole GRR package, this already contains my customized settings, like the keys I generated. This is where I'm going to put all my modifications.

The first thing I did was switching away from the MongoDB backend. I tried to use the SqliteDataStore so I appended to the config file:

Datastore.implementation: SqliteDataStore

I did a quick restart using the initctl_switch script and, while I was there, also switched to multiprocessing:

sudo /usr/share/grr/scripts/initctl_switch.sh multi

(If you already have grr running in multiprocess mode, use

sudo /usr/share/grr/scripts/initctl_switch.sh restart

instead.)

I confirmed that all processes are there:

> ps -ef | grep grr root 13297 1 35 13:34 ? 00:00:00 /usr/bin/python /usr/bin/grr_server --start_http_server --config=/etc/grr/grr-server.yaml root 13310 1 34 13:34 ? 00:00:00 /usr/bin/python /usr/bin/grr_server --start_ui --config=/etc/grr/grr-server.yaml root 13323 1 33 13:34 ? 00:00:00 /usr/bin/python /usr/bin/grr_server --start_enroller --config=/etc/grr/grr-server.yaml root 13336 1 33 13:34 ? 00:00:00 /usr/bin/python /usr/bin/grr_server --start_worker --config=/etc/grr/grr-server.yaml

and that I can reach the gui @ http://54.76.238.125:8000/. It also seemed like it was using the SQLite data store (using the default path in /tmp):

> ls /tmp/grr-datastore/ aff4.sqlite config.sqlite files.sqlite foreman.sqlite pmem%2Dsigned.sqlite pmem.sqlite stats_store.sqlite winpmem%2Eamd64%2Esys.sqlite winpmem%2Ex86%2Esys.sqlite

After the basic setup worked, I started setting up the remote data store. For this to work I had to set up

client credentials so the GRR server can talk to the data store servers (need to be set on the GRR server and the master server)
server credentials so the slaves can talk to the master (need to be set on all data store servers)
the data store master server - has to know about all the slave servers.
the slave server(s).

I decided to go with two data store servers at first - one master, one slave. First I set up the data master by putting in both servers (the master has to be the first entry!), client credentials, and server username + password:

> sudo cat /etc/grr/server.local.yaml Dataserver.server_list: - http://54.171.141.180:7000 - http://54.171.117.245:7000 Dataserver.client_credentials: - myawesomeuser:thepassword:rw Dataserver.server_username: servergroup Dataserver.server_password: 65f2271f7

Starting it up:

> sudo python grr/server/data_server/data_server.py --config=/etc/grr/grr-server.yaml --master [...] Starting Data Master/Server on port 7000 ...

Yay, it works! Now I set up the slave server. This one only needs to know the master and the server credentials, client credentials will be automatically distributed:

> sudo cat /etc/grr/server.local.yaml Dataserver.server_list: - http://54.171.141.180:7000 Dataserver.server_username: servergroup Dataserver.server_password: 65f2271f7

Starting the slave (I added verbose so there is some more info):

> PYTHONPATH=. python grr/server/data_server/data_server.py --config=grr/config/grr-server.yaml --verbose [...] Registering with data master at 54.171.141.180:7000. [...] DataServer fully registered. Starting Data Server on port 7000 ...

At the same time, since our group only consisted of two servers, the master also displayed:

Registered server 54.171.117.245:7000 All data servers have registered!

which indicates that our data store is now good to go. Thus, I could then set up the main server to actually use the data store server group:

> sudo cat /etc/grr/server.local.yaml [...] Dataserver.server_list: - http://54.171.141.180:7000 - http://54.171.117.245:7000 Datastore.implementation: HTTPDataStore HTTPDataStore.username: myawesomeuser HTTPDataStore.password: thepassword

Running a quick test with the console confirms, it's working:

> sudo PYTHONPATH=. python grr/tools/console.py --config=/etc/grr/grr-server.yaml [...] In [1]:

Now I still had one more server that I wanted to include in my group so I fired up the data store manager (after making sure that noone is using the data server group):

> sudo PYTHONPATH=. python grr/server/data_server/manager.py --config=grr/config/grr-server.yaml Manager(2 servers)>servers Last refresh: Thu Oct 23 12:53:12 2014 Server 0 54.171.141.180:7000 (Size: 100KB, Load: 0) 14 components 7KB average size Server 1 54.171.117.245:7000 (Size: 111KB, Load: 0) 13 components 8KB average size Manager(2 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Manager(2 servers)>

So all my servers were registered, they used 7 and 8 kb of storage and the hash space was distributed evenly:

hex(9223372036854775808) == '0x8000000000000000L' hex(18446744073709551616) == '0x10000000000000000L'

To add a third server, we need to first add it to the data store server master:

Manager(2 servers)> addserver 54.171.117.72 7000 Master server allows us to add server 54.171.117.72:7000 Do you really want to add server //54.171.117.72:7000? (y/n) y ============================================= Operation completed. To rebalance server data you have to do the following: 1. Add '//54.171.117.72:7000' to Dataserver.server_list in your configuration file. 2. Start the new server at 54.171.117.72:7000 3. Run 'rebalance'

The server was added, but it has an empty hash range so far:

Manager(3 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Server 2 54.171.117.72:7000 0% [18446744073709551616, 18446744073709551616[

So I took down all servers, added the third server to the config on the master and the GRR server, and brought everything back up (including the new data store server). Afterwards, I just used the data store manager to do a rebalance operation to actually fill the new server with data:

> sudo PYTHONPATH=. python grr/server/data_server/manager.py --config=grr/config/grr-server.yaml Manager(3 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Server 2 54.171.117.72:7000 0% [18446744073709551616, 18446744073709551616[ Manager(3 servers)> rebalance The new ranges will be: Server 0 54.171.141.180:7000 33% [00000000000000000000, 06148914691236516864[ Server 1 54.171.117.245:7000 33% [06148914691236516864, 12297829382473033728[ Server 2 54.171.117.72:7000 33% [12297829382473033728, 18446744073709551616[ Contacting master server to start re-sharding... OK The following servers will need to move data: Server 0 moves 30KB Server 1 moves 69KB Server 2 moves 14KB Proceed with re-sharding? (y/n) y Rebalance with id e03c2a40-06cf-46f1-ab27-dcc30aa55ce6 fully performed. Manager(3 servers)> sync Sync done. Manager(3 servers)> ranges Server 0 54.171.141.180:7000 33% [00000000000000000000, 06148914691236516864[ Server 1 54.171.117.245:7000 33% [06148914691236516864, 12297829382473033728[ Server 2 54.171.117.72:7000 33% [12297829382473033728, 18446744073709551616[ Manager(3 servers)> servers Last refresh: Thu Oct 23 13:09:51 2014 Server 0 54.171.141.180:7000 (Size: 70KB, Load: 0) 10 components 7KB average size Server 1 54.171.117.245:7000 (Size: 72KB, Load: 0) 9 components 8KB average size Server 2 54.171.117.72:7000 (Size: 83KB, Load: 0) 10 components 8KB average size Manager(3 servers)>

And it worked, all servers were equally loaded.

GRR Rapid Response Blog

Hey everyone,

Your favorite open source incident response project, GRR (https://github.com/google/grr) now has a blog!

We are going to use this blog to write about incident response, tricky technical challenges we have encountered, how you can use GRR to do amazing things, and whatever else comes to our minds.

We hope you'll find it interesting!

-The GRR team