Friday, October 24, 2014

Using the distributed data store in GRR.

We have experienced that the scalability of a GRR installation highly depends on which data store backend is used. Currently, we provide two different data stores, a MongoDB and a MySQL based one, that are pretty simple to use but also, sadly, not very scalable. To mitigate this problem, we have recently released a new distributed data store that offers a huge performance improvement over all other the data stores. This data store is a bit harder to set up than the others but we think it's really worth it performance wise. In this blog post I'll describe how I installed GRR using a small group of data store servers.

I started out by getting four machines on Amazon EC2, all of them running Ubuntu 14.04.

I planned to have one main server running all the GRR processes:

54.76.238.125

And the rest are data store servers:

54.171.141.180 <-- the first one is the master server. 54.171.117.245 54.171.117.72

First, I installed GRR on all those machines using the install script provided ( install_script_ubuntu.sh ). This is just a little easier because it installs all the dependencies we need automatically but in the end I was going to use the latest code from the repository (highly recommended if you try the latest features!) so I did a:

> git clone https://github.com/google/grr.git

and

> python setup.py build > sudo python setup.py install


GRR usually uses two config files:
  • /etc/grr/grr-server.yaml : This one comes with every server release and should not be touched.
  • /etc/grr/server.local.yaml : This is the one where all the local modifications go. On the main server where I installed the whole GRR package, this already contains my customized settings, like the keys I generated. This is where I'm going to put all my modifications.

The first thing I did was switching away from the MongoDB backend. I tried to use the SqliteDataStore so I appended to the config file:

Datastore.implementation: SqliteDataStore

I did a quick restart using the initctl_switch script and, while I was there, also switched to multiprocessing:

sudo /usr/share/grr/scripts/initctl_switch.sh multi

(If you already have grr running in multiprocess mode, use

sudo /usr/share/grr/scripts/initctl_switch.sh restart

instead.)

I confirmed that all processes are there:

> ps -ef | grep grr root     13297     1 35 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_http_server --config=/etc/grr/grr-server.yaml root     13310     1 34 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_ui --config=/etc/grr/grr-server.yaml root     13323     1 33 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_enroller --config=/etc/grr/grr-server.yaml root     13336     1 33 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_worker --config=/etc/grr/grr-server.yaml

and that I can reach the gui @ http://54.76.238.125:8000/. It also seemed like it was using the SQLite data store (using the default path in /tmp):

> ls /tmp/grr-datastore/ aff4.sqlite  config.sqlite  files.sqlite  foreman.sqlite  pmem%2Dsigned.sqlite  pmem.sqlite  stats_store.sqlite  winpmem%2Eamd64%2Esys.sqlite  winpmem%2Ex86%2Esys.sqlite


After the basic setup worked, I started setting up the remote data store. For this to work I had to set up

  • client credentials so the GRR server can talk to the data store servers (need to be set on the GRR server and the master server)
  • server credentials so the slaves can talk to the master (need to be set on all data store servers)
  • the data store master server - has to know about all the slave servers.
  • the slave server(s).

I decided to go with two data store servers at first - one master, one slave. First I set up the data master by putting in both servers (the master has to be the first entry!), client credentials, and server username + password:
> sudo cat /etc/grr/server.local.yaml Dataserver.server_list:   - http://54.171.141.180:7000   - http://54.171.117.245:7000 Dataserver.client_credentials:   - myawesomeuser:thepassword:rw Dataserver.server_username: servergroup Dataserver.server_password: 65f2271f7

Starting it up:

> sudo python grr/server/data_server/data_server.py --config=/etc/grr/grr-server.yaml --master [...] Starting Data Master/Server on port 7000 ...

Yay, it works! Now I set up the slave server. This one only needs to know the master and the server credentials, client credentials will be automatically distributed:

> sudo cat /etc/grr/server.local.yaml Dataserver.server_list:   - http://54.171.141.180:7000 Dataserver.server_username: servergroup Dataserver.server_password: 65f2271f7

Starting the slave (I added verbose so there is some more info):

> PYTHONPATH=. python grr/server/data_server/data_server.py --config=grr/config/grr-server.yaml --verbose [...] Registering with data master at 54.171.141.180:7000. [...] DataServer fully registered. Starting Data Server on port 7000 ...

At the same time, since our group only consisted of two servers, the master also displayed:
Registered server 54.171.117.245:7000 All data servers have registered!

which indicates that our data store is now good to go. Thus, I could then set up the main server to actually use the data store server group:
> sudo cat /etc/grr/server.local.yaml [...] Dataserver.server_list:   - http://54.171.141.180:7000   - http://54.171.117.245:7000 Datastore.implementation: HTTPDataStore HTTPDataStore.username: myawesomeuser HTTPDataStore.password: thepassword

Running a quick test with the console confirms, it's working:

> sudo PYTHONPATH=. python grr/tools/console.py --config=/etc/grr/grr-server.yaml [...] In [1]:

Now I still had one more server that I wanted to include in my group so I fired up the data store manager (after making sure that noone is using the data server group):
> sudo PYTHONPATH=. python grr/server/data_server/manager.py --config=grr/config/grr-server.yaml Manager(2 servers)>servers Last refresh: Thu Oct 23 12:53:12 2014 Server 0 54.171.141.180:7000 (Size: 100KB, Load: 0)                 14 components 7KB average size Server 1 54.171.117.245:7000 (Size: 111KB, Load: 0)                 13 components 8KB average size Manager(2 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Manager(2 servers)>

So all my servers were registered, they used 7 and 8 kb of storage and the hash space was distributed evenly:

hex(9223372036854775808)  ==  '0x8000000000000000L' hex(18446744073709551616) == '0x10000000000000000L'

To add a third server, we need to first add it to the data store server master:

Manager(2 servers)> addserver 54.171.117.72 7000 Master server allows us to add server 54.171.117.72:7000 Do you really want to add server //54.171.117.72:7000? (y/n) y ============================================= Operation completed. To rebalance server data you have to do the following:         1. Add '//54.171.117.72:7000' to Dataserver.server_list in your configuration file.         2. Start the new server at 54.171.117.72:7000         3. Run 'rebalance'

The server was added, but it has an empty hash range so far:
Manager(3 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Server 2 54.171.117.72:7000 0% [18446744073709551616, 18446744073709551616[

So I took down all servers, added the third server to the config on the master and the GRR server, and brought everything back up (including the new data store server). Afterwards, I just used the data store manager to do a rebalance operation to actually fill the new server with data:
> sudo PYTHONPATH=. python grr/server/data_server/manager.py --config=grr/config/grr-server.yaml Manager(3 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Server 2 54.171.117.72:7000 0% [18446744073709551616, 18446744073709551616[ Manager(3 servers)> rebalance The new ranges will be: Server 0 54.171.141.180:7000 33% [00000000000000000000, 06148914691236516864[ Server 1 54.171.117.245:7000 33% [06148914691236516864, 12297829382473033728[ Server 2 54.171.117.72:7000 33% [12297829382473033728, 18446744073709551616[ Contacting master server to start re-sharding... OK The following servers will need to move data: Server 0 moves 30KB Server 1 moves 69KB Server 2 moves 14KB Proceed with re-sharding? (y/n) y Rebalance with id e03c2a40-06cf-46f1-ab27-dcc30aa55ce6 fully performed. Manager(3 servers)> sync Sync done. Manager(3 servers)> ranges Server 0 54.171.141.180:7000 33% [00000000000000000000, 06148914691236516864[ Server 1 54.171.117.245:7000 33% [06148914691236516864, 12297829382473033728[ Server 2 54.171.117.72:7000 33% [12297829382473033728, 18446744073709551616[ Manager(3 servers)> servers Last refresh: Thu Oct 23 13:09:51 2014 Server 0 54.171.141.180:7000 (Size: 70KB, Load: 0)                 10 components 7KB average size Server 1 54.171.117.245:7000 (Size: 72KB, Load: 0)                 9 components 8KB average size Server 2 54.171.117.72:7000 (Size: 83KB, Load: 0)                 10 components 8KB average size Manager(3 servers)>


And it worked, all servers were equally loaded.

GRR Rapid Response Blog

Hey everyone,

Your favorite open source incident response project, GRR (https://github.com/google/grr) now has a blog!

We are going to use this blog to write about incident response, tricky technical challenges we have encountered, how you can use GRR to do amazing things, and whatever else comes to our minds.
 
We hope you'll find it interesting!
-The GRR team