Friday, October 24, 2014

Using the distributed data store in GRR.

We have experienced that the scalability of a GRR installation highly depends on which data store backend is used. Currently, we provide two different data stores, a MongoDB and a MySQL based one, that are pretty simple to use but also, sadly, not very scalable. To mitigate this problem, we have recently released a new distributed data store that offers a huge performance improvement over all other the data stores. This data store is a bit harder to set up than the others but we think it's really worth it performance wise. In this blog post I'll describe how I installed GRR using a small group of data store servers.

I started out by getting four machines on Amazon EC2, all of them running Ubuntu 14.04.

I planned to have one main server running all the GRR processes:

54.76.238.125

And the rest are data store servers:

54.171.141.180 <-- the first one is the master server. 54.171.117.245 54.171.117.72

First, I installed GRR on all those machines using the install script provided ( install_script_ubuntu.sh ). This is just a little easier because it installs all the dependencies we need automatically but in the end I was going to use the latest code from the repository (highly recommended if you try the latest features!) so I did a:

> git clone https://github.com/google/grr.git

and

> python setup.py build > sudo python setup.py install


GRR usually uses two config files:
  • /etc/grr/grr-server.yaml : This one comes with every server release and should not be touched.
  • /etc/grr/server.local.yaml : This is the one where all the local modifications go. On the main server where I installed the whole GRR package, this already contains my customized settings, like the keys I generated. This is where I'm going to put all my modifications.

The first thing I did was switching away from the MongoDB backend. I tried to use the SqliteDataStore so I appended to the config file:

Datastore.implementation: SqliteDataStore

I did a quick restart using the initctl_switch script and, while I was there, also switched to multiprocessing:

sudo /usr/share/grr/scripts/initctl_switch.sh multi

(If you already have grr running in multiprocess mode, use

sudo /usr/share/grr/scripts/initctl_switch.sh restart

instead.)

I confirmed that all processes are there:

> ps -ef | grep grr root     13297     1 35 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_http_server --config=/etc/grr/grr-server.yaml root     13310     1 34 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_ui --config=/etc/grr/grr-server.yaml root     13323     1 33 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_enroller --config=/etc/grr/grr-server.yaml root     13336     1 33 13:34 ?        00:00:00 /usr/bin/python /usr/bin/grr_server --start_worker --config=/etc/grr/grr-server.yaml

and that I can reach the gui @ http://54.76.238.125:8000/. It also seemed like it was using the SQLite data store (using the default path in /tmp):

> ls /tmp/grr-datastore/ aff4.sqlite  config.sqlite  files.sqlite  foreman.sqlite  pmem%2Dsigned.sqlite  pmem.sqlite  stats_store.sqlite  winpmem%2Eamd64%2Esys.sqlite  winpmem%2Ex86%2Esys.sqlite


After the basic setup worked, I started setting up the remote data store. For this to work I had to set up

  • client credentials so the GRR server can talk to the data store servers (need to be set on the GRR server and the master server)
  • server credentials so the slaves can talk to the master (need to be set on all data store servers)
  • the data store master server - has to know about all the slave servers.
  • the slave server(s).

I decided to go with two data store servers at first - one master, one slave. First I set up the data master by putting in both servers (the master has to be the first entry!), client credentials, and server username + password:
> sudo cat /etc/grr/server.local.yaml Dataserver.server_list:   - http://54.171.141.180:7000   - http://54.171.117.245:7000 Dataserver.client_credentials:   - myawesomeuser:thepassword:rw Dataserver.server_username: servergroup Dataserver.server_password: 65f2271f7

Starting it up:

> sudo python grr/server/data_server/data_server.py --config=/etc/grr/grr-server.yaml --master [...] Starting Data Master/Server on port 7000 ...

Yay, it works! Now I set up the slave server. This one only needs to know the master and the server credentials, client credentials will be automatically distributed:

> sudo cat /etc/grr/server.local.yaml Dataserver.server_list:   - http://54.171.141.180:7000 Dataserver.server_username: servergroup Dataserver.server_password: 65f2271f7

Starting the slave (I added verbose so there is some more info):

> PYTHONPATH=. python grr/server/data_server/data_server.py --config=grr/config/grr-server.yaml --verbose [...] Registering with data master at 54.171.141.180:7000. [...] DataServer fully registered. Starting Data Server on port 7000 ...

At the same time, since our group only consisted of two servers, the master also displayed:
Registered server 54.171.117.245:7000 All data servers have registered!

which indicates that our data store is now good to go. Thus, I could then set up the main server to actually use the data store server group:
> sudo cat /etc/grr/server.local.yaml [...] Dataserver.server_list:   - http://54.171.141.180:7000   - http://54.171.117.245:7000 Datastore.implementation: HTTPDataStore HTTPDataStore.username: myawesomeuser HTTPDataStore.password: thepassword

Running a quick test with the console confirms, it's working:

> sudo PYTHONPATH=. python grr/tools/console.py --config=/etc/grr/grr-server.yaml [...] In [1]:

Now I still had one more server that I wanted to include in my group so I fired up the data store manager (after making sure that noone is using the data server group):
> sudo PYTHONPATH=. python grr/server/data_server/manager.py --config=grr/config/grr-server.yaml Manager(2 servers)>servers Last refresh: Thu Oct 23 12:53:12 2014 Server 0 54.171.141.180:7000 (Size: 100KB, Load: 0)                 14 components 7KB average size Server 1 54.171.117.245:7000 (Size: 111KB, Load: 0)                 13 components 8KB average size Manager(2 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Manager(2 servers)>

So all my servers were registered, they used 7 and 8 kb of storage and the hash space was distributed evenly:

hex(9223372036854775808)  ==  '0x8000000000000000L' hex(18446744073709551616) == '0x10000000000000000L'

To add a third server, we need to first add it to the data store server master:

Manager(2 servers)> addserver 54.171.117.72 7000 Master server allows us to add server 54.171.117.72:7000 Do you really want to add server //54.171.117.72:7000? (y/n) y ============================================= Operation completed. To rebalance server data you have to do the following:         1. Add '//54.171.117.72:7000' to Dataserver.server_list in your configuration file.         2. Start the new server at 54.171.117.72:7000         3. Run 'rebalance'

The server was added, but it has an empty hash range so far:
Manager(3 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Server 2 54.171.117.72:7000 0% [18446744073709551616, 18446744073709551616[

So I took down all servers, added the third server to the config on the master and the GRR server, and brought everything back up (including the new data store server). Afterwards, I just used the data store manager to do a rebalance operation to actually fill the new server with data:
> sudo PYTHONPATH=. python grr/server/data_server/manager.py --config=grr/config/grr-server.yaml Manager(3 servers)> ranges Server 0 54.171.141.180:7000 50% [00000000000000000000, 09223372036854775808[ Server 1 54.171.117.245:7000 50% [09223372036854775808, 18446744073709551616[ Server 2 54.171.117.72:7000 0% [18446744073709551616, 18446744073709551616[ Manager(3 servers)> rebalance The new ranges will be: Server 0 54.171.141.180:7000 33% [00000000000000000000, 06148914691236516864[ Server 1 54.171.117.245:7000 33% [06148914691236516864, 12297829382473033728[ Server 2 54.171.117.72:7000 33% [12297829382473033728, 18446744073709551616[ Contacting master server to start re-sharding... OK The following servers will need to move data: Server 0 moves 30KB Server 1 moves 69KB Server 2 moves 14KB Proceed with re-sharding? (y/n) y Rebalance with id e03c2a40-06cf-46f1-ab27-dcc30aa55ce6 fully performed. Manager(3 servers)> sync Sync done. Manager(3 servers)> ranges Server 0 54.171.141.180:7000 33% [00000000000000000000, 06148914691236516864[ Server 1 54.171.117.245:7000 33% [06148914691236516864, 12297829382473033728[ Server 2 54.171.117.72:7000 33% [12297829382473033728, 18446744073709551616[ Manager(3 servers)> servers Last refresh: Thu Oct 23 13:09:51 2014 Server 0 54.171.141.180:7000 (Size: 70KB, Load: 0)                 10 components 7KB average size Server 1 54.171.117.245:7000 (Size: 72KB, Load: 0)                 9 components 8KB average size Server 2 54.171.117.72:7000 (Size: 83KB, Load: 0)                 10 components 8KB average size Manager(3 servers)>


And it worked, all servers were equally loaded.

No comments:

Post a Comment