[HOWTO] Can Graylog tell me who my IoT devices are talking to?

Coming soon - Get a detailed view of why an account is flagged as spam!

view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

6

[HOWTO] Can Graylog tell me who my IoT devices are talking to?

Author Summary

BourbonInExile is in HOWTO

Post Body

Hi, it's Rob again. For those that don't know me, I'm the North American Engineering Team Lead here at Graylog. I posted a while back about getting Graylog up and running in Docker on my Raspberry Pi. More recently, I posted about getting Graylog to parse the DHCP logs from my router.

This weekend, I finally got around to getting GeoIP lookups set up and combining them with the data from my firewall and my Pi-hole DNS server so I could visualize who's trying to talk to my home network and who my home network is trying to talk to. You can see my tweet about it here.

What am I skipping?

I'm not going to get into the details of installing Pi-hole. You can find that info here. I talked about getting the logs from Pi-hole into Graylog in my first howto post.

Also, if you look at the pipeline rules for my DNS parsing pipeline, you'll see that I have lookup tables for hostname and friendly hostname values. These use a MongoDB data adapter and get populated in my DHCP processing pipeline.

Setting up GeoIP lookups

To get started on the GeoIP stuff, I recommend you go have a look at Nick's post over on the official Graylog blog. It gives a pretty good overview of the subject. As mentioned in Nick's blog, I decided to use the MaxMind GeoLite2 city and ASN databases.

Once I had the MMDB files downloaded, I realized I needed to put them in a location where Graylog (running in Docker) would be able to see them. I finally settled on adding a new bind volume in my docker-config.yml. So I put the MMDB files in /mnt/graylog/data and when I set up the data adapter in Graylog, I told it the files were in /usr/share/graylog/data/geo/.

DNS Processing

The first thing I did was set up a new Index Set and Stream for my Pi-hole logs. For the stream rule, I'm using application_name must match exactly dnsmasq. With all of my Pi-hole logs going into a stream, I now have something to attach a pipeline to.

The first stage of the pipeline simply changes the source value on the messages. The box Pi-hole is running on is called "plex" but I want these logs to be identifiable at a glance as coming from Pi-hole.

The next stage has five rules, one for each of the 5 different types of logs Pi-hole sends: Query, Cached, Forwarded, Blocked, and Reply. One of the frustrations I had with the DNS logs was that the host requesting the DNS lookup only gets mentioned in the Query log. To address this, I added a new MongoDB-backed lookup table. On each query log, I add an entry to this lookup table using the dns query as the key and the IP of the requesting machine as the value. In the other 4 rules, I do a lookup on this table (because they include the dns query) to get back the identity of the host that the response is going back to.

In the final stage, I do the GeoIP lookup and add the data to the message. Getting the city and country out of the lookup response was easy enough, but the state is weirdly nested, so I had to get kind of creative to get that.

Dashboarding

Once all the data is flowing and the GeoIP enrichment is working, it's time for the fun part: setting up a dashboard. Since one of the selling points of Pi-hole is that it can block nasty sites (ads, trackers, malware, etc), I wanted to know how many queries were being made and how many queries were being blocked. For that, I used a couple of Single Number aggregation widgets. I also threw in a Data Table aggregation widget to get counts for the most-queried-for domains. But the real star of the show is the World Map widget. I'm using my destination_geo_coordinates field for the rollup column (with the limit bumped up to 500) and dropping in a count for the metric.

Thanks to my hostname/friendly hostname lookup tables, I'm also able to do a breakout tab just for our IoT devices (Kasa plugs and bulbs). This dashboard is using most of the same widgets with the query changed to source_friendly_hostname:("Kasa Smart Wifi Plug Mini", "Kasa Smart Wifi Plug", "Kasa Smart Wi-Fi Plug Mini", "Kasa Smart Wifi Bulb", "Kasa Smart Light Bulb"). It could probably be cleaned up a bit, but it's a pretty good start.

Hopefully this helps a few folks get more useful (or at least interesting) info out of their log data. As always, I've uploaded all the pipeline rules to my Github repo. And, of course, I'm happy to try and answer any questions you might have down in the comments.

Author

Account Strength

100%

Account Age

12 years

Verified Email

Yes

Verified Flair

No

Total Karma

43,172

Link Karma

4,310

Comment Karma

33,301

Profile updated: 12 hours ago

Posts updated: 8 months ago

BourbonInExile

Subreddit

r/graylog

Post Details

Location

HOWTO

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 2 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/graylog/com...