Monday, November 17, 2014

Monitor Outgoing Internet Connections - #2 (Interim)

Originally posted at OpenWRT forum at https://forum.openwrt.org/viewtopic.php?pid=254679#p254679 


Hi folks,
short update with option 3 (DNSMASQ logging) after 24 hrs with one device connected:
- browsing experience on iPad is NOT slower than normal - great!
- router is working fine (my log/grep process taking 1% CPU/Mem)
- logfile has grown to 800kB in a day with 7500 log entries (plenty of memory left...)
- download log to PC and quick analysis in Excel (pivot table & little string formatting)
- quick graphs to show #of DNS queries by minute, top-requested domain names

And voila, here you are: 
DNS Queries by Minute/by Domain



Summary: Option 3 (DNS Query logging)
- no noticable performance degredation
- logfile size is manageable
- detail level okay (domain name only, not the full URL)
Improvement ideas:
- switch more devices to my modded DNS server (performance? log size?...)
- check that really every connection shows up in DNS log (at least per minute)
- restart log process every day, and offload previous day via Email/FTP
- minimize logfile by writing only relevant text (grep/awk/sed magic)
- create log analysis files in HTML with charting via google chart api?
- start filtering all these advertising trackers
Comments to my commenters:
  • CPU power on router is fine (1%), analyis is done in Excel. I'm not so interested in bandwidth tracking, more in domains/URLs with traffic
  • yes, DNS-logging can be circumvented by changing DNS setting on client. And, my log does not show traffic to IP addresses (without hostnames). I want to log/observe - not control access.

Also, TCPDUMP does NOT run in background even when starting from scripts - google "tcpdump in background" for plenty of people having tried this route (and I did not find a success story).

Stay posted for updates.....

Sunday, November 16, 2014

Monitor Outgoing Internet Connections - #1 (Starting)

Originally posted at OpenWRT forum at https://forum.openwrt.org/viewtopic.php?id=54048 

Hi folks!
Wanted to share my experience and get some feedback and advice. My objective is to log and analyze/visualize all Internet connections of my local network to get visibility of internet usage by individual devices.
Setup: cable modem (100MB down, 6MB up) – with Fritzbox 7390 for WLAN connectivity. Amazingly many networked devices (iPad, Chromebook, smartphone, Internet radio, Skype appliance, home pc, work laptop, blue ray player, little NAS, …).
Got me a TP-Link WDR3600 for 44€ at my local Mediamarkt and installed OpenWRT – worked flawless. Nice.
Now, how do I capture/log all Internet traffic?

Option 1 – transparent proxy with TinyProxy
Installation worked fine (opkg install tinyproxy luci-app-tinyproxy). Before changing routing for transparent proxy I manually changed client to use proxy. Web performance was noticable slower. And it proxies HTTP only, for HTTPS I have to install my own certificates on all devices? Is there a simpler way? And honestly, routing on OpenWRT is a piece of work. By default, the logs show the URL without the hostname.
Internet reference (http://www.farville.com/home-networks-a-transparent-proxy-to-monitor-kids/) requires another server for log processing (yes, I already got a Raspberry lying around, but no…).

Option 2 – TCPDUMP
If setting up a transparent proxy and logging is so complex, why not listening to the traffic directly? Wireshark/Tshark is not available on OpenWRT, but TCPdump is.
Installing TCPdump is very easy: “opkg install tcpdump”.
Then use “tcpdump –D” to show all network interfaces, or “tcpdump –q –tttt” to show all connections with timestamp.
tcpdump -q -i br-lan –tttt
2014-11-14 16:51:16.282484 IP iPadwwwi2daycom.lan.53516 > ea-in-f94.1e100.net.https: tcp 0
2014-11-14 16:51:16.282818 IP iPadwwwi2daycom.lan.53516 > ea-in-f94.1e100.net.https: tcp 0
But holy moly, it logs every fart in the air. Every Ping, every ARP, everything. Piping the simplified output (without the actual packet data) to file generates a few MB for a few minutes. And without some magic it only shows the host, not the visited URL.
tcpdump –q –i br-lan –tttt > tcpdump.log
2014-11-14 19:11:04.356741 IP iPadwwwi2daycom.lan.54532 > 195.10.18.43.https: tcp 0
2014-11-14 19:11:04.357311 IP iPadwwwi2daycom.lan.54533 > 195.10.18.43.https: tcp 0
2014-11-14 19:11:04.400291 IP 195.10.18.43.https > iPadwwwi2daycom.lan.54532: tcp 0
But holy moly, it logs every fart in the air. Every Ping, every ARP, everything. Piping the simplified output (without the actual packet data) to file generates a few MB for a few minutes. And without some magic it only shows the host, not the visited URL.

tcpdump –q –i br-lan –tttt > tcpdump.log
2014-11-14 19:11:04.356741 IP iPadwwwi2daycom.lan.54532 > 195.10.18.43.https: tcp 0
2014-11-14 19:11:04.357311 IP iPadwwwi2daycom.lan.54533 > 195.10.18.43.https: tcp 0
2014-11-14 19:11:04.400291 IP 195.10.18.43.https > iPadwwwi2daycom.lan.54532: tcp 0

I guess I have to invest some time in setting up proper filtering.
But here is the showstopper: tcpdump does not run in background. After you kill the SSH session, TCPdump will stop too (even if you run with “&” parameter). Did not find any successful recipe on the Internet (tried “screen” too).

Option 3  - DNS logging
Another direction is to monitor all DNS requests coming from the local network. OpenDNS offers some services/functionality here. Of course, this will only display the Internet host and not the full URL either.
Apparently OpenWRT ships with DNSMASQ for DHCP and DNS. The manpage at http://www.thekelleys.org.uk/dnsmasq/docs/dnsmasq-man.html shows us how to manipulate the DNS answers for a short Time-To-Live – so that every website visit triggers a new DNS query (even if the client should remember the correct IP from a minute ago).
First install text editor nano (alternative to VI) via “opkg install nano”.  Now configure DNSMASQ with “nano /etc/dnsmasq.conf”.
Add the following lines:
# Set the TTL value returned in answers from the authoritative server.
max-ttl=0
auth-ttl=0
Finally restart dnsmasq with “reboot” or “killall dnsmasq” and “/etc/init.d/dnsmasq start”.
Now you can read every DNS query on syslog with “logread | grep "query\[A"” (try logread to see every message). And with “logread –f | grep “query\[A” >> dnsmasq.log &” we write all new entries to a logfile in the background (can disconnect from SSH session). The logfile is only a few dozen kB after an hour and looks like this:
Sun Nov 16 18:14:56 2014 daemon.info dnsmasq[11659]: query[A] [url=http://www.facebook.com]www.facebook.com[/url] from 192.168.1.244
Sun Nov 16 18:14:56 2014 daemon.info dnsmasq[11659]: query[AAAA] [url=http://www.facebook.com]www.facebook.com[/url] from 192.168.1.244
Sun Nov 16 18:14:57 2014 daemon.info dnsmasq[11659]: query[A] farm.plista.com from 192.168.1.244
Sun Nov 16 18:14:57 2014 daemon.info dnsmasq[11659]: query[AAAA] farm.plista.com from 192.168.1.244
Sun Nov 16 18:14:57 2014 daemon.info dnsmasq[11659]: query[A] csi.gstatic.com from 192.168.1.244
Sun Nov 16 18:14:57 2014 daemon.info dnsmasq[11659]: query[AAAA] csi.gstatic.com from 192.168.1.244
Sun Nov 16 18:14:58 2014 daemon.info dnsmasq[11659]: query[A] pubads.g.doubleclick.net from 192.168.1.244
Sun Nov 16 18:14:58 2014 daemon.info dnsmasq[11659]: query[AAAA] pubads.g.doubleclick.net from 192.168.1.244
Sun Nov 16 18:14:58 2014 daemon.info dnsmasq[11659]: query[A] [url=http://www.google-analytics.com]www.google-analytics.com[/url] from 192.168.1.244
Sun Nov 16 18:14:58 2014 daemon.info dnsmasq[11659]: query[AAAA] [url=http://www.google-analytics.com]www.google-analytics.com[/url] from 192.168.1.244
Sun Nov 16 18:14:58 2014 daemon.info dnsmasq[11659]: query[AAAA] partnerad.l.doubleclick.net from 192.168.1.244
Nice, excellent! Small logfile, every webvisit on there, timestamp and IP address of local host. Now I just have to filter out all the advertising BS associated with every website (create your own adblocker by manipulating the DNS records for all these trackers).
Now I just have to download the logfile, and display connections over time (still have to figure this one out).


Summary:
I am surprised that there are not that many posts on the Internet for this specific use case (why has nobody posted a simple how-to?)
I am surprised how much traffic a simple website generates (yes, I knew this before, but seeing it is different).
I am curious if this forum can give me some new pointers/ideas that actually work!