I love programming and programming logic, so the Splunk Search Processing Language (shortened to SPL) was a reasonable jump when I began learning it in the course of my daily work as a Threat Hunter. Recently, I started tackling several problems regarding how to tease some user behavioral analytic data from the varied logs and other data ingested by Splunk.
This particular query should show login times, IP addresses, and login location information (based on geolocation for the source IP address) for a particular user. Presumably, a value would be suspect if one of the following criteria were met:
- A login from the same user occurs from a different IP address/location simultaneously
- A login from the same user occurs from a different IP address/location shortly after previous login from an improbable distance away
- A login from the same user occurs from a different IP address/location and there is external evidence that it was not that user (ie. they had logged off for the evening, they did not travel to the location indicated, etc.)
I share this with you all, in the hopes it helps in finding the Bad in your enterprise.
There are a couple of requirements in order to make this query work in its entirety. Number one is (somewhat obviously) data that encompasses some sort of user login information, such as Windows logs, Syslog data, Active Directory logs, IPA or RADIUS logs, web authentication data, etc. Perhaps you have all of that, which can help make this work even better by ensuring all avenues of authentication have been covered.
Next, you'll require the Splunk Common Information Model (CIM) app, which can be found here:
https://splunkbase.splunk.com/app/1621/. This app ensures that, no matter what sort of data you have, the same sort of data can be addressed the same way (ie. "src_ip" instead of "srcip" or "source_ip"). Essentially, this application allows you to create something like a relational database of all your like data, so that no matter where it came from or what source type it is, it can be addressed in one way instead of a long list of OR statements.
Here's the query:
| from datamodel:"Authentication"
| search user="*USERNAME*"
| `get_asset(src)`
| iplocation src
| sort 0 + _time
| eval session_lat=if(isnull(src_lat), lat, src_lat)
| eval session_lon=if(isnull(src_long), lon, src_long)
| eval session_city=if(isnull(src_city), City, src_city)
| where isnotnull(session_lat) and isnotnull(session_lon)
| sort 0 + _time
| streamstats current=t window=2 earliest(session_lat) as prev_lat, earliest(session_lon) as prev_lon, earliest(session_city) as prev_city, earliest(_time) as prev_time, earliest(src) as prev_src, latest(user_bunit) as user_bunit by user
| where (src!=prev_src)
| `globedistance(session_lat,session_lon,prev_lat,prev_lon,kilometers)`
| table _time src distance session_city session_lat session_lon
Below, the query is broken out further. The query is listed on the left, and a description of each statement on the right.
SPL Query |
Description of SPL Line |
| from datamodel:"Authentication" |
Pull data from “Authentication” Data Model (from Splunk_SA_CIM) |
| search user="*USERNAME*" |
Match username (replace USERNAME with your username) |
| `get_asset(src)` |
Initiate get_asset macro to pull source IP from data, regardless of sourcetype |
| iplocation src |
Geolocate source IP |
| sort 0 + _time |
Sort events by time in ascending order |
| eval session_lat=if(isnull(src_lat), lat, src_lat) |
Get Latitude from source events |
| eval session_lon=if(isnull(src_long), lon, src_long) |
Get Longitude from source events |
| eval session_city=if(isnull(src_city), City, src_city) |
Get City from source events (OPTIONAL) |
| where isnotnull(session_lat) and isnotnull(session_lon) |
Use only events with a Lat/Long |
| sort 0 + _time |
Sort events by time in ascending order |
| streamstats current=t window=2 earliest(session_lat) as prev_lat, earliest(session_lon) as prev_lon, earliest(session_city) as prev_city, earliest(_time) as prev_time, earliest(src) as prev_src, latest(user_bunit) as user_bunit by user |
Pull out event data from two adjacent events for comparison |
| where (src!=prev_src) |
Make sure adjacent events don’t have the same IPs |
| `globedistance(session_lat,session_lon,prev_lat,prev_lon,kilometers)` |
Initiate globedistance macro to get distance between geolocated IPs |
| table _time src distance session_city session_lat session_lon |
Display table with Time, IP, Distance from last different IP, IP City, Lat, and Long |
That query took a lot of time to research, figure out, and get right,
so I'm pretty proud of it. I'd like to thank the good folks over at
GoSplunk for their service as a repository for useful Splunk queries,
and (of course) Google for...well...allowing me to google all the
obscure InfoSec items I do.
If you see something wrong, or something that could be executed better, please drop me a line and let me know.
References:
splunk.com
docs.splunk.com/Splexicon
splunkbase.splunk.com/app/1621
gosplunk.com
google.com