Splunk

Main Roles

  • Admin: can install apps, create knowledge objects for all users.
  • Power user: can create and share knowledge objects for users of an app and do real-time searches.
  • User: only sees their own knowledge objects and those shared with them.

Apps

Apps are preconfigured environments, like workspaces built to solve a specific use case. Splunk Enterprise comes with 2 default apps:

  • Home app: quick place to explore and launch other apps, add custom dashboard.
  • Search and Reporting app: default interface for searching and analyzing data.

Source types

Within the Search and Reporting app, click on Data Summary to list available hosts, sources and source types.

Windows Event Log (sourcetype=WinEventLog)

Splunk usually contains security, system and application logs – collected via Splunk Universal Forwarder.

Microsoft Sysmon

A free SysInternal tool available from Microsoft. Events collected by Splunk Universal Forwarder.

Search modes

  • Fast mode: only returns default fields or necessary to fulfill search, field discovery is disabled.
  • Verbose mode: returns as much fields and event data as possible, discovering all the fields it can.
  • Smart mode (default): toggle behavior based on type of search you are running.

Search Processing Language (SPL)

SPL is designed by Splunk for use with Splunk software. SPL encompasses all the search commands and their functions, arguments, and clauses. Its syntax was originally based on the Unix pipeline and SQL. The scope of SPL includes data searching, filtering, modification, manipulation, insertion, and deletion.

  • Search Terms: foundation of search queries
  • Commands: tells Splunk what to do with search results (create charts, compute statistics, and formatting).
  • Functions: explains how we want to chart, compute and evaluate the results.
  • Arguments: variables to apply to the functions
  • Clauses: explains how we want results grouped or defined

Search Commands

The pipe “|” sends the results to next component (like in Unix). Commands starting with “|” do not search into raw log files.

Search returns a list of events.

Index adds structure to unstructured data.

BEST PRACTICE: Select a time range (on the right of the search bar) for faster results. After time, the most efficient default fields are index, source, host and sourcetype.

any words you want

There is a Save As button on the top right corner to save search as a knowledge object.

Event types

Common searches can be saved by clicking Save As -> Event Type. Event type provide a way to categorize data. It can be later used in search bar as:

eventtype=<event name>
eventtype=pentesting

Using wildcard

Wil return “fail”, “failure”, “failed”, and more…

fail*

Escape characters

info="user \"john\" not in database"

Comments

``` | this is commented ```

Boolean operators

NOT case sensitive: Search terms
CASE sensitive: Boolean operators (NOT, OR, AND), command references are (like “replace www1 with server1”, www1 is case sensitive), field names (but not the value).

AND is implied when using multiple search terms without specifying Boolean operators. Order of evaluation (unless in parentheses) is: NOT, OR, AND.

failed NOT password
failed OR password
failed password
failed AND password
failed NOT (success OR accepted)

Exact terms

"failed password"

Fields and field operators

“| fields” allows to include or exclude specific fields from search results. It defaults to inclusion “| fields +status”, but can remove fields using “| fields -status”

Field operators:

  • “=”, “!=”: Can be used with numerical or string values.
  • “>”, “>=”, “<“, “<=”: only with numerical values
index=security sourcetype=linux_secure action=failure host!="mail*"
index=web status IN ("500", "503", "505")
index=web status IN ("500", "503", "505")
| fields status
| stats count by status
| rename status as "HTTP Status"

Will add product_name and price to returned fields.

index=web sourcetype=access_combined product_name=*
| fields product_name price

Will remove the raw data that is included by default.

index=web sourcetype=access_combined product_name=*
| fields -_raw

Will remove BOTH product_name and price from fields because there is a space between “-” and the field name.

index=web sourcetype=access_combined product_name=*
| fields - product_name price

Will remove ONLY product_name because there is NO space between “-” and product_name.

index=web sourcetype=access_combined product_name=*
| fields -product_name price

Events where status field is not 200 – excluding events without a status field

status!=200

Events that does not have a field of status=200 – including events without a status field

NOT status=200

| table

Best human readable format 😉 Use the fields command before table command to limit the search to return only these fields before formatting them. The fields command MUST include all the fields used by the table command.

The table command is similar to fields command in that specified fields are kept in the results. Transforming command that retains the data in a tabulated format.

index=web sourcetype=access_combined product_name=*
| fields JSESSIONID product_name price
| table JSESSIONID product_name price

| dedup

Remove duplicates. NOT like the DISCTINCT function in SQL.

Be careful as this can keep only one event when providing the wrong field.

index=web sourcetype=access_combined product_name=*
| fields JSESSIONID product_name price
| table JSESSIONID product_name price
| dedup JSESSIONID

| addtotals

Will compute the sum of all numeric fields (per row) by default and will add a “Total” column. Use “fieldname” to rename the column.

index=sales sourcetype=vendor product_name=* VendorCountry IN ("United States", "Canada")
| chart sum(price) over product_name by VendorCountry
| addtotals
| addtotals fieldname="Total By Product"

Will add a column summary, displayed as an added row at the end without a label. Use “label=” to set a name, and “labelfield” to choose in which existing column to display the result.

| addtotals col=true label="Total Sales" labelfield="product_name"

Remove the row total (the added “Total” column).

| addtotals col=true label="Total Sales" labelfield="product_name" row=false

| fieldformat

Format the appearance of fields values without changing the underlying raw data.

index=sales sourcetype=vendor product_name=* VendorCountry IN ("United States", "Canada")
| chart sum(price) over product_name by VendorCountry
| addtotals
| fieldformat Total = "$" + tostring(Total, "commas")

Transforming commands

Order search results into a data table that can be used for statistical purposes. Required to transform search results into Visualizations (require at least 2 columns, col1=x, col2=y).

| top
| rare
| stats
| chart
| timechart
| trendline

Most common values of given fields in a result set. Returns count and percent for Vendors, limits results to top 10 by default. All results: add “limit=0”. Remove percentage column: “showperc=false”. To add a row regrouping all other vendors, add “useother=true”.

index=sales sourcetype=vendor_sales
| top Vendor product_name
| top Vendor product_name limit=0
| top Vendor product_name limit=5 showperc=false countfield="Number of Sales" userother=true

Top 3 products sold per Vendor

| top product_name by Vendor limit=3 countfield="Number of Sales" showperc=false

“| rare” command: same as “top” but with the least common values. Has exact same options.

| rare Vendor limit=3 countfield="Number of Sales" useother=true

Common stats functions: count, distinct count, sum, average, min, max, list, values.

index=sales sourcetype=vendor_sales
| stats count as "Total Sales by Vendors" by product_name, categoryId

The “| chart” commands can take two clause statements: “over” (x axis) and “by” (adds grouping or columns in x axis, can only specify one field in “by”). The y axis must always have numeric values.

index=web sourcetype=access_combines status>299
| chart count over status by host usenull=false useother=false
index=web sourcetype=access_combines status>299 product_name=*
| chart count over host by product_name limit=5

“| timechart”: performs stats aggregation against time. Time is always the x axis.

index=sales sourcetype=vendor_sales
| timechart span=12h count by product_name limit=0

“| trendline”: computes moving averages of field values. Requires 3 arguments:

  • Trend Type: simple moving average (sma), exponential moving average (ema), weighted moving average (wma). Compute the sum of data points over a period of time (ema and wma assigned more weight to more current data).
  • Time Period: period of time used to compute the trend, integer between 2 and 10 000 (in days). For example, wma2 = 2 days.
  • Field name: field to calculate the trend from.
index=web sourcetype=access_combines action=purchase status=200
| timechart sum(price) as sales
| trendline wma2(sales) as trend

Using index

To validate, best practice is to start search by index name or index=*

index=* any words you want

List source types

| metadata type=sourcetypes
| tstats values(sourcetype) where index=*

Example – Number of visits to prohibited site, grouped by username

index=network sourcetype=cisco_wsa_squid usage=Violation
| stats count(usage) as Visits by cs_username

Example – Number of visits to prohibited site > 1, grouped by username

The “| search” allows to filter results further as normal search does.

index=network sourcetype=cisco_wsa_squid usage=Violation
| stats count(usage) as Visits by cs_username
| search Visits > 1

Example – HTTP error when purchasing an item

index=web sourcetype=access_combines action=purchase status!=200

Example

index=security sourcetype=linux_secure host!=mail* "failed password" | top limit=20 src_ip

Temporary fields – Example – Bandwidth usage

index=network sourcetype=cisco_wsa_squid
| stats sum(sc_bytes) as Bytes by usage
| eval bandwidth = Bytes/1024/1024

Calculated fields (stored)

Can only reference fields already present in the events returned by a search (already extracted).

index=network sourcetype=cisco_wsa_squid
| stats sum(sc_bytes) as Bytes by usage
| eval bandwidth = Bytes/1024/1024

Extract fields from data that were not automatically extracted

Use Field Extractor, or use regular expressions to extract temporarily for the duration of a search:

Like automatic Field Extractor: Give a sample of values, and Splunk will decide what field to extract. Splunk requires sample data to generate the RegEx.

| erex <field name that will appear in search> fromfield=_raw examples="..."
index=games sourcetype=SimCubeBeta
| erex Character fromfield=_raw examples="pixie, Kooby"

To see values that were missed during extraction (add more samples in examples to fix).

index=games sourcetype=SimCubeBeta
| erex Character fromfield=_raw examples="pixie, Kooby"
| where isnull(Character)

Extract fields at search time using regular expressions (named “capture groups”) on fields values or raw data. Does not require sample data. Example to extract User and Character fields from raw data:

BEST PRACTICE: When possible, use rex.

index=games sourcetype=SimCubeBeta
| rex field=_raw "^[^'\n]*'(?P<User>[a-zA-Z0-9_.-]+@[a-zA-Z0-9.]+\.[a-zA-Z0-9.]+)'\s[a-zA-Z:]+'(?P<Character>[a-zA-Z0-9.-]+)"

List events from Windows Event Log

4688 Process Auditing with full command line arguments

sourcetype=WinEventLog EventCode=4688

Search specific words from Windows Event Log events.

sourcetype=WinEventLog SomeFreeText

Add a new field (hex_convert_pid) to convert process ID in hex.

sourcetype=WinEventLog EventCode=4688
| eval hex_convert_pid=tonumber(New_Process_ID,16)

Pivoting on time is a key Splunk investigation tactic…

  • Perform any search.
  • On the event of interest, click on the time (is a link).
  • A pop-up window will appear asking a number of seconds (or else). Choose what seems appropriate.
  • The time interval for the search will change. In the search area, enter *

Summarize data (stats)

Analyze all Sysmon data. Count each event by every unique combination of both EventCode and EventDescription, then sort descending by the resulting count.

sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"
| stats count by EventCode,EventDescription
| sort -count

Knowledge Objects

Tools to help users discover and analyze your data. Knowledge objects are grouped into 5 categories:

  • Data Interpretation: Fields, Field Extractions, Calculated Fields
  • Data Classification: Event Types, Transactions
  • Data Enrichment: Lookups (information not included in the index data), Workflow Actions (create links within events that interact with external resources or narrow our search)
  • Data Normalization: Tags (labels), Field Aliases (normalize fields amongst many sources)
  • Data Models: hierarchically structured datasets (events, searches or transactions)

Knowledge objects: fields, field extractions, field aliases, calculated fields, lookups, event types, tags, workflow actions, reports, alerts, macros, data models. They are private to the user by default. Can be share to specific app (requires Power user or Admin role) or all apps (Admin role).

Order of evaluation: Field Extractions _> Field Aliases -> Calculated Fields -> Lookups -> Event Types -> Tags

Reports and Dashboards

Dashboards are a collection of reports.

Save and share searches

By default, the report will display for the owner only. This can be changed in the report Edit Permissions -> Display For. Select App, and select roles that should be able to read/write.

  • Do a search.
  • On top right corner, click Save As -> Report.
  • Enter a name