Splunk is a big data tool where big data is classified by the 3 vs and can be used to detect trends and find malicious activity that doesn’t fit the normal trend of data using machine learning to classify what is considered malicious.
3Vs for big data
Volume - A large amount of data
Velocity - The speed the data is produced
Variery - The range of data types that need to be stored as the data is comming from a range of different data sources
Introduction
Splunk was made to search, monitor, and analyse big amounts of data normally produced by machines talking to another machine (machine generated data). Splunk can be ran on all major platforms, virtual machines and can now be ran on the cloud using Splunk cloud. Splunk was designed to index a variety of data which is achieved by pre-defined configurations in the Splunk config and technical adds-ons can be added to Splunk which are developed by the community.
The key splunk components are the following forwarders, indexers, search heads and will now be discussed in more detail.
Fowarders
This is what consumes the data for Splunk, so these forwarders are ran on the devices that are creating the data. Once the forwarder has collected the data the data is then forwarded onto the indexer the next part of the component in Splunk architecture. There are two types of forwarders the universal and heavy the universal forwarder just forwards passes on the data to the indexer where the heavy forwarders perform other tasks like parsing the data and extracting felid data or filtering data before it gets past to the indexer so that only relevant data is indexed. However, both forwarders perform these tasks on the data: Buffer and compress data, break data into 64kb blocks and assign metadata to incoming data like the source and source type.
Indexer
This component is responsible for indexing it converts the data from the forwarders into Splunk events and then these events are then stored in an index there are two types of indexes events and metrices. The indexer should have a high input / output capacity since it will need to do a lot of reading and writing to the disk when indexing the data and if this capacity is low could be a bottle neck to splunk. Multiple indexers can be combined to form cluster to increase data availability, data recovery and search capabilities. The component responsible for searching through the data that has just been indexed is the search head. The inputs for splunk can be edited in a file called the inputs.conf or using the splunk web interface go to settings and then add data where the forwarder can be configured to accept different types of data.
Search heads
This component is that the user will interact with through the browser interface and can help visualize the data. To search the data search heads, use search processing language (SPL) which works by sending a SPL query to the indexers in the form of bundles. The indexers then find the data that corresponds to the query and sends the results back to the search head and displaying them to the user. The search heads can coordinate their searches across multiple indexers and search heads can be combined like indexers to form an index cluster which allows for quicker searches and redundancy even if one search head was to fail.