WorryFree Computers   »   [go: up one dir, main page]

New to Google SecOps: Introducing Statistical Search

jstoner
Staff

Last year, Google Security Operations added the pivot capability to aggregate and calculate statistics in UDM search. This allowed users the ability to take a UDM search and use an interface to aggregate values, like this.

ntc-stats-01.png

Fast forward to this year, and aggregated search with statistics has taken a big step forward with statistical outcomes now available directly from search!

Let me start by saying that the fundamentals of search have not changed a bit, so if you are looking for all of your network connections from your Zeek sensor, that is still the same search you’ve used before.

 

metadata.event_type = "NETWORK_CONNECTION" and metadata.vendor_name = "Zeek"

 

Statistical search introduces structural concepts of YARA-L to search. For those unfamiliar with YARA-L, Google SecOps uses YARA-L for building detection rules. Rules follow a standard format and search leverages this structure and capabilities to handle aggregation and statistical outcomes. Oh and one more thing, search can now also leverage functions that were previously only available in rules. This unlocks a lot of potential for users!

A good search to start with is the Top 10 IP Address Pairs by event count. We start our search with our filtering statement. This is simply the boolean expression of the fields and values that we are interested in. We can use parentheses as well as and, or and not to organize the search. The filtering statement contains regular expression and/or string comparisons to find the events we are looking for. In our example, the filtering example is:

 

metadata.event_type = "NETWORK_CONNECTION" and metadata.vendor_name = "Zeek"

 

ntc-stats-02.png

Our result set contains a few thousand rows, based on our timeframe, and contains a number of fields including the ones shown here. As we move into generating a statistical search, we need to determine what fields we are going to aggregate. The principal.ip and target.ip fields are obvious choices since we are looking for a count by address pairs. This is where we start applying YARA-L concepts to search.

In a YARA-L rule, we can aggregate multiple events together by one or more variables and these variables are represented in the match section. This same concept is applied to search. It is important to remember that UDM fields cannot be placed in the match section, they must be represented by variables, so we need to define these variables in our filtering statement prior to the match section.

Currently in search, we support three types of variables; placeholder, match and outcome. Here we see the placeholder and match variables in action. The placeholder variables reside in the filtering statement and in this case we are using both placeholder variables as match variables to group, or aggregate, our search by the common values in these two fields.

Commas separate variables in the match section. For those familiar with writing rules in YARA-L, you may notice that search does not contain the term over followed by a time. In rules, matches occur over a window of time as rules are continually evaluating new data. In search, the time window is represented by the range specified in the time picker, so there is no need to specify in the match section a window. This provides users the ability to perform a search over the past hour, day, week, or month and aggregate over the entire period.

ntc-stats-03.png

When we execute our search, we get a listing of 140 IP address pairs from our Zeek network connection events in the time window we selected. That’s a good start but that doesn’t tell us which ones are in the top 10, nor does it provide an idea of the number of events per pair. To get there, we need to move to the next part of our search; the outcome section.

The outcome section in YARA-L is where aggregation, mathematical and logic operations take place. These capabilities have been available in rules previously but are now available in search! We will get to additional aggregation functions, but let’s focus for the moment on the function count_distinct. In our search, we want a count of every row in our result set, grouped by the principal and target IP addresses. The match section grouped the IPs, now we want to count the events. The outcome section is where this calculation occurs and the output of that calculation is written to the $event_count outcome variable. The name of this variable like all variables is arbitrary, btw.

ntc-stats-04.png

Now when we run our search, we get our IP address pairs but this time an additional column is available that displays the number of events for each pair. Notice we still have 140 pairs of IPs and the first one on the list has three events, another has 12, and the listing is a little all over the place. We need to add some organization to our output. This is where two new sections of YARA-L, order and limit, are going to be used.

Order and limit are not part of YARA-L for rules, but they are new sections that provide a method to organize our search results. The order section is focused on sorting data. Variables are used in the order section and these variables must already exist in the match or outcome section to be used in the order section. Fields can be sorted in ascending (asc) or descending (desc) order with the default of ascending. Sorting can occur by multiple fields and contain a mix of ascending and descending. Each field must be separated by a comma.

The limit section expects an integer which places a maximum on the number of rows returned. Because we wanted a top 10 search by event count, we first sorted by event count and then limited our results to 10 rows.

 

metadata.event_type = "NETWORK_CONNECTION" and metadata.vendor_name = "Zeek"
$principal_ip = principal.ip
$target_ip = target.ip
match:
   $principal_ip, $target_ip
outcome:
   $event_count = count_distinct(metadata.id)
order:
   $event_count desc
limit:
   10

 

ntc-stats-05.png

Notice that the event count has a down arrow indicating descending sort and at the top of the results we see that 10 rows of statistics have been returned where previously the number was 140.

So there you have it, your first statistical search using YARA-L constructs in Google SecOps! We just scratched the surface and in our next blog, we will take our basic search and extend it further to demonstrate how statistical search can be broadened even further.

1 2 733
Authors
2 Comments
Fred_Frey
Bronze 2
Bronze 2

@jstoner thanks for posting this! This opens up quite a few powerful ideas. 

I copied your search and ran it in my instance and getting an error ... narrowing the error down it seems it doesn't like me assigning the variables (ie: $principal_ip) ... any thoughts of what I may be doing wrong?

Fred_Frey_0-1716901122401.png

 

jstoner
Staff

Since this is a feature coming out in preview, i am aware they are rolling it out slowly across tenants. Perhaps the instance you are on is not enabled yet as the error for a variable would be more specific to line of search it is on. I would suggest opening a support case and ask if it is enabled and if it isn't, can they ensure that statistical search is activated.