Main terms and concepts
=======================

.. index:: Zone, Fallback
.. _zone_tree:

Zones
-----

Principles
++++++++++

A :term:`zone` is an **arbitrary container** in which groups of peers can be
kept and organized according to their network address.

Each peer being attributed a zone, a conversation between two peers comes with
two zones: a client and a server zone.

A zone consists merely of a name, a priority and a set of optional filters.
Each conversation is tagged with a client and server zone (using the client and
server IP and MAC addresses) according to this process: every rule is tried
in order of priority, and the first zone that has filters that comply with this
conversation is selected. Thus, it may be important to consider the priority of
a zone in the rare occurrence where the default ordering scheme does not yield
the expected results.

For instance, here is a simple configuration (in order of priority):

========= ===================== ========================= ==== ===== ========= ======
 Priority  Name                 Subnet                    MAC  VLAN  Capture   Device
========= ===================== ========================= ==== ===== ========= ======
    20    /LAN/Servers/Mail     192.168.1.25                   120   localhost
    20    /LAN/Servers/Web      192.168.1.80                   120   localhost
    10    /LAN/Servers/Fallback 192.168.1.1-192.168.1.100      120   localhost
     0    /LAN/Fallback         192.168.1.0/24                 120   localhost
     0    /Remote                                                    capture2
 -1000    /Internet
========= ===================== ========================= ==== ===== ========= ======

Here, we have two servers (for mail and web) that are tested first by
IP (if the VLAN is 120 and the capture is `localhost`), then all other
servers (using an IP range), then the LAN, then the remote site
(everything from ``capture2``), and everything else in ``Internet``.

Notice that some fields are unused (MAC, Device), meaning any value will do.

Whatever changes are made in the zone tree, a global fallback (here,
it's ``/Internet``) will be created by default to store any
conversation that is not matched by any rule (this remains true even
after filters are added for this zone).  Also, this zone is special in
that the IP addresses of these conversations will be degraded over
time to reduce storage requirements.

Your actual configuration will, of course, be much more complex. Indeed, even the
default configuration is larger:

.. gofigure:: img/DefaultZoneTree.png

   Zone tree as displayed in PVX select boxes, showing the default configuration.

Selections
++++++++++

Zone names, although not used in the aforementioned process, play an important
role in the GUI. As you can see on the example, zone names are organised in a
tree of sub-names delimited with slashes (/), not unlike a standard file system.

For instance, ``/LAN/Servers/Web`` is made of three components, meant to be read
as the host ``Web``, amidst the ``Servers`` in the ``LAN``. Here ``/LAN`` is said to
be the parent zone of ``/LAN/Servers`` and ``/LAN/Fallback``, and ``/LAN/Servers`` is
said to be the parent zone of ``/LAN/Servers/Mail`` and ``/LAN/Servers/Web``.

In all select boxes of the GUI, selecting a parent zone will select all
conversations that fall in this zone or in any of its child zone.

For instance, in the above example, selecting ``/LAN/Servers`` will select all
conversations in ``/LAN/Servers/Mail``, ``/LAN/Servers/Web`` and
``/LAN/Servers/Fallback``.

.. _fallback:

Fallbacks
+++++++++

By convention, a :term:`fallback` is a zone with a larger filter
but lower priority than a set of more specific rules. For instance, in the
above example, the ``/LAN/Servers/Fallback`` zone collects all IP addresses in
the 192.168.1.0/24 subnet after some more precise zones tried to match with
subsets of this subnet.

Notice that the priority of the fallback must be lower than the priority of
these smaller zones; otherwise, they would be shadowed by the fallback.

Notice also that if the example configuration was instead:

========= ===================== ========================= ==== ===== ========= ======
 Priority  Name                 Subnet                    MAC  VLan  Capture   Device
========= ===================== ========================= ==== ===== ========= ======
    20    /LAN/Servers/Mail     192.168.1.25                   120   localhost
    20    /LAN/Servers/Web      192.168.1.80                   120   localhost
    10    /LAN/Servers          192.168.1.1-192.168.1.100      120   localhost
========= ===================== ========================= ==== ===== ========= ======

i.e., with ``/LAN/Servers`` instead of ``/LAN/Servers/Fallback``, then selecting the
``/LAN/Servers`` zone in the GUI would actually select ``/LAN/Servers/Mail`` and
``/LAN/Servers/Web`` in addition to the fallback.  In other words, there would be
no way to select in the GUI only the peers that are in the servers IP range but
that are neither the mail nor the web server.  Using the ``Fallback`` naming
convention allow one to select either a specific server (``/LAN/Servers/Mail``,
``/LAN/Servers/Web``), all servers (``/LAN/Servers``) or only the other servers
than mail and web (``/LAN/Servers/Fallback``).

.. index:: Application
.. _application_concept:

Application
-----------

The main objective of :term:`application` is to easily categorize
network usage. Through this concept, which is a key notion of
|Product|, the administrator can group similar network usages
into categories that will make sense for his network context.
Additionally, by configuring Applications, reports on network traffic
are made clearer and are readable by any user regardless of their
understanding of the underlying infrastructure (IP addresses and
:term:`subnet`, or ports used by each application).

An `application` is a set of network services which together
correspond to a business application. For example, an application
named *ERP* could be configured to match network traffic on port
TCP/80 on a server :term:`Zone` containing the specific server
``192.168.20.4/32``.

Application definition
++++++++++++++++++++++
An :term:`application` can be defined using a set of filters a flow must match
to enter the application. These filters can use various elements of a flow, from
its IP addresses to its ports, capture, protocols, and so on.

Notice that depending on what flow is considered, some of the information
may not be available.  For instance, the attribution of an application for a
NetFlow cannot use anything beside bare IP addresses, protocol and ports.
As a consequence, an application defined on a given VLAN, MAC address or
protocol stack will never accept a NetFlow.

All rules are checked one after the other and the first matching rule gives
the flow its application, in a process similar to the one used for zone
attribution.

The priority of these rules can be changed to alter the order in which these checks
are performed.

For more information about the configuration of applications, refer to
the :ref:`configuration` section.

Examples
++++++++
An application which is run on a server which has an IP of ``192.168.1.4`` with `MSSQL`
will be defined as follows:

- Port Range: ``1433``.
- IP protocol: ``TCP``.
- IP Server: ``192.168.1.4/32``.

An HTTP application running on a server along with several other
applications will be defined as follows:

- Web Application Pattern: ``*intranet.securactive.lan*``.

.. index:: IP merging

IP Merging
----------

In order to maximize usage of the available disk space, some
information are removed to allow better aggregation. This is the case
for IP data of foreign host on aggregation levels 3 and 4.

Principle
+++++++++
Upon data consolidation at the third aggregation level, all IP tagged on
the *Internet* zone (or whatever name was given to this default zone)
will be removed in favor of a *merged* identifier.  Consequently, these IPs
will appear as merged in all tables where IP values are displayed if the IP was
belonging to Internet Zone and your observation period is such that the third
or the fourth aggregation level is used. This will happen with long observation
periods (> 8 hours) and also on old data (> 1 week old).

Example
+++++++
Let’s say a user has access to the *Internet* zone using the same
application; for example, a web browser using ``HTTP`` on port ``80``
to access to different web sites for a period of time. Originally,
you will see for that period.

.. gofigure:: img/screenshots/spv/version_unk/merged_ip_1.png

   TCP conversation before degradation

Once data has been aggregated, if you query the same period of back in time, you
will have:

.. gofigure:: img/screenshots/spv/version_unk/merged_ip_2.png

   TCP conversation after degradation

For the *Client IP*, *merged* means that the two conversations to the
different Internet clients have been merged into one single
entry. This is only done when the Zone is *Internet* and matches the
same server / application couple. So, you still know that this server
was accessed from the *Internet* zone with the ``http`` application on
the port ``80``.

.. index:: Conversation

Conversation
------------

Objective & Definition
++++++++++++++++++++++
The objective of a :term:`conversation` is to group a set of data
exchanges between two hosts for a single :term:`application` into one
basic entity to be able to generate a more user-friendly report on
network traffic.

A :term:`flow` is a group of data exchanges between two hosts for one
:term:`application` over the :term:`aggregation period`. A
:term:`conversation` is a group of flows over the observation
period. The observation period is defined by a start time and an
end time provided by the user. A `conversation` is defined by the
following criteria:

- The :term:`device identifier` that received the packets
- The VLAN tag that might be present in the packets
- Source or client IP address (please refer to the chapter
  :ref:`types_of_conversations`).
- Destination or server IP address
- Application (please refer to the chapter :ref:`application_concept`)

.. _types_of_conversations:

Types of Conversations
++++++++++++++++++++++
|Product| offers two ways to analyze network
:term:`conversation`.  From a user's perspective, network
conversations can be seen in two different ways, which correspond to
two different needs: Client/Server or Source/Destination. This
chapter explains how those views differ, which kind of information
they provide, and how they can be used.

.. index:: Source, Destination
.. src_dst:

Source / Destination
^^^^^^^^^^^^^^^^^^^^
In a source/destination :term:`conversation`, all flows between two hosts will
be classified following the concepts of source and destination. This
means that the flows will group data exchanges from a source IP
address to a destination IP address regardless of whether they
function as a client or a server.

For instance, a traffic ``from A to B`` for an application will be broken
down in two conversations: a conversation ``from A to B`` and a
conversation ``from B to A``.

Src/Dst conversations correspond to a view of network flows for
traffic analysis. When reviewing data for traffic analysis purposes,
an administrator wants to view flows without considering the role of
each host, that is to say, **disregarding if the host is a client or a
server**.

.. gofigure:: img/src_dst_explain_schema.png
   :scale: 60%

   Source/Destination treatment

.. container:: example

   For example, traffic ``from A to B`` takes into account all traffic
   coming from a host in A to a host in B, regardless of the role they
   played (client or server). The above graphs take into account the
   communications ``from A to B``, only in one direction.

.. index:: Client, Server
.. _clt_srv:

Client / Server
^^^^^^^^^^^^^^^
In a client/server :term:`conversation`, all flows between two hosts
will be classified following the concepts of client and server. This
means that the flows will group data exchanges to (and from) a client
IP address from (and to) a server IP address.

.. container:: example

   For instance, a traffic ``from A to B`` for an application
   (provided both A and B can be a server for a single application)
   will be broken down in two conversations: a conversation for
   ``client A & server B`` (with traffic ``from A to B`` and ``from B
   to A``) and a conversation from client B to server A (with traffic
   ``from A to B`` and ``from B to A``).

Clt/Srv corresponds to a view of network flows for performance analysis.
When reviewing data for performance analysis purposes, an administrator wants to
view flows **in function of the role of each host, client or server**. Indeed,
the role of a host has an impact on the metrics displayed and the clients and
servers cannot be mixed.

.. gofigure:: img/clt_srv_explain_schema.png
   :scale: 60%

   Client/Server treatment

.. container:: example

  For example, the clt/srv graphs shown above will be
  generated taking into account the communications:

  - from clients in A to servers in B
  - from servers in B to clients in A

  In short, the traffic displayed in client/server conversations will
  take into consideration the data transfer in both directions.

.. note:: The appliances can only distinguish reliably clients from
  servers when the IP protocol in use is TCP, when the connection
  establishment was successfully received by the probe, and when the
  connection state is sufficiently active to not be in timeout. In all
  other cases, the probe assumes that the lower port is used on the
  server's side.

Where are both being used?
^^^^^^^^^^^^^^^^^^^^^^^^^^
Src/Dst will be used for all views of oriented traffic, i.e., where
the reports need to show the amount of data from one zone to another zone.
Hereunder (in the first and second lines of the table) you can see that the data
exchange between the two hosts has been split into two conversations from A to
B and from B to A.

.. gofigure:: img/screenshots/spv/version_unk/src_dst_explain_table.png

   Source/Destination conversations

On the other hand, client/server conversations will be used for all
views reporting performance. Hereunder you can see (in the first line
of the table) that a client/server conversation takes into account the
traffic in both directions.


.. gofigure:: img/screenshots/spv/version_unk/clt_srv_explain_table.png

   Client/Server conversations

In general, you will find that:

- Client/Server is relevant when we are speaking about Performance;
- Source/Destination is relevant when we are speaking about Usage.


Top-Down Analysis
+++++++++++++++++
The Src/Dst matrix can be the starting point for a fine-tuning analysis
of traffic: bandwidth and conversation. In each cell, there are two
buttons:

- one to display the bandwidth graph ``from zone A to zone B``
- one to display the conversations ``from zone A to zone B``.

.. gofigure:: img/matrix_cell_detail.png

   Cell detailed view

The first link will open the conversation table and will display all the traffic
between the two zones, whereas the second one will display a bandwidth chart
from the source zone on the left and the destination zone on the top.


.. index:: Aggregation
.. _data_aggregation:

Data Aggregation
----------------

Rationale
+++++++++
By nature, the operations of statistical analysis require the storage
of large amounts of data. Furthermore, that data must be stored over
extended periods of time so as to expose overall trends.
In order to minimize storage space while still making it possible to
reveal trends over weeks or months, |Product| automatically
summarizes the collected data. The process of creating these summaries
is called aggregation.

Process
+++++++
`Aggregation` occurs automatically. Whenever your probe displays a chart or a
table, this is based on already aggregated data. In order to display this,
|Product| first decides on an aggregation granularity
depending on the length of the time period you requested and how far back into
the past it goes.

Note that because the higher aggregate levels summarize more data at
once, they take up less disk space, and can be kept in storage much
longer without filling out the hard drive. This strikes a good balance
between data granularity and duration of retention: performance data
with the best granularity will occupy much of your disk space, and
long-lasting global trends can be exposed from far back
(albeit with less detail), all from the same interface.

Requests
++++++++

Generaly the chosen aggregate level is transparent for the user. The system
will query the one that minimizes the computing effort for the same result.
But when large interval are requested, data can be degraded to guarantee a fast
enough response latency.

Depending on your particular needs, it may be necessary to request
more data points in graphs or to split a particular flow into smaller
chunks to get a more precise timeline. The ``Mode`` option allows you
to specify how the data granularity will automatically change
depending on how far back you request, how many columns are requested
(using themes) and how much filtering is applied to the page. Choosing
the default ``Fast`` mode will report fewer data points or possible
simpler data. The ``Precise`` mode will report more data points or
more detailed data but may also extend your querying time. (See
:ref:`precision_mode` for more details about the precision mode.)

If the data can be extended with more details, the option will be
presented as a ``Refine`` button, that let you ask the page to query
again with a higher level of resources allocation. This will be
usually slower (up to several minutes for some complex pages). This
refinement is applied for the current page. When navigating to other
pages, or when submitting a new search, this will default back to the
current ``Mode`` (``Fast`` or ``Precise``). The same behavior apply
when a page is added to a report. Note that increasing resources
allocation will work with no guarantee and may trigger an error if
there is not enough physical memory on the node that executes the
query.