User Guide 19.11 documentation
Here, you can find details on how some of the less obvious metrics are computed, and how they are affected by the sniffer configuration. You may safely skip this section unless you need a deeper understanding of how the sniffer works.
Many generic metrics are computed on TCP streams. To be able to interpret these correctly, it may be useful to be aware of a few things.
To find out which peer is the client, the sniffer tries several options:
Applicative keep-alives are small messages that are sent from either
peer to the other when no traffic have used this socket for some
time. They must not be taken into consideration when computing SRT,
DTT, and so on. The ica_keepalive_max_size
parameter is
dedicated to the detection of ICA (citrix) keep-alive messages.
The standard TCP keep-alive packet is normally detected using its size and
sequence number, according to the RFC. In case the previous sequence number is
unknown, though, the tcp_keepalive_timer
may be used as an alternative;
after this inactivity period, any TCP packet that looks like a keep-alive will
be ignored.
The objective of the TCP DTT metric is to measure the duration of a
single write (or of a sequence of closely related writes). For
protocols that do not follow the pattern request/response, it is very
important to detect when two data transfers are separate in time
(suggesting they are unrelated). The tcp_dtt_timeout
parameter
helps with that. If two packets are separated by more than this
duration, then they do not belong to the same DTT. By default, it is
set to 1s so that lost packets nor a full reception buffer would not
interrupt the DTT, but an actual pause from the sending application
will be detected as such.
According to the sniffer, any TCP packet with a payload (or a SYN, a FIN or RST flag) with a sequence number that was already covered is a retransmission (here, covered means that this sequence number was in a packet that has already been analyzed).
Fast retransmission is thus counted as retransmission.
The HTTP metric offers a very synthetic notion of a page, which is a set of HTTP documents fetched by the same user and combined by his browser into a single object, a “page”. Reconstructing pages from the actual packets involves an unusually high number of operations and thus, deserves quite a detailed description.
Although not required to use SkyLIGHT PVX, the following definitions are required to understand the following description.
The sniffer receives fragments of HTTP messages. It starts to reconstruct a new HTTP message as soon as it receives the start of a header. Some fragments of the message may be missing, though, in which case it may be incapable of:
HTTP offers no better way to associate response with corresponding query than to rely on ordering: first response of the socket with first query, and so on.
So, for every socket, the sniffer stores all queries not already paired with a response. Notice that on a socket, a proxy may mix queries of different users, and that two interconnected proxies may even mix queries to distinct servers.
Notice also how damaging a single dropped packet may be if it hides a query or a full response to the sniffer, since all pairing following this gap will be questionable.
Also, servers may not respond, leading to a timeout of the pending queries (which will be inserted in database without any response).
Since all transactions of a page are necessarily emitted by the same user, then all transactions are associated to this user, in chronological order (time and the “Referrer” field are our two best tools from now on). Notice that since a page routinely involves transactions of several sockets, and since different sockets are reassembled by different TCP parsers which thus deliver segments at different paces, then it’s possible for the HTTP metric to reconstruct a transaction A before a transaction B even if B happened and was received by the probe before A (for instance, if A’s socket reassembly was delayed by a missing frame). In such an occurrence, the referrer relation between A and B may not be honored.
We do not wait for the pairing with a response to attach a query to the page it belongs to. When we attach a new query to a client, we look for the referrer of this transaction within the ones that are already attached to this client (in case the referrer field is absent, we use the same kind of referrer cache as found in KSniffer). If the referred page is itself attached to another page, two behaviors are possible:
Note that the first behavior is possible only when the content-type of the referred page does not prevent it (i.e., is not typically reserved to non-root transactions, such as image, CSS, and other typically embedded content).
You can choose between these two behaviors with the http-detach-referred
parameter.
The second behavior (keep referred transactions attached) is better when iframes are involved but it is believed that the first (and default) one generally leads to better results. Other than iframes, the only observed case where a referenced transaction was obviously not a page root was an AJAX POSTing to the same URL as referrer continuously, thus detaching its predecessor.
If/when we eventually receive the response of a transaction (and, hopefully,
its content-type), we revise our judgment on the attachment. If the
transaction seems to have not been triggered by AJAX, and its content-type is
indicative of a standalone document (PDF, PS or HTML with status 200), then we
detach it (turning it into a root). Otherwise, if the content-type is not
indicative of a typically embedded content (image, CSS, etc.) then we check the
delay between the page root and this transaction and if found greater than a
parameter (http-page-construction-max-delay
), then it is detached as well.
To speed up information retrieval, some global per page values are precomputed
in the sniffer: every transaction attached to a page contributes into the page
as soon as it was received less than http-page-contribution-max-delay
seconds after the root. All of these transactions will contribute to the
page load time.
To be able to dump a root transaction with all of these counters, we must, of course, delay the dump of roots as late as possible, thus raising memory requirements.
To limit memory and CPU usage, the sniffer implements these protections:
http-max-tracked
(unlimited by default). New transactions
above this will be ignored (with catastrophic consequences to transaction
pairing).http-max-tracked-for-reconstruction
(unlimited by default).http-max-content-size
(50k by
default).http-referrer-mem
.Page load time is the most interesting metric, yet we have seen that many conditions must be met to accurately reconstruct pages.
The SMB module produces one flow for each couple of a query and its answer. To link queries and responses together, the SMB protocol uses the following IDs:
The sniffer conjointly uses these IDs with the Tree ID, the command type and the underlying connection (a.k.a. IP, ports, VLAN and such) to properly link requests and responses together for each conversation.
However, it may induce a high number of flows for some simple and common operations like reading or writing to files: the operations being sent as multiple read or writes commands, using buffers with a maximum size of 64KiB or 1MiB (for the more recent versions of the protocol).
For example, writing a file of 1GiB over 10s (at a rate of roughly 100MiB/s) would generate 1000 SMB2 WRITE commands with a buffer of 1MiB resulting in 1000 flows stored in the database. The interval between two of these write commands would roughly be of 10ms. The number of flows would be an order of magnitude higher if the protocol used 64KiB buffers.
This would give a fine-grained precision but it isn’t of much use most of the time and the resulting number of flow may quickly grow the database usage or toward the license limit.
It seems much more interesting to have these statistics aggregated from an higher level. Read and writes commands could be aggregated together if they act on the same underlying file (based on its File ID).
As such, from PVX 5.0 onwards, the sniffer will aggregate successions of the following commands together for a small period of time (which is configurable):
Some of these commands use the File ID as a discriminating factor, others requires to compare paths or patterns.
You can expect less SMB flows after upgrading from PVX 4.2, and more importantly, this should decrease SMB flow bursts. Unfortunately, some frequent commands may not be aggregated together, like closes or opens since they only appear once during file manipulations.
Starting from PVX 5.2, when the new “VXLAN stripping” option in the VXLAN parser is enabled (which is the case by default), the transport layer is simply discarded. The discarded layers are not accounted in the traffic. Enabling this option make sense when considering the VXLAN transport as a mirror mechanism. You can switch back to the old behavior by disabling this option.