tcp Provider

Skip to end of metadata
Go to start of metadata

tcp Provider

The tcp provider provides probes for tracing the TCP protocol.

This provider integrated into Solaris Nevada build 142.

Top

Probes

The tcp probes are described in the table below.

tcp Probes
Probe Description
state-change Probe that fires a TCP session changes its TCP state. Previous state is noted in the tcplsinfo_t * probe argument. The tcpinfo_t * and ipinfo_t * arguments are NULL.
send Probe that fires whenever TCP sends a segment (either control or data).
receive Probe that fires whenever TCP receives a segment (either control or data).
connect-request Probe that fires when a TCP active open is initiated by sending an initial SYN segment. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the initial SYN segment sent.
connect-established This probe fires when either of the following occurs: either a TCP active OPEN succeeds - the initial SYN has been sent and a valid SYN,ACK segment has been received in response. TCP enters the ESTABLISHED state, and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the SYN,ACK segment received; or a simultaneous active OPEN succeeds and a final ACK is received from the peer TCP. TCP has entered the ESTABLISHED state and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers of the final ACK received. The common thread in these cases is that an active-OPEN connection is established at this point, in contrast with tcp:::accept-established which fires on passive connection establishment. In both cases above, the TCP segment that is presented via the tcpinfo_t * is the segment that triggers the transition to ESTABLISHED - the received SYN,ACK in the first case and the final ACK segment in the second.
connect-refused A TCP active OPEN connection attempt was refused by the peer - a RST segment was received in acknowledgment of the initial SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST,ACK segment received.
accept-established A passive open has succeeded - an initial active OPEN initiation SYN has been received, TCP responded with a SYN,ACK and a final ACK has been received. TCP has entered the ESTABLISHED state. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the final ACK segment received.
accept-refused An incoming SYN has arrived for a destination port with no listening connection, so the connection initiation request is rejected by sending a RST segment ACKing the SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST segment sent.

The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by tcp. On Solaris, loopback TCP connections can bypass the TCP layer when transferring data packets - this is a performance feature called tcp fusion; these packets are also traced by the tcp provider.

Top

Arguments

The argument types for the tcp probes are listed in the table below. The arguments are described in the following section. All probes expect state-change have 5 arguments - state-change has 6.

tcp Probe Arguments
Probe args[0] args[1] args[2] args[3] args[4] args[5]
state-change null csinfo_t * null tcpsinfo_t * null tcplsinfo_t *
send pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  
receive pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  
connect-request pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  
connect-established pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  
connect-refused pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  
accept-established pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  
accept-refused pktinfo_t * csinfo_t * ipinfo_t * tcpsinfo_t * tcpinfo_t *  

Top

pktinfo_t structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.
The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;

Top

csinfo_t structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
	uint64_t cs_cid;
	pid_t cs_pid;
	zoneid_t cs_zoneid;
 } csinfo_t;
csinfo_t Members
cs_addr Address of translated ip_xmit_attr_t *.
cs_cid Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
cs_pid Process ID associated with the connection.
cs_zoneid Zone ID associated with the connection.

Top

ipinfo_t structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in TCP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.

ipinfo_t Members
ip_ver IP version number. Currently either 4 or 6.
ip_plength Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
ip_daddr Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.

Top

tcpsinfo_t structure

The tcpsinfo_t structure contains tcp state info.

typedef struct tcpsinfo {
        uintptr tcps_addr;
        int tcps_local;                 /* is delivered locally, boolean */
        int tcps_active;                /* active open (from here), boolean */
        uint16_t tcps_lport;            /* local port */
        uint16_t tcps_rport;            /* remote port */
	string tcps_laddr;		/* local address, as a string */
	string tcps_raddr;		/* remote address, as a string */
	int32_t tcps_state;             /* TCP state. Use inline tcp_state_string[] to convert to string */
        uint32_t tcps_iss;              /* initial sequence # sent */
        uint32_t tcps_suna;             /* sequence # sent but unacked */
        uint32_t tcps_snxt;             /* next sequence # to send */
        uint32_t tcps_rack;             /* sequence # we have acked */
        uint32_t tcps_rnxt;             /* next sequence # expected */
        uint32_t tcps_swnd;             /* send window size */
        uint32_t tcps_snd_ws;           /* send window scaling */
        uint32_t tcps_rwnd;             /* receive window size */
        uint32_t tcps_rcv_ws;           /* receive window scaling */
	uint32_t tcps_cwnd;		/* congestion window */
	uint32_t tcps_cwnd_ssthresh;	/* threshold for congestion avoidance */
	uint32_t tcps_sack_fack;	/* SACK sequence # we have acked */
	uint32_t tcps_sack_snxt;	/* next SACK seq # for retransmission */
        uint32_t tcps_rto;              /* round-trip timeout, msec */
	uint32_t tcps_mss;		/* max segment size */
        int tcps_retransmit;            /* retransmit send event, boolean */
} tcpsinfo_t;

It may seem redundant to supply the local and remote ports and addresses here as well as in the tcpinfo_t below, but the tcp:::state-change probes do not have associated tcpinfo_t data, so in order to map the state change to a specific port, we need this data here.

tcpsinfo_t Members
tcps_addr Address of translated tcp_t *.
tcps_local is local, boolean. 0: is not delivered locally (uses a physical network interface), 1: is delivered locally (including loopback interfaces, eg lo0),.
tcps_active is an active open, boolean. 0: TCP connection was created from a remote host, 1: TCP connection was created from this host.
tcps_lport local port associated with the TCP connection.
tcps_rport remote port associated with the TCP connection.
tcps_laddr local address associated with the TCP connection, as a string.
tcps_raddr remote address associated with the TCP connection, as a string.
tcps_state TCP state. Inline defintions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string.
tcps_iss Initial sequence number sent.
tcps_suna Lowest sequence number for which we have sent data but not received acknowledgement.
tcps_snxt Next sequence number to send. tcps_snxt - tcps_suna gives the number of bytes pending acknowledgement for the TCP connection
tcps_rack Highest sequence number for which we have received and sent acknowledgement.
tcps_rnxt Next sequence number expected on receive side. tcps_rnxt - tcps_rack gives the number of bytes we have received but not yet acknowledged for the TCP connection.
tcps_swnd TCP send window size.
tcps_snd_ws TCP send window scale. tcps_swnd << tcp_snd_ws gives the scaled window size if window scaling options are in use.
tcps_rwnd TCP receive window size.
tcps_rcv_ws TCP receive window scale. tcps_rwnd << tcp_rcv_ws gives the scaled window size if window scaling options are in use.
tcps_cwnd TCP congestion window size.
tcps_cwnd_ssthresh TCP congestion window threshold. When the congestion window is greater than ssthresh, congestion avoidance begins.
tcps_sack_fack Highest SACK-acked sequence number.
tcps_sack_snxt Next sequence num to be retransmitted using SACK.
tcps_rto Round-trip timeout. If we do not receive acknowledgement of data sent tcps_rto msec ago, retransmit is required.
tcps_mss Maximum segment size.
tcps_retransmit send is a retransmit, boolean. 1 for tcp:::send events that are retransmissions, 0 for tcp events that are not send events, and for send events that are not retransmissions.

Top

tcplsinfo_t structure

The tcplsinfo_t structure contains the previous tcp state during a state change.

typedef struct tcplsinfo {
        int32_t tcps_state;              /* TCP state */
} tcplsinfo_t;
tcplsinfo_t Members
tcps_state previous TCP state. Inline defintions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string.

Top

tcpinfo_t structure

The tcpinfo_t structure is a DTrace translated version of the TCP header.

typedef struct tcpinfo {
        uint16_t tcp_sport;             /* source port */
        uint16_t tcp_dport;             /* destination port */
        uint32_t tcp_seq;               /* sequence number */
        uint32_t tcp_ack;               /* acknowledgment number */
        uint8_t tcp_offset;             /* data offset, in bytes */
        uint8_t tcp_flags;              /* flags */
        uint16_t tcp_window;            /* window size */
        uint16_t tcp_checksum;          /* checksum */
        uint16_t tcp_urgent;            /* urgent data pointer */
        tcph_t *tcp_hdr;                /* raw TCP header */
} tcpinfo_t;
tcpinfo_t Members
tcp_sport TCP source port.
tcp_dport TCP destination port.
tcp_seq TCP sequence number.
tcp_ack TCP acknowledgment number.
tcp_offset Payload data offset, in bytes (not 32-bit words).
tcp_flags TCP flags. See the tcp_flags table below for available macros.
tcp_window TCP window size, bytes.
tcp_checksum Checksum of TCP header and payload.
tcp_urgent TCP urgent data pointer, bytes.
tcp_hdr Pointer to raw TCP header at time of tracing.
tcp_flags Values
TH_FIN No more data from sender (finish).
TH_SYN Synchronize sequence numbers (connect).
TH_RST Reset the connection.
TH_PUSH TCP push function.
TH_ACK Acknowledgment field is set.
TH_URG Urgent pointer field is set.
TH_ECE Explicit congestion notification echo (see RFC-3168).
TH_CWR Congestion window reduction.

See RFC-793 for a detailed explanation of the standard TCP header fields and flags.

Top

Examples

Some simple examples of tcp provider usage follow.

Connections by host address

This DTrace one-liner counts inbound TCP connections by source IP address:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  127.0.0.1                                                         1
  192.168.2.88                                                      1
  fe80::214:4fff:fe8d:59aa                                          1
  192.168.1.109                                                     3

The output above shows there were 3 TCP connections from 192.168.1.109, a single TCP connection from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Connections by TCP port

This DTrace one-liner counts inbound TCP connections by local TCP port:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_lport] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

 40648                1
    22                3

The output above shows there were 3 TCP connections for port 22 (ssh), a single TCP connection for port 40648 (an RPC port).

Who is connecting to what

Combining the previous two examples produces a useful one liner, to quickly identify who is connecting to what:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' 
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  192.168.2.88                                       40648                1
  fe80::214:4fff:fe8d:59aa                              22                1
  192.168.1.109                                         22                3

The output above shows there were 3 TCP connections from 192.168.1.109 to port 22 (ssh), etc.

Who isn't connecting to what

It may be useful when troubleshooting connection issues to see who is failing to connect to their requested ports. This is equivalent to seeing where incoming SYNs arrive when no listener is present, as per RFC793:

# dtrace -n 'tcp:::accept-refused { @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' 
dtrace: description 'tcp:::receive ' matched 1 probes
^C

  192.168.1.109                                         23                2

Here we traced two failed attempts by host 192.168.1.109 to connect to port 23 (telnet).

Packets by host address

This DTrace one-liner counts TCP received packets by host address:

# dtrace -n 'tcp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.168.2.30                                                     43
  192.168.1.109                                                    44
  192.168.2.88                                                   3722

The output above shows that 7 TCP packets were recieved from 127.0.0.1, 14 TCP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Packets by local port

This DTrace one-liner counts TCP received packets by the local TCP port:

# dtrace -n 'tcp:::receive { @[args[4]->tcp_dport] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

 42303                3
 42634                3
  2049               27
 40648               36
    22              162

The output above shows that 162 packets were received for port 22 (ssh), 36 packets were received for port 40648 (an RPC port), 27 packets for 2049 (NFS), and a few packets to high numbered client ports.

Sent size distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for TCP sends:

# dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'tcp:::send ' matched 3 probes
^C

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    14       
             128 |@@@                                      1        
             256 |                                         0        

  192.168.2.30                                      
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@                     7        
              64 |@@@@@@@@@                                3        
             128 |@@@                                      1        
             256 |@@@@@@                                   2        
             512 |@@@                                      1        
            1024 |                                         0        
tcpstate.d

This DTrace script demonstrates the capability to trace TCP state changes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

int last[int];

dtrace:::BEGIN
{
        printf(" %3s %12s  %-20s    %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW");
        last = timestamp;
}

tcp:::state-change
/ last[args[1]->cs_cid] /
{
        this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000;
        printf(" %3d %12d  %-20s -> %-20s\n", cpu, this->elapsed,
            tcp_state_string[args[5]->tcps_state], tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;
}

tcp:::state-change
/ last[args[1]->cs_cid] == 0 /
{
        printf(" %3d %12s  %-20s -> %-20s\n", cpu, "-",
            tcp_state_string[args[5]->tcps_state],
            tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;
}

This script was run on a system for a couple of minutes:

# ./tcpstate.d 

 CPU    DELTA(us)  OLD                     NEW                 
   0            -  state-listen         -> state-syn-received  
   0          613  state-syn-received   -> state-established   
   0            -  state-idle           -> state-bound         
   0           63  state-bound          -> state-syn-sent      
   0          685  state-syn-sent       -> state-bound         
   0           22  state-bound          -> state-idle          
   0          114  state-idle           -> state-closed  

In the above example output, an inbound connection is traced, It takes 613 us to go from syn-received to
established. An outbound connection attempt is also made to a closed port. It takes 63us to go from bound
to syn-sent, 685 us to go from syn-sent to bound etc.

The fields printed are:

field description
CPU CPU id for the event
DELTA(us) time since previous event for that connection, microseconds
OLD old TCP state
NEW new TCP state
tcpio.d

The following DTrace script traces TCP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %15s:%-5s      %15s:%-5s %6s  %s\n", "CPU",
            "LADDR", "LPORT", "RADDR", "RPORT", "BYTES", "FLAGS");
}

tcp:::send
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d -> %16s:%-5d %6d  (", cpu,
            args[2]->ip_saddr, args[4]->tcp_sport,
            args[2]->ip_daddr, args[4]->tcp_dport, this->length);
}

tcp:::receive
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d <- %16s:%-5d %6d  (", cpu,
            args[2]->ip_daddr, args[4]->tcp_dport,
            args[2]->ip_saddr, args[4]->tcp_sport, this->length);
}

tcp:::send,
tcp:::receive
{
        printf("%s", args[4]->tcp_flags & TH_FIN ? "FIN|" : "");
        printf("%s", args[4]->tcp_flags & TH_SYN ? "SYN|" : "");
        printf("%s", args[4]->tcp_flags & TH_RST ? "RST|" : "");
        printf("%s", args[4]->tcp_flags & TH_PUSH ? "PUSH|" : "");
        printf("%s", args[4]->tcp_flags & TH_ACK ? "ACK|" : "");
        printf("%s", args[4]->tcp_flags & TH_URG ? "URG|" : "");
        printf("%s", args[4]->tcp_flags & TH_ECE ? "ECE|" : "");
        printf("%s", args[4]->tcp_flags & TH_CWR ? "CWR|" : "");
        printf("%s", args[4]->tcp_flags == 0 ? "null " : "");
        printf("\b)\n");
}

This example output has captured a TCP handshake:

# ./tcpio.d
 CPU           LADDR:LPORT                RADDR:RPORT  BYTES  FLAGS
   1     192.168.2.80:22    ->    192.168.1.109:60337    464  (PUSH|ACK)
   1     192.168.2.80:22    ->    192.168.1.109:60337     48  (PUSH|ACK)
   2     192.168.2.80:22    ->    192.168.1.109:60337     20  (PUSH|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (SYN)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (SYN|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337     20  (PUSH|ACK)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337    376  (PUSH|ACK)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337     24  (PUSH|ACK)
   2     192.168.2.80:22    ->    192.168.1.109:60337    736  (PUSH|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)

The fields printed are:

field description
CPU CPU id that event occurred on
LADDR local IP address
LPORT local TCP port
RADDR remote IP address
RPORT remote TCP port
BYTES TCP payload bytes
FLAGS TCP flags

Note: The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering, and events such as the TCP handshake can be printed out of order. Keep an eye on changes in the CPU column, or add a timestamp column to this script and post sort.

Top

tcp Stability

The tcp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 39, Stability.

Element Name stability Data stability Dependency class
Provider Evolving Evolving ISA
Module Private Private Unknown
Function Private Private Unknown
Name Evolving Evolving ISA
Arguments Evolving Evolving ISA
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Oracle community and they might not be employed or in any way formally affiliated with Oracle. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Oracle nor any other party necessarily agrees with them.