tcp Provider
The tcp provider provides probes for tracing the TCP protocol.
This provider integrated into Solaris Nevada build 142.
Probes
The tcp probes are described in the table below.
tcp Probes
Probe | Description |
---|---|
state-change | Probe that fires a TCP session changes its TCP state. Previous state is noted in the tcplsinfo_t * probe argument. The tcpinfo_t * and ipinfo_t * arguments are NULL. |
send | Probe that fires whenever TCP sends a segment (either control or data). |
receive | Probe that fires whenever TCP receives a segment (either control or data). |
connect-request | Probe that fires when a TCP active open is initiated by sending an initial SYN segment. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the initial SYN segment sent. |
connect-established | This probe fires when either of the following occurs: either a TCP active OPEN succeeds - the initial SYN has been sent and a valid SYN,ACK segment has been received in response. TCP enters the ESTABLISHED state, and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the SYN,ACK segment received; or a simultaneous active OPEN succeeds and a final ACK is received from the peer TCP. TCP has entered the ESTABLISHED state and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers of the final ACK received. The common thread in these cases is that an active-OPEN connection is established at this point, in contrast with tcp:::accept-established which fires on passive connection establishment. In both cases above, the TCP segment that is presented via the tcpinfo_t * is the segment that triggers the transition to ESTABLISHED - the received SYN,ACK in the first case and the final ACK segment in the second. |
connect-refused | A TCP active OPEN connection attempt was refused by the peer - a RST segment was received in acknowledgment of the initial SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST,ACK segment received. |
accept-established | A passive open has succeeded - an initial active OPEN initiation SYN has been received, TCP responded with a SYN,ACK and a final ACK has been received. TCP has entered the ESTABLISHED state. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the final ACK segment received. |
accept-refused | An incoming SYN has arrived for a destination port with no listening connection, so the connection initiation request is rejected by sending a RST segment ACKing the SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST segment sent. |
The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by tcp. On Solaris, loopback TCP connections can bypass the TCP layer when transferring data packets - this is a performance feature called tcp fusion; these packets are also traced by the tcp provider.
Arguments
The argument types for the tcp probes are listed in the table below. The arguments are described in the following section. All probes expect state-change have 5 arguments - state-change has 6.
tcp Probe Arguments
Probe | args[0] | args[1] | args[2] | args[3] | args[4] | args[5] |
---|---|---|---|---|---|---|
state-change | null | csinfo_t * | null | tcpsinfo_t * | null | tcplsinfo_t * |
send | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * | |
receive | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * | |
connect-request | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * | |
connect-established | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * | |
connect-refused | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * | |
accept-established | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * | |
accept-refused | pktinfo_t * | csinfo_t * | ipinfo_t * | tcpsinfo_t * | tcpinfo_t * |
pktinfo_t structure
The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.
The pkt_addr member is currently always NULL.
typedef struct pktinfo { uintptr_t pkt_addr; /* currently always NULL */ } pktinfo_t;
csinfo_t structure
The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.
typedef struct csinfo { uintptr_t cs_addr; uint64_t cs_cid; pid_t cs_pid; zoneid_t cs_zoneid; } csinfo_t;
csinfo_t Members
cs_addr | Address of translated ip_xmit_attr_t *. |
cs_cid | Connection id. A unique per-connection identifier which identifies the connection during its lifetime. |
cs_pid | Process ID associated with the connection. |
cs_zoneid | Zone ID associated with the connection. |
ipinfo_t structure
The ipinfo_t structure contains common IP info for both IPv4 and IPv6.
typedef struct ipinfo { uint8_t ip_ver; /* IP version (4, 6) */ uint16_t ip_plength; /* payload length */ string ip_saddr; /* source address */ string ip_daddr; /* destination address */ } ipinfo_t;
These values are read at the time the probe fired in TCP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.
ipinfo_t Members
ip_ver | IP version number. Currently either 4 or 6. |
ip_plength | Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header. |
ip_saddr | Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits. |
ip_daddr | Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits. |
tcpsinfo_t structure
The tcpsinfo_t structure contains tcp state info.
typedef struct tcpsinfo { uintptr tcps_addr; int tcps_local; /* is delivered locally, boolean */ int tcps_active; /* active open (from here), boolean */ uint16_t tcps_lport; /* local port */ uint16_t tcps_rport; /* remote port */ string tcps_laddr; /* local address, as a string */ string tcps_raddr; /* remote address, as a string */ int32_t tcps_state; /* TCP state. Use inline tcp_state_string[] to convert to string */ uint32_t tcps_iss; /* initial sequence # sent */ uint32_t tcps_suna; /* sequence # sent but unacked */ uint32_t tcps_snxt; /* next sequence # to send */ uint32_t tcps_rack; /* sequence # we have acked */ uint32_t tcps_rnxt; /* next sequence # expected */ uint32_t tcps_swnd; /* send window size */ uint32_t tcps_snd_ws; /* send window scaling */ uint32_t tcps_rwnd; /* receive window size */ uint32_t tcps_rcv_ws; /* receive window scaling */ uint32_t tcps_cwnd; /* congestion window */ uint32_t tcps_cwnd_ssthresh; /* threshold for congestion avoidance */ uint32_t tcps_sack_fack; /* SACK sequence # we have acked */ uint32_t tcps_sack_snxt; /* next SACK seq # for retransmission */ uint32_t tcps_rto; /* round-trip timeout, msec */ uint32_t tcps_mss; /* max segment size */ int tcps_retransmit; /* retransmit send event, boolean */ } tcpsinfo_t;
It may seem redundant to supply the local and remote ports and addresses here as well as in the tcpinfo_t below, but the tcp:::state-change probes do not have associated tcpinfo_t data, so in order to map the state change to a specific port, we need this data here.
tcpsinfo_t Members
tcps_addr | Address of translated tcp_t *. |
tcps_local | is local, boolean. 0: is not delivered locally (uses a physical network interface), 1: is delivered locally (including loopback interfaces, eg lo0),. |
tcps_active | is an active open, boolean. 0: TCP connection was created from a remote host, 1: TCP connection was created from this host. |
tcps_lport | local port associated with the TCP connection. |
tcps_rport | remote port associated with the TCP connection. |
tcps_laddr | local address associated with the TCP connection, as a string. |
tcps_raddr | remote address associated with the TCP connection, as a string. |
tcps_state | TCP state. Inline defintions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string. |
tcps_iss | Initial sequence number sent. |
tcps_suna | Lowest sequence number for which we have sent data but not received acknowledgement. |
tcps_snxt | Next sequence number to send. tcps_snxt - tcps_suna gives the number of bytes pending acknowledgement for the TCP connection |
tcps_rack | Highest sequence number for which we have received and sent acknowledgement. |
tcps_rnxt | Next sequence number expected on receive side. tcps_rnxt - tcps_rack gives the number of bytes we have received but not yet acknowledged for the TCP connection. |
tcps_swnd | TCP send window size. |
tcps_snd_ws | TCP send window scale. tcps_swnd << tcp_snd_ws gives the scaled window size if window scaling options are in use. |
tcps_rwnd | TCP receive window size. |
tcps_rcv_ws | TCP receive window scale. tcps_rwnd << tcp_rcv_ws gives the scaled window size if window scaling options are in use. |
tcps_cwnd | TCP congestion window size. |
tcps_cwnd_ssthresh | TCP congestion window threshold. When the congestion window is greater than ssthresh, congestion avoidance begins. |
tcps_sack_fack | Highest SACK-acked sequence number. |
tcps_sack_snxt | Next sequence num to be retransmitted using SACK. |
tcps_rto | Round-trip timeout. If we do not receive acknowledgement of data sent tcps_rto msec ago, retransmit is required. |
tcps_mss | Maximum segment size. |
tcps_retransmit | send is a retransmit, boolean. 1 for tcp:::send events that are retransmissions, 0 for tcp events that are not send events, and for send events that are not retransmissions. |
tcplsinfo_t structure
The tcplsinfo_t structure contains the previous tcp state during a state change.
typedef struct tcplsinfo { int32_t tcps_state; /* TCP state */ } tcplsinfo_t;
tcplsinfo_t Members
tcps_state | previous TCP state. Inline defintions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string. |
tcpinfo_t structure
The tcpinfo_t structure is a DTrace translated version of the TCP header.
typedef struct tcpinfo { uint16_t tcp_sport; /* source port */ uint16_t tcp_dport; /* destination port */ uint32_t tcp_seq; /* sequence number */ uint32_t tcp_ack; /* acknowledgment number */ uint8_t tcp_offset; /* data offset, in bytes */ uint8_t tcp_flags; /* flags */ uint16_t tcp_window; /* window size */ uint16_t tcp_checksum; /* checksum */ uint16_t tcp_urgent; /* urgent data pointer */ tcph_t *tcp_hdr; /* raw TCP header */ } tcpinfo_t;
tcpinfo_t Members
tcp_sport | TCP source port. |
tcp_dport | TCP destination port. |
tcp_seq | TCP sequence number. |
tcp_ack | TCP acknowledgment number. |
tcp_offset | Payload data offset, in bytes (not 32-bit words). |
tcp_flags | TCP flags. See the tcp_flags table below for available macros. |
tcp_window | TCP window size, bytes. |
tcp_checksum | Checksum of TCP header and payload. |
tcp_urgent | TCP urgent data pointer, bytes. |
tcp_hdr | Pointer to raw TCP header at time of tracing. |
tcp_flags Values
TH_FIN | No more data from sender (finish). |
TH_SYN | Synchronize sequence numbers (connect). |
TH_RST | Reset the connection. |
TH_PUSH | TCP push function. |
TH_ACK | Acknowledgment field is set. |
TH_URG | Urgent pointer field is set. |
TH_ECE | Explicit congestion notification echo (see RFC-3168). |
TH_CWR | Congestion window reduction. |
See RFC-793 for a detailed explanation of the standard TCP header fields and flags.
Examples
Some simple examples of tcp provider usage follow.
Connections by host address
This DTrace one-liner counts inbound TCP connections by source IP address:
# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr] = count(); }' dtrace: description 'tcp:::state-change' matched 1 probes ^C 127.0.0.1 1 192.168.2.88 1 fe80::214:4fff:fe8d:59aa 1 192.168.1.109 3
The output above shows there were 3 TCP connections from 192.168.1.109, a single TCP connection from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.
Connections by TCP port
This DTrace one-liner counts inbound TCP connections by local TCP port:
# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_lport] = count(); }' dtrace: description 'tcp:::state-change' matched 1 probes ^C 40648 1 22 3
The output above shows there were 3 TCP connections for port 22 (ssh), a single TCP connection for port 40648 (an RPC port).
Who is connecting to what
Combining the previous two examples produces a useful one liner, to quickly identify who is connecting to what:
# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' dtrace: description 'tcp:::state-change' matched 1 probes ^C 192.168.2.88 40648 1 fe80::214:4fff:fe8d:59aa 22 1 192.168.1.109 22 3
The output above shows there were 3 TCP connections from 192.168.1.109 to port 22 (ssh), etc.
Who isn't connecting to what
It may be useful when troubleshooting connection issues to see who is failing to connect to their requested ports. This is equivalent to seeing where incoming SYNs arrive when no listener is present, as per RFC793:
# dtrace -n 'tcp:::accept-refused { @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' dtrace: description 'tcp:::receive ' matched 1 probes ^C 192.168.1.109 23 2
Here we traced two failed attempts by host 192.168.1.109 to connect to port 23 (telnet).
Packets by host address
This DTrace one-liner counts TCP received packets by host address:
# dtrace -n 'tcp:::receive { @[args[2]->ip_saddr] = count(); }' dtrace: description 'tcp:::receive ' matched 5 probes ^C 127.0.0.1 7 fe80::214:4fff:fe8d:59aa 14 192.168.2.30 43 192.168.1.109 44 192.168.2.88 3722
The output above shows that 7 TCP packets were recieved from 127.0.0.1, 14 TCP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.
Packets by local port
This DTrace one-liner counts TCP received packets by the local TCP port:
# dtrace -n 'tcp:::receive { @[args[4]->tcp_dport] = count(); }' dtrace: description 'tcp:::receive ' matched 5 probes ^C 42303 3 42634 3 2049 27 40648 36 22 162
The output above shows that 162 packets were received for port 22 (ssh), 36 packets were received for port 40648 (an RPC port), 27 packets for 2049 (NFS), and a few packets to high numbered client ports.
Sent size distribution
This DTrace one-liner prints distribution plots of IP payload size by destination, for TCP sends:
# dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }' dtrace: description 'tcp:::send ' matched 3 probes ^C 192.168.1.109 value ------------- Distribution ------------- count 32 | 0 64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14 128 |@@@ 1 256 | 0 192.168.2.30 value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@@@@@@@@@@@ 7 64 |@@@@@@@@@ 3 128 |@@@ 1 256 |@@@@@@ 2 512 |@@@ 1 1024 | 0
tcpstate.d
This DTrace script demonstrates the capability to trace TCP state changes:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10 int last[int]; dtrace:::BEGIN { printf(" %3s %12s %-20s %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW"); last = timestamp; } tcp:::state-change / last[args[1]->cs_cid] / { this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000; printf(" %3d %12d %-20s -> %-20s\n", cpu, this->elapsed, tcp_state_string[args[5]->tcps_state], tcp_state_string[args[3]->tcps_state]); last[args[1]->cs_cid] = timestamp; } tcp:::state-change / last[args[1]->cs_cid] == 0 / { printf(" %3d %12s %-20s -> %-20s\n", cpu, "-", tcp_state_string[args[5]->tcps_state], tcp_state_string[args[3]->tcps_state]); last[args[1]->cs_cid] = timestamp; }
This script was run on a system for a couple of minutes:
# ./tcpstate.d CPU DELTA(us) OLD NEW 0 - state-listen -> state-syn-received 0 613 state-syn-received -> state-established 0 - state-idle -> state-bound 0 63 state-bound -> state-syn-sent 0 685 state-syn-sent -> state-bound 0 22 state-bound -> state-idle 0 114 state-idle -> state-closed
In the above example output, an inbound connection is traced, It takes 613 us to go from syn-received to
established. An outbound connection attempt is also made to a closed port. It takes 63us to go from bound
to syn-sent, 685 us to go from syn-sent to bound etc.
The fields printed are:
field | description |
---|---|
CPU | CPU id for the event |
DELTA(us) | time since previous event for that connection, microseconds |
OLD | old TCP state |
NEW | new TCP state |
tcpio.d
The following DTrace script traces TCP packets and prints various details:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10hz dtrace:::BEGIN { printf(" %3s %15s:%-5s %15s:%-5s %6s %s\n", "CPU", "LADDR", "LPORT", "RADDR", "RPORT", "BYTES", "FLAGS"); } tcp:::send { this->length = args[2]->ip_plength - args[4]->tcp_offset; printf(" %3d %16s:%-5d -> %16s:%-5d %6d (", cpu, args[2]->ip_saddr, args[4]->tcp_sport, args[2]->ip_daddr, args[4]->tcp_dport, this->length); } tcp:::receive { this->length = args[2]->ip_plength - args[4]->tcp_offset; printf(" %3d %16s:%-5d <- %16s:%-5d %6d (", cpu, args[2]->ip_daddr, args[4]->tcp_dport, args[2]->ip_saddr, args[4]->tcp_sport, this->length); } tcp:::send, tcp:::receive { printf("%s", args[4]->tcp_flags & TH_FIN ? "FIN|" : ""); printf("%s", args[4]->tcp_flags & TH_SYN ? "SYN|" : ""); printf("%s", args[4]->tcp_flags & TH_RST ? "RST|" : ""); printf("%s", args[4]->tcp_flags & TH_PUSH ? "PUSH|" : ""); printf("%s", args[4]->tcp_flags & TH_ACK ? "ACK|" : ""); printf("%s", args[4]->tcp_flags & TH_URG ? "URG|" : ""); printf("%s", args[4]->tcp_flags & TH_ECE ? "ECE|" : ""); printf("%s", args[4]->tcp_flags & TH_CWR ? "CWR|" : ""); printf("%s", args[4]->tcp_flags == 0 ? "null " : ""); printf("\b)\n"); }
This example output has captured a TCP handshake:
# ./tcpio.d CPU LADDR:LPORT RADDR:RPORT BYTES FLAGS 1 192.168.2.80:22 -> 192.168.1.109:60337 464 (PUSH|ACK) 1 192.168.2.80:22 -> 192.168.1.109:60337 48 (PUSH|ACK) 2 192.168.2.80:22 -> 192.168.1.109:60337 20 (PUSH|ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (SYN) 3 192.168.2.80:22 -> 192.168.1.109:60337 0 (SYN|ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 20 (PUSH|ACK) 3 192.168.2.80:22 -> 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 376 (PUSH|ACK) 3 192.168.2.80:22 -> 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 24 (PUSH|ACK) 2 192.168.2.80:22 -> 192.168.1.109:60337 736 (PUSH|ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK)
The fields printed are:
field | description |
---|---|
CPU | CPU id that event occurred on |
LADDR | local IP address |
LPORT | local TCP port |
RADDR | remote IP address |
RPORT | remote TCP port |
BYTES | TCP payload bytes |
FLAGS | TCP flags |
Note: The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering, and events such as the TCP handshake can be printed out of order. Keep an eye on changes in the CPU column, or add a timestamp column to this script and post sort.
tcp Stability
The tcp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 39, Stability.
Element | Name stability | Data stability | Dependency class |
---|---|---|---|
Provider | Evolving | Evolving | ISA |
Module | Private | Private | Unknown |
Function | Private | Private | Unknown |
Name | Evolving | Evolving | ISA |
Arguments | Evolving | Evolving | ISA |