Commit Graph

50 Commits

Author SHA1 Message Date
Jordan Whited
2fcdaf9799 conn: fix StdNetBind fallback on Windows
If RIO is unavailable, NewWinRingBind() falls back to StdNetBind.
StdNetBind uses x/net/ipv{4,6}.PacketConn for sending and receiving
datagrams, specifically via the {Read,Write}Batch methods.
These methods are unimplemented on Windows and will return runtime
errors as a result. Additionally, only Linux benefits from these
x/net types for reading and writing, so we update StdNetBind to fall
back to the standard library net package for all platforms other than
Linux.

Reviewed-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10 14:52:24 +01:00
Jason A. Donenfeld
dbd949307e conn: inch BatchSize toward being non-dynamic
There's not really a use at the moment for making this configurable, and
once bind_windows.go behaves like bind_std.go, we'll be able to use
constants everywhere. So begin that simplification now.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10 14:52:22 +01:00
Jordan Whited
f26efb65f2 conn: set SO_{SND,RCV}BUF to 7MB on the Bind UDP socket
The conn.Bind UDP sockets' send and receive buffers are now being sized
to 7MB, whereas they were previously inheriting the system defaults.
The system defaults are considerably small and can result in dropped
packets on high speed links. By increasing the size of these buffers we
are able to achieve higher throughput in the aforementioned case.

The iperf3 results below demonstrate the effect of this commit between
two Linux computers with 32-core Xeon Platinum CPUs @ 2.9Ghz. There is
roughly ~125us of round trip latency between them.

The first result is from commit 792b49c which uses the system defaults,
e.g. net.core.{r,w}mem_max = 212992. The TCP retransmits are correlated
with buffer full drops on both sides.

Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.00  sec  4.74 GBytes  4.08 Gbits/sec  2742   285 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.74 GBytes  4.08 Gbits/sec  2742   sender
[  5]   0.00-10.04  sec  4.74 GBytes  4.06 Gbits/sec         receiver

The second result is after increasing SO_{SND,RCV}BUF to 7MB, i.e.
applying this commit.

Starting Test: protocol: TCP, 1 streams, 131072 byte blocks
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-10.00  sec  6.14 GBytes  5.27 Gbits/sec    0   3.15 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.14 GBytes  5.27 Gbits/sec    0    sender
[  5]   0.00-10.04  sec  6.14 GBytes  5.25 Gbits/sec         receiver

The specific value of 7MB is chosen as it is the max supported by a
default configuration of macOS. A value greater than 7MB may further
benefit throughput for environments with higher network latency and
lower CPU clocks, but will also increase latency under load
(bufferbloat). Some platforms will silently clamp the value to other
maximums. On Linux, we use SO_{SND,RCV}BUFFORCE in case 7MB is beyond
net.core.{r,w}mem_max.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10 14:52:20 +01:00
Jordan Whited
9e2f386022 conn, device, tun: implement vectorized I/O on Linux
Implement TCP offloading via TSO and GRO for the Linux tun.Device, which
is made possible by virtio extensions in the kernel's TUN driver.

Delete conn.LinuxSocketEndpoint in favor of a collapsed conn.StdNetBind.
conn.StdNetBind makes use of recvmmsg() and sendmmsg() on Linux. All
platforms now fall under conn.StdNetBind, except for Windows, which
remains in conn.WinRingBind, which still needs to be adjusted to handle
multiple packets.

Also refactor sticky sockets support to eventually be applicable on
platforms other than just Linux. However Linux remains the sole platform
that fully implements it for now.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10 14:52:17 +01:00
Jordan Whited
3bb8fec7e4 conn, device, tun: implement vectorized I/O plumbing
Accept packet vectors for reading and writing in the tun.Device and
conn.Bind interfaces, so that the internal plumbing between these
interfaces now passes a vector of packets. Vectors move untouched
between these interfaces, i.e. if 128 packets are received from
conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is
no internal buffering.

Currently, existing implementations are only adjusted to have vectors
of length one. Subsequent patches will improve that.

Also, as a related fixup, use the unix and windows packages rather than
the syscall package when possible.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10 14:52:13 +01:00
Jason A. Donenfeld
ebbd4a4330 global: bump copyright year
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-02-07 20:39:29 -03:00
Jason A. Donenfeld
bb719d3a6e global: bump copyright year
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-20 17:21:32 +02:00
Brad Fitzpatrick
b51010ba13 all: use Go 1.19 and its atomic types
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-04 12:57:30 +02:00
Brad Fitzpatrick
c31a7b1ab4 conn, device, tun: set CLOEXEC on fds
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-04 01:42:12 +02:00
Josh Bleecher Snyder
ef5c587f78 conn: remove the final alloc per packet receive
This does bind_std only; other platforms remain.

The remaining alloc per iteration in the Throughput benchmark
comes from the tuntest package, and should not appear in regular use.

name           old time/op      new time/op      delta
Latency-10         25.2µs ± 1%      25.0µs ± 0%   -0.58%  (p=0.006 n=10+10)
Throughput-10      2.44µs ± 3%      2.41µs ± 2%     ~     (p=0.140 n=10+8)

name           old alloc/op     new alloc/op     delta
Latency-10           854B ± 5%        741B ± 3%  -13.22%  (p=0.000 n=10+10)
Throughput-10        265B ±34%        267B ±39%     ~     (p=0.670 n=10+10)

name           old allocs/op    new allocs/op    delta
Latency-10           16.0 ± 0%        14.0 ± 0%  -12.50%  (p=0.000 n=10+10)
Throughput-10        2.00 ± 0%        1.00 ± 0%  -50.00%  (p=0.000 n=10+10)

name           old packet-loss  new packet-loss  delta
Throughput-10        0.01 ±82%       0.01 ±282%     ~     (p=0.321 n=9+8)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-04-07 03:31:10 +02:00
Jason A. Donenfeld
193cf8d6a5 conn: use netip for std bind
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-03-17 22:23:02 -06:00
Josh Bleecher Snyder
42c9af45e1 all: update to Go 1.18
Bump go.mod and README.

Switch to upstream net/netip.

Use strings.Cut.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2022-03-16 16:09:48 -07:00
Jason A. Donenfeld
9c9e7e2724 global: apply gofumpt
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-12-09 23:15:55 +01:00
Jason A. Donenfeld
ef8d6804d7 global: use netip where possible now
There are more places where we'll need to add it later, when Go 1.18
comes out with support for it in the "net" package. Also, allowedips
still uses slices internally, which might be suboptimal.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-11-23 22:03:15 +01:00
Jason A. Donenfeld
dfd688b6aa global: remove old-style build tags
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-10-12 12:02:10 -06:00
Jason A. Donenfeld
982d5d2e84 conn,wintun: use unsafe.Slice instead of unsafeSlice
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-10-11 14:57:53 -06:00
Jason A. Donenfeld
2ef39d4754 global: add new go 1.17 build comments
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-09-05 16:00:43 +02:00
Jason A. Donenfeld
bd83f0ac99 conn: linux: protect read fds
The -1 protection was removed and the wrong error was returned, causing
us to read from a bogus fd. As well, remove the useless closures that
aren't doing anything, since this is all synchronized anyway.

Fixes: 10533c3 ("all: make conn.Bind.Open return a slice of receive functions")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-20 18:09:55 +02:00
Jason A. Donenfeld
5846b62283 conn: windows: set count=0 on retry
When retrying, if count is not 0, we forget to dequeue another request,
and so the ring fills up and errors out.

Reported-by: Sascha Dierberg <dierberg@dresearch-fe.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-11 16:47:17 +02:00
Jason A. Donenfeld
8246d251ea conn: windows: do not error out when receiving UDP jumbogram
If we receive a large UDP packet, don't return an error to receive.go,
which then terminates the receive loop. Instead, simply retry.

Considering Winsock's general finickiness, we might consider other
places where an attacker on the wire can generate error conditions like
this.

Reported-by: Sascha Dierberg <sascha.dierberg@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-26 22:07:03 -04:00
Jason A. Donenfeld
54dbe2471f conn: reconstruct v4 vs v6 receive function based on symtab
This is kind of gross but it's better than the alternatives.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-12 15:35:32 -06:00
Jason A. Donenfeld
5f6bbe4ae8 conn: windows: reset ring to starting position after free
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-09 18:09:41 -06:00
Jason A. Donenfeld
75526d6071 conn: windows: compare head and tail properly
By not comparing these with the modulo, the ring became nearly never
full, resulting in completion queue buffers filling up prematurely.

Reported-by: Joshua Sjoding <joshua.sjoding@scjalliance.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-09 14:26:08 -06:00
Jason A. Donenfeld
fbf97502cf winrio: test that IOCP-based RIO is supported
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-09 14:26:08 -06:00
Josh Bleecher Snyder
10533c3e73 all: make conn.Bind.Open return a slice of receive functions
Instead of hard-coding exactly two sources from which
to receive packets (an IPv4 source and an IPv6 source),
allow the conn.Bind to specify a set of sources.

Beneficial consequences:

* If there's no IPv6 support on a system,
  conn.Bind.Open can choose not to return a receive function for it,
  which is simpler than tracking that state in the bind.
  This simplification removes existing data races from both
  conn.StdNetBind and bindtest.ChannelBind.
* If there are more than two sources on a system,
  the conn.Bind no longer needs to add a separate muxing layer.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-02 11:07:08 -06:00
Jason A. Donenfeld
8ed83e0427 conn: winrio: pass key parameter into struct
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-04-02 10:36:41 -06:00
Josh Bleecher Snyder
517f0703f5 conn: document retry loop in StdNetBind.Open
It's not obvious on a first read what the loop is doing.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-03-30 12:09:38 -07:00
Josh Bleecher Snyder
204140016a conn: use local ipvN vars in StdNetBind.Open
This makes it clearer that they are fresh on each attempt,
and avoids the bookkeeping required to clearing them on failure.

Also, remove an unnecessary err != nil.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-03-30 12:09:38 -07:00
Josh Bleecher Snyder
822f5a6d70 conn: unify code in StdNetBind.Send
The sending code is identical for ipv4 and ipv6;
select the conn, then use it.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-03-30 12:09:32 -07:00
Jason A. Donenfeld
0eb7206295 conn: linux: unexport mutex
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-03-08 21:04:09 -07:00
Jason A. Donenfeld
3c11c0308e conn: implement RIO for fast Windows UDP sockets
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-25 15:08:08 +01:00
Jason A. Donenfeld
9a29ae267c device: test up/down using virtual conn
This prevents port clashing bugs.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-23 20:00:57 +01:00
Jason A. Donenfeld
a4f8e83d5d conn: make binds replacable
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-23 20:00:57 +01:00
Jason A. Donenfeld
4e439ea10e conn: bump to 1.16 and get rid of NetErrClosed hack
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-16 21:05:25 +01:00
Jason A. Donenfeld
aabc3770ba conn: close old fd before trying again
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-10 00:43:31 +01:00
Jason A. Donenfeld
5cdb862f15 conn: use errors.Is for unwrapping
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-09 19:46:57 +01:00
Jason A. Donenfeld
30b96ba083 conn: try harder to have v4 and v6 ports agree
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-09 18:45:12 +01:00
Jason A. Donenfeld
d4112d9096 global: bump copyright
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-28 17:52:15 +01:00
Brad Fitzpatrick
23b2790aa0 conn: fix interface parameter name in Bind interface docs
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-01-26 15:20:22 +01:00
Jason A. Donenfeld
294d3bedf9 device: allow compiling with Go 1.15
Until we depend on Go 1.16 (which isn't released yet), alias our own
variable to the private member of the net package. This will allow an
easy find replace to make this go away when we eventually switch to
1.16.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-20 20:12:32 +01:00
Josh Bleecher Snyder
f07177c762 conn: remove _ method receiver
Minor style fix.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-20 20:03:40 +01:00
Jason A. Donenfeld
ea6c1cd7e6 device: receive: do not exit immediately on transient UDP receive errors
Some users report seeing lines like:

> Routine: receive incoming IPv4 - stopped

Popping up unexpectedly. Let's sleep and try again before failing, and
also log the error, and perhaps we'll eventually understand this
situation better in future versions.

Because we have to distinguish between the socket being closed
explicitly and whatever error this is, we bump the module to require Go
1.16.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-08 14:30:04 +01:00
Jason A. Donenfeld
3b3de758ec conn: linux: do not allow ReceiveIPvX to race with Close
If Close is called after ReceiveIPvX, then ReceiveIPvX will block on an
invalid or potentially reused fd.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-07 17:08:58 +01:00
Jason A. Donenfeld
890cc06ed5 conn: do not SO_REUSEADDR on linux
SO_REUSEADDR does not make sense for unicast UDP sockets.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-07 14:49:44 +01:00
David Crawshaw
00bcd865e6 conn: add comments saying what uses these interfaces
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-06-22 10:40:59 +10:00
Jason A. Donenfeld
c403da6a39 conn: unbreak boundif on android
Another thing never tested ever.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-06-07 01:48:28 -06:00
Jason A. Donenfeld
d6de6f3ce6 conn: remove useless comment
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-06-07 01:37:01 -06:00
Jason A. Donenfeld
59e556f24e conn: fix windows situation with boundif
This was evidently never tested before committing.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-06-07 01:26:25 -06:00
Jason A. Donenfeld
db0aa39b76 global: update header comments and modules
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-05-02 02:08:26 -06:00
David Crawshaw
203554620d conn: introduce new package that splits out the Bind and Endpoint types
The sticky socket code stays in the device package for now,
as it reaches deeply into the peer list.

This is the first step in an effort to split some code out of
the very busy device package.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-05-02 01:46:42 -06:00