Commit Graph

53 Commits

Author SHA1 Message Date
4f8b7857f9 change imports to personal fork 2024-01-07 22:03:11 +03:00
Jordan Whited
4ffa9c2032 device: change Peer.endpoint locking to reduce contention
Access to Peer.endpoint was previously synchronized by Peer.RWMutex.
This has now moved to Peer.endpoint.Mutex. Peer.SendBuffers() is now the
sole caller of Endpoint.ClearSrc(), which is signaled via a new bool,
Peer.endpoint.clearSrcOnTx. Previous Callers of Endpoint.ClearSrc() now
set this bool, primarily via peer.markEndpointSrcForClearing().
Peer.SetEndpointFromPacket() clears Peer.endpoint.clearSrcOnTx when an
updated conn.Endpoint is stored. This maintains the same event order as
before, i.e. a conn.Endpoint received after peer.endpoint.clearSrcOnTx
is set, but before the next Peer.SendBuffers() call results in the
latest conn.Endpoint source being used for the next packet transmission.

These changes result in throughput improvements for single flow,
parallel (-P n) flow, and bidirectional (--bidir) flow iperf3 TCP/UDP
tests as measured on both Linux and Windows. Latency under load improves
especially for high throughput Linux scenarios. These improvements are
likely realized on all platforms to some degree, as the changes are not
platform-specific.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-12-11 16:34:09 +01:00
Jordan Whited
1ec454f253 device: move Queue{In,Out}boundElement Mutex to container type
Queue{In,Out}boundElement locking can contribute to significant
overhead via sync.Mutex.lockSlow() in some environments. These types
are passed throughout the device package as elements in a slice, so
move the per-element Mutex to a container around the slice.

Reviewed-by: Maisem Ali <maisem@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-10-10 15:07:36 +02:00
Jordan Whited
3bb8fec7e4 conn, device, tun: implement vectorized I/O plumbing
Accept packet vectors for reading and writing in the tun.Device and
conn.Bind interfaces, so that the internal plumbing between these
interfaces now passes a vector of packets. Vectors move untouched
between these interfaces, i.e. if 128 packets are received from
conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is
no internal buffering.

Currently, existing implementations are only adjusted to have vectors
of length one. Subsequent patches will improve that.

Also, as a related fixup, use the unix and windows packages rather than
the syscall package when possible.

Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-03-10 14:52:13 +01:00
Jason A. Donenfeld
c7b76d3d9e device: uniformly check ECDH output for zeros
For some reason, this was omitted for response messages.

Reported-by: z <dzm@unexpl0.red>
Fixes: 8c34c4c ("First set of code review patches")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-02-16 16:33:14 +01:00
Jason A. Donenfeld
ebbd4a4330 global: bump copyright year
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2023-02-07 20:39:29 -03:00
Jason A. Donenfeld
bb719d3a6e global: bump copyright year
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-20 17:21:32 +02:00
Brad Fitzpatrick
b51010ba13 all: use Go 1.19 and its atomic types
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-04 12:57:30 +02:00
Jason A. Donenfeld
e3134bf665 device: defer state machine transitions until configuration is complete
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-11-15 23:40:47 +01:00
Jason A. Donenfeld
9087e444e6 device: optimize Peer.String even more
This reduces the allocation, branches, and amount of base64 encoding.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-18 17:43:53 +02:00
Josh Bleecher Snyder
25ad08a591 device: optimize Peer.String
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-14 00:37:30 +02:00
Jason A. Donenfeld
326aec10af device: remove unusual ... in messages
We dont use ... in any other present progressive messages except these.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-05-07 12:17:41 +02:00
Jason A. Donenfeld
593658d975 device: get rid of peers.empty boolean in timersActive
There's no way for len(peers)==0 when a current peer has
isRunning==false.

This requires some struct reshuffling so that the uint64 pointer is
aligned.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-03-06 08:44:38 -07:00
Jason A. Donenfeld
a4f8e83d5d conn: make binds replacable
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-23 20:00:57 +01:00
Jason A. Donenfeld
75e6d810ed device: use container/list instead of open coding it
This linked list implementation is awful, but maybe Go 2 will help
eventually, and at least we're not open coding the hlist any more.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-10 18:19:11 +01:00
Jason A. Donenfeld
484a9fd324 device: flush peer queues before starting device
In case some old packets snuck in there before, this flushes before
starting afresh.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-10 00:39:28 +01:00
Jason A. Donenfeld
5bf8d73127 device: create peer queues at peer creation time
Rather than racing with Start(), since we're never destroying these
queues, we just set the variables at creation time.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-10 00:21:12 +01:00
Josh Bleecher Snyder
78ebce6932 device: only allocate peer queues once
This serves two purposes.

First, it makes repeatedly stopping then starting a peer cheaper.
Second, it prevents a data race observed accessing the queues.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-09 18:33:48 +01:00
Jason A. Donenfeld
a816e8511e device: fix comment typo and shorten state.mu.Lock to state.Lock
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-09 15:37:04 +01:00
Jason A. Donenfeld
6ac1240821 device: do not attach finalizer to non-returned object
Before, the code attached a finalizer to an object that wasn't returned,
resulting in immediate garbage collection. Instead return the actual
pointer.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-09 15:37:04 +01:00
Josh Bleecher Snyder
d8dd1f254f device: remove mutex from Peer send/receive
The immediate motivation for this change is an observed deadlock.

1. A goroutine calls peer.Stop. That calls peer.queue.Lock().
2. Another goroutine is in RoutineSequentialReceiver.
   It receives an elem from peer.queue.inbound.
3. The peer.Stop goroutine calls close(peer.queue.inbound),
   close(peer.queue.outbound), and peer.stopping.Wait().
   It blocks waiting for RoutineSequentialReceiver
   and RoutineSequentialSender to exit.
4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets().
   SendStagedPackets attempts peer.queue.RLock().
   That blocks forever because the peer.Stop
   goroutine holds a write lock on that mutex.

A background motivation for this change is that it can be expensive
to have a mutex in the hot code path of RoutineSequential*.

The mutex was necessary to avoid attempting to send elems on a closed channel.
This commit removes that danger by never closing the channel.
Instead, we send a sentinel nil value on the channel to indicate
to the receiver that it should exit.

The only problem with this is that if the receiver exits,
we could write an elem into the channel which would never get received.
If it never gets received, it cannot get returned to the device pools.

To work around this, we use a finalizer. When the channel can be GC'd,
the finalizer drains any remaining elements from the channel and
restores them to the device pool.

After that change, peer.queue.RWMutex no longer makes sense where it is.
It is only used to prevent concurrent calls to Start and Stop.
Move it to a more sensible location and make it a plain sync.Mutex.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-08 13:02:52 -08:00
Josh Bleecher Snyder
15810daa22 device: separate timersInit from timersStart
timersInit sets up the timers.
It need only be done once per peer.

timersStart does the work to prepare the timers
for a newly running peer. It needs to be done
every time a peer starts.

Separate the two and call them in the appropriate places.
This prevents data races on the peer's timers fields
when starting and stopping peers.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-08 10:32:07 -08:00
Josh Bleecher Snyder
0bcb822e5b device: overhaul device state management
This commit simplifies device state management.
It creates a single unified state variable and documents its semantics.

It also makes state changes more atomic.
As an example of the sort of bug that occurred due to non-atomic state changes,
the following sequence of events used to occur approximately every 2.5 million test runs:

* RoutineTUNEventReader received an EventDown event.
* It called device.Down, which called device.setUpDown.
* That set device.state.changing, but did not yet attempt to lock device.state.Mutex.
* Test completion called device.Close.
* device.Close locked device.state.Mutex.
* device.Close blocked on a call to device.state.stopping.Wait.
* device.setUpDown then attempted to lock device.state.Mutex and blocked.

Deadlock results. setUpDown cannot progress because device.state.Mutex is locked.
Until setUpDown returns, RoutineTUNEventReader cannot call device.state.stopping.Done.
Until device.state.stopping.Done gets called, device.state.stopping.Wait is blocked.
As long as device.state.stopping.Wait is blocked, device.state.Mutex cannot be unlocked.
This commit fixes that deadlock by holding device.state.mu
when checking that the device is not closed.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-08 10:32:07 -08:00
Jason A. Donenfeld
01e176af3c device: take peer handshake when reinitializing last sent handshake
This papers over other unrelated races, unfortunately.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-03 17:52:31 +01:00
Josh Bleecher Snyder
8a374a35a0 device: tie encryption queue lifetime to the peers that write to it
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-02-03 00:57:57 +01:00
Jason A. Donenfeld
9263014ed3 device: simplify peer queue locking
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-29 16:21:53 +01:00
Jason A. Donenfeld
d4112d9096 global: bump copyright
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-28 17:52:15 +01:00
Jason A. Donenfeld
1b092ce584 device: get rid of nonce routine
This moves to a simple queue with no routine processing it, to reduce
scheduler pressure.

This splits latency in half!

benchmark                  old ns/op     new ns/op     delta
BenchmarkThroughput-16     2394          2364          -1.25%
BenchmarkLatency-16        259652        120810        -53.47%

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-27 18:38:27 +01:00
Jason A. Donenfeld
8cc99631d0 device: use linked list for per-peer allowed-ip traversal
This makes the IpcGet method much faster.

We also refactor the traversal API to use a callback so that we don't
need to allocate at all. Avoiding allocations we do self-masking on
insertion, which in turn means that split intermediate nodes require a
copy of the bits.

benchmark               old ns/op     new ns/op     delta
BenchmarkUAPIGet-16     3243          2659          -18.01%

benchmark               old allocs     new allocs     delta
BenchmarkUAPIGet-16     35             30             -14.29%

benchmark               old bytes     new bytes     delta
BenchmarkUAPIGet-16     1218          737           -39.49%

This benchmark is good, though it's only for a pair of peers, each with
only one allowedips. As this grows, the delta expands considerably.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-27 01:48:58 +01:00
Jason A. Donenfeld
d669c78c43 device: combine debug and info log levels into 'verbose'
There are very few cases, if any, in which a user only wants one of
these levels, so combine it into a single level.

While we're at it, reduce indirection on the loggers by using an empty
function rather than a nil function pointer. It's not like we have
retpolines anyway, and we were always calling through a function with a
branch prior, so this seems like a net gain.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-26 23:05:48 +01:00
Josh Bleecher Snyder
7139279cd0 device: change logging interface to use functions
This commit overhauls wireguard-go's logging.

The primary, motivating change is to use a function instead
of a *log.Logger as the basic unit of logging.
Using functions provides a lot more flexibility for
people to bring their own logging system.

It also introduces logging helper methods on Device.
These reduce line noise at the call site.
They also allow for log functions to be nil;
when nil, instead of generating a log line and throwing it away,
we don't bother generating it at all.
This spares allocation and pointless work.

This is a breaking change, although the fix required
of clients is fairly straightforward.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-26 22:40:20 +01:00
Josh Bleecher Snyder
d0f8e9477c device: remove unnecessary zeroing
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
b42e32047d device: call wg.Add outside the goroutine
One of the first rules of WaitGroups is that you call wg.Add
outside of a goroutine, not inside it. Fix this embarrassing mistake.

This prevents an extremely rare race condition (2 per 100,000 runs)
which could occur when attempting to start a new peer
concurrently with shutting down a device.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Jason A. Donenfeld
25b01723dd device: fix alignment of peer stats member
This was shifted by 2 bytes when making persistent keepalive into a u32.
Fix it by placing it after the aligned region.

Fixes: e739ff7 ("device: fix persistent_keepalive_interval data races")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
f7bbdc31a0 device: fix data race in peer.timersActive
Found by the race detector and existing tests.

To avoid introducing a lock into this hot path,
calculate and cache whether any peers exist.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
70861686d3 device: fix races from changing private_key
Access keypair.sendNonce atomically.
Eliminate one unnecessary initialization to zero.

Mutate handshake.lastSentHandshake with the mutex held.

Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
2832e96339 device: use channel close to shut down and drain outbound channel
This is a similar treatment to the handling of the encryption
channel found a few commits ago: Use the closing of the channel
to manage goroutine lifetime and shutdown.
It is considerably simpler because there is only a single writer.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
63066ce406 device: fix persistent_keepalive_interval data races
Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
fc0aabbae9 device: prevent spurious errors while closing a device
When closing a device, packets that are in flight
can make it to SendBuffer, which then returns an error.
Those errors add noise but no light;
they do not reflect an actual problem.

Adding the synchronization required to prevent
this from occurring is currently expensive and error-prone.
Instead, quietly drop such packets instead of
returning an error.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Josh Bleecher Snyder
c9e4a859ae device: remove starting waitgroups
In each case, the starting waitgroup did nothing but ensure
that the goroutine has launched.

Nothing downstream depends on the order in which goroutines launch,
and if the Go runtime scheduler is so broken that goroutines
don't get launched reasonably promptly, we have much deeper problems.

Given all that, simplify the code.

Passed a race-enabled stress test 25,000 times without failure.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-01-07 14:49:44 +01:00
Haichao Liu
913f68ce38 device: add write queue mutex for peer
fix panic: send on closed channel when remove peer

Signed-off-by: Haichao Liu <liuhaichao@bytedance.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-11-18 14:22:15 +01:00
Jason A. Donenfeld
5ca1218a5c device: format a few things
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-11-06 18:01:27 +01:00
Jason A. Donenfeld
c8fe925020 device: remove global for roaming escape hatch
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-10-14 10:45:31 +02:00
Jason A. Donenfeld
db0aa39b76 global: update header comments and modules
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-05-02 02:08:26 -06:00
Jason A. Donenfeld
28c4d04304 device: use atomic access for unlocked keypair.next
Go's GC semantics might not always guarantee the safety of this, and the
race detector gets upset too, so instead we wrap this all in atomic
accessors.

Reported-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-05-02 01:56:48 -06:00
David Crawshaw
203554620d conn: introduce new package that splits out the Bind and Endpoint types
The sticky socket code stays in the device package for now,
as it reaches deeply into the peer list.

This is the first step in an effort to split some code out of
the very busy device package.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2020-05-02 01:46:42 -06:00
David Anderson
3dce460c88 device: add test to ensure Peer fields are safe for atomic access on 32-bit
Adds a test that will fail consistently on 32-bit platforms if the
struct ever changes again to violate the rules. This is likely not
needed because unaligned access crashes reliably, but this will reliably
fail even if tests accidentally pass due to lucky alignment.

Signed-Off-By: David Anderson <danderson@tailscale.com>
2020-05-02 01:44:58 -06:00
Jason A. Donenfeld
4739708ca4 noise: unify zero checking of ecdh 2020-03-17 23:07:14 -06:00
Jason A. Donenfeld
4e3018a967 uapi: skip peers with invalid keys 2019-08-05 16:57:41 +02:00
Jason A. Donenfeld
a961aacc9f device: immediately rekey all peers after changing device private key
Reported-by: Derrick Pallas <derrick@pallas.us>
2019-07-11 17:37:35 +02:00