Hmm, I still am not sure I completely follow, but I think I get the broad strokes.
TBT is 40Gbps (5GBps), but it’s never fully stated if that’s “each way” or not. After doing some napkin math, I’m going to assume it is, as otherwise you can’t fit 25.6Gbps (3.2GBps). But it’s generally assumed that it should peak out at 32Gbps for PCIe data, with the rest free for DisplayPort as that comes into the controller over a separate link.
TBT is full duplex yes - all the modern wired multi-gigabit comms standards I can think of are. (Wireless is a different story. WiFi is half duplex. Full duplex radio is hard since, while the radio is transmitting, its un-attenuated (by distance) transmit power is akin to a 130dB jet engine piped directly into the "ear" of the receive side of the radio.)
Hector's point is that lots of people assume that because Intel says certain things about TBT capabilities, they must be universal to all TBT implementations and baked into the spec at a deep level, when actually they are not.
TBT is a channel for moving packets around. It's designed around primarily being used to encapsulate packets from other packetized standards, most notably PCIe and DisplayPort. Each end of the link needs a bridge to the appropriate other kind of bus, and the bridge handles wrapping the "alien" packets in TBT framing.
The stuff you see about TBT handling 32 Gbps of PCIe comes from Intel's most common implementations, where the host side PCIe-TBT bridge includes a physical Gen3 x4 link to a host PCIe port. But there is no requirement that it actually be done that way, Intel only requires a
minimum of 32 Gbps PCIe bandwidth. More is possible.
In Apple's case, there is no physical PCIe link at the host end. They have a PCIe root complex in the SoC, but PCIe is a layered networking spec and Apple can discard the layers it doesn't need. They don't have to include any of the physical layers at all, they can simply clock packets through an internal parallel SoC link into a FIFO for the TBT controller to read out at its own pace.
So, if Apple didn't internally bottleneck it anywhere, their TBT ports can theoretically do 40 Gbps PCIe, assuming the other end of the link's up to it. And as Hector pointed out, there's now some peripheral-end TBT chips whose PCIe bridge provides Gen4 x4 connectivity, meaning it's quite possible (at least in theory) to hit 40G. (You'd end up bottlenecked by TBT rather than the device-end Gen4 x4 link.)
You'll also see lots of people talking about TBT as if there's a fixed DisplayPort bandwidth allocation. I don't think any such thing needs to exist, instead they should just be giving DP packets absolute priority in the TBT bridge's transmission queue. That should automatically "allocate" exactly the amount of bandwidth required by the current resolution, color depth, and refresh rate of the DP device at the other end.