CyberSecurity Project: Transparent Filtering Bridge (+ Extras) 5.0

Sometimes on a project, you run into a problem that you cannot solve.

Sometimes, this is due to lack of skill or experience. Sometimes it is due to lack of authority to do what needs to be done. Sometimes it's because of technological limitations.

Sometimes, it's because of bad defaults, and stubborn refusal to accept valid criticism.

The Progression Blocker

To recap, the project has a simple goal: implement a transparent filtering bridge with OPNsense. I acquired a miniPC, installed OPNsense, and ran into issues connecting to the webGUI.

Initially, this was due to two things:

  1. Shutting down the WiFi interface completely obliterated the ability to reactive it.
  2. I couldn't apply an IP address to the LAN port without breaking the filtering bridge.

So I eventually clean installed OPNsense again so that I could reactivate the WiFi and use that to serve as my management connection:

Planned network architecture, with Transparent Filtering Bridge connected to router WiFi interface.

There were two problems with this:

  1. The WiFi interface itself would have repeated errors that referenced a specific FreeBSD bug.
  2. Even when I did connect the miniPC to WiFi, I couldn't access the webGUI.

Those two problems might've actually been two sides of the same coin, but that was not obvious at the start.

The FreeBSD Bug

The issue I was running into was this error message:

iwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe015572d4c8 skb 0xfffff80035b83000 { len 30 } info 0xfffffe00e1253cd8 sta 0xfffff80035ecc080 (if you see this please report to PR 274382)
iwl_mvm_tx_mpdu:1204: fc 0x00b0 tid 8 txq_id 65535 mvm 0xfffffe015572d4c8 skb 0xfffff80035b83000 { len 30 } info 0xfffffe00e1253cd8 sta 0xfffff80035ecc080 (if you see this please report to PR 274382)

The error, according to the FreeBSD foundation, seems to related to receiving frames for a state that is no longer known to the driver or firmware.

What was odd was that this error would trigger on startup, even when the WiFi configuration was set.

The webGUI

Diagnosing the webGUI connection issues was much harder, since the Intel WiFi driver bug would at least trigger whenever I was trying to set up the connection. The obvious solution of making a firewall rule to allow HTTPS connections didn't seem to do anything. Attempts to access the webGUI if and when I could sporadically get the WiFi to connect just led to timed out connections.

So to figure out what to do next, I asked Bing/Microsoft Copilot (they've changed the URL, so I have no idea what's going on with the branding) for some advice on how to troubleshoot the issue. And that advice included a console command to disable the firewall.

After executing the command and going into the network interface commands to get the WiFi connection to communicate with the DHCP server, and I had an IP address.

So this made me look closer at the firewall settings for the interface.

The Settings

It turns out that when I was implementing that HTTPS rule earlier, I overlooked an entire suite of default rules in a collapsed accordion interface. So I opened it up, and took a look inside.

As you can see above, the list starts with a deny anything rule, although it supposedly is a last match rule. That made me suspicious, so I replaced the HTTPS rule with a generic allow all IPv4/IPv6 rule into the interface, to see if that would change things.

No dice. Still had the same errors and connection problems.

I looked to see if I could either disable the default rules one by one or all together, but there wasn't a GUI option for that. I also couldn't move my own custom rule before the default rules.

This was a big problem, because one of the earliest things you learn in cybersecurity is that access control lists are executed in order of how the rules are listed. So you have to get the order right to ensure things work as intended.

Eventually, I couldn't figure out what to do next, so I made an issue on the OPNsense Github repo, and got this helpful advice:

You can easily debug the effect of these rules by editing /tmp/rules.debug and after modifying the ruleset, apply changes using pfctl -f /tmp/rules.debug.

One thing I discovered while attempting this was that the rules list in the debug file was in the same order as it displayed in the webGUI. Which means that there's a high likelihood that the problem was the default IPv4 deny rule. So I commented it out, applied the changes and...

Got an IP address with no problems and could connect to the webGUI.

Poking Around Some More

Experimenting some more with various logging techniques, I discovered a few other things.

First, it seemed like the outbound rules were definitely not the problem:

OPNsense OPT1 interface traffic being allowed out onto network.

Second, the default IPv4 deny rule was definitely part of the problem:

OPT1 interface default IPv4 deny rule applying to multiple packets.

I tried packet captures through OPNsense's internal diagnostic tools, then ran into a bizarre problem. By default, the captures would end at a certain amount of packets, which made sense to keep things from being unmanageable. But if I set the capture to some value like 50 or 100, the capture wouldn't end, even after multiple minutes.

As a sanity check, I opened up Wireshark on a separate computer, connected to the webGUI over the WiFi, used the webGUI IP address as the capture filter, and began capturing. Within ten seconds, I had over 100 packets. Checking the packet captures I made in OPNsense, even the most permissive ones were lucky to have more than a dozen.

This makes troubleshooting this issue much harder.

The Theory of the Bug

Here's what I think is happening:

  1. There's a bug somewhere in the code.
  2. When the firewall is initialized, the default IPv4 deny rule triggers first, despite being a last match rule.
  3. When the WiFi interface initializes, the rule blocks the DHCP Offer and/or Acknowledge packets from the DHCP server, triggering the Intel WiFi error.
  4. If/when the WiFi connection is made, the firewall blocks the webGUI TCP three-way handshake at some point.
  5. The webGUI connection then times out, because it doesn't receive the SYN-ACK and/or ACK packets it was expecting.

The UX Part of The Problem

The frustrating part of this situation is that there's a potentially simple solution to the problem. It just doesn't exist, because the developers seem resistant to the idea that their implementation of the default rules might be a problem.

No less than three issues reference the default firewall rules, and one of those is my own.

Now, the main point of disagreement, from what I can tell, is this:

  1. The developers want to provide default firewall rules that protect every connection.
  2. The users want to be able to alter/disable/delete the rules from the rules GUI.

These two things are not mutually exclusive, which is why the general tone of users in this issue thread (the first I could find) is of frustration:

I think it's a little more than agreeing to disagree.

What is the purpose of using a firewall if I don't have full autonomy over what rules I want to create/delete?


normally closedsource products try to think for me and prevent me of doing stuff. i prefer the straight way of linux and its rm -rf /. if i want to do that i can. it is really sad that you/the opnsense team is blocking such an important functionality which, as this ticket shows, costs developer a lot of time and blocks them from using your awesome software. i would love to stand with opnsense instead of pfsense. so i would really prefer to use opnsense.

i think i can speak for the people who commented here that it is not about removing a default setting, which you say make the firewall more secure, but the option for an advanced user to do so.

So, since this is a somewhat contentious topic, I'm just going to ask a simple question:

Isn't turning things on and off in different orders, and sometimes moving things around, the most common and proven method of troubleshooting?

It seems odd to remove this basic capability, then expect people to always stumble upon the potential solutions. And to be honest, if your default settings are causing issues, is that not a big red flag suggesting there might need to be changes?