It makes sense to me that a relatively narrowly focused/narrowly trained AI system eventually beats humans for vulnerability detection. And significant false positives are in this scenario probably acceptable, much more than false negatives. Someone will have to work off those false positives (and presumably feed them back so the tool learns and gets better.)
But at the end of the day, the question is "How do you trust the tool is correct?" Here, at least, you can write a reasonably testable requirement. "Must detect security vulnerabilities" and provide a definition of (which I'm probably not qualified to write :-) ) for 'security vulnerability'. But then someone has to figure out what the verification approach will be, and how that's established/documented. Should there be a formal registry of 'trusted AI vulnerability scanners"? Certainly if we expect such tools to be used for product qualification ("Your website must be shown to contain no vulnerabilities, as inspected by this tool and set of procedures we trust."), we have to have a way to establish that trust.
This is a good news story, but there's much more work to be done to turn this into production. And a lot of that work is not strictly technical, but managerial (probably including government participation, e.g. a NIST set of qualification criteria and maybe even a registry of tools that meet those criteria.)