When your AI compliance tool drifts: why automated NIS2 checks need human oversight
Agent drift in NIS2 compliance: how AI tools can misclassify risks, skip controls, and report "compliant" when real gaps exist. Why Article 20 demands human accountability.
Introduction
In April 2025, a QA engineer discovered that their team's AI agent - configured to prioritize Jira tasks - had deleted half the backlog. The agent was not malicious. It simply optimized for the wrong metric: "reduce the number of open tasks" instead of "prioritize the important ones." The QA community called this "agent drift" - the moment when an AI tool loses direction and starts making decisions that seem logical but produce dangerous effects.
Now imagine the same phenomenon applied to NIS2 compliance.
An automated compliance verification tool can run 140 controls in minutes. It can generate reports, assess risks, even fill out DNSC forms. But it can also misclassify a critical risk as "medium," skip a control it deems irrelevant, or report "compliant" when real gaps exist. And unlike a human mistake - which leaves visible traces - an AI tool that drifts produces results that look perfect.
What is "agent drift" and why it matters for NIS2
Agent drift occurs when an AI system maintains its form but loses its substance. It continues producing reports, continues checking boxes, continues displaying compliance scores. But the decisions behind those scores gradually decouple from reality.
In software testing, this means an automation tool runs 500 tests and reports "all pass" - but the tests were automatically modified to match the application's current behavior, not the original specifications. The tests pass because they adapted to bugs, not because the bugs do not exist.
In NIS2 compliance, the equivalent would be:
- An automated scanner evaluating security policies based on the presence of documents, not their content
- A risk assessment tool assigning scores based on generic templates, not the organization's specifics
- A report generator filling fields with predefined answers without verifying whether measures are actually implemented
Concrete example: risk level classification
OUG 155/2024 requires organizations to submit a risk level assessment to DNSC. This assessment determines what security measures must be implemented. An AI tool could analyze the sector, size, and type of data processed - and produce a risk level. But what happens if the tool does not know the organization suffered an undisclosed security incident in the past 6 months? Or that a critical supplier was compromised?
The automatically generated risk level would look correct. It would be professionally formatted. It would cite the relevant directive articles. But it would be wrong - and the organization would plan insufficient measures based on it.
What NIS2 says about human accountability
The NIS2 Directive does not explicitly mention AI tools. But it clearly establishes who is responsible - and it is not an algorithm.
Article 20: management accountability
The management bodies of essential and important entities must:
- Approve the measures for cybersecurity risk management
- Oversee the implementation of those measures
- Participate in training on cybersecurity
- Bear responsibility for non-compliance
The key word is "oversee." An AI tool can generate recommendations, automate checks, accelerate the process. But management must understand what they approve. If an AI tool produces a compliance report and management signs it without verification, the responsibility remains with management - not the tool.
Article 23: incident reporting
NIS2 requires notifications in strict phases:
- Initial alert: 24 hours from detection
- Full notification: 72 hours
- Final report: 30 days
An AI tool can detect anomalies. It can even classify an event as an "incident." But the decision to report to DNSC involves human judgment: the actual severity, the impact on services, the number of affected users, whether the incident is isolated or part of a broader attack. These assessments require context that an automated tool does not have.
Article 21: security measures
The 140 controls in the CyFunRO framework (adapted from NIST CSF) cover everything from governance to disaster recovery. For each control, the organization must demonstrate that the measure is:
- Implemented (not just documented)
- Tested (not just checked off)
- Maintained (not just configured once)
An AI tool can verify whether a policy document exists. It cannot verify whether that policy is followed in practice. It can confirm that a firewall is configured. It cannot assess whether the firewall rules are adequate for the organization's specific threats.
Three real scenarios where automation can fail
Scenario 1: superficial gap analysis
An organization uses an AI tool to compare existing policies with NIS2 requirements. The tool finds 85% coverage and reports a "good" compliance level. But the analysis was based on keyword matching between documents and requirements - not on evaluating implementation quality.
In reality, the organization has policies that mention "incident management" but do not describe concrete procedures. It has a "risk assessment" document that has not been updated in 2 years. The AI tool checked boxes based on document existence, not relevance.
Scenario 2: risk assessment with incomplete data
An automated tool evaluates security risks based on available data: network topology, server configurations, application inventory. It produces a risk matrix with numerical scores.
But the tool does not know that: the IT team deployed a new VPN last month without updating documentation; a cloud provider changed their data processing terms; three employees with privileged access left the company without their rights being fully revoked.
The risk assessment looks impeccable on paper. The real risks remain invisible.
Scenario 3: continuous monitoring with ignored alerts
A monitoring system generates alerts 24/7. In the first month, the team verifies each alert. By the third month, the alert volume is so high that the team begins classifying them automatically. By the sixth month, an AI tool is configured to filter "false alarms."
The problem: what the tool classifies as a "false alarm" is based on historical patterns. A new type of attack - that resembles nothing in history - is automatically classified as a false positive. The attack progresses undetected.
How NIS2 Manager keeps humans in the compliance loop
NIS2 Manager is built on the principle that automation should accelerate human decisions, not replace them.
Structure, not blind automation
The platform organizes the 140 CyFunRO controls into a structured workflow. For each control, the user must:
- Assess the current state - not an auto-generated score, but a deliberate evaluation
- Upload evidence - documents, screenshots, audit reports demonstrating implementation
- Document the measures - what was done, who is responsible, when to re-verify
The platform does not generate answers. It provides the framework for informed human answers.
Verification through eligibility
The eligibility calculator helps organizations determine whether they fall under NIS2. But it does not produce an automatic "yes/no" answer based on a single criterion. It guides the user through sectors, size thresholds, exceptions, and special conditions - requiring human input at every step.
Complete traceability
Every action in NIS2 Manager is recorded: who assessed a control, when, what evidence they uploaded, what score they assigned. This traceability serves two purposes:
- Internal audit: management can verify that assessments were made by competent people, not auto-generated
- DNSC audit: in case of an inspection, the organization can demonstrate that the compliance process involved human judgment at every step
Deadline-based alerts, not AI predictions
Instead of using AI to "predict" non-compliance risks, NIS2 Manager sends notifications based on concrete deadlines: the DNSC deadline of September 2026, the 24-hour incident reporting window, periodic review cycles. These alerts are deterministic - they do not depend on probabilistic models that can drift.
The lesson from software testing: BetterQA and human oversight
BetterQA, the company behind NIS2 Manager, built a software testing practice on the same principle. After 15+ years of QA experience, the conclusion is clear: automated tools are excellent for speed, but human judgment remains essential for quality.
In software testing, BetterQA uses:
- Automation (via Flows) for repetitive tests that must run on every build
- AI (via BugBoard) for generating test cases from screenshots - but a QA engineer reviews every generated case
- Security scanning (via AI Security Toolkit) with 30+ scanners in parallel - but a specialist interprets the results and reconstructs attack chains
No automated result reaches a client without human verification. This philosophy is directly reflected in NIS2 Manager: the platform structures the compliance process but does not make decisions for the user.
Checklist: are you using AI in NIS2 compliance?
If your organization uses automated tools for compliance, verify:
- Who validates the results? - If nobody checks what the AI tool produces, you are exposed to agent drift
- What data is missing? - Automated tools work with what they receive. If the input data is incomplete, the results are misleading
- Is there traceability? - Can you demonstrate to DNSC that a human assessed each control, not just an algorithm?
- How do you handle exceptions? - Edge cases like internal IT departments as MSPs, group membership, DORA vs NIS2 sectors require human judgment
- Does management understand what they approve? - NIS2 Article 20 requires management bodies to oversee security measures. Signing an AI report without understanding does not fulfill this requirement
Conclusion
AI tools are valuable in NIS2 compliance - they can accelerate assessments, organize evidence, generate reports. But NIS2 was designed for a reason: cybersecurity is too important to be fully delegated to an automated process.
"Agent drift" is not a theoretical problem. It is a documented risk that manifests precisely when you have the most confidence in your tool - when everything appears to work perfectly.
NIS2 Manager exists to provide structure without eliminating human judgment. Because in compliance, just as in software testing, "the tool says it is fine" is not the same as "it is fine."
Check your organization's eligibility with our free calculator and start the compliance process with humans in the loop - not outside it.
NIS2 Manager is a product of BetterQA, one of the top software testing companies in Europe.
