The author wishes to acknowledge the valuable contributions and review provided by Leo DeCoteau, which greatly improved the quality and accuracy of this document.

Section 1: Foundational Frameworks for Secure OT/IT Convergence

The contemporary industrial enterprise faces a fundamental tension between the operational necessity of digital transformation and the security imperative of protecting critical infrastructure. The historical model of complete physical isolation, or “air-gapping,” of Operational Technology (OT) environments is no longer tenable in an era that demands data-driven decision-making, remote monitoring, and integrated business logistics. Consequently, the architectural paradigm has shifted from absolute isolation to controlled, secured, and continuously monitored convergence. This blog post outlines a resilient architecture for OT application delivery that is founded upon three pillars of modern industrial cybersecurity: the structural segmentation of the Purdue Enterprise Reference Architecture (PERA), the prescriptive security controls of the National Institute of Standards and Technology (NIST) Special Publication (SP) 800-82 Revision 3, and the operational philosophy of a Zero Trust Architecture (ZTA). The convergence of these frameworks establishes a defensible boundary at the nexus of Information Technology (IT) and OT, enabling secure data exchange while mitigating the significant risks posed by an interconnected landscape.

1.1 The Purdue Enterprise Reference Architecture (PERA): A Model for Logical Segmentation

The Purdue Model serves as the foundational blueprint for segmenting industrial networks. It organizes Industrial Control Systems (ICS) and enterprise networks into a logical hierarchy of distinct layers, or levels, thereby separating the real-time, high-availability OT functions from the general-purpose, transaction-oriented IT systems. This hierarchical structure is not merely a network topology diagram; it is a security framework that dictates communication flows and establishes trust boundaries. A thorough understanding of each level is essential for implementing effective security controls.

  • Level 0 (Physical Process): This is the foundation of the industrial process, comprising the physical devices that interact directly with the physical world. Assets at this level include sensors (e.g., temperature, pressure, flow), actuators, motors, and valves. The primary security concern at this level is physical security and ensuring the integrity of the signals being sent and received.
  • Level 1 (Intelligent Devices): This level consists of the intelligent devices that directly monitor and manage the physical processes at Level 0. These include Programmable Logic Controllers (PLCs), Remote Terminal Units (RTUs), and Intelligent Electronic Devices (IEDs). These devices interpret data from sensors and execute commands to actuators, often operating under strict real-time constraints. They are vulnerable due to their often-limited computing power and legacy operating systems.1
  • Level 2 (Area Supervisory Control): This level involves the systems used by human operators to supervise and control specific sections or areas of the facility. Key components include Human-Machine Interfaces (HMIs) and Supervisory Control and Data Acquisition (SCADA) software. These systems aggregate data from Level 1 devices, manage alarms, and allow for real-time adjustments to the process.1
  • Level 3 (Site Operations): At the top of the OT zone, Level 3 contains systems that manage site-wide production workflows and operations. This includes assets such as data historians, alarm servers, and plant-level analytics platforms. This level represents the first line of defense between the process control environment and the enterprise IT network.
  • Level 4 (Business Logistics): Residing in the IT zone, Level 4 houses the business logistics systems that orchestrate manufacturing operations. These include Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), and other systems crucial for business planning. Data from the OT environment is integrated here to inform high-level decision-making.
  • Level 5 (Enterprise Network): The highest level of the model, Level 5, encompasses the corporate IT network. This includes standard enterprise services such as email, file storage, internet access, and general user workstations. From a security perspective, this level is considered the most untrusted and is the primary source of external threats targeting the industrial environment.

The fundamental security principle of the Purdue Model is the enforcement of hierarchical data flow. Communication should be restricted to adjacent levels only. For instance, a system in the Enterprise Zone (Level 4/5) should never be permitted to communicate directly with a control system in Level 2. All traffic must be mediated and inspected as it traverses the levels, ensuring that a compromise in the less-trusted IT environment cannot directly impact the highly sensitive OT environment.

Table 1.1: Purdue Model Level Definitions and Security Characteristics

LevelNameTypical AssetsCommunication FlowPrimary Security Concern
5Enterprise NetworkCorporate Servers, Email, Internet Access, User WorkstationsTo/From Level 4 and External NetworksExternal Threats, Malware, Phishing
4Business LogisticsERP, CRM, Business Planning SystemsTo/From Level 5 and Level 3.5 (DMZ)Data Exfiltration, Compromise of Business Systems
3.5Industrial DMZProxy Servers, Application Gateways, Patch ServersMediates traffic between Level 4 and Level 3Unauthorized Access, Lateral Movement from IT to OT
3Site OperationsData Historians, Alarm Servers, Site MESTo/From Level 3.5 and Level 2Loss of View, Manipulation of Site-Wide Production Data
2Area Supervisory ControlHMIs, SCADA Systems, Engineering WorkstationsTo/From Level 3 and Level 1Loss of Control, Manipulation of Local Process View
1Intelligent DevicesPLCs, RTUs, IEDsTo/From Level 2 and Level 0Manipulation of Control Logic, Denial of Control
0Physical ProcessSensors, Actuators, Motors, ValvesPhysical I/O to/from Level 1Physical Tampering, Signal Integrity

1.2 The Industrial DMZ (Level 3.5): The Nexus of Secure Convergence

The Industrial Demilitarized Zone (iDMZ), classified as Level 3.5, is a modern, security-driven addition to the Purdue Model. It is a buffer network zone that sits between the IT and OT environments, specifically between Level 4 and Level 3. The primary purpose of the iDMZ is not to host operational applications but to serve as a dedicated security zone that enforces data security policies and prevents direct communication paths between the enterprise and industrial networks.

The iDMZ is the architectural embodiment of the shift from air-gapped isolation to secure convergence. It acts as a critical chokepoint where all traffic attempting to cross the IT/OT boundary can be terminated, inspected, authenticated, and controlled. This prevents the lateral movement of threats from the IT network into the OT environment while still allowing for the necessary, authorized flow of data, such as production data from a Level 3 historian to a Level 4 ERP system. By forcing traffic through brokered services within the iDMZ, such as proxies and application gateways, organizations can ensure that no session from the untrusted IT zone ever directly reaches the trusted OT zone.

1.3 Core Tenets of NIST SP 800-82 Revision 3: The Authoritative Guide for OT Security

NIST SP 800-82 Revision 3, “Guide to Operational Technology (OT) Security,” is the authoritative U.S. government standard for securing industrial systems. Its latest revision significantly expands its scope from a narrow focus on ICS to the broader category of OT, encompassing a wide range of cyber-physical systems, including building automation and transportation systems.

The standard provides comprehensive guidance on OT risk management, advocating for a defense-in-depth architecture and the implementation of security controls that are specifically tailored to the unique performance, reliability, and safety requirements of OT environments. A central recommendation of NIST 800-82 R3 is the implementation of robust network segmentation. The guide explicitly advises separating OT networks from IT networks and using a DMZ as a key enforcement boundary to monitor, log, and filter all inter-network traffic. This recommendation directly validates the use of the Purdue Model as a foundational architectural pattern.

Furthermore, NIST 800-82 R3 introduces the concept of an “OT overlay” for the security controls cataloged in NIST SP 800-53. This overlay provides tailored baselines of security controls for low-, moderate-, and high-impact OT systems, offering a concrete framework for selecting and implementing the specific safeguards needed to protect critical infrastructure. This blog post will leverage the OT overlay as the basis for mapping specific AVI platform capabilities to NIST-recommended controls.

1.4 Applying Zero Trust Principles to the iDMZ

The Zero Trust Architecture (ZTA), as defined in NIST SP 800-207, is a cybersecurity model founded on the principle of “never trust, always verify”. It assumes that a breach is inevitable or has already occurred, and therefore, no user or device can be implicitly trusted based on its network location, whether inside or outside the perimeter. This philosophy is perfectly suited for governing the iDMZ, which serves as the boundary between the untrusted IT world and the trusted OT world.

The iDMZ is the logical and ideal location to implement the core components of a ZTA. According to NIST SP 800-207, all communication must be secured regardless of network location, and access to resources must be granted on a per-session basis, enforcing the principle of least privilege. This means every request from an IT system to access an OT-related application service exposed in the iDMZ must be independently authenticated and authorized. The iDMZ becomes the Zero Trust boundary, and the application delivery platform within it acts as the Policy Enforcement Point (PEP), making access decisions based on dynamic, context-aware policies. This approach moves beyond static firewall rules and enables a more granular, identity-aware security posture that is essential for protecting high-value OT assets.

The synthesis of these foundational frameworks provides the intellectual underpinning for a robust and defensible architecture. The Purdue Model supplies the structural blueprint for segmentation, defining where the IT/OT boundary exists. NIST SP 800-82 R3 provides the prescriptive security guidance, defining what specific controls and architectural patterns, such as the DMZ, are required at that boundary. Finally, the Zero Trust model provides the operational philosophy, defining how those controls should function: dynamically, on a per-request basis, and with no implicit trust.

Section 2: VMware AVI Platform Architecture and Capabilities

The VMware AVI Load Balancer (formerly Avi Networks) is an application delivery platform built on a software-defined, scale-out architecture that fundamentally separates the control plane from the data plane. This architectural design is not merely a technical detail; it is a key enabler for implementing the segmented, secure architecture required by the Purdue Model and NIST 800-82 R3. Unlike monolithic hardware appliances, AVI’s decoupled nature allows for strategic placement of its components in alignment with security zone principles, providing flexibility, scalability, and centralized control without compromising the integrity of the IT/OT boundary.

2.1 Decoupled Control and Data Planes: The Foundation of Flexibility

The AVI platform is composed of two primary components that operate independently but in concert.

  • The Avi Controller Cluster (Control Plane): The Controller is the centralized “brain” of the AVI platform. It serves as the single point of management, policy configuration, and analytics aggregation for the entire system. For high availability, the Controller is deployed as a three-node, active-active cluster, which ensures that the management plane remains operational even in the event of a single or dual node failure. A virtual IP (VIP) address is assigned to the cluster, providing a single, resilient endpoint for all administrative and API interactions. The Controller houses the policy engine and exposes a 100% REST API, making it fully automatable and integratable with CI/CD pipelines and orchestration tools.
  • The Avi Service Engines (Data Plane): The Service Engines (SEs) constitute the distributed data plane. These are lightweight, high-performance load balancers that handle all application traffic. SEs are deployed as virtual machines or containers and are placed proximate to the applications they are servicing. They receive their configuration from the Controller and stream a rich set of near-real-time telemetry and logs back to it. This architecture allows the data plane to be elastically scaled out or in based on application demand, without requiring manual intervention.

This separation of planes is the cornerstone of the platform’s architectural advantage. It allows the control plane (the Controller cluster) to be deployed and managed within the IT zone, where administrative access is appropriate, while the data plane (the SEs) can be deployed in a secure enforcement zone like the iDMZ. This alignment with the Purdue Model’s segmentation goals is difficult to achieve with traditional, monolithic load-balancing appliances, which would require extending management access directly into the secure zone, thereby creating an undesirable attack vector.

2.2 High Availability and Resiliency Models

Ensuring continuous availability is a paramount concern in OT environments. The AVI platform provides robust high availability (HA) at both the control and data plane levels.

  • Controller Cluster HA: The three-node Controller cluster operates in an active-active model, with a leader elected to handle certain tasks but with all nodes capable of serving API requests. If the leader node fails, a new leader is elected from the remaining nodes, ensuring the management plane remains available. A two-node failure will result in a loss of quorum, hence preventing any new configurations on the control plane. If all three nodes of the control plane are offline, the data plane continues to function in a headless state. Application traffic continues to flow through the data plane; however, new configurations are only possible once the control plane is restored.
  • Service Engine HA Modes: The SEs, which handle the live application traffic, can be configured in several HA modes to ensure data plane resiliency. The choice of mode depends on the specific requirements for failover time, resource utilization, and scalability.
    • Legacy HA (Active/Standby): This is a traditional HA model where two SEs are paired. One SE is active and handles all traffic for a given virtual service, while the other remains in a standby state. Upon failure of the active SE, the virtual service fails over to the standby SE. This model is simple but does not permit active load sharing for a given virtual service or scaling beyond two SEs.
    • Elastic HA N+M Mode: This is the default and most commonly recommended mode. In an N+M configuration, ‘N’ SEs are actively handling traffic, while ‘M’ additional SEs are deployed as a buffer. If one of the ‘N’ active SEs fails, the Controller automatically moves its virtual services to one of the ‘M’ buffer SEs. This provides a balance between resource efficiency and failover capacity.
    • Elastic HA Active/Active Mode: In this high-performance model, a single virtual service is scaled out across multiple active SEs simultaneously. If one SE fails, it only results in a partial degradation of capacity for the virtual service, as the remaining SEs continue to handle their share of the traffic. This mode offers the fastest failover and is ideal for mission-critical applications where even a brief interruption is unacceptable.

2.3 Core Communication Paths and Network Requirements

A secure and functional deployment of the AVI platform requires specific communication paths to be allowed through network firewalls. Each path serves a distinct purpose and must be explicitly permitted.

  • Controller-to-SE Communication: The Controller communicates with the SEs over a secure channel to push configuration, check health status (heartbeats), and receive metrics and logs. This requires allowing SSH (TCP/22) and a secure control channel (TCP/8443) from the SEs management IPs to the Controller IPs.
  • Intra-Controller Cluster Communication: The nodes within the Controller cluster must communicate with each other to maintain quorum, elect a cluster leader, replicate configuration, and synchronize state. This requires allowing traffic between the Controller nodes on TCP ports 22, 443 and 8443.
  • SE-to-SE Communication: For Elastic HA modes, the SEs within a Service Engine Group send health monitoring heartbeats to each other over their data interfaces. This requires allowing traffic between the SE data IPs using TCP Ports 9001 and 4001.
  • Administrative Access: Human administrators and automation systems access the platform’s UI and API via the Controller Cluster VIP. This requires allowing HTTPS (TCP/443) access from authorized administrative networks to the Controller Cluster VIP.

Table 2.1: Required Network Communication Ports for AVI Platform

Source Zone (Purdue Level)Destination Zone (Purdue Level)Source ComponentDestination ComponentProtocolPort(s)PurposeNIST 800-82 R3 Justification
4 (Enterprise)4 (Enterprise)Avi ControllerAvi ControllerTCP22, 443, 8443Intra-cluster state synchronization and quorum.AC-3, SC-7
4 (Enterprise)3.5 (iDMZ)Avi ControllerAvi Service Engine (Mgmt)TCP22, 8443Configuration push, heartbeats, metrics/log collection.AC-3, SC-7
4 (Enterprise)3.5 (iDMZ)Admin WorkstationAvi Controller (VIP)TCP443Administrative UI/API access.AC-3, AC-17
3.5 (iDMZ)3.5 (iDMZ)Avi Service Engine (Data)Avi Service Engine (Data)TCP9001,4001Elastic HA health monitoring.SC-7
4 (Enterprise)3.5 (iDMZ)IT ClientAvi Service Engine (VIP)TCP/UDPApp-specificApplication traffic to Virtual Service.AC-4, SC-7
3.5 (iDMZ)3 (Site Ops)Avi Service Engine (Data)OT Application ServerTCP/UDPApp-specificProxied application traffic to backend server.AC-4, SC-7

Section 3: Integrated Design: Deploying VMware AVI in a Purdue-Aligned Architecture

This section presents the definitive architectural blueprint for deploying the VMware AVI platform in a manner that is strictly aligned with the Purdue Model and fortified by the principles of NIST 800-82 R3. The design leverages AVI’s decoupled architecture to strategically place components in their appropriate security zones, establishing a robust and inspectable boundary between the IT and OT environments.

3.1 Strategic Component Placement

The placement of AVI components is one of the most critical architectural decisions and directly reflects the security principles of the Purdue Model.

  • Avi Controller Cluster in the Enterprise Zone (Level 4): The three-node Avi Controller cluster will be deployed on the IT Management Network, which resides within the Purdue Model’s Level 4. This placement is deliberate and offers several key advantages. It allows network administrators, security teams, and automation tools residing within the corporate network to access the central management and analytics plane of the AVI platform without requiring privileged access that crosses into more secure zones. Centralizing control in the appropriate administrative domain simplifies management and aligns with the principle of least privilege, as IT personnel do not need direct network access to the iDMZ or OT zones to perform their duties.
  • Avi Service Engines in the Industrial DMZ (Level 3.5): In accordance with the primary design constraint, the Avi Service Engines (SEs) will be deployed within the Industrial DMZ at Level 3.5. This placement positions the SEs as security gateways that proxy all application traffic flowing between the IT and OT networks. The SEs become the primary policy enforcement point, terminating connections from the less-trusted IT side, inspecting the traffic for threats, and then initiating new, clean connections to the application servers on the more-trusted OT side. This “proxy-in-the-middle” architecture is fundamental to preventing the direct propagation of threats and enforcing granular security policies at the IT/OT boundary.

3.2 Logical Network Topology and Traffic Flows

To support the strategic component placement and enforce strict segmentation, a specific logical network topology is required. This topology utilizes multiple network segments (which can be implemented as VLANs) to isolate different types of traffic and enforce the principle of least privilege at the network layer.

  • Network Segmentation Design:
    • IT Management Network (Level 4): A dedicated network segment within the corporate data center for the management interfaces of the three Avi Controller nodes and the Controller Cluster VIP.
    • iDMZ Management Network (Level 3.5): A dedicated network segment within the iDMZ for the management interfaces of the Avi Service Engines. The firewall policy between Level 4 and Level 3.5 must explicitly allow communication from the SE management IPs to the Controller IPs on TCP ports 22 and 8443.
    • iDMZ Front-End “VIP” Network (Level 3.5): This segment hosts the Virtual IPs (VIPs) for the applications being protected. It is the destination network for clients in the IT zone (Level 4/5) who are accessing OT applications. The firewall must allow application-specific traffic (e.g., TCP/443 for HTTPS) from authorized client networks to the VIPs on this segment.
    • iDMZ Back-End Data Network (Level 3.5): This segment is used for the SEs’ data interfaces that communicate with the back-end application servers in Level 3. The firewall policy between Level 3.5 and Level 3 must be highly restrictive, allowing traffic only from the SEs’ back-end interface IPs to the specific application servers on their required ports.

This multi-network design for the SEs within the iDMZ constitutes a form of microsegmentation within the boundary itself. It ensures that an attacker who compromises a system in the IT zone can only reach the front-end VIPs. They have no direct network path to the back-end OT application servers. To reach those servers, they would need to successfully exploit a vulnerability in the application presented on the VIP and then pivot through the AVI SE’s data plane—a significantly more complex and difficult attack path than a direct network connection.

  • Traffic Flow Analysis:
    1. An administrator in the IT Zone (Level 4) connects via HTTPS to the Avi Controller Cluster VIP to configure a new WAF policy. This traffic remains entirely within Level 4.
    2. The Avi Controller (Level 4) pushes the new policy configuration via a secure channel (TCP/8443) to the management interface of the SEs in the iDMZ Management Network (Level 3.5).
    3. A user in the IT Zone (Level 4) opens a web browser and connects to an HMI application, resolving to a VIP on the iDMZ Front-End Network (Level 3.5).
    4. The SE receives the connection, terminates the TLS session, and applies the WAF policy to inspect the HTTP request.
    5. Assuming the request is legitimate, the SE uses its interface on the iDMZ Back-End Network (Level 3.5) to initiate a new, separate connection to the HMI web server in the Site Operations zone (Level 3).
    6. The HMI server responds to the SE, which then relays the response back to the client. The client never communicates directly with the HMI server.

3.3 Data Flow Control and Policy Enforcement at the Boundary

By placing the AVI SEs in the iDMZ, they become the definitive Policy Enforcement Point (PEP) for all application-layer traffic crossing the IT/OT boundary, perfectly aligning with the Zero Trust model. This allows for the enforcement of granular security policies that go far beyond simple IP and port-based firewall rules.

The AVI platform can be configured to enforce the Purdue principle of unidirectional data flow where appropriate. For example, a Network Security Policy can be created that allows connections from a Level 4 business analytics server to a Level 3 data historian VIP, but denies any attempt by the historian to initiate connections back into the IT network. The most crucial function, however, is the termination and inspection of all traffic. By acting as a full proxy, the SEs break the network connection at the boundary, preventing attacks like malware propagation or network reconnaissance from passing directly from IT to OT.

Figure 3.1: Detailed Logical Architecture Diagram

Section 4: Implementing NIST 800-82 R3 Controls with VMware AVI

This section provides a practical mapping of specific security controls from the NIST SP 800-82 R3 OT overlay to concrete features and configurations within the VMware AVI platform. This demonstrates how the proposed architecture achieves a standards-based, verifiable security posture.

4.1 Enforcing Access Control (AC) at the IT/OT Boundary

The Access Control (AC) family of controls is fundamental to protecting the IT/OT boundary. AVI provides multiple layers of access enforcement for application traffic.

  • AC-3 (Access Enforcement) & AC-4 (Information Flow Enforcement): AVI enforces access and information flow policies at both Layer 4 and Layer 7. Network Security Policies function as Layer 4 access control lists (ACLs), allowing or denying traffic based on source/destination IP, port, and protocol. This can be used to restrict access to an OT application’s VIP to only authorized subnets within the IT network. More granularly, the Web Application Firewall (WAF) acts as a Layer 7 enforcement point, inspecting the content of the application traffic itself to block malicious payloads or unauthorized requests, thereby controlling the flow of information.
  • AC-17 (Remote Access): For web-based OT applications such as remote HMIs, the AVI platform can serve as a secure remote access gateway. By terminating the remote user’s connection at the SE in the iDMZ, AVI can enforce strong authentication policies, including integration with SAML or LDAP for multi-factor authentication, before allowing any traffic to proceed to the sensitive OT application server. All remote activity is logged and auditable through the platform.21

4.2 System and Communications Protection (SC)

The System and Communications Protection (SC) family focuses on securing the network infrastructure and the data transmitted across it.

  • SC-7 (Boundary Protection): The deployment of the AVI SEs within the iDMZ directly implements the core requirement of SC-7. The SEs act as managed interfaces at the boundary between the IT and OT zones, monitoring and controlling all communications that pass through them. This creates a chokepoint for inspection and policy enforcement, effectively isolating the OT network from untrusted IT traffic.
  • SC-8 (Transmission Confidentiality and Integrity): AVI provides robust capabilities for ensuring data in transit is protected. It can terminate TLS sessions from clients at the VIP, inspecting the decrypted traffic with the WAF. It can then re-encrypt the traffic before sending it to the back-end OT application server. This ensures that data is encrypted both on the IT side of the connection and on the OT side, creating a secure, end-to-end communication channel even if the back-end server has weak or outdated TLS capabilities.
  • SC-11 (Trusted Path): For critical user-to-application communications, such as an operator interacting with an HMI, AVI can help establish a trusted path. By using strong, validated SSL/TLS certificates on the virtual service, the platform provides assurance to the user’s browser that it is connecting to the legitimate, authenticated endpoint and not a spoofed or man-in-the-middle server.

4.3 Advanced WAF Protection for Critical OT Applications (e.g., HMIs, Historians)

OT applications, particularly web-based HMIs and data historian portals, are often high-value targets. They may be built on legacy code or custom platforms, making them susceptible to common web vulnerabilities. The AVI WAF provides a multi-faceted defense against these threats.

  • Applying OWASP CRS (Negative Security Model): The first layer of defense is the application of AVI’s default WAF policy, which is built upon the industry-standard OWASP ModSecurity Core Rule Set (CRS). This provides immediate, out-of-the-box protection against the OWASP Top 10 and other known attack signatures, such as SQL Injection (SQLi) and Cross-Site Scripting (XSS).
  • Implementing a Positive Security Model: The primary challenge in securing custom or legacy OT applications is that their specific vulnerabilities may not be covered by generic signature sets. A negative security model, which blocks known bad traffic, is insufficient against unknown or zero-day attacks. The AVI WAF addresses this through its machine learning-driven Positive Security Model.
    • Learning Mode: When a WAF policy is placed in “Learning Mode,” the AVI analytics engine observes legitimate traffic to the application. It automatically builds a profile of normal behavior, including valid URIs, parameters, request methods, and other attributes. From this profile, it generates a set of positive security rules that define what is allowed.
    • Tuning and False Positive Management: During the learning phase, the WAF may flag legitimate but unusual traffic as a potential false positive. The AVI UI provides explicit recommendations for these events, allowing an administrator to review the flagged transaction and, with a single click, accept a recommendation to create a specific rule exception. This streamlined workflow greatly simplifies the process of tuning the WAF policy to eliminate false positives without disabling broad categories of protection.
  • Phased Rollout Strategy (Detection vs. Enforcement): To prevent operational disruptions, WAF policies must be deployed in a phased manner. The policy should initially be applied to a virtual service in “Detection Mode.” In this mode, the WAF analyzes traffic and logs any rule violations but does not block any requests. After a period of monitoring and tuning to resolve any false positives, the policy can be confidently switched to “Enforcement Mode,” where it will actively block malicious traffic. This detect-then-enforce strategy is critical for gaining operational acceptance in risk-averse OT environments.

Table 4.1: NIST SP 800-82 R3 Control Mapping to VMware AVI Features

NIST Control Family & IDNIST Control ObjectiveAVI FeatureConfiguration/Implementation Notes
AC-3Access EnforcementNetwork Security Policy, WAF PolicyCreate L4 ACLs to restrict VIP access to authorized source IPs. Apply L7 WAF rules to block unauthorized requests.
AC-4Information Flow EnforcementWAF Policy, L7 DataScriptsUse WAF to inspect and control application data flows. Use DataScripts for custom logic on complex flows.
AC-17Remote AccessVirtual Service Authentication ProfileConfigure virtual service to require SAML or LDAP authentication for users accessing web-based OT applications.
SC-7Boundary ProtectionService Engine Deployment in iDMZDeploy SEs in the Level 3.5 iDMZ to act as the managed interface between the IT and OT networks.
SC-8Transmission ConfidentialitySSL/TLS Profile, SSL EverywhereTerminate and re-encrypt traffic at the SE. Apply an SSL Profile to the virtual service and the server pool.
SC-11Trusted PathSSL/TLS Profile with Trusted CertificatesAssign a valid, trusted certificate to the virtual service to provide authentication of the endpoint to the client.
SI-4Information System MonitoringWAF Policy, Anomaly DetectionConfigure WAF in detection/enforcement mode. Monitor application health scores for security anomalies.
AU-2Event LoggingLog Streaming to External ServerConfigure SE Group to stream application and WAF logs in JSON format to a central SIEM for analysis and retention.
IR-4Incident HandlingWAF Logs and AnalyticsUse the detailed WAF logs and security insights dashboard to investigate alerts, identify attack vectors, and determine impact.

Section 5: Operationalizing Security: Advanced Monitoring, Logging, and Response

Deploying a secure architecture is only the first step; maintaining its security posture requires continuous monitoring, robust logging, and well-defined incident response procedures. The VMware AVI platform provides a rich set of integrated tools that move beyond simple log generation to offer deep, actionable insights into application health and security. This capability is particularly valuable in OT environments, which have historically suffered from a lack of visibility into application-layer traffic.

5.1 Continuous Monitoring with AVI Security Analytics

The AVI Controller provides a centralized dashboard for real-time monitoring of all managed applications, translating vast amounts of telemetry into easily digestible insights.

  • Application Health Scores: Each virtual service is assigned a Health Score from 0-100, which is a composite metric calculated from four components: Performance, Resource Penalty, Anomaly Penalty, and Security Penalty. A sudden drop in the Security Penalty score, for example, immediately alerts an operator to a potential issue, such as a DDoS attack or a spike in WAF-flagged transactions.
  • Security Insights Dashboard: For each virtual service, a dedicated Security dashboard provides a focused view of security-related events. This includes metrics on SSL/TLS transactions (e.g., failed handshakes, weak ciphers), DDoS attack details (e.g., attack type, mitigated traffic volume), and WAF analytics. This allows security personnel to quickly assess the security posture of an application without needing to sift through performance metrics.
  • Anomaly Detection: The platform’s analytics engine automatically establishes a baseline of normal behavior for each application across hundreds of metrics. It then uses machine learning to detect significant deviations from this baseline. These anomalies, such as a sudden spike in HTTP 500 error codes or an unusual increase in end-to-end latency, are flagged and can serve as early indicators of a security incident or an impending operational failure.

5.2 Centralized Logging for Forensic Analysis

While the Controller’s dashboards provide high-level insights, deep forensic analysis requires access to detailed transaction logs. AVI facilitates this by enabling the export of rich, structured logs to external Security Information and Event Management (SIEM) systems.

  • Configuring Secure Syslog Forwarding: The AVI platform can be configured to stream logs to an external syslog server. To ensure the confidentiality and integrity of this sensitive log data, the connection should be configured to use syslog over TLS. This requires configuring a PKI Profile on the Controller with the syslog server’s CA certificate and an SSL/TLS Profile with the Controller’s client certificate. The configuration is performed via the AVI CLI.
  • Leveraging JSON Log Formats: For maximum utility, logs should be exported in a structured format. AVI supports sending application and WAF logs in newline-delimited JSON (NDJSON) format. This format is easily parsable by modern SIEMs like Splunk and allows for powerful searching, correlation, and dashboarding based on specific fields within the log, such as client IP, requested URL, WAF rule ID, or response code.
  • Integrating with SIEM Platforms: To integrate with Splunk, a data input (e.g., TCP on port 9998) must be configured on a Splunk forwarder or indexer. A corresponding Splunk Add-on (e.g., Splunk Add-on for VMware ESXi Logs, which can be adapted) is then used to parse the incoming JSON data, extracting the fields and making them searchable within the Splunk platform. This integration transforms raw logs into a powerful forensic database.

5.3 Incident Investigation Workflow

The combination of real-time analytics on the AVI Controller and deep forensic data in the SIEM enables an efficient incident response workflow.

  1. Alert Generation: An event, such as a WAF rule match in enforcement mode, triggers an alert. This alert is sent from the Avi Controller to the SIEM via the configured syslog-TLS forwarder.
  2. Initial Triage in SIEM: The security analyst sees the alert in the SIEM console. The structured JSON log provides immediate context, such as the virtual service name, client IP address, and the specific WAF rule that was triggered (e.g., “SQL Injection Detected”).
  3. Pivoting to AVI Analytics: For broader context, the analyst pivots to the AVI Controller UI and navigates to the Security dashboard for the affected virtual service. Here, they can see if this was an isolated event or part of a larger pattern, such as a distributed attack from multiple source IPs.
  4. Deep-Dive Forensic Analysis: The analyst uses the virtual service’s “Logs” tab to perform a deep dive. They can filter for the specific transaction ID from the SIEM alert or filter by client IP. The log details provide the full, un-truncated request headers and body, showing the exact malicious payload that was blocked. This level of detail is critical for understanding the attacker’s methods and intent, and for providing definitive evidence for an incident report.

This integrated workflow bridges the often-problematic gap between real-time detection and deep forensic analysis. The AVI Controller’s analytics provide the immediate “what is happening now,” while the structured logs streamed to the SIEM provide the rich historical data needed to answer “what happened, how, and by whom.” This capability is often lacking in OT environments and represents a significant improvement in incident response readiness at the critical IT/OT boundary.

Section 6: Lifecycle Management for High-Availability OT Environments

In Operational Technology, system stability and availability are paramount. Any maintenance or upgrade activity must be meticulously planned and executed to minimize or, ideally, eliminate downtime. The architecture of the VMware AVI platform is designed with these stringent requirements in mind, offering robust procedures for backup, recovery, and zero-downtime upgrades that are essential for maintaining both the availability and the security posture of the application delivery infrastructure.

6.1 Controller Backup and Disaster Recovery

The Avi Controller cluster is the central point of configuration and management for the entire platform. A comprehensive backup and recovery strategy is therefore critical.

  • Backup Configuration: The AVI platform includes a native mechanism for automated, scheduled backups. This feature should be configured to export the full system configuration to a remote, secure server using SCP or SFTP. A strong, unique passphrase must be set to encrypt the backup file, ensuring the confidentiality of the configuration data. The schedule should be configured for daily backups, with a retention policy that aligns with the organization’s disaster recovery objectives.
  • Restore Procedure: In the event of a catastrophic failure of the entire Controller cluster, recovery is achieved through a well-defined procedure. The process involves deploying three new, factory-default Controller virtual machines with the same IP addresses as the original cluster members. Then, from the CLI of one of the new nodes, the restore_config.py script is executed. This script takes the encrypted backup file and the passphrase as input, restores the configuration, and automatically reforms the three-node cluster, including the cluster VIP. This scripted, predictable process ensures a reliable and efficient recovery of the entire management plane.

6.2 Zero-Downtime Upgrade Procedures

One of the most significant challenges in OT security is the reluctance to perform software updates due to the risk of downtime. This often leads to systems running with known vulnerabilities for extended periods. The AVI platform’s architecture directly addresses this challenge by enabling zero-downtime upgrades for the data plane.

  • Decoupled Plane Upgrades: The separation of the control and data planes allows them to be upgraded independently. The Controller cluster is upgraded first, followed by the Service Engine groups. This staged approach isolates the upgrade process and minimizes risk.
  • Controller Cluster Upgrade: The upgrade process is initiated from the Controller UI. All three nodes in the cluster are upgraded simultaneously. During this process, which typically takes several minutes, the management plane (UI and API) will be unavailable. However, the data plane is unaffected. All existing Service Engines will continue to operate on their current software version and forward application traffic without any interruption.
  • Service Engine Group Upgrade (Rolling Upgrade): This is the key to achieving zero data plane downtime. Once the Controller cluster upgrade is complete, the administrator can initiate the upgrade for each Service Engine Group. The Controller orchestrates a rolling upgrade process, handling one SE at a time. For a virtual service configured in an Elastic HA mode (N+M or Active/Active), the Controller’s workflow is as follows:
    1. It places one SE in the group into a “maintenance mode.”
    2. It gracefully migrates all virtual services from that SE to other available SEs within the same group.
    3. Once the SE is no longer handling traffic, the Controller upgrades its software.
    4. After the upgrade is complete and the SE is verified as healthy, it is brought back into service.
    5. The process repeats for the next SE in the group until all SEs are running the new version.This automated, hitless process ensures that application traffic is never interrupted during the upgrade of the data plane components.
  • Upgrade Dry Run Feature: To further de-risk the upgrade process, AVI offers an “upgrade dry run” feature. This allows an administrator to test the upgrade in an isolated Docker container on a follower Controller node. The system performs a full upgrade of the configuration within the container and provides a detailed report of the outcome, all without impacting the live production environment. This provides a high degree of confidence that the actual upgrade will succeed.

The ability to perform zero-downtime upgrades of the data plane is a transformative capability for OT security. It directly resolves the long-standing conflict between maintaining availability and maintaining a strong security posture. By removing the operational barrier of required maintenance windows, this feature enables security teams to keep the critical enforcement points at the IT/OT boundary patched against the latest vulnerabilities, fundamentally improving the organization’s resilience to cyber threats.

Section 7: Conclusion and Strategic Recommendations

This blog has detailed a comprehensive architecture for deploying VMware AVI as a secure application delivery platform within an industrial network, grounded in the established principles of the Purdue Model and the prescriptive controls of NIST SP 800-82 Revision 3. By strategically placing the AVI Service Engines in the Industrial DMZ (Level 3.5) and the AVI Controller Cluster in the Enterprise Zone (Level 4), this design creates a robust, defensible, and highly observable boundary between the IT and OT environments.

7.1 Summary of Architectural Benefits

The integrated design presented herein offers a multitude of benefits that address the core challenges of modern OT security:

  • Standards-Based Security: The architecture is not based on ad-hoc decisions but is directly aligned with the Purdue Model for logical segmentation and implements specific, auditable controls from NIST SP 800-82 R3, ensuring a design that is rooted in industry best practices.
  • Zero Trust Enforcement: By functioning as a full proxy and Policy Enforcement Point in the iDMZ, the AVI platform enforces Zero Trust principles, ensuring that no traffic from the IT network is implicitly trusted and that all access to OT applications is explicitly authenticated, authorized, and inspected on a per-session basis.
  • Scalability and Resilience: The software-defined, scale-out architecture provides the elasticity to handle dynamic traffic loads, while the comprehensive High Availability models for both the control and data planes ensure the resilience required for mission-critical OT applications. The platform’s ability to self-heal by automatically migrating services from failed SEs further enhances operational continuity.
  • Deep Observability and Rapid Response: The platform’s integrated analytics engine, application health scores, and anomaly detection capabilities provide unprecedented real-time visibility into the security and performance of OT applications. When combined with the ability to stream rich, structured logs to a central SIEM, this creates a powerful ecosystem for proactive threat hunting and rapid forensic analysis, significantly reducing Mean Time To Resolution (MTTR) for security incidents.
  • Operational Viability in OT Environments: The architecture’s support for zero-downtime, rolling upgrades of the data plane directly addresses the primary operational constraint in OT environments—the intolerance for downtime. This allows security teams to maintain a strong patch posture without disrupting critical industrial processes, resolving the classic conflict between security and availability.

7.2 Implementation Roadmap

A phased implementation is recommended to ensure a smooth, low-risk adoption of this architecture.

  • Phase 1: Foundational Deployment and Passive Monitoring.
    • Actions: Deploy the 3-node Avi Controller cluster in the Level 4 IT zone. Create the necessary network segments for the iDMZ. Deploy an initial Service Engine Group in the iDMZ. Onboard a single, non-critical OT application (e.g., a development historian web portal). Apply a WAF policy in Detection Mode only. Configure log forwarding to the SIEM.
    • Goal: Establish the core platform and begin collecting baseline traffic data and security logs to gain visibility without any risk of production impact.
  • Phase 2: Policy Tuning and Initial Enforcement.
    • Actions: Analyze the logs and analytics from Phase 1. Use the WAF’s learning engine and log recommendations to tune the policy for the pilot application, creating exceptions for any identified false positives. Once confident, switch the WAF policy for the pilot application to Enforcement Mode.
    • Goal: Validate the security policy, the tuning workflow, and the incident response procedures on a non-critical application, ensuring all operational processes are sound.
  • Phase 3: Scaled Rollout.
    • Actions: Begin onboarding additional, more critical OT applications onto the platform. Repeat the “Detect -> Tune -> Enforce” cycle for each new application or group of similar applications. Scale out the Service Engine Group as needed to accommodate the increased load.
    • Goal: Systematically expand the security perimeter to protect all critical web-based applications at the IT/OT boundary.
  • Phase 4: Continuous Improvement.
    • Actions: Establish a regular operational rhythm for reviewing security dashboards, investigating anomalies, and tuning policies. Implement a schedule for performing zero-downtime upgrades of the AVI platform to ensure it remains patched against the latest vulnerabilities.
    • Goal: Transition from an implementation project to a mature, ongoing security operation focused on continuous monitoring and improvement.

By following this strategic roadmap, an organization can methodically and securely implement a modern, standards-based architecture that enables the benefits of IT/OT convergence while robustly defending its most critical assets.

Unknown's avatar

VCP-DV, VCP-NV, VCAP-DCD currently working at VMware in the PSO organization​.

Leave a comment