Network World
networkworld.com.web.brid.gy
Network World
@networkworld.com.web.brid.gy
Network World provides news and analysis of enterprise data center technologies, including networking, storage, servers and virtualization.

[bridged from https://networkworld.com/ on the web: https://fed.brid.gy/web/networkworld.com ]
Google agrees to acquire infrastructure builder Intersect to accelerate capacity development
Google parent Alphabet is taking steps to enable speedier addition of capacity to feed AI’s increasing demands with its announcement of plans to buy data center and energy company Intersect. This, it said, will meet intensive demand, increase energy reliability, reduce power delays, and support development of alternative energy sources. “AI infrastructure across the board appears to be at capacity, and there are questions whether upcoming investments in data centers will come to fruition on time,” said Thomas Randall, research lead at Info-Tech Research Group. “Alphabet acquiring Intersect quickly opens capacity that it expects [to require, to meet demand] from Gemini’s growing popularity, training, and embeddedness in nearly all Google searches.” ## Expands efforts to build power capacity With the $4.75 billion deal, expected to close in the first half of 2026, Google will absorb Intersect’s team and “multiple gigawatts” of energy and data projects already in development, including its first co-located data center and power site under construction in Texas. The companies will jointly continue work on those projects and build out new ones, according to Google. However, Intersect’s existing Texas assets and those in development in California are not part of the deal, and will continue as an independent entity. Intersect will “explore a range of emerging technologies” to help “increase and diversify” energy supply and bolster the tech giant’s data center investments, according to the announcement. The tech giant said it is committed to working with energy and utility companies to “unlock abundant, reliable, affordable energy supply” that supports data center buildouts. The acquisition is yet another step in Google’s efforts to build this capacity. Earlier this year, it announced a partnership with NV Energy that will bring 115MW of clean energy, via geothermal power, to Nevada’s grid. The company is also working with Energy Dome on CO2 battery innovations for long-duration energy storage, and is supporting carbon capture and storage (CCS) technologies at a gas power plant in partnership with Broadwing Energy. Through that initiative, Broadwing will capture and permanently store roughly 90% of its CO2 emissions, and Google has committed to buying most of the power it generates. With the Intersect purchase, Google is effectively signaling that the traditional model of relying on utilities and third-party energy developers isn’t dependable enough in the AI era, noted Sanchit Vir Gogia, chief analyst at Greyhound Research. It is a “recognition that Google’s constraint has moved upstream,” he said. Google doesn’t need to be schooled on data center design or real estate footprint, it “needs a way to bring megawatts online with predictability in a market where grid timelines, interconnection queues, substation upgrades, and permitting cycles are now slower than the compute deployment cycle.” It is also a risk internalization play, he noted; Alphabet wants to reduce its exposure to delays that occur when phased power delivery is late and “[leaves] capacity stranded and utilization depressed.” ## ‘Shifts the dependency’ Intersect adds time certainty, sequencing control, and a developer-style operating model that is not reliably provided by utilities and co-location contracts, Gogia noted. The company is “explicitly framed” around co-locating demand and dedicated gas and renewable generation. That model shifts the dependency, he noted. Instead of waiting for grid capacity to become available, then placing load into it, generation is placed alongside the load path and both are orchestrated together. “It is a very different approach to reliability and speed,” said Gogia. Alphabet’s support of numerous renewable and clean energy technologies indicates that the tech giant is looking to diversify and stabilize power capacity across different regions and grid conditions and reduce single point dependency. When one technology pathway stalls, another can carry part of the load, Gogia explained. “Intersect allows Google to coordinate the sequencing so that compute and power arrive together,” he said. ## Tie energy strategy to capacity planning Ultimately, the acquisition reduces Alphabet’s dependency on third-party energy partners, noted Info-Tech’s Randall. Energy is a fundamental component of the core infrastructure stack, but it is becoming more and more scarce as AI providers scoop up resources. “Data center managers should use this moment as an opportunity to tie energy strategy with capacity planning, sustainability goals, and competitive positioning,” Randall advised. Traditionally, Gogia added, CIO decision-making around build-versus-lease-versus-cloud was framed around cost, agility, security, and compliance. But the missing variable was certainty around power delivery. If hyperscalers are investing billions to bring generation and load together, enterprises should assume they will face the same constraints, he said, “just earlier and with less negotiating power.” Cloud abstracts energy risk, Gogia noted. When regions hit power and GPU ceilings, capacity gets rationed, timelines shift, and customers are nudged to alternate regions or delivery models. This can result in delayed deployments, and, often, higher costs because scarcity pricing differs. This reality is even more evident with on-premises builds, he observed; builders can complete a facility on time, yet still run under capacity for months or longer if power does not arrive as planned. CIOs must adjust their governance model, he advised, noting that energy due diligence should be part of the technology decision process. Site selection requires a “time to power” view, not just a network latency view. Contracts should provide greater transparency around capacity commitments, region expansion goals, and contingency plans. “Data center planning [duration] may get shorter because designs are modular and repeatable,” said Gogia. “Energy contracting will get longer because supply is constrained and approvals are slow.” ## Power shouldn’t be an afterthought Enterprises must develop a power risk strategy and be mindful of social license, he noted. Data center expansion is increasingly contested by communities and regulators, especially when it comes to its impact on the local grid. With its current move, Alphabet is scaling its energy supply while managing perceptions that local customers will shoulder the costs. “Enterprises should learn from that,” said Gogia. “If you are planning a major facility or even a large colocation footprint, stakeholder management is no longer optional. It is part of delivery.” Google’s acquisition of Intersect also signals a shift in vendor strategies. That could reshape pricing, availability, and negotiation dynamics for enterprise buyers, said Gogia. Power shouldn’t be treated as an afterthought, he advised; that will lead to slipped timelines. Assume that some capacity will be constrained and plan for alternatives far before projects are underway. “This is the decade where kilowatts, permits, and politics quietly decide whether your ‘cloud first’ roadmap actually lands on time,” Gogia noted.
www.networkworld.com
December 23, 2025 at 1:29 PM
Top 5 enterprise tech priorities for 2026
It’s the season for “looking ahead to next year” articles that tell some group or another what they should be doing or what some “expert” says they should do. Let’s take a different slant and focus on what one group— arguably the most important tech buyer group — actually says they’re going to do. I’ve collected 284 comments on tech priorities for 2026 from enterprises, and here are the top five. ### 1. AI optimization and uncertainty Let’s start with a priority that embraces**** two unsurprising sentiments. The first is that the top priority, cited by 211 of the enterprises, is to “deploy the hardware, software, data, and network tools needed to optimize AI project value.” The second is that there is significant uncertainty regarding just what those tools are and even over whether they could be confidently identified. 1. The great majority of the 211 said that their focus is on the agent form of AI, but recall from this blog that enterprises have always seen three distinct agent models: the “interactive” one workers used much like we use chatbot AI today, the “workflow” model that sits in an application workflow like another piece of software, and the “embedded” form that’s built into an application. Enterprises think they have a handle on workflow AI, in terms of what it runs on and what it connects to. But even there, many enterprises say that a new AI element in the workflow is useful because it integrates insights drawn from a broader set of sources, which might in theory be anywhere, data-wise. And they’re even less sure about the impact of the other two models. Embedded AI agents are mostly used in either business analytics or to support network or IT operations, and both these missions seem to put AI in a simple extension role, so aren’t generally expected to require a lot of rethinking on infrastructure or practices. Not so, say enterprises. AI agents are, in general, data magnets. They want more information, they want more consistent information, and they need some measure of data value or weight in making decisions. For example, a netops AI tool might want information on seasonal sales patterns to forecast traffic better. Without broader information sources, it’s harder to build enough value to make a business case, and AI use of data is often implicit in how the model works, where traditional applications use data because the developer called for it. How do you know what AI is going to call for? The interactive form of AI poses the greatest risk. While enterprises see this model as limited to perhaps ten or fifteen percent of workers, those with the highest unit value of labor, it’s nearly impossible to predict what resources a given worker interaction might demand. “One question could use as much compute, as much data, and generate as much traffic, as a week’s running of a normal application,” one planner complained, noting that this sudden resource draw-down can actually impact IT performance across a big chunk of the business. And interactive agents might broad access permission, raising data governance concerns. Enterprises agree you **never** want to give an agent access to everything. First, obviously, this would create massive privacy and governance risks, but second, there’s a significant risk generated by any sort of redundancy in the data. It does beyond simple deduplication, too. Enterprises recommend against using a mixture of detail and summary data, including “derived” data. There’s a risk that working on mixed detail levels can bias results accidentally. “Twenty summaries of the same detail data can look to AI like twenty other sources,” one enterprise noted. IBM’s decision to buy Confluent, a known player in building data-flow applications, may be linked to the data access control and governance issue. ### 2. Cloud backups The second priority is also unsurprising given recent news. Of 284 enterprises who commented, 173 said that they needed a strategy to back up cloud components in the event of cloud outages. This, they say, is a lot more complicated than senior management thinks. First, you have to decide just what really needs backed up. “You can’t totally immunize yourself against a massive cloud or Internet problem,” say planners. Most cloud outages, they note, resolve in a maximum of a few hours, so you can let some applications ride things out. When you know the “what,” you can look at the “how.” Is multi-cloud the best approach, or can you build out some capacity in the data center? Enterprises note that building in resilience in any form may require redesigning some applications to make cloud-hosted elements portable, and that can also mean looking at where application data is stored and how access to it is connected. ### 3. Infrastructure simplification Priority three is managing the technical complexity of infrastructure, cited by 139 enterprises. “We have too many **things** to buy and to manage,” one planner said. “Too many sources, too many technologies.” Nobody thinks they can do some massive fork-lift restructuring (there’s no budget), but they do believe that current projects can be aligned to a long-term simplification strategy. This, interestingly, is seen by over a hundred of the group as reducing the number of vendors. They think that “lock-in” is a small price to pay for greater efficiency and reduction in operations complexity, integration, and fault isolation. This is the biggest shift against multi-vendor or open infrastructure I’ve ever seen. ### 4. Prioritize governance The close number four priority is more administrative than infrastructure-related; 124 enterprises in our group said they needed to “totally revamp governance.” Yes, AI is a big factor in this, but so is the elastic-hosting model of cloud and multi-cloud, and the “sovereignty” issues associated with operating across multiple jurisdictions, and the increasingly chaotic nature of regulations. The percentage of enterprises who say they need some formal “government affairs” input to management practices has increased from 12% in 2020 to 47% for 2026. For example, EU cloud and AI sovereignty concerns impact plans for both AI and cloud application resilience. The biggest problem, these enterprises say, is that governance has tended to be applied to projects at the planning level, meaning that absent major projects, governance tended to limp along based on aging reviews. Enterprises note that, like AI, orderly expansions in how applications and data are used can introduce governance issues, just like changes in laws and regulations. AI complicates this because it’s difficult or impossible to know just what data AI is accessing, if there are no filters on data availability. All this is a governance challenge, but it can pale in comparison to the fact that companies aren’t used to even thinking about governance absent a project framework. Do you need to create “governance projects?” If so, how are they justified, funded? Where there are hard changes in law or regulations, there are procedures, but not so much with other challenges. AI agents, even workflow agents, can creep into governance problems as usage grows, for example. ### 5. Cost management The final priority on our list, with 108 enterprises citing it, is in many ways a barrier to fulfilling any of the other goals they identify: Do more for less. Of our 284 enterprises, 226 said that they were under more budget pressure for 2026, and only 9 said they had less pressure (for the rest, pressures were the same). It’s interesting, though, that number five on the priorities list is the lowest scored for cost management since 2008/2009. The interesting thing about this particular priority is that, unlike prior years where the “cost” being managed was presumed to be the capital cost of the technologies involved, the equipment and software, the focus for 2026 is the total cost, what would be classified as “total cost of ownership” or TCO. This would be easy, or at least possible, in the context of traditional project thinking, but so many of these priorities blur the lines between “projects” that require review, justification, and approval, and normal day-to-day business decisions that usually dodge much of that formality. How do you assess the TCO of AI efficiency optimization overall, or cloud application resilience, or governance​? Overall, the comments from enterprises suggest that while they’re prioritizing many expected issues, they’re also dealing with more subtle ones, and even on topics like AI, they’re taking a different slant than many had expected. It’s more about their ecosystem than the individual parts, and that should make 2026 an interesting year with a lot of important trends to watch!
www.networkworld.com
December 22, 2025 at 2:07 PM
WatchGuard fixes ‘critical’ zero-day allowing firewall takeover
WatchGuard has issued an urgent patch alert for its Firebox firewall appliances after discovering a critical-rated vulnerability that is under exploit by threat actors. Tracked as CVE-2025-14733, with a CVSS score of 9.3, the flaw is an Out-of-bounds Write vulnerability affecting the iked process, a WatchGuard Fireware OS component responsible for the IKEv2 key exchange in IPSec VPNs. According to the WatchGuard advisory, this weakness could “allow a remote unauthenticated attacker to execute arbitrary code,” taking control of the appliance through remote code execution (RCE) without having to log in. Because it was under attack before a patch was made available by WatchGuard on December 18, this makes CVE-2025-14733 a bona fide zero-day vulnerability. The first job for admins should therefore be to check Firebox appliances for signs of current or recent compromise. WatchGuard’s advisory lists four IP addresses associated with exploitation; outbound traffic to them is “a strong indicator of compromise,” while inbound connections from them “could indicate reconnaissance efforts or exploit attempts,” the advisory said. With logging enabled, other strong indicators were an IKE_AUTH request log message with an abnormally large CERT payload greater than 2,000 bytes, or evidence of an iked process hang, the company said. Affected Fireware OS versions are 2025.1 up to and including 2025.1.3, 12.0 up to and including 12.11.5, and legacy 11.10.2 up to and including 11.12.4_Update1. The resolved versions are 2025.1.4, 12.11.6, 12.5.15 (T15 & T35 models), and 12.3.1_Update4 (B728352) for the FIPS-certified release. There is no fix for 11.x, which is considered end of life. Importantly, WatchGuard warned, patching may not be enough: “If the Firebox was previously configured with the mobile user VPN with IKEv2 or a branch office VPN using IKEv2 to a dynamic gateway peer, and both of those configurations have since been deleted, that Firebox may still be vulnerable if a branch office VPN to a static gateway peer is still configured.” And some admins have even more post-patching tasks to perform, it said, noting, “in addition to installing the latest Fireware OS that contains the fix, administrators that have confirmed threat actor activity on their Firebox appliances must take precautions to rotate all locally stored secrets on vulnerable Firebox appliances.” ## Deja vu In September, WatchGuard patched a similar Firebox vulnerability, CVE-2025-9242, also affecting the iked VPN configuration and given a CVSS score of 9.3. At the time, WatchGuard said there were no reports of active exploitation, but by October, the company had revised this assessment after exploitation attempts were detected. This is a reminder not to read initial vulnerability assessments for this type of infrastructure too optimistically — exploitation is frequently detected after a flaw has been made public. Firewalls and VPNs are major targets for cybercriminals, and every significant vulnerability in them represents a clear and present cyber security risk. Unfortunately, the evidence shows that some WatchGuard customers don’t patch vulnerabilities as quickly as they should. In October, a scan by The Shadowserver Foundation found that over 71,000 Firebox appliances had not yet been patched for CVE-2025-9242, including 23,000 in the US. Despite its zero-day status, it’s likely to be a similar story for CVE-2025-14733. Slow or reluctant patching might also explain why Russian-aligned ‘Sandworm’ hackers were recently discovered to be targeting WatchGuard Firebox and XTM appliances by exploiting CVEs dating back several years. _This article originally appeared onCSOonline._
www.networkworld.com
December 20, 2025 at 8:38 AM
Snowflake software update caused 13-hour outage across 10 regions
A software update knocked out Snowflake’s cloud data platform in 10 of its 23 global regions for 13 hours on December 16, leaving customers unable to execute queries or ingest data. Customers saw “SQL execution internal error” messages when trying to query their data warehouses, according to Snowflake’s incident report. The outage also disrupted Snowpipe and Snowpipe Streaming file ingestion, and data clustering appeared unhealthy. “Our initial investigation has identified that our most recent release introduced a backwards-incompatible database schema update,” Snowflake wrote in the report. “As a result, previous release packages errantly referenced the updated fields, resulting in version mismatch errors and causing operations to fail or take an extended amount of time to complete.” The outage affected customers in Azure East US 2 in Virginia, AWS US West in Oregon, AWS Europe in Ireland, AWS Asia Pacific in Mumbai, Azure Switzerland North in Zürich, Google Cloud Platform Europe West 2 in London, Azure Southeast Asia in Singapore, Azure Mexico Central, and Azure Sweden Central, the report said. Snowflake initially estimated service would be restored by 15:00 UTC that day, but later revised it to 16:30 UTC as the Virginia region took longer than expected to recover. The company offered no workarounds during the outage, beyond recommending failover to non-impacted regions for customers with replication enabled. It said it will share a root cause analysis (RCA) document within five working days. “We do not have anything to share beyond this for now,” the company said. ## Why multi-region architecture failed to protect customers The type of failure that hit Snowflake — a backwards-incompatible schema change causing multi-region outages — represents a consistently underestimated failure class in modern cloud data platforms, according to Sanchit Vir Gogia, chief analyst at Greyhound Research. Schema and metadata sit in the control plane layer that governs how services interpret state and coordinate behavior across geographies, he said. “Regional redundancy works when failure is physical or infrastructural. It does not work when failure is logical and shared,” Gogia said. “When metadata contracts change in a backwards-incompatible way, every region that depends on that shared contract becomes vulnerable, regardless of where the data physically resides.” The outage exposed a misalignment between how platforms test and how production actually behaves, Gogia said. Production involves drifting client versions, cached execution plans, and long-running jobs that cross release boundaries. “Backwards compatibility failures typically surface only when these realities intersect, which is difficult to simulate exhaustively before release,” he said. The issue raises questions about Snowflake’s staged deployment process. Staged rollouts are widely misunderstood as containment guarantees when they are actually probabilistic risk reduction mechanisms, Gogia said. Backwards-incompatible schema changes often degrade functionality gradually as mismatched components interact, allowing the change to propagate across regions before detection thresholds are crossed, he said. Snowflake’s release documentation describes a three-stage deployment approach that “enables Snowflake to monitor activity as accounts are moved and respond to any issues that may occur.” The documentation states that “if issues are discovered while moving accounts to a full release or patch release, the release might be halted or rolled back,” with follow-up typically completed within 24 to 48 hours. The December 16 outage affected 10 regions simultaneously and lasted well beyond that window. “When a platform relies on globally coordinated metadata services, regional isolation is conditional, not absolute,” Gogia said. “By the time symptoms become obvious, rollback is no longer a simple option.” Rollback presents challenges because while code can be rolled back quickly, state cannot, Gogia said. Schema and metadata changes interact with live workloads, background services, and cached state, requiring time, careful sequencing, and validation to avoid secondary corruption when reversed. ## Security breach and outage share common weakness The December outage combined with Snowflake’s security troubles earlier in 2024 should fundamentally change how CIOs define operational resilience, according to Gogia. In mid-2024, approximately 165 Snowflake customers were targeted by criminals using stolen credentials from infostealer infections. “These are not separate incidents belonging to different risk silos. They are manifestations of the same underlying issue: control maturity under stress,” Gogia said. “In the security incidents, stolen credentials exploited weak identity governance. In the outage, a backwards-incompatible change exploited weak compatibility governance.” CIOs need to move beyond compliance language and uptime averages to ask behavioral questions about how platforms behave when assumptions fail, Gogia said. “The right questions are behavioral. How does the platform behave when assumptions fail. How does it detect emerging risk. How quickly can blast radius be constrained.” This article first appeared on Infoworld.
www.networkworld.com
December 20, 2025 at 8:39 AM
HPE OneView vulnerable to remote code execution attack
A maximum severity remote code execution vulnerability in Hewlett Packard Enterprise (HPE) OneView network and systems management suite is “bad” and needs to be patched immediately, says a cybersecurity expert. “Vendors typically downplay the severity of a vulnerability,” says Curtis Dukes, executive VP for security best practices at the Center for Internet Security, “but HPE did not – it’s a 10.” The vulnerability is remotely executable by an unauthenticated user, he added, and it impacts every recent version of the suite. On top of that, he pointed out, OneView is a central manager of IT infrastructure in organizations. “For these reasons, the patch should be implemented immediately,” Dukes said. “Adversaries, nation-state, and criminal gangs alike know there is a window of opportunity and are likely working on an exploit.” HPE says in its advisory that the vulnerability, CVE-2025-37164, affects all versions between 5.20 and 10.20. It can be resolved by applying a security hotfix, which must be reapplied after an appliance upgrade from HPE OneView version 6.60.xx to 7.00.00, as well as after any HPE Synergy Composer reimage. HPE offers separate hotfixes for HPE OneView virtual appliance and HPE Synergy Composer. The advisory adds that any third party security patches that are to be installed on systems running HPE software products should be applied in accordance with the customer’s patch management policy. Asked for comment, an HPE spokesperson said the company has nothing to say beyond its advisory, other than to urge admins to download and install the patches as soon as possible. Jack Bicer, director of vulnerability research at Action1, said that because this vulnerability can be exploited without authentication or any user interaction, it is “an extremely severe security issue. There are no available workarounds, so the patch should be applied immediately. Until the patch can be applied, restrict network access to the OneView management interface to trusted administrative networks only.” HPE describes OneView as a solution that simplifies infrastructure lifecycle management across compute storage and networking through a unified API. It allows admins to create a catalogue of workload-optimized infrastructure templates so more general IT staff can rapidly and reliably provision resources. These templates can quickly provision physical, virtual, and containerized systems, setting up BIOS settings, local RAID configuration, firmware baseline, shared storage and more. HPE says software-defined intelligence allows IT to run multiple applications simultaneously with repeatable templates that ensure high reliability, consistency, and control. The vendor also says the embedded automation speeds provisioning and lowers operating expenses. The most recent major vulnerability in OneView was revealed in June: CVE-2025-37101, a local elevation of privilege issue which relates specifically to OneView for VMware vCenter. If exploited, an attacker with read only privilege could upgrade their access to allow them to perform admin actions. _This article originally appeared onCSOonline._
www.networkworld.com
December 19, 2025 at 4:38 AM
Breaking the ransomware kill chain: Why distributed lateral security is no longer optional
Ransomware attacks in 2025 have caused business operations to cease for weeks and months at a time, resulting in massive financial losses in organizations around the globe in sectors such as retail, manufacturing, and healthcare. These major breaches go well beyond the purview of the security team alone. They demand boardroom attention and a fundamental rethinking of enterprise defense strategies. Much of the urgency stems from how artificial intelligence (AI) has rapidly transformed the threat landscape. AI-powered autonomous attacks now probe enterprise networks with minimal human intervention, discovering thousands of potential entry points where human attackers might find only a handful. The automated nature of these attacks means they’re finding far more vulnerabilities much faster. What happens after infiltration hasn’t changed: lateral movement, hunting for high-value assets, and initiating the ransom process. But AI makes the need for proper security hygiene even more pronounced. Enterprises need to take a different approach to security. Traditional perimeter-based security assumes a fortress model, with strong walls that protect sensitive internal assets from external threats. But modern enterprises deploy distributed workloads, containers, and dynamic infrastructure that render static perimeter defenses obsolete. Once attackers breach the perimeter, they can move laterally (freely) through flat (unsegmented) networks like burglars in an empty mansion. **Breaking the ransomware kill chain** Breaking the ransomware kill chain requires distributed security controls at multiple stages. During initial infiltration, intrusion prevention capabilities must operate wherever vulnerabilities exist, such as across private clouds, virtual desktop environments, and application layers. This distributed approach is critical, because a single Java or Linux vulnerability might expose dozens of applications simultaneously across hundreds of servers. Macro- and micro segmentation are the crucial second line of defense. By creating virtual barriers at the workload and hypervisor level, organizations prevent lateral movement after initial compromise. Rather than allowing attackers to roam freely once inside, macro- and micro segmentation contain any threats, limiting damage and buying security teams critical response time. However, implementation requires discipline. Organizations often mistake micro segmentation’s ultimate goal for the first step, attempting to jump directly to granular application-level controls. The more effective path progresses systematically, guided by built-in deployment tooling in the firewall itself: assess the environment, segment shared infrastructure services, establish zone-based protections, and then evolve toward application-level micro segmentation. Network detection and response (NDR) provides the third critical capability. As attackers leave behavioral signatures while moving laterally, AI-powered integrated threat defense can correlate these indicators across the environment, identifying malicious activity before data exfiltration and encryption begin. Locking down protocols such as Remote Desktop Protocol becomes essential. The operational reality is that security tool sprawl undermines even sophisticated strategies. Having multiple disconnected solutions creates deployment delays, policy management nightmares, and incomplete coverage across the attack chain. Organizations purchase numerous tools but deploy only a fraction, across only a subset of applications, leaving dangerous gaps. The solution lies in integrated software-defined security that deploys at the data center private cloud level, where applications and data reside. Exemplifying this approach is VMware vDefend, a unified stack that provides distributed firewall capabilities for macro- and micro segmentation with automated deployment workflows as well as advanced threat detection and prevention that automatically extend as environments scale. By embedding security into the virtualization and Kubernetes layer with policy mobility and dynamic workload protection, organizations gain comprehensive visibility without IP address complexity or deployment delays. Modern ransomware demands modern defenses — not more disparate tools but smarter architecture that breaks the kill chain before attacks succeed. Click to learn more about how VMware vDefend can help your security approach meet AI-powered threats. * * *
www.networkworld.com
December 18, 2025 at 11:34 PM
Cisco confirms zero-day exploitation of Secure Email products
Cisco has warned that a China-linked hacking group is actively exploiting a previously unknown vulnerability in its Secure Email appliances to gain persistent access, forcing affected organizations to consider disruptive rebuilds of critical security infrastructure while patches remain unavailable. Cisco Talos said the campaign has been active since at least late November, raising concerns for security leaders about unseen compromise and how far incident response efforts may need to extend beyond the affected devices. The vulnerability affects Cisco Secure Email Gateway, Cisco Secure Email, and Web Manager appliances running AsyncOS, but only in configurations where the Spam Quarantine feature is enabled and exposed to the internet, according to Cisco. The company said there is currently no patch available, and that rebuilding affected appliances is the only way to fully remove the attackers’ persistence mechanisms in confirmed compromise cases. ## Enterprise exposure and risk scope Cisco said that systems where the Spam Quarantine feature is not enabled are not affected, but analysts said this does not necessarily reduce enterprise risk. “This vulnerability may remain a high-risk issue because affected appliances typically sit in privileged network positions, even though the feature is not enabled by default,” said Sunil Varkey, a cybersecurity analyst. It is also not clear how many enterprises may have enabled the feature in production environments, said Keith Prabhu, founder and CEO of Confidis. “The Spam Quarantine provides a way for administrators to review and release ‘false positives,’ i.e., legitimate email messages that the appliance has deemed to be spam,” Prabhu said. “In today’s remote support and 24×7 operations, it is entirely possible that this feature has been enabled by many enterprises.” Akshat Tyagi, associate practice leader at HFS Research, said the bigger concern is the nature of the target. Unlike a user laptop or a standalone server, email security systems sit at the center of how organizations filter and trust email traffic, meaning attackers would be operating inside infrastructure designed to stop threats rather than receive them. “The fact that there’s no patch yet elevates the risk further,” Tyagi said. “When the vendor’s guidance is to rebuild appliances rather than clean them in place, it tells you this is about persistence and control, not just a one-off exploit.” Varkey added that exploitation may not require direct internet exposure and could also occur from internal or VPN-reachable networks, advising organizations to close or restrict access to affected management ports temporarily. ## Rebuild guidance and operational tradeoffs Cisco has said that wiping and rebuilding appliances is currently required in cases where compromise has been confirmed. “From a security standpoint, it is indeed the right call,” Tyagi said. “When there’s a risk that attackers have embedded themselves deep in a system, patching alone won’t solve the issue. Rebuilding is the only way to be confident the threat is fully removed.” But Varkey said that this may not be a viable option for many organizations, as it introduces business risks, including downtime, misconfiguration, and the potential reintroduction of persistence through contaminated backups. Enterprises will need to balance remediation speed with business continuity while relying on compensating controls to limit exposure. “Cisco Secure Email Gateway, Cisco Secure Email, and Web Manager are critical components of the email infrastructure,” Prabhu said. “Organizations would need to plan this activity in a way that minimizes downtime, but at the same time reduces the time window of compromise. In the interim, they could use other security measures like blocking ports on the firewall to limit exposure.”
www.networkworld.com
December 18, 2025 at 11:34 PM
The state of open-source networking: The foundations and technologies driving today’s networks
### **3 key facts about open-source networking** > * 92% of organizations view open source networking as critical to their future infrastructure plans. * Projects like SONiC enable hardware independence, helping organizations achieve up to 50% reduction in TCO. * eBPF-based CNI implementations like Cilium are replacing older protocols, leading to better performance in Kubernetes environments. Two decades ago, Linux emerged as a mainstream operating system and was perhaps the most well-known open source technology. What has emerged around Linux as part of the broader networking ecosystem are a series of open source technologies and projects that have become foundation to modern connectivity. Open source networking has gone mainstream: In fact, 92% of organizations now view it as critical to their future, according to Linux Foundation “2025 Open Source Networking Study: The Role and Value of Open Source in the Networking Industry’s Software Stack,” understanding this landscape is no longer optional for network professionals. This Linux Foundation guide maps the active open source projects that power modern infrastructure — from hyperscale cloud data centers to 5G networks. These projects collectively represent millions of dollars in shared innovation and are deployed in production at the world’s largest networks. ## Why open source networking matters Three forces are encouraging organizations to use open source networking software: **Breaking vendor lock-in.** Projects like SONiC let organizations run the same software stack across hardware from hundreds of vendors. Organizations report 40-50% reductions in total cost of ownership by eliminating licensing fees and using commodity hardware. **Faster innovation.** There are over 1,300 contributors across major projects, which means that those projects deliver early access to emerging technologies like AI integration and 5G without waiting for vendor roadmaps. 74% of organizations see open source as foundational to AI success in networks. **Real interoperability.** Standardized APIs and interfaces mean you can mix and match components from different projects. ## Open source foundations drive development and governance While there can be individual efforts led by companies, the most influential and impactful open -source networking efforts tend to fit into one of the major open source foundations. The Linux Foundation itself is a ‘foundation of foundations’ and is home to multiple groups including: **Linux Foundation Networking (LFN)** hosts the largest portfolio of projects. Formed in 2018 by merging the OPNFV, ONAP, OpenDaylight and FD.io projects, it now includes over a dozen projects spanning software-defined networking (SDN) controllers, orchestration, data plane acceleration, and emerging technologies. Recent 2025 additions include Duranta (Open RAN), Essedum (AI networking) and Project Salus (responsible AI). LFN provides the governance structure for projects like OpenDaylight, ONOS, ONAP, DPDK, VPP and Nephio. **Cloud Native Computing Foundation (CNCF)** focuses on containerized applications. With over 700 member organizations, CNCF graduated the projects that now run most Kubernetes networking: Cilium, Istio, Linkerd, Envoy, and CoreDNS. The foundation’s incubating and sandbox projects include CNI (container networking interface) implementations like Contour, Kube-OVN, and Antrea. **OpenInfra Foundation** (formerly OpenStack Foundation) maintains OpenStack Neutron, the network-as-a-service component for cloud platforms. Neutron continues serving enterprise private cloud deployments worldwide. **SONiC Foundation** governs the data center network operating system running Microsoft Azure’s entire global network. Launched in 2020 when SONiC moved to Linux Foundation governance, the foundation coordinates development across over 850 community members and ensures hardware compatibility with hundreds of vendors. SONiC represents the most widely deployed open source network OS in hyperscale environments. There are a few efforts outside of the Linux Foundations, including the following: **Open Compute Project (OCP)** drives open hardware specifications for data center infrastructure. Founded by Facebook in 2011, OCP created the hardware ecosystem that enables software-defined networking at scale. The project’s switch specifications provide the foundation for SONiC and other network operating systems, with contributions from Microsoft, Google, Meta, and hundreds of hardware vendors. **Apache Software Foundation (ASF**) hosts a number of critical projects, including the **Apache HTTP Server** which is the original widely used open source web server for providing HTTP services. ## Key projects by use case Open source networking projects spans the entire landscape of networking deployment with multiple core use cases: ### **Data center infrastructure** * **SONiC** dominates modern data center deployments as the leading open source network operating system, running on white-box switches from hundreds of vendors through the Switch Abstraction Interface (SAI). The containerized microservices architecture enables hardware independence while maintaining production-grade reliability. * **DPDK** (data plane development kit) bypasses the Linux kernel’s network stack by moving packet processing into userspace, eliminating context switches and enabling direct memory access to network interfaces. This architecture delivers order-of-magnitude performance improvements compared to traditional kernel-based networking. * **VPP** (vector packet processing) from FD.io builds on DPDK with a graph-based processing model that handles multiple packets simultaneously through the same code path rather than processing packets individually. * **Open vSwitch** remains the production standard for virtual switching. ### **Kubernetes networking** * **Cilium** leads the Container Network Interface implementations with eBPF-based architecture that replaces iptables for better performance. * **Service mesh****options** include Istio and Linkerd. Istio offers comprehensive features built on Envoy proxy with zero-trust networking, mTLS encryption, advanced traffic management and extensive observability. Linkerd emphasizes simplicity with a lightweight Rust-based proxy. * **Envoy** serves as the foundational L7 proxy underlying most service meshes and API gateways. ### **Automating network operations** * **NAPALM** provides the industry-standard abstraction layer for multivendor network automation. * **NetBox** serves as the network source of truth combining IPAM and DCIM functionality. * **Batfish** analyzes configurations pre-deployment by building behavioral models. It catches errors before they hit production without needing device access. ### **Operating service provider networks** * **ONAP** orchestrates complete NFV and cloud-native infrastructure for service providers. * **FRRouting** provides production routing capabilities with full protocol support. * **Akraino Edge Stack** addresses edge computing requirements srcset="https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?quality=50&strip=all 2550w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=97%2C300&quality=50&strip=all 97w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=768%2C2371&quality=50&strip=all 768w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=332%2C1024&quality=50&strip=all 332w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=498%2C1536&quality=50&strip=all 498w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=663%2C2048&quality=50&strip=all 663w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=226%2C697&quality=50&strip=all 226w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=54%2C168&quality=50&strip=all 54w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=27%2C84&quality=50&strip=all 27w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=155%2C480&quality=50&strip=all 155w, https://b2b-contenthub.com/wp-content/uploads/2025/12/NW-project-table-2550px.jpg?resize=81%2C250&quality=50&strip=all 81w" width="332" height="1025" sizes="auto, (max-width: 332px) 100vw, 332px"> We break down more than 40 open source projects in these categories: SDN and orchestration, data plane and performance, cloud-native networking, routing and switching, network automation, testing and validation, network operations, edge and telco, and OpenStack. Foundry
www.networkworld.com
December 17, 2025 at 8:49 PM
Cisco defines AI security framework for enterprise protection
Cisco has rolled out an AI Security and Safety Framework it hopes will help customers and the industry get out in front of what is expected to be a potential flood of adversarial threats, content safety failures, model and supply chain compromise, and agentic behavior problems as AI becomes an integral part of the enterprise network. With AI, humans, organizations, and governments cannot adequately comprehend or respond to the implications of such rapidly evolving technology and the threats that ensue, wrote Amy Chang, leader, threat and security research in Cisco’s AI Software and Platform group, in a blog about the new Integrated AI Security and Safety Framework. “Organizations are deploying systems whose behavior evolves, whose modes of failure are not fully understood, and whose interactions with their environment are dynamic and sometimes unpredictable,” Chang stated. The framework is Cisco’s bid to define the common language for AI risk before attackers and regulators do, according to the vendor. The framework represents one of the first holistic attempts to classify, integrate, and operationalize the full range of AI risks. This vendor-agnostic framework provides a structure for understanding how modern AI systems fail, how adversaries exploit them, and how organizations can build defenses that evolve alongside capability advancements, Chang wrote. The AI Security and Safety Framework is built on five elements that comprise an evolving AI threat landscape: the integration of AI threats and content harms, development lifecycle awareness, multi-agent coordination, multimodality, and audience-aware utility. Further detail includes: **Threats and harms** : Adversaries exploit vulnerabilities across both domains, and oftentimes, link content manipulation with technical exploits to achieve their objectives. A security attack, such as injecting malicious instructions or corrupting training data, often culminates in a safety failure, such as generating harmful content, leaking confidential information, or producing unwanted or harmful outputs, Chang stated. The AI Security and Safety Framework’s taxonomy brings these elements into a single structure that organizations can use to understand risk holistically and build defenses that address both the mechanism of attack and the resulting impact. **AI lifecycle** : Vulnerabilities that are irrelevant during model development may become critical once the model gains access to tooling or interacts with other agents. The AI Security and Safety Framework follows the model across this entire journey, making it clear where different categories of risk emerge and how they may evolve, and letting organizations implement defense-in-depth strategies that account for how risks evolve as AI systems progress from development to production. **Multi-agent orchestration** : The AI Security and Safety Framework can also account for the risks that emerge when AI systems work together, encompassing orchestration patterns, inter-agent communication protocols, shared memory architectures, and collaborative decision-making processes, Chang stated. **Multimodal threats** : Threats can emerge from text prompts, audio commands, maliciously constructed images, manipulated video, corrupted code snippets, or even embedded signals in sensor data, Chang stated. As we continue to research how multimodal threats can manifest, treating these pathways consistently is essential, especially as organizations adopt multimodal systems in robotics and autonomous vehicle deployments, customer experience platforms, and real-time monitoring environments, Chang stated. **Audience-aware** : Finally, the framework is intentionally designed for multiple audiences. Executives can operate at the level of attacker objectives, security leaders can focus on techniques, while engineers and researchers can dive deeper into sub techniques. Drilling down even further, AI red teams and threat intelligence teams can build, test, and evaluate procedures. All of these groups can share a single conceptual model, creating alignment that has been missing from the industry, Chang stated. The framework includes the supporting infrastructure, complex supply chains, organizational policies, and human-in-the-loop interactions that collectively determine security outcomes. This enables clearer communication between AI developers, AI end-users, business functions, security practitioners, and governance and compliance entities, Chang stated. The framework is already integrated into Cisco AI Defense package, Chang stated. Cisco’s AI Defense package offers protection to enterprise customers developing AI applications across models and cloud services. It includes four key components: AI Access, AI Cloud Visibility, AI Model and Application Validation, and AI Runtime Protection. There are additional model context protocol (MCP), agentic, and supply chain threat taxonomies embedded within the AI Security Framework. Protocols like MCP and A2A govern how LLMs interpret tools, prompts, metadata, and execution environments, and when these components are tampered with, impersonated, or misused, benign agent operations can be redirected toward malicious goals, Chang stated. “The MCP taxonomy (which currently covers 14 threat types) and our A2A taxonomy (which currently covers 17 threat types) are both standalone resources that are also integrated into AI Defense and in Cisco’s] open source tools: [MCP Scanner and A2A Scanner. Finally, supply chain risk is also a core dimension of lifecycle-aware AI security. We’ve developed a taxonomy that covers 22 distinct threats and is simple,” Chang said. Cisco isn’t only vendor to offer an AI security framework. AWS, Microsoft Azure, Palo Alto Networks, and others have frameworks as well, but Cisco says they are missing key coverage areas. “For years, organizations that attempted to secure AI pieced together guidance from disparate sources. MITRE ATLAS helped define adversarial tactics in machine learning systems. NIST’s Adversarial Machine Learning taxonomy described attack primitives. OWASP published Top 10 lists for LLM and agentic risks. Frontier AI labs like Google, OpenAI, and Anthropic shared internal safety practices and principles. Yet each of these efforts focused on a particular slice of the risk landscape, offering pieces of the puzzle but stop short of providing a unified, end-to-end understanding of AI risk,” Change wrote. Change stated that no existing framework covers content harms, agentic risks, supply chain threats, multimodal vulnerabilities, and lifecycle-level exposure with the completeness needed for enterprise-grade deployment. The real world does not segment these domains, and adversaries certainly do not either, Chang stated.
www.networkworld.com
December 17, 2025 at 8:51 PM
Kubernetes 1.35 enables zero-downtime resource scaling for production cloud workloads
The open-source Kubernetes cloud native platform is getting its last major release of 2025 today. Kubernetes 1.35 comes nearly four months after the Kubernetes 1.34 update, which integrated a host of enhancements for networking. Kubernetes has emerged to become the default cloud technology for containers and is supported by every major cloud platform. It powers everything from traditional web applications to distributed AI training clusters. As adoption expands, the platform faces pressure to eliminate technical debt while advancing capabilities that enterprises demand. The new Kubernetes 1.35 release addresses both imperatives. The release graduates in-place pod resource adjustments to general availability, enabling administrators to modify CPU and memory allocations without downtime. At the same time, the project deprecates IP Virtual Server proxy mode, pushing networking forward to a more modern architecture. The release also strengthens certificate lifecycle automation and enhances security policy controls. As with every release, the Kubernetes community comes up with an interesting codename that is intended to be symbolic of both the specific release and Kubernetes community. For 1.35, the Kubernetes community selected “Treenetes” as the codename based on World Tree mythology. The symbolism reflects both the project’s maturity and its diverse contributor base. srcset="https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?quality=50&strip=all 512w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=150%2C150&quality=50&strip=all 150w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=300%2C300&quality=50&strip=all 300w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=168%2C168&quality=50&strip=all 168w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=84%2C84&quality=50&strip=all 84w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=480%2C480&quality=50&strip=all 480w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=360%2C360&quality=50&strip=all 360w, https://b2b-contenthub.com/wp-content/uploads/2025/12/Kubernetes-1.35-Treenetes.png?resize=250%2C250&quality=50&strip=all 250w" width="512" height="512" sizes="auto, (max-width: 512px) 100vw, 512px"> Kubernetes project “The project keeps growing into branches, and the product is rooting itself to be a very mature foundation for things like AI and edge going into the future,” Drew Hagen, the Kubernetes 1.35 release lead, told _Network World_. ## In-Place pod resource adjustments reach production The headline feature in Kubernetes 1.35 is general availability for in-place pod resource adjustments. It’s a feature that is tracked in the project as Kubernetes Enhancement Proposal (KEP) 1287 and was first proposed back in 2019. The capability fundamentally changes how administrators manage container resources in production clusters. “This has the capability of updating the resources and the resource requests and limits on a pod, which is just really powerful, because now we don’t have to actually restart a pod to expand the resources that are getting allocated to it,” Hagen explained. Previously, modifying resource requests or limits required destroying the pod and creating a new one with updated specifications. Applications went offline during the transition. Network connections dropped. The process required maintenance windows for routine operational tasks. The new implementation modifies cgroup (control group) settings directly on running containers. When resource specifications change, Kubernetes updates the existing cgroup rather than recreating the pod. Applications continue running without interruption. The feature particularly benefits AI training workloads and edge computing deployments. Training jobs can now scale vertically without restarts. Edge environments gain resource flexibility without the complexity of pod recreation. “For AI, that’s a really big training job that can be scaled and adjusted vertically, and then for edge computing, that’s really big to where there’s added complexity and actually adjusting those workloads,” Hagen said. The feature requires cgroups v2 on the underlying Linux nodes. Kubernetes 1.35 deprecates cgroups v1 support. Most current enterprise Linux distributions include cgroups v2, but older deployments may need OS upgrades before using in-place resource adjustments. ## Gang Scheduling supports distributed AI workloads Among the preview features that is in the new release is a capability known as gang scheduling. The feature (tracked as KEP-4671) is intended to help distributed applications that require multiple pods to start simultaneously. “It’s adding a new workload that can be deployed through the cluster that will group a bunch of pods together, and they either all get started together, or none of them do,” Hagen explained. “It’s kind of keeping a better way of packaging certain dependencies of distributed apps that have run together.” The implementation adds a new workload object deployed through the cluster. Pods in the group either all start together or none start at all. This eliminates the complexity of ensuring distributed application dependencies come online in the correct order. Hagen noted that a really good use case for gang scheduling is for AI workloads where organizations have multiple instances working together and training data. Version 1.35 also includes a preview of a node-declared feature (KEP-5328), allowing nodes to advertise their capabilities. Pods won’t schedule on nodes lacking required features, preventing runtime failures from capability mismatches. ## Security enhancements target node impersonation and pod identity Kubernetes 1.35 advances several security features aimed at preventing cluster compromise and enabling zero-trust architectures. Constrained impersonation (KEP-5284) enters alpha status in this release. The feature blocks malicious machines from impersonating legitimate nodes to extract sensitive information from running applications and pods. “This helps with preventing a machine to come into the cluster and impersonate itself as a node and pull sensitive information from running applications and pods,” Hagen said. Pod certificates for mutual TLS (KEP-4317) reach beta, enabling mutual TLS authentication between pods. The capability supports zero-trust networking models where pod-to-pod communication requires cryptographic verification. The release also includes OCI (Open Container Initiative) image volume source improvements (KEP-4639) for edge computing and storage. The feature allows attaching read-only data volumes as OCI artifacts, simplifying data distribution in edge deployments. ## IPVS Proxy Mode deprecated in favor of Nftables for networking The new Kubernetes release isn’t just about new features, it’s also about getting rid of old features. Kubernetes 1.35 deprecates IP Virtual Server (IPVS) proxy mode for service load balancing. The decision forces network teams to migrate to nftables-based implementations. IPVS has been a core networking option since Kubernetes 1.8. The mode leverages the Linux kernel’s IPVS load balancer for distributing service traffic. Many production deployments adopted IPVS because it outperformed the original iptables-based kube-proxy, especially in clusters with thousands of services. Nftables represents the modern Linux packet filtering framework. It replaced iptables in the kernel networking stack and provides better performance with more flexible rule management. The framework consolidates packet filtering, NAT and load balancing into a unified interface. Network administrators need to test nftables compatibility with existing service mesh implementations and network policies. The deprecation timeline spans multiple releases, giving teams time to plan migrations. “It seems as though Kubernetes is a very mature project, and we’re getting to a point or a place where we aren’t afraid to shed technical debt to sort of enable us to move forward with some of these big features,” Hagen said.
www.networkworld.com
December 17, 2025 at 8:50 PM
Enterprises to prioritize infrastructure modernization in 2026
Readying enterprise infrastructure for AI and other resource-heavy applications is high on the to-do list for businesses looking to stay competitive 2026. The rise of AI has heightened the importance of IT modernization, as many organizations are still reliant on outdated, legacy infrastructure that is ill-equipped to handle modern workload requirements, says tech solutions provider World Wide Technologies (WWT). “A key aspect of any refresh initiative is gaining better visibility and control over the existing asset base. Too often, organizations don’t have a clear understanding of what hardware and software they have deployed, which maintenance contracts are in place, or how all the pieces fit together. This lack of visibility makes it extremely difficult to plan meaningful modernization beyond reactive ‘rip-and-replace’ cycles,” WWT stated in its report, IT infrastructure Modernization Priorities for 2026. “When you look at the traditional data center or infrastructure modernization that’s going on, I don’t know that there are wildly new trends, but there are some things that are accelerating, like addressing of technical debt to keep the enterprise agile, efficient and capable of supporting cutting-edge innovations — particularly AI-powered applications,” Neil Anderson, vice president and CTO of cloud, infrastructure, and AI solutions for WWT, told _Network World_. Application modernization is one area Anderson sees accelerating. “If you kind of intersect what’s going on with AI software coding assistance and the problem of that modernization, it starts to become much more feasible at to do app modernization at scale,” Anderson said. “You can translate languages, you can re-platform and re-architect, all with the assistance of these AI tools. Some apps are still written in COBOL. This is a once-in-a-generation opportunity to kind of catch up on some of those problems.” A move to modernize data center infrastructure has many organizations are looking at private cloud models, according to the WWT report: “The drive toward private cloud is fueled by several needs, with one primary driver being greater data security and privacy. Industries like finance and government, which handle sensitive information, often find private cloud architectures better suited for meeting strict compliance requirements. Additionally, private clouds offer more customization, allowing organizations to tailor their environment to specific workloads and performance needs, which is difficult to achieve in one-size-fits-all public clouds.” WWT reports the rise of specialized private clouds for AI and high-performance computing—for example, neocloud providers that offer GPU-as-a-service. “These on-premises environments can be optimized for performance characteristics and cost management, whereas public cloud offerings, while often a quick entry point to start AI/ML experimentation, can become prohibitively expensive at scale for certain workloads,” WWT stated. There is also a move to build up network and compute abilities at the edge, Anderson noted. “Customers are not going to be able to home run all that AI data to their data center and in real time get the answers they need. They will have to have edge compute, and to make that happen, it’s going to be agents sitting out there that are talking to other agents in your central cluster. It’s going to be a very, distributed hybrid architecture, and that will require a very high speed network,” Anderson said. Real-time AI traffic going from agent to agent is also going to require a high level of access control and security, Anderson said. “You need policy control in the middle of that AI agent environment to say ‘is that agent authorized to be talking to that other agent? And are they entitled to access these applications?’” That’s a big problem on the horizon, Anderson said. “If a company has 100,000 employees, they have 100,000 identities and 100,000 policies about what those people can do and not do. There’s going to be 10x or 100x AI agents out there, each one is going to have to have an identity. Each one is going to have an entitlement in a policy about what data they are allowed to access. That’s going to take upgrades that don’t exist today. The AI agent issue is growing rapidly,” Anderson said. In addition, the imperative to run AI workloads on-premises, often dubbed “private AI,” continues to grow, fueled by the need for greater control over data, enhanced performance, predictable costs and compliance with increasingly strict regulatory requirements, WWT stated. It cited IDC data projecting that by 2028, 75% of enterprise AI workloads are expected to run on fit-for-purpose hybrid infrastructure, which includes on-premises components. “This reflects a shift toward balancing performance, cost and compliance, especially for private AI deployments,” WWW wrote, noting that Grand View Research is predicting the global AI infrastructure market will reach $223.45 billion by 2030, growing at a 30.4% CAGR, “with on-premises deployments expected to remain a significant portion of this growth, particularly in regulated industries like healthcare, finance, and defense.” “Implementing private AI is not simply a matter of deploying new software or adding a few servers. The complexity and scale of modern AI workloads, ranging from machine learning model training and inferencing to real-time analytics, require a comprehensive modernization of the underlying infrastructure,” WWT wrote. Such modernization needs to take into consideration power and cooling needs much more than ever, Anderson said. “Most of our customers are not sitting there with a lot of excess data center power; rather, most people are out of power or need to be doing more power projects to prepare for the near future,” he said. Steps for building AI-ready infrastructure should include implementing efficient cooling technologies, WWT recommends: “Given the significant heat output of dense AI clusters, traditional air cooling may be insufficient. [Customers should] investigate advanced technologies such as direct-to-chip liquid cooling, immersion cooling tanks or rear-door heat exchangers. These methods can enhance thermal efficiency, lower energy consumption and help control data center operating costs. Partner with vendors who provide integrated solutions and ongoing support, and consider deploying environmental sensors throughout your facility to monitor temperature gradients and airflow in real time,” WWT wrote. “What we’ve found by working with some of the leading manufacturers, like Nvidia on liquid cooling, is that if you cool the GPUs properly, you actually require less power,” Anderson said.
www.networkworld.com
December 17, 2025 at 8:50 PM
Will Google throw gasoline on the AI chip arms race?
Google caused two significant disruptions in the AI chip field last month. The first one was the release of its 7th generation tensor processing unit (TPU), codenamed Ironwood. The chips offered a significant improvement in inference processing, for which it was custom built. Ironwood also offered massive memory scale and bandwidth come with something needed in AI processing. The second came a few weeks later and was much more significant. Word spread quickly that Meta was considering making a significant purchase – reportedly 100,000 units — of the Google TPUs for its own hyperscale facilities. There was also speculation that Google would seek out other customers for its TPU. This sent ripples through the AI silicon market, particularly impacting Nvidia. The company is so dominant now that people are looking for any excuse to take it down a peg, and a legitimate competitor is a good one. It would also mark a significant departure in the direction hyperscalers have been taking up until now. Virtually every hyperscaler is building its own custom silicon, some for general computing and others specifically for AI processing. But up until now, hyperscalers have kept their homegrown silicon for themselves. Google going into business of selling AI processors would be a marked departure from the way things have been done. So, would Google kick off a new arms race in silicon, where Nvidia and AMD face competition from their biggest customers: Google, AWS, and Microsoft? Analysts say maybe but not likely. “Are they likely to sell the TPUs? Yes. Are they going to compete directly with Nvidia? No. Because the TPUs are not meant to compete directly with Nvidia. The TPUs really are meant to be more targeted at doing smaller scale, less intense model processors,” said Jack Gold, president of J.Gold & Associates. The Nvidia processors, he explains, are for processing massive, large language models (LLMs), while the Google TPU is used for inferencing, the next step after processing the LLM. So the two chips don’t compete with each other, they complement each other, according to Gold. Selling and supporting processors may not be Google’s core competence, but they have the skills and experience to do this, said Alvin Nguyen, senior analyst with Forrester Research. “They have had their TPUs available, from what I understand, to some outside companies already, mainly startups from ex-Googlers or Google-sponsored startups,” he said. As to the rumor of the Meta purchase, the question is, what does Meta want to do with them, said Gold. “If they’ve already built out a model and they’re running inference workloads, then Nvidia B100s and B200s are overkill,” he said. “And so what are the options there are now? There are a number of startups that are trying to do inference-based chips as well, and Intel and AMD are moving in that direction as well. So it really is a function of getting a chip that’s optimized for their environment, and again, Google’s TPUs that are optimized for a hyperscaler cloud type of environment.” Nguyen says it’s one thing to make their own chips for their own use, it’s another thing to be selling them that’s an infrastructure and a competency that they don’t have, and Intel and AMD and Nvidia are way ahead of them in that regard. “Yes, they know how to do it for themselves, long as you were talking about as a service or as a cloud service. For on premises, or for people who want to take it for themselves, that’s a muscle memory they have to develop,” he said. For that reason, he doubts that other hyperscalers with their own custom silicon will go into the chip selling business for themselves as well. “There’s nothing to stop them, but each of them has their own challenges,” said Nguyen. Microsoft, AWS and OpenAI all have multiple partnerships and would inevitably end up in competition with somebody. Gold says he can’t see AWS and Microsoft going into the chips business. “I can’t see that happening. I really can’t. It’s just not a business model for those guys that makes a lot of sense in my mind,” he said.
www.networkworld.com
December 17, 2025 at 6:30 AM
Enterprise reactions to cloud and internet outages
Let’s face it, the last couple of months haven’t been great for the cloud. In October, both Amazon’s AWS and Microsoft’s Azure had widely publicized, highly impactful failures. In November, a Cloudflare outage took down a big chunk of websites, effectively closing some businesses. I even had problems getting a haircut because the salon website was down with Cloudflare-itis and I couldn’t join a waitlist. Of course for enterprises, haircuts were the least of their worries. The cloud was too much of a risk, some said. The internet itself was at risk to cloud problems, and to problems of its own. Those in the c-suite, not surprisingly, “examined” or “explored” or “assessed” their companies’ vulnerability to cloud and internet problems after the news. So what did they find? Are enterprises fleeing the cloud they now see as risky instead of protective? A total of 147 enterprises have offered comments to me, of which 83 offered highly technical suggestions. Here’s how they shook out. First, none of the enterprises said they planned to abandon the cloud, and everyone thought any notion of leaving the internet world was too silly to even assess. This, despite the fact that 129 said that they believed both the cloud and the internet to be less reliable than it had been, and 24 said they did plan to take measures to reduce their vulnerability to outages. All the enterprises thought the dire comments they’d read about cloud abandonment were exaggerations, or reflected an incomplete understanding of the cloud and alternatives to cloud dependence. And the internet? “What’s our alternative there?” one executive asked me. “Do we go back to direct mail or build a bunch of retail stores?” What caused the reliability of the cloud and internet to decline, in the perception of those who thought it had? A majority said that they believed the providers, to improve their profit margins, were running their servers too close to the edge. The next top problem cited was “an imbalance between the quality of staff tools and their mission,” and the remainder just thought that infrastructure was getting more complex, without offering exactly how, why, or why that was impacting reliability. Of those who offered technical comments, all but two said the issue really is complexity. Virtualization in any form is more operationally intensive because it’s a multi-layer concept, and a secure and reliable global internet depends on layers of technology and administration. More layers, more disconnected things to go wrong. A little issue with something so many depend on necessarily has a massive impact. > Enterprises ask: “What good is a ‘single pane of glass’ when three or four groups are all trying to see through it, looking for different things?” Virtualization, in a data center of an enterprise or a cloud provider, is a three-layer process. The bottom layer is the resource pool, the servers and platform software. There’s a set of management tools associated with them, The top layer is a “mapping” layer that creates the virtual elements from the resource pool and exposes them for applications and application management. Astride both, in parallel, is the network layer, which provides connectivity throughout. This layer is run by different people, and in fact different teams. The enterprise experts pointed out that the network piece of this cake had special challenges. Its critical to keep the two other layers separated, at least to ensure that nothing from the user-facing layer could see the resource layer, which of course would be supporting other applications and, in the case of the cloud, other companies. It’s also critical in exposing the features of the cloud to customers. The network layer, of course, includes the Domain Name Server (DNS) system that converts our familiar URLs to actual IP addresses for traffic routing; it’s the system that played a key role in the AWS problem, and as I’ve noted, it’s run by a different team. The internet is layered, too. We have ISPs who offer access, a mixture of commercial and government players who provide and manage global connectivity, physical and logical (URL) addressing, and security offerings at the content end, the consumer end, and (as Cloudflare shows) as an intermediary process. It’s like a global dance, and even if everyone has the steps right, they can still trip over each other. (See also: Why cloud and AI projects take longer and how to fix the holdups) Complexity is increasing in every one of our layers in both the internet and cloud, because the business case depends on efficient use of resources and reliable quality of experience. Operations is more likely to be a problem in each of the layers, and given their interdependence, cooperation in operations is essential. Yet we separate the people involved. All the layer-people dance, and sometimes trip up on the crowded floor. Why not combine the groups? Enterprises ask: “What good is a ‘single pane of glass’ when three or four groups are all trying to see through it, looking for different things?” Enterprises don’t see the notion of a combined team or an overlay, every-layer team, as the solution. None of the enterprises had a view of what would be needed to fix the internet, and only a quarter of even the virtualization experts express an opinion on what the answer is for the cloud. That group agrees with the limited comments I’ve gotten from people I know in the cloud provider world. The answer is **templates, simulations, and world models** , and I think that could work for the internet too, since many of our major internet issues, including Cloudflare’s issue, really come down to software configuration and operations. The idea here is to prevent issues from developing by modeling the entire system, using real-world machine learning to add in experience with usage, traffic, conditions, and QoE, then using the model to come up with a list of steps needed to implement a desired change or respond to a problem. This template of steps would be simulated on the model, and the template and simulation results reviewed by the operations team or teams involved. Then when the steps are executed, the result of each step would be checked against the simulation and any discrepancies would halt the process, reverse the step if needed, and alert all the teams to convene for a review. Simulation, used to forecast reaction to potential rather than real issues, might even help in software-error-driven problems like the recent Cloudflare outage by pointing out risks of cascade faults and identifying remedies. It might seem that this approach keeps people, the people who perhaps made the mistakes in the first place, too involved. Everyone disagrees with that, though. One operations manager said, “It’s my *** if things go wrong, so I’m going to be sure I have the last word on anything.” In general, operations types in all the verticals and in the cloud provider and even telco worlds agree that a purely automated strategy, if it goes wrong, would likely create something even operations professionals could be unable to fix. The risk of a massive and persistent outage, an AI disaster created at superhuman speed, is simply too great. Overall, among enterprises, less than ten percent of operations specialists think we’ll ever get to the point where purely automated processes will run the network, a virtualized data center, or the cloud. Among cloud providers and telcos, it’s less than half that number. So what are enterprises going to do about cloud and internet problems? I think the answer may be found in (of all places) my haircut. Did I resort to home hair-cutting, let my hair grow long, make a change in barbers to avoid the risk of having to wait? No, I went to the salon with no appointment, ready to stand in line. Instead I found that nobody else had come, and I walked in. Does that mean that enterprises should go back to the old way, forget the internet? Or does it mean that they should just work through an outage, knowing that it will end and they’ll survive it? It means that sometimes there isn’t really any choice because there really aren’t any options. Business say that using the cloud, or the internet, may mean accepting some major headaches, but they survived them in the past, so they’ll bet they can survive them in the future. Business as usual is safer, easier to justify to your boss. And, if you’re worried, my haircut looks fine, but I’ll be online to join the waitlist next time I need one.
www.networkworld.com
December 16, 2025 at 2:56 PM
Nvidia moves deeper into AI infrastructure with SchedMD acquisition
Nvidia has taken a strategic step deeper into the AI software stack with its acquisition of SchedMD, the developer of Slurm, a widely used open-source workload manager for high-performance computing and AI clusters. Slurm plays a central role in scheduling large, resource-intensive jobs across thousands of servers and GPUs, shaping how AI workloads are distributed in modern data centers. “Nvidia will continue to develop and distribute Slurm as open-source, vendor-neutral software, making it widely available to and supported by the broader HPC and AI community across diverse hardware and software environments,” Nvidia said in a blog post. The deal underscores Nvidia’s push to strengthen its open software ecosystem while ensuring Slurm remains vendor-neutral and broadly available to users navigating increasingly complex AI workloads. The acquisition also follows Nvidia’s announcement of a new family of open-source artificial intelligence models, highlighting how the company is pairing model development with deeper investments in the software and infrastructure layers needed to run AI at scale. ## Why Slurm matters As AI clusters scale in size and complexity, workload scheduling is increasingly tied to network performance, affecting east-west traffic flows, GPU utilization, and the ability to keep high-speed fabrics operating efficiently. “Slurm excels at orchestrating multi-node distributed training, where jobs span hundreds or thousands of GPUs,” said Lian Jye Su, chief analyst at Omdia. “The software can optimize data movement within servers by deciding where jobs should be placed based on resource availability. With strong visibility into the network topology, Slurm can direct traffic to areas with high-speed links, minimizing network congestion and thereby improving GPU utilization.” Charlie Dai, principal analyst at Forrester, said Slurm’s scheduling logic plays a significant role in shaping how traffic moves within AI clusters. “Slurm orchestrates GPU allocation and job scheduling and directly influences east-west traffic patterns in AI clusters,” Dai said. “Efficient scheduling reduces idle GPUs and minimizes inter-node data transfers, while improving throughput for GPU-to-GPU communication, which is critical for large-scale AI workloads.” While Slurm does not manage network traffic directly, its placement decisions can have a substantial impact on network behavior, said Manish Rawat, analyst at TechInsights. “If GPUs are placed without network topology awareness, cross-rack and cross-spine traffic rises sharply, increasing latency and congestion,” Rawat said. Taken together, these analyst views underscore why bringing Slurm closer to Nvidia’s GPU and networking stack could give the company greater influence over how AI infrastructure is orchestrated end-to-end. ## Enterprise impact and tradeoffs For enterprises, the acquisition reinforces Nvidia’s broader push to strengthen networking capabilities across its AI stack, spanning GPU topology awareness, NVLink interconnects, and high-speed network fabrics. “The acquisition signals a push toward co-design between GPU scheduling and fabric behavior, not immediate lock-in,” Rawat said. “Combining Slurm’s job-level intent with GPU and interconnect telemetry enables smarter placement decisions.” That said, Su noted that while Slurm will remain open source and vendor-neutral, Nvidia’s investment is likely to steer development toward features such as tighter NCCL integration, more dynamic network resource allocation, and greater awareness of Nvidia’s networking fabrics, including more optimized scheduling for InfiniBand and RoCE environments. This means that the move could nudge enterprises running mixed-vendor AI clusters to migrate toward Nvidia’s ecosystem in pursuit of better networking performance. Organizations that prefer to avoid deeper alignment may instead evaluate alternative frameworks, such as Ray, Su added. ## What customers should expect For existing Slurm users, analysts expect the transition to be largely smooth, with limited disruption to current deployments, especially because Slurm is expected to remain open source and vendor-neutral. “Continual community contributions are expected and should help mitigate bias,” Su added. “Enterprises and cloud providers that already have Nvidia-powered servers can expect faster rollout of features optimized for Nvidia hardware and higher overall performance.” Still, Dai cautioned that deeper integration with Nvidia’s AI stack is likely to bring operational changes that enterprises will need to plan for. “Enterprises and cloud providers should anticipate enhanced GPU-aware scheduling features and deeper telemetry integration with Nvidia tools,” Dai said. “This may require updates to monitoring workflows and network optimization strategies, particularly for Ethernet fabrics.”
www.networkworld.com
December 16, 2025 at 2:57 PM
Cloud providers continue to push EU court to undo Broadcom-VMware merger
CISPE, the association of Cloud Infrastructure Providers in Europe, is continuing its legal fight to unwind Broadcom’s acquisition of VMware, citing the harm it is doing to VMware customers. It has just filed a response in a case it brought before a top European Court earlier this year, seeking to annul a 2023 decision to approve the merger by the European Commission, the European Union’s antitrust authority. The Commission was one of numerous competition regulators around the world weighing in on the decision. “The Commission looked at this merger through half-closed eyes and declared it safe. By rubber stamping the deal, Brussels handed Broadcom a blank cheque to raise prices, lock-in and squeeze customers. This was a failure of oversight by the regulator with real world costs for Europe’s cloud sector and every organization that depends upon it,” CISPE secretary general Francisco Mingorence said in a written statement. Broadcom’s acquisition of VMware has attracted fierce criticism from VMware customers, leading as it has to some significant price rises. In an October report on the effects of the merger CISPE’s European Cloud Competition Observatory said members had reported experiencing price increases of between 800% and 1500%. CISPE filed its case with the General Court of the Court of Justice of the European Union in July. The Commission has since filed its defense, and it was to that document that CISPE responded last week. In its new filing, CISPE claimed that “the Commission had not examined the risk that Broadcom would use VMware’s dominance to drive substantial price increases and tighten contractual lock-in and adopted no safeguards under EU merger rules.” CISPE director of communications Ben Maynard dismissed fears that any action by the Commission could lead to a fine on VMware that would be passed on to its users, increasing prices even further. “I’m not sure that a fine is a likely consequence. This isn’t an action against Broadcom; this is against the European Commission’s decision,” he said. ## Time is relative If the action is successful, he said, VMware users “may well see a reduction in prices and a return to the terms before the acquisition. I certainly can’t see the action leading to higher prices for VMware users. Most users are already paying a higher price.” Cases at the General Court have a reputation for being long, drawn-out affairs but Maynard said he hoped the action would be solved relatively quickly before adding, “We’re talking ‘quickly’ in European terms. We hope to have it referred back to the General Court within the first half of next year, with a decision made within a couple of years.” Broadcom strongly disagreed with CISPE’s allegations. “The European Commission, along with twelve other jurisdictions around the world, approved our acquisition of VMware following a thorough merger review process, and we will uphold the commitments made to the Commission at that time,” a company spokesperson said.
www.networkworld.com
December 13, 2025 at 3:14 AM
FinOps Foundation sharpens FOCUS to reduce cloud cost chaos
A cloud challenge that hampers many organizations is how to normalize billing data across disparate platforms that include multi-cloud and hybrid infrastructure deployments. Enterprises are spending significant resources building custom integrations to reconcile cost data from different cloud providers, SaaS platforms and on-premises infrastructure. That’s the domain of FinOps, a practice that has gained traction in recent years. FinOps combines finance, operations and engineering teams to manage cloud and technology spending. Among the leading organizations in the movement is the FinOps Foundation, which is part of the Linux Foundation project. The FinOps Foundation is responsible for the FinOps Open Cost and Usage Specification (FOCUS), which was first released in 2023. The primary goal of FOCUS is to help standardize how providers format billing data, so organizations can compare costs across multiple platforms without building custom integrations for each one. More than a dozen providers now support FOCUS, including major cloud vendors (Google, Oracle, Microsoft, AWS, Alibaba, Huawei, Tencent) and platform providers (Databricks, Grafana). The specification has evolved beyond its original cloud-only focus to encompass SaaS, data center and emerging AI infrastructure spending. This week, the FinOps Foundation announced the release of the FOCUS 1.3 specification. The update aims to address three persistent technical challenges: * Splitting shared resource costs with transparent allocation methodology * Tracking contract commitments in a structured format * Verifying data freshness and completeness through standardized metadata “FOCUS is really meant to be a language of cloud and technology value,” J.R. Storment, executive director of the FinOps Foundation, told _Network World._ ## From cloud-only to hybrid infrastructure management The FinOps practice has undergone significant expansion since the FOCUS specification first launched in 2023. What began as a cloud cost management discipline has evolved into a comprehensive approach for managing technology value across diverse infrastructure types. “The big change that’s really started to happen in late 2024 early 2025 is that the FinOps practice started to expand past the cloud,” Storment said. “A lot of organizations got really good at using FinOps to manage the value of cloud, and then their organizations went, ‘oh, hey, we’re living in this happily hybrid state now where we’ve got cloud, SaaS, data center. Can you also apply the FinOps practice to our SaaS? Or can you apply it to our Snowflake? Can you apply it to our data center?'” The FinOps Foundation’s community has grown to approximately 100,000 practitioners. The organization now includes major cloud vendors, hardware providers like Nvidia and AMD, data center operators and data cloud platforms like Snowflake and Databricks. Some 96 of the Fortune 100 now participate in FinOps Foundation programs. The practice itself has shifted in two directions. It has moved left into earlier architectural and design processes, becoming more proactive rather than reactive. It has also moved up organizationally, from director-level cloud management roles to SVP and COO positions managing converged technology portfolios spanning multiple infrastructure types. This expansion has driven the evolution of FOCUS beyond its original cloud billing focus. Enterprises are implementing FOCUS as an internal standard for chargeback reporting even when their providers don’t generate native FOCUS data. Some newer cloud providers, particularly those focused on AI infrastructure, are using the FOCUS specification to define their billing data structures from the ground up rather than retrofitting existing systems. The FOCUS 1.3 release reflects this maturation, addressing technical gaps that have emerged as organizations apply cost management practices across increasingly complex hybrid environments. ## FOCUS 1.3 exposes cost allocation logic for shared infrastructure The most significant technical enhancement in FOCUS 1.3 addresses a gap in how shared infrastructure costs are allocated and reported. Current implementations force practitioners to either accept provider-determined allocations without visibility into methodology or build custom logic to redistribute costs. FOCUS 1.3 introduces allocation-specific columns that expose the methodology providers use to split costs across workloads. Rather than simply receiving a final allocated cost figure, practitioners can now see both the allocation approach and the underlying calculation method. This change particularly benefits organizations running multi-tenant Kubernetes clusters or shared database instances. Platform engineering teams can verify that provider allocation methods align with their internal cost models and chargeback systems. The specification provides a standardized way for providers to document whether they’re using resource-based allocation, usage-based allocation or hybrid approaches. ## Separating contract data from usage records Organizations tracking reserved instances, savings plans and committed use discounts across multiple cloud providers face a data structure problem. Current billing exports embed contract details within usage rows. A single cost record might include both the hourly usage charge and fragments of commitment information scattered across multiple columns. This structure makes basic queries difficult. Asking “what are all my active commitments, and when do they expire?” requires parsing usage data, deduplicating contract references and reconstructing commitment terms from partial information across thousands of billing rows. FOCUS 1.3 creates a dedicated Contract Commitment dataset separate from cost and usage data. The dataset includes start dates, end dates, committed units and contract descriptions in a queryable format. A single SELECT statement returns all active commitments without touching usage records. The separation enables different access controls. Finance teams can view contract terms and commitment status while operations teams access only the usage data they need for capacity planning. This addresses compliance requirements in financial services and healthcare where contract terms must be restricted to specific roles. “We didn’t have the opportunity for them to express that through the FOCUS metadata before, and so we wanted to be able to close that gap for providers that support it,” Matt Cowsert, principal product manager at the FinOps Foundation, told _Network World_. This represents the first time FOCUS has defined a dataset beyond cost and usage. The pattern establishes a framework for future adjacent datasets covering invoicing and price lists. ## Flagging incomplete data before it breaks workflows Automated cost reconciliation workflows fail when they process incomplete billing data. A common scenario: Finance triggers month-end close based on available data, then discovers two days later that the cloud provider revised usage records and added previously unreported charges. FOCUS 1.3 requires providers to include metadata indicating whether data is complete or subject to revision. The specification defines timestamp fields and completeness flags in a structured format that applications can check programmatically. Organizations can now build logic that checks completeness status before triggering dependent processes. If the metadata indicates incomplete data, automated workflows can wait rather than processing partial information and requiring manual corrections later. The metadata also documents data delivery SLAs. Providers specify when usage records for specific services have been finalized. This replaces informal knowledge about which providers deliver complete data within 24 hours versus which take three to five days to finalize records. ## FOCUS in the real world The FinOps Foundation releases FOCUS updates twice annually. Providers choose implementation timing based on their development cycles rather than following a lockstep upgrade path. Enterprises are also implementing FOCUS as an internal standard independent of external provider support. Organizations use FOCUS language for internal chargeback systems and finance reporting even when aggregating data from providers that don’t generate native FOCUS output. “Our goal is to ensure that each release for FOCUS has material benefit for practitioners,” Cowsert said.
www.networkworld.com
December 13, 2025 at 3:15 AM
P4 programming: Redefining what’s possible in network infrastructure
Network engineers have spent decades working within rigid constraints. Your switch vendor decides what protocols you can use, what features you get, and when you get them. Need something custom? You’re out of luck. That’s changing, and P4 is a primary driver. P4 lets you program the data plane, the part of switches and SmartNICs that actually moves packets. This isn’t theoretical. Organizations are running P4 in production today, handling real traffic for applications that can’t wait years for vendor feature requests to materialize. If you’re planning network infrastructure for the next five to ten years, understanding P4 isn’t optional anymore. ## What P4 actually does The core idea is simple: separate the control plane (decides where packets go) from the data plane (moves packets there), then make the data plane programmable. OpenFlow did the first part. P4 takes it further by letting you define how packets get processed, not just where they go. Think about traditional network hardware. It knows Ethernet, IP, TCP, UDP, maybe VXLAN if you’re lucky. Send it a packet with a custom header format? The device treats everything after the outer headers as opaque payload. You can’t route based on your custom fields. You can’t modify them. You’re stuck. With P4, you write the parser yourself. You tell the switch or SmartNIC exactly what your custom protocol looks like: where each field starts, how long it is, what values matter. Then you define match action rules. If this field equals X, do Y. The device compiles your program and executes it on every packet at line rate. Here’s what makes this powerful: you’re not limited to protocols that existed when the hardware shipped. Need to support a new encapsulation format next month? Write the parser, compile, deploy. No firmware update. No vendor involvement. No waiting. ## Real problems P4 solves ### Visibility that actually tells you something Traditional monitoring gives you SNMP counters (updated every 30 seconds, way too slow) or NetFlow samples (statistically useful but incomplete). Neither tells you what happened to a specific transaction at a specific moment. P4 changes this completely. Your switches and SmartNICs can add metadata to packets as they flow through timestamps, queue depths and congestion indicators. The application receiving the packet gets real data about what happened in the network. A database query that normally takes 5ms suddenly takes 50ms? You know exactly which device had congestion, when it happened, and how bad it was. Real example: A retail company deployed P4 telemetry on both their switches and server SmartNICs before Black Friday. Their traditional monitoring showed everything looked normal. Average latency within bounds, no packet loss. But P4 telemetry revealed that 2% of shopping cart transactions were hitting 500ms delays. Turned out specific switch ports had misconfigured buffers that only showed up under bursty traffic. They found and fixed it before it became a revenue problem. Their old monitoring system would’ve completely missed this. ### Security at every layer Most networks handle DDoS protection with dedicated appliances. Expensive boxes positioned at chokepoints. P4 moves that protection everywhere, from the network fabric to the server edge. Simple example: DNS amplification attacks. A P4 program on a SmartNIC tracks query-to-response ratios per source IP. See 1 query and 50 responses? That’s amplification. Drop the responses automatically before they even reach the server CPU. The SmartNIC maintains state, makes decisions, and acts. All at wire speed while forwarding legitimate traffic normally. More advanced implementations get really interesting. One financial services company uses P4 on SmartNICs to enforce API call sequences at the server edge. You must call their authentication endpoint first, then data endpoints, then logout. Try to grab data without authenticating? The P4 program drops your packets immediately at the NIC, before consuming any server resources. It’s maintaining per-connection state machines, something very hard to achieve with traditional fixed-function switches and NICs. ### Offload and acceleration SmartNICs running P4 can offload network functions from server CPUs. Encryption, encapsulation, load balancing and traffic shaping are all handled at the NIC before packets reach the host. This frees up CPU cycles for actual application workloads. One cloud provider deployed P4 SmartNICs across their compute fleet to handle VXLAN encapsulation and security policy enforcement. Result: 30% reduction in CPU overhead for networking, which translated directly into more capacity for customer workloads. The same hardware, just programmed differently. ### Deploy new protocols in months, not years Large cloud operators have implemented custom congestion control protocols optimized for their data center traffic patterns. Rolling that out with traditional hardware would take years. You need switches and NICs that understand the new packet format. With P4, they wrote the parser and forwarding logic, compiled it, and pushed it to existing hardware. Design to production: months. This pattern applies broadly. Custom load balancing schemes, experimental transport protocols, new overlay formats. All deployable on hardware you already own through P4 programming. ## The parts nobody talks about (until something breaks) ### Hardware doesn’t have infinite resources P4 programs run on ASICs and FPGAs with real physical constraints. Match action tables hold thousands to maybe a few millions of entries, not billions. Stateful operations have size limits. Packet modifications must complete in nanoseconds, not microseconds. I’ve seen engineers design beautiful table hierarchies that look perfect on paper, then discover their target hardware doesn’t have enough TCAM. The program compiles fine. It just won’t load. That’s a bad day. This applies whether you’re programming a top-of-rack switch or a server SmartNIC. Best approach: know your hardware intimately before you write code. Understand table sizes, match types (exact vs. ternary vs. LPM), action complexity limits. Design within those bounds from the start. Vendor data sheets and P4 target documentation should be reviewed early to avoid late surprises. ### Testing isn’t optional, it’s survival A buggy P4 program drops packets. Or worse, forwards them incorrectly. You absolutely cannot “try it and see” in production. Testing infrastructure is mandatory. The P4 behavioral model (BMv2) lets you run your program in software. Send test packets through, verify behavior, before touching real hardware. Your test cases need to cover normal traffic, edge cases, malformed packets, and attack scenarios. Add negative tests for parser error paths and table miss behavior; these are common sources of field issues. One company I know runs 10,000+ test cases on every P4 program change. Sounds excessive until you hear they caught 43 bugs in one update, any of which would’ve caused an outage. Testing saved them. ### Portability takes real work Different hardware targets support different P4 features. Your program might use 32 match action stages, but some devices only support 16. Hash functions vary. Packet modification capabilities differ. Supported protocols aren’t consistent. Perfect portability is a fantasy. Instead, maintain a core P4 program with target-specific adaptations. Use compiler directives and modular design so platform differences stay isolated in small sections. Accept that some advanced features won’t work everywhere. A switch ASIC and a SmartNIC FPGA will have different capabilities. Where feasible, align control plane integration on P4Runtime to reduce vendor lock-in at the API layer. ## How to actually deploy this ### Start small and specific Don’t try to replace your entire network on day one. Pick one use case where P4 delivers clear value. Deploy capable hardware in targeted locations. Maybe SmartNICs for critical application servers, or ToR switches for specific traffic patterns, or edge routers needing custom traffic engineering. Pattern that works well: deploy P4 hardware in monitoring mode initially. SmartNICs and switches watch traffic and generate telemetry, but don’t affect forwarding. Operations teams build confidence with low risk. Then, gradually add forwarding logic and policy enforcement. Track success metrics such as latency percentiles, CPU offload, and incident mean time to resolution to justify expansion. ### Design for hybrid deployments Not everything needs programmable processing. Run P4 capable hardware for traffic requiring custom logic. Use conventional devices for high-volume standard traffic. Example: equip database servers with P4 SmartNICs that implement custom congestion control and security policies. Standard web servers use regular NICs. Machine learning training clusters get P4 switches with specialized flow handling. Standard office traffic uses regular switches. You get P4’s benefits precisely where they matter, while controlling cost and complexity. ### Think about control planes P4 programs implement the data plane. Something else has to populate those match action tables. That’s your control plane. Options include traditional routing protocols, SDN controllers, or custom applications. Many deployments use SDN controllers that translate high-level policies into table entries pushed to switches and SmartNICs. The controller understands topology and requirements. The P4 program executes forwarding efficiently. Separating concerns keeps complexity manageable. Standardizing on P4Runtime for table programming and using gNMI for device telemetry and configuration can simplify multi-vendor control plane design. ## Building the team skills ### It’s not just network engineers P4 programming needs hybrid expertise: deep protocol knowledge plus software development skills. Network engineers have to learn programming. Software developers have to learn networking internals. Training should cover P4 language basics, hardware architectures (both switches and SmartNICs), testing methods, and debugging. Hands-on labs with BMv2 and real hardware are essential. Budget 4 to 6 months for engineers to become productive. Early on, consider cross-functional teams: network architects who understand requirements paired with developers who write clean code. Over time, people develop both skillsets. ### Treat it like real software Use version control. Do code reviews. Run automated testing. Deploy in stages. One company’s workflow: develop in BMv2, test on lab hardware, deploy to staging environment, monitor for 48 hours, then production rollout to switches and SmartNICs. Keep rollback procedures ready. P4 programs update without hardware changes, but you need to reverse quickly if problems emerge. Blue-green deployments or canary strategies work well for P4 rollouts in production. ## Where this goes next Hardware support is expanding rapidly. More switch vendors and SmartNIC manufacturers are shipping P4-capable platforms. Tooling is maturing. We’ll see tighter integration with intent-based networking. High-level business policies automatically generate P4 programs deployed across the infrastructure. Machine learning will consume P4 telemetry from switches and SmartNICs to optimize traffic in real time. New protocols will emerge that assume P4’s flexibility instead of fighting hardware constraints. Server-side processing will increasingly leverage SmartNIC offload for network-intensive workloads. For network architects, the question isn’t whether to adopt P4. It’s when and how. Organizations building P4 capability now gain real competitive advantage: faster feature deployment, better visibility, stronger security, networks that adapt to business needs instead of constraining them. Yes, this requires investment. Hardware, skills, development processes. But the alternative means staying constrained by vendor roadmaps in an era where network agility increasingly determines business success. P4 offers a way out of those constraints, if you’re willing to rethink how network infrastructure works. The transition won’t be easy. Nothing this fundamental ever is. But the organizations making this shift now, deploying P4 on both switches and SmartNICs across their infrastructure, will help define what “modern networking” means for the next decade. The rest will spend that decade catching up. **This article is published as part of the Foundry Expert Contributor Network. ****Want to join?**
www.networkworld.com
December 13, 2025 at 3:15 AM
Aetherflux joins the race to launch orbital data centers by 2027
Aetherflux says it will launch its first solar powered orbital data center satellite in the first quarter of 2027, joining SpaceX, Amazon, Google, and Starcloud in a race to move computing infrastructure off-planet as AI’s energy demands outpace terrestrial data center capacity. “Aetherflux’s first data center node for commercial use is targeted for Q1 2027; subsequent satellite launches will build a constellation of nodes to scale capacity,” Aetherflux said in an announcement of the project, dubbed “Galactic Brain.” The push reflects AI’s mounting pressure on data center infrastructure. Data center energy consumption is expected to double by 2030, according to World Economic Forum estimates. Goldman Sachs projects that power demand will be even higher, surging 160% by 2030. Yet fewer than one in ten CIOs have included orbital computing in their three-to-five-year roadmaps, even though more than six in ten cite power, land, and permitting as top constraints on AI infrastructure, said Sanchit Vir Gogia, chief analyst at Greyhound Research, citing the firm’s data. ## Company outlines technical approach Aetherflux’s first node will provide “multi-gigabit level bandwidth,” with availability comparable to terrestrial servers through leveraging of optical inter-satellite links and emerging relay networks, a company spokesperson told _NetworkWorld_. “Our roadmap begins with the deployment of teraflop-class systems in 2027,” the spokesperson said. “We are designing the architecture to scale rapidly to petaflop-class constellations as we increase the number of deployed nodes.” Enterprises will connect to and manage orbital workloads “the same way they manage cloud workloads today,” using optical links, the spokesperson added. The company’s approach is to “continuously launch new hardware and quickly integrate the latest architectures,” with older systems running lower-priority tasks to serve out the full useful lifetime of their high-end GPUs. The company declined to disclose pricing. Aetherflux plans to launch about 30 satellites at a time on SpaceX Falcon 9 rockets. Before the data center launch, the company will launch a power-beaming demonstration satellite in 2026 to test transmission of one kilowatt of energy from orbit to ground stations, using infrared lasers. Competition in the sector has intensified in recent months. In November, Starcloud launched its Starcloud-1 satellite carrying an Nvidia H100 GPU, which is 100 times more powerful than any previous GPU flown in space, according to the company, and demonstrated running Google’s Gemma AI model in orbit. In the same month, Google announced Project Suncatcher, with a 2027 demonstration mission planned. ## Analysts see limited near-term applications Despite the competitive activity, orbital data centers won’t replace terrestrial cloud regions for general hosting through 2030, said Ashish Banerjee, senior principal analyst at Gartner. Instead, they suit specific workloads, including meeting data sovereignty requirements for jurisdictionally complex scenarios, offering disaster recovery immune to terrestrial risks, and providing asynchronous high-performance computing, he said. “Orbital centers are ideal for high-compute, low-I/O batch jobs,” Banerjee said. “Think molecular folding simulations for pharma, massive Monte Carlo financial simulations, or training specific AI model weights. If the job takes 48 hours, the 500ms latency penalty of LEO is irrelevant.” One immediate application involves processing satellite-generated data in orbit, he said. Earth observation satellites using synthetic aperture radar generate roughly 10 gigabytes per second, but limited downlink bandwidth creates bottlenecks. Processing data in orbit and transmitting only results could reduce latency and communication costs, he said. Space-suitable workloads share three characteristics, Banerjee said: low data transfer requirements, latency tolerance, and high energy intensity. Gogia echoed the limited scope. “Core banking, e-commerce, ERP, collaboration, and most analytics will remain resolutely terrestrial,” he said. “Those systems are tightly coupled to user interaction, regulatory controls and existing data gravity.” ## Cost barriers remain substantial The notion that the cost of energy in space is almost zero “has merit, but is economically dangerous if taken in isolation,” Banerjee said. Terrestrial data centers have high operational expenses for power and cooling but moderate capital expenses. In orbit, energy operational costs approach zero but capital expenses are astronomical, he said. “You must build the power plant—solar arrays—and build the cooling tower—radiators—and launch all of it at thousands of dollars per kilogram,” Banerjee explained. Launch costs pose a major hurdle. Google’s research indicated costs must fall below $200 per kilogram by the mid-2030s for orbital data centers to match terrestrial facilities on cost. SpaceX’s Falcon 9 currently charges around $2,500 per kilogram. Hardware refresh compounds the challenge. On Earth, enterprises refresh hardware every three to five years. In space, hardware can’t be upgraded once launched, Banerjee said. “If Aetherflux launches H100 GPUs in 2027, by 2030 they’re obsolete artifacts,” he said. “Orbital providers must treat satellites as disposable—launching new clusters annually—which drastically inflates total cost of ownership.” Cooling adds another cost layer. In vacuum, cooling occurs only through radiation, requiring massive radiator panels. Every kilogram of radiator adds to launch costs, eroding savings from free solar energy, he said. Given these challenges, more than 70% of CIOs said they need at least a 30% to 40% total cost advantage before considering orbital options, Gogia said, citing Greyhound Research data. Despite the obstacles, market projections remain bullish. BIS Research projects the in-orbit data center market will reach $1.77 billion in 2029, and $39.09 billion by 2035. ## Timeline expectations differ sharply “The timelines being marketed are best read as ambitious markers, not planning anchors for CIOs,” Gogia said. “Realistic enterprise pilots for production-grade workloads are more likely toward the very end of this decade.” For network architects planning infrastructure, orbital computing should be treated as an ultra-remote specialist region loosely coupled to existing systems, he added, since response times remain in tens of milliseconds—acceptable for batch AI training but unsuitable for transaction systems. Banerjee framed the value proposition differently. “IT leaders must treat orbital compute not as a cost-savings play, but as a sustainability and availability play,” he said. “You’re not going to space to save money on your cloud bill. You’re going there to access 500 megawatts of green power that your state’s grid simply cannot give you.” For now, both analysts recommended caution. “For the next three to five years, orbital data centers should sit firmly in scenario planning, not as a dependency in core transformation programs,” Gogia said.
www.networkworld.com
December 13, 2025 at 3:15 AM
Here’s what Oracle’s soaring infrastructure spend could mean for enterprises
Oracle’s aggressive AI-driven data center build-out has pushed its free cash flow from a modest deficit of $2 billion in the quarter ended August 31 to a staggering $10 billion shortfall in the quarter ended November 30, creating structural financial pressure that could translate into higher subscription costs and stricter contract terms for customers, analysts say. “Oracle customers face a clear and escalating risk of price increases because the company has entered a capital cycle where spending has significantly outpaced monetization,” said Sanchit Vir Gogia, CEO of Greyhound Research. The bigger deficit is not the product of temporary timing issues but the result of $12 billion of capital expenditure on data centers, GPU superclusters, sovereign cloud regions, specialized networking, and high-density cooling infrastructure, Gogia added. However, Oracle co-CEOs Clay Magouyrk and Mike Sicilia, along with other top executives on Wednesday’s quarterly earnings call with analysts, framed the free cash flow deficit not as a structural weakness but as a strategic investment phase, one they expect to pay dividends as cloud and infrastructure revenues scale. Oracle is not incurring expenses for new data centers until they are actually up and running, said principal financial officer Douglas Kehring, while Magouyrk sad that the time period for a data center to start generating revenue after becoming operational is “not material.” “We’ve highly optimized a process… which means that the period of time where we’re incurring expenses without that kind of revenue and the gross margin profile that we talked about is really on the order of a couple of months… So a couple of months is not a long time,” Magouyrk said during the call. He said he had earlier told analysts in a separate call that margins for AI workloads in these data centers would be in the 30% to 40% range over the life of a customer contract. Kehring reassured that there would be demand for the data centers when they were completed, pointing to Oracle’s increasing remaining performance obligations, or services contracted but not yet delivered, up $68 billion on the previous quarter, saying that Oracle has been seeing unprecedented demand for AI workloads driven by the likes of Meta and Nvidia. ## Rising debt and margin risks raise flags for CIOs For analysts, though, the swelling debt load is hard to dismiss, even with Oracle’s attempts to de-risk its spend and squeeze more efficiency out of its buildouts. Gogia sees Oracle already under pressure, with the financial ecosystem around the company pricing the risk — one of the largest debts in corporate history, crossing $100 billion even before the capex spend this quarter — evident in the rising cost of insuring the debt and the shift in credit outlook. “The combination of heavy capex, negative free cash flow, increasing financing cost and long-dated revenue commitments forms a structural pressure that will invariably finds its way into the commercial posture of the vendor,” Gogia said, hinting at an “eventual” increase in pricing of the company’s offerings. He was equally unconvinced by Magouyrk’s assurances about the margin profile of AI workloads as he believes that AI infrastructure, particularly GPU-heavy clusters, delivers significantly lower margins in the early years because utilisation takes time to ramp. “These weaker early-year margins widen the gap between Oracle’s profitability model and the economic reality of its AI business. To bridge this, vendors typically turn toward subscription uplifts, stricter renewal structures, more assertive minimum consumption terms and intensified enforcement of committed volumes,” Gogia said. HFS Research CEO Phil Fersht expects Oracle customers to have “tougher renewal discussions” if the company decides to increase pricing. “Oracle has one of the strongest enterprise lock-in positions in the industry,” Fersht said, adding that the company offers many core products that are hard to unwind. ## Make ready to leave CIOs should start acting even before Oracle makes the changes explicit, the analysts advised. Gogia sees developing architectural optionality as a critical step for CIOs, meaning that they should identify which Oracle workloads are genuinely immovable because of regulatory, operational or data gravity reasons, and which can be diversified or redesigned. “It is commercial leverage. A CIO who can genuinely demonstrate the technical feasibility of reducing dependency will experience an entirely different negotiation dynamic to one whose estate is structurally trapped,” Gogia said, adding that developing optionality is not the same as migration intent. The second safeguard, Gogia said, is locking in multi-year price protections that are explicit, measurable, and legally enforceable. “This protection must be written at the unit level, not in blended percentage terms that can be reinterpreted during renewal, Gogia said. “Ambiguity is a risk factor that customers cannot afford.” Fersht cautioned that CIOs should be wary of Oracle trying to bundle services such as database automation and AI, as “every large tech vendor gravitates toward higher-margin and higher-control services” as margins slip. Gogia, too, sees this as a threat and advised CIOs to demand complete separation between AI infrastructure pricing and core cloud or database services. ## Is there a silver lining? Despite the risk of price rises, there might be a strategic upside for CIOs, especially if they can use time to their advantage. “Oracle’s need to demonstrate utilization and revenue conversion over the next several quarters create windows of disproportionate buyer leverage,” Gogia said, adding that CIOs that come to the table now can secure far more favorable economic outcomes than those that wait until Oracle’s cash flow stabilizes and its bargaining power returns. He also sees this as an opportunity for enterprises to reshape the governance of their Oracle estates. “CIOs can use this moment to renegotiate the terms that have historically disadvantaged them, such as restrictive lock-in conditions, aggressive audit rights and opaque consumption commitments,” Gogia concluded. This article first appeared on CIO.com.
www.networkworld.com
December 11, 2025 at 9:23 PM
New Nvidia software gives data centers deeper visibility into GPU thermals and reliability
Nvidia has released new open-source software that gives data center operators deeper visibility into the thermal and overall health of its AI GPUs, aiming to help enterprises manage heat and reliability challenges as power-hungry accelerators push cooling systems to their limits. The update arrives as the industry weighs the growing impact of thermal stress on the lifespan and performance of modern AI hardware, making granular telemetry an increasingly important part of large-scale infrastructure planning. The new software gives operators a dashboard to monitor power use, utilization, memory bandwidth, airflow issues, and other key indicators across entire GPU fleets, helping them spot bottlenecks and reliability risks earlier. “The offering is an opt-in, customer-installed service that monitors GPU usage, configuration and errors,” Nvidia said in a statement. “It will include an open-source client software agent — part of Nvidia’s ongoing support of open, transparent software that helps customers get the most from their GPU-powered systems.” The importance of such monitoring is underscored by a recent report from Princeton University’s Center for Information Technology Policy, which warns that high thermal and electrical stress can cut the usable lifespan of AI chips to one or two years, much shorter than the broader one-to-three-year range often assumed. Nvidia emphasized that the service provides read-only telemetry that customers control, and that its GPUs do not include any hardware tracking features, kill switches, or backdoors. ## Addressing the challenge Modern AI accelerators now draw more than 700W per GPU, and multi-GPU nodes can reach 6kW, creating concentrated heat zones, rapid power swings, and a higher risk of interconnect degradation in dense racks, according to Manish Rawat, semiconductor analyst at TechInsights. Traditional cooling methods and static power planning increasingly struggle to keep pace with these loads. “Rich vendor telemetry covering real-time power draw, bandwidth behavior, interconnect health, and airflow patterns shifts operators from reactive monitoring to proactive design,” Rawat said. “It enables thermally aware workload placement, faster adoption of liquid or hybrid cooling, and smarter network layouts that reduce heat-dense traffic clusters.” Rawat added that the software’s fleet-level configuration insights can also help operators catch silent errors caused by mismatched firmware or driver versions. This can improve training reproducibility and strengthen overall fleet stability. “Real-time error and interconnect health data also significantly accelerates root-cause analysis, reducing MTTR and minimizing cluster fragmentation,” Rawat said. These operational pressures can shape budget decisions and infrastructure strategy at the enterprise level. ## Enterprise impact Analysts say tools like Nvidia’s can play a growing role as AI reshapes the economics and operating models of modern data centers. “Modern AI is a power-hungry and heat-emitting beast, disrupting the very economics and operational principles of data centers,” said Naresh Singh, senior director analyst at Gartner. “Enterprises need monitoring and management tools and practices to ensure things do not get out of hand, while also enabling greater agility and dynamism in operating data centers. There is no escape here; this will become mandatory in the coming years.” He added that better fleet-level visibility is becoming essential for justifying rising AI infrastructure budgets. “Such tools are critical for optimizing the very high datacenter and infrastructure capex, and opex outlays planned for the next few years,” Singh said. “As the value and the practical organizational use of AI come under scrutiny, such high investments need to be backed by effective utilization, with every dollar and watt being accounted for in terms of effective tokens served.”
www.networkworld.com
December 11, 2025 at 2:03 PM
Arista goes big with campus wireless tech
Arista Networks is making moves in campus mobility, announcing architectural updates designed to help enterprises build large-scale wireless networks. Arista’s new Virtual Ethernet Segment with Proxy ARP (VESPA) for WLAN mobility will let enterprise customers design massive Wi-Fi roaming domain networks with more than 500,000 multivendor clients and 30,000 Arista wireless access points. VESPA is part of Arista’s Cognitive Campus/Cognitive Wi-Fi package, which is based on the cloud-based wireless technology it bought from Mojo Networks in 2018. That technology bypasses the traditional on-premises Wi-Fi controller to manage APs and lets the wireless control and management plane run in the cloud, offering scalability and flexibility, according to Sriram Venkiteswaran, Arista’s director of product management. It also provides the ability to run unified wired and wireless network management, in this case via Arista’s CloudVision management package, Venkiteswaran said. Arista VESPA uses data center-style technologies solve two big Wi-Fi problems, Venkiteswaran said. The first is scale. With a network for a large university, or a large campus with multiple buildings, “one of the biggest challenges with the legacy architecture is that there is no concept of a seamless, roaming domain across the entire campus,” he said. “So the current architecture would require the network to be broken down to multiple mobility domains because of how it’s architected, with the limitations of traditional controller-based wireless architecture.” The second problem is resiliency. “When you deploy these wireless clients and controllers, you have this concept of failover, and when there is a failover, it takes an order of minutes, or you have an outage. And for customers that are running mission-critical or business-critical applications, any downtime can be massively disrupting,” Venkiteswaran said. With VESPA, Arista has applied high-scale, data-center principles like Virtual Extensible LAN (VXLAN) and Ethernet VPN (EVPN) to wireless campus networks to enable a single, massive roaming domain supporting over 500,000 clients with high resiliency and fast failover, Venkiteswaran said. The system also supports zero-touch provisioning that lets customers rapidly activate and configure access points after connecting to the cloud. In a white paper describing how VESPA works, Arista wrote: > The first component of VESPA involves Arista access points creating VXLAN tunnels to Arista switches serving as WLAN Gateways…. Second, as device packets arrive via the AP, it dynamically creates an Ethernet Segment Identifier (Type 6 ESI) based on the AP’s VTEP IP address. These dynamically created tunnels can scale to 30K ESI’s spread across paired switches in the cluster which provide active/active load sharing (performance+HA) to the APs. Third, the gateway switches use Type 2 EVPN NLRI (Network Layer Reachability Information) to learn and exchange end point MAC addresses across the cluster. … With this architecture, adding more EVPN WLAN gateways scales both AP and user connections, to tens of thousands of end points. > > To manage the forwarding information for hundreds of thousands of clients (e.g: FIB next hop and rewrite) would prove very complex and expensive if using conventional networking solutions. Arista’s innovation is to distribute this function across the WiFi access points with a unique MAC Rewrite Offload feature (MRO). With MRO, the access point is responsible for servicing mobile client ARP requests (using its own mac address), building a localized MAC-IP binding table, and forwarding client IP addresses to the WLAN gateways with the APs MAC address. The WLAN Gateways therefore only learns one (MAC) address for all the clients associated with the AP. This improves the gateway’s scaling from 10X to 100X, allowing these cost effective gateways to support hundreds of thousands of clients attached to the APs. ### AVA system gets a boost In addition to the new wireless technology, Arista is also bolstering the capabilities of its natural-language, generative AI-based Autonomous Virtual Assist (AVA) system for delivering network insights and AIOps. AVA is aimed at providing an intelligent assistant that’s not there to replace people but rather help them do their job better, said Jeff Raymond, vice president, EOS product management and services at Arista. “AVA is a chat interface into our data lake. But instead of just providing you easy search responses, like, ‘tell me where this user is sitting,’ the chat interface uses an LLM on the back end to rationalize and reason with the telemetry information and get the end user to something that’s much more precise in terms of troubleshooting, in terms of being able to just find an issue more quickly, or be able to even prevent an issue,” Raymond said. The LLM component is new and has been in product preview until now. With the AVA insights component, “we’re looking at being able to use our best practices, our technical expertise, but codified,” he said. Then we’re able “to run those as agents, so that we can look for problems in a network and be able to proactively alert a user before they happen.” Arista has also added the ability to handle multi-domain event correlation across wired, wireless, data center, and security to isolate a root cause of a problem. AVA can then perform continuous monitoring and automated root cause analysis for proactive issue identification, Raymond said. AVA will also add support for new agents and components in the network – such as VESPA and for the VeloCloud SD/WAN technology the company recently purchased from Broadcom, Raymond said. ### Ruggedized switches on tap Lastly, Arista rolled out it first ruggedized switches targeting industrial networking edge applications. The switches include a 20-port DIN Rail switch used in industrial control panels and a 1RU 24-port switch. They support multi-gigabit ports and high-power PoE (90W). Both are engineered for harsh industrial conditions and can tolerate extreme temperatures, vibrations, shocks, and more demanding physical conditions than standard enterprise switches, Arista stated. VESPA and the ruggedized switches are expected to be available by the first quarter of 2026.
www.networkworld.com
December 11, 2025 at 2:03 PM
Cybersecurity skills matter more than headcount in an AI era: ISC2 study
Cybersecurity teams are navigating a shift as skills shortages overtake headcount as the primary concern, according to ISC2’s 2025 Cybersecurity Workforce Study. The research, based on responses from some 16,029 cybersecurity professionals globally, reveals that while budget cuts and layoffs have leveled off after last year’s surge, the pressure on security teams has intensified. ISC2, a nonprofit member organization for cybersecurity professionals, found that cybersecurity workforce budget limitations remain a key driver of staff shortages, with 33% of respondents stating that their organizations do not have enough resources to “adequately” staff their teams. Another 29% of respondents said they cannot afford to hire staff with the skills they need to “adequately secure their organizations,” this year’s study found. And nearly three-fourths (72%) of respondents said that they believe reducing security personnel “significantly increases the risk of a breach in their organizations,” according to ISC2. Economic conditions affecting cybersecurity budgets showed signs of stabilizing in 2025, according to ISC2, with reports of budget cuts dropping to 36% (down one percentage point from 2024) and layoffs declining to 24% (also down one point). Still, underlying workforce challenges remain. “Based on what we’re seeing in the data and the sentiment of cybersecurity professionals globally, there is no indication that budget cuts or layoffs will accelerate significantly in 2026,” says Casey Marks, Chief Operating Officer at ISC2. “Economic conditions will always play an important role in workforce development and enablement. However, the overall outlook does not suggest a worsening trend in 2026.” ## Skills gaps drive security consequences The study highlights a critical trend: Nearly 90% of respondents (88%) have experienced at least one significant cybersecurity event in their organizations due to skills shortages, with 69% reporting more than one event. The severity of skills needs has grown substantially, with 95% of respondents reporting at least one skill need (up 5% from 2024) and 59% citing critical or significant skills gaps (a 15% increase from the previous year). “A shift is happening. This year’s data makes it clear that the most pressing concern for cybersecurity teams isn’t headcount but skills,” said Debra Taylor, ISC2 Acting CEO and CFO, in a statement. “Skills deficits raise cybersecurity risk levels and challenge business resilience.” Organizations have experienced oversights in cybersecurity processes and procedures (26%), been forced to put underqualified or inexperienced people into roles to cover them (25%), are lacking the time or resources to train cybersecurity staff (25%), and are dealing with misconfigured systems (24%), according to this year’s study. The report also states “Another commonly cited (24%) outcome of skills shortages is that parts of the organization are left under-secured and staff are unable to take advantage of emerging cybersecurity technologies (24% each),” the report states. While the study doesn’t tie security consequences to specific technical domains, the number of consequences shows how capability development has become more critical than simply adding headcount, Marks says. “AI and cloud security continue to stand out as the most urgent skills needs from both hiring managers and cybersecurity professionals. Nearly everyone in the study reports at least one skills need, and most report significant ones,” Marks says. “That tells us capability development has become more critical than simply adding headcount.” ## AI adoption accelerates The research found that AI adoption is accelerating quickly, with 28% of respondents reporting that they have already integrated AI tools into their operations and 69% involved in some level of adoption, through integration, active testing, or early evaluation. “What stands out is how fast AI has moved from experimentation into day-to-day operations. More than two-thirds of respondents are already using, testing, or actively evaluating AI tools in their security programs,” Marks explains. “For those who are using them today, the majority are already seeing measurable productivity gains. That tells us that AI is quickly becoming a practice part of how security work gets done, not a future concept.” The study shows that cybersecurity professionals view AI technology as a career accelerator. The study found that 73% believe AI will create more specialized cybersecurity skills, 72% say it will necessitate more strategic cybersecurity mindsets, and 66% said they believe it will require broader skillsets across the workforce. AI remains one of the top skills needed for the second consecutive year, with 41% of respondents of the 2025 study citing it as a critical skill, followed by cloud security at 36%. According to the report, 48% of respondents are already working to gain generalized AI knowledge and skills, while “35% are educating themselves on AI solutions at risk to better understand vulnerabilities and exploits.” “The use of AI tools and the perception that AI will be a career-booster in the cybersecurity industry are prompting professionals to take proactive steps to develop and expand their knowledge and skill base to future-proof their careers,” Marks says. “They see it as a driver of new and more specialized skills, more strategic responsibilities, and broader career pathways.” ## High cybersecurity job satisfaction The research found that 87% believe there will always be a need for cybersecurity professionals, 81% are confident the profession will remain strong, and 68% are satisfied in their current job (up two percentage points from 2024). Another 80% report feeling passionate about their work. “While satisfaction with organizations and leadership varies, confidence in the profession itself remains high, and that sense of purpose is a powerful stabilizing force. Cybersecurity is a mission-driven field, and 80% reported feeling passionate about their work, while 71% are satisfied with their day-to-day experience. A large majority believe the profession will remain essential in the long term and will continue to feel passionate about their role,” Marks says. Almost half (48%) of respondents feel exhausted from trying to stay current on the latest cybersecurity threats and emerging technologies, and 47% feel overwhelmed by workload, according to the study. ISC2’s findings suggest that sustained investment in skills development—especially related to AI—realistic workload expectations, and support for continuous learning during working hours are essential. The study also found that career development is important to cybersecurity professionals. Nearly one-third (31%) of respondents said they consider advancement opportunities critical, and 23% cited unplanned financial or benefit rewards as key drivers. According to the 2025 study, 75% are likely to stay at their current organization for the next year, but that number drops to 66% when considering the next two years. The study’s findings proves that organizations must rethink their approach to cybersecurity workforce development, according to ISC2’s Marks. “The data shows tremendous energy at the individual level around AI upskilling. Nearly half of respondents are already building AI skills on their own, and many plan to pursue AI-focused qualifications,” Marks says. “Organizations are investing in development through training budgets, internal education and cross-training, but the scale of demand for AI skills is significant. Our research shows widespread individual and organizational investment in AI upskilling, with demand continuing to grow.”
www.networkworld.com
December 11, 2025 at 2:05 PM
Most significant networking acquisitions of 2025
This year is shaping up to be an active one for mergers and acquisitions. Goldman Sachs says 2025 is on pace to become the second-biggest in history for announced M&As, Reuters reports. In the networking arena, some of the biggest deals of 2025 were a long time coming — it took more than 18 months for HPE to finally close the Juniper Networks deal, for example. Some come with blockbuster price tags (like Palo Alto Networks’ $25 billion CyberArk buy), while others are less costly but still impactful. Many of the deals revolve around AI capabilities and enabling vendors to develop more robust systems for accessing and securing distributed resources. Here are more than a dozen of this year’s acquisitions, organized alphabetically by acquirer, that will help shape the future of enterprise networking. #### Akamai acquires Fermyon This month Akamai announced plans to acquire WebAssembly startup Fermyon for an undisclosed sum as the company looks to expand its own edge capabilities. Fermyon helped to develop Wasm beyond its browser foundation for server-side and edge deployments. The deal could bring more users to Wasm, which is gaining momentum as the WebAssembly System Interface (WASI) specification nears standardization. #### Arista buys VeloCloud Arista Networks acquired Broadcom’s VeloCloud SD-WAN business in July for an undisclosed sum. For Arista, the SD-WAN buy fills one of the few networking gaps the company had and boosts its SD-WAN, SASE and branch networking plans. And those plans are big: CEO Jayshree Ullal said Arista’s campus and WAN business is expected to grow from the current $750 million to $1.25 billion by the end of 2026. #### AT&T buys Lumen AT&T in May announced plans to acquire Lumen’s Mass Markets fiber business. This deal, worth $5.75 billion, is an example of how important carriers see fiber optic technology, particularly as they look to handle the expected traffic increases spurred by AI. The Lumen Mass Markets fiber assets included in the deal total about 1 million fiber subscribers across more than 4 million fiber locations, according to AT&T. #### Cisco makes two AI deals: EzDubs and NeuralFabric Last month Cisco completed its acquisition of EzDubs, a privately held AI software company with speech-to-speech translation technology. EzDubs translates conversations across 31 languages and will accelerate Cisco’s delivery of next-generation features, such as live voice translation that preserves the characteristics of speech, the vendor stated. Cisco plans to incorporate EzDubs’ technology in its Cisco Collaboration portfolio. Also in November, Cisco bought AI platform company NeuralFabric, which offers a generative AI platform that lets organizations develop domain-specific small language models using their own proprietary data. #### Coreweave buys Core Scientific Nvidia-backed AI cloud provider CoreWeave acquired crypto miner Core Scientific for about $9 billion, giving it access to 1.3 gigawatts of contracted power to support growing demand for AI and high-performance computing workloads. CoreWeave said the deal augments its vertical integration by expanding its owned and operated data center footprint, allowing it to scale GPU-powered services for enterprise and research customers. #### F5 picks up three: CalypsoAI, Fletch and MantisNet F5 acquired Dublin, Ireland-based CalypsoAI for $180 million. CalypsoAI’s platform creates what the company calls an Inference Perimeter that protects across models, vendors, and environments. F5 says it will integrate CalypsoAI’s adaptive AI security capabilities into its F5 Application Delivery and Security Platform (ADSP). F5’s ADSP also stands to gain from F5’s acquisition of agentic AI and threat management startup Fletch. Fletch’s technology turns external threat intelligence and internal logs into real-time, prioritized insights; its agentic AI capabilities will be integrated into ADSP, according to F5. Lastly, F5 grabbed startup MantisNet to enhance cloud-native observability in F5’s ADSP. MantisNet leverages extended Berkeley Packet Filer (eBPF)-powered, kernel-level telemetry to provide real-time insights into encrypted protocol activity and allow organizations “to gain visibility into even the most elusive traffic, all without performance overhead,” according to an F5 blog post. #### HPE makes it official with Juniper Finalized in July, this $13.4 billion deal basically doubled HPE’s networking business while bolstering its AI technologies. The transaction set the stage for offering a combined portfolio spanning enterprise campus, data center, service provider, and cloud networking segments, according to the Futurum Group. “The deal creates opportunities for integrated network security offerings spanning firewall, service edge, and zero-trust architectures,” the analyst firm wrote after the close of the deal. “The combined entity will compete in both ‘AI for networks’ and ‘networks for AI’ market opportunities.” #### IBM finalizes HashiCorp deal IBM’s $6.4 billion buy of HashiCorp, finalized in February, will infuse HashiCorp automation and security technology in every data center possible, Big Blue said. IBM plans to integrate HashiCorp’s automation technology into its Red Hat, watsonx, data security, IT automation, and consulting businesses. HashiCorp’s products include its flagship Terraform package, which lets customers provision infrastructure, network, and virtual components across multiple cloud providers and on-premises environments. #### Netgear acquires Exium In June 2025, Netgear acquired the privately held security vendor Exium to expand its SASE offerings. Known as a networking hardware vendor for consumers, Netgear is increasingly focused on delivering enterprise-grade security solutions for SMEs. “What I see as an opportunity, uniquely for Netgear, given what our roots are, is to address the needs of small and medium enterprise customers,” Pramod Badjate, president and general manager of Netgear for Business,told _Network World_. “They have a unique need where they want the same level of reliability as a large enterprise expects [and] they also expect support.” #### Nokia purchases Infinera This $2.3 billion deal brought Nokia a ton more optical and dense wavelength-division multiplexing (DWDM) technology that it will use to bolster its hyperscaler and carrier class offerings. #### Palo Alto Networks grabs CyberArk Announced in July, this $25 billion deal gives Palo Alto a significant boost for its network access and identity management portfolio. “Palo Alto is positioning this acquisition as the ultimate leap toward securing machine and agent identities – one of the hottest frontiers in the rapidly emerging era of AI-driven threats,” the Everest Group wrote in a blog post about the acquisition. “It signals a seismic shift in how the biggest cybersecurity providers hope to position themselves as indispensable partners for the AI-powered enterprise.” #### Qualcomm takes Alphawave Semi Looking to expand its data center networking and compute offerings, Qualcomm grabbed British hardware maker Alphwave Semi for $2.4 billion. Alphawave Semi has a variety of wired connectivity and compute technologies, including custom silicon, chiplets, ASIC, and semiconductor intellectual property. The goal is to pair Qualcomm processors with Alphawave’s high-speed connectivity and compute technologies to support increasingly intense AI workloads. “If you wanted a super strong indicator that Qualcomm was serious about playing in the datacenter CPU market, this is it,” said Matt Kimball, vice president and principal analyst at Moor Insights & Strategy, when the deal was announced in June.
www.networkworld.com
December 11, 2025 at 2:04 PM