runc container breakouts via procfs writes: CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881

简介: There are 3 High severity vulnerabilities in runc. As maintainers of OCI Runtime, I strongly recommend updating.

| NOTE: This advisory was sent to security-announce@opencontainers.org
| on 2025-10-16. If you ship any Open Container Initiative software, we
| highly recommend that you subscribe to our security-announce list in
| order to receive more timely disclosures of future security issues.
| The procedure for subscribing to security-announce is outlined here:
| https://github.com/opencontainers/.github/blob/main/SECURITY.md#disclosure-distribution-list

Hello,

This is a notification to vendors that use or ship runc about THREE (3)
high-severity vulnerabilities (CVE-2025-31133, CVE-2025-52565, and
CVE-2025-52881). All three vulnerabilities ultimately allow (through
different methods) for full container breakouts by bypassing runc's
restrictions for writing to arbitrary /proc files.

Today we have released the following runc releases which include more
than 20 patches to resolve this issue:

We strongly recommend you update as soon as possible. For your own
reference I have attached a tarball of the patches (which apply cleanly
on top of runc v1.2.7, v1.3.2 and v1.4.0-rc.2).

Unfortunately the patches are are quite large as they required a lot of
development work in github.com/cyphar/filepath-securejoin along with
quite deep changes to runc. I would recommend just going with the
released versions.

Note that these patches have not been split into per-CVE patches, as the
resolutions for each issue overlap and so some patches help resolve more
than one CVE on the list. We strongly recommend simply applying all of
the provided patches (we have included a squashed single-patch version
for your convenience -- see v1.[234].patch).

| NOTE:
| Some vendors were given a pre-release version of this release.
| These public releases include two extra patches to fix regressions
| dIscovered very late during the embargo period and were thus not
| included in the pre-release versions. Please update to this version.
| The above tarball includes these extra patches as well.

/ Vulnerabilities /

Below is a break-down of the key points of each issue. Once this
vulnerability is made public on the embargo date, the linked advisory
pages will contain some more information about the issues.

Please note that while these issues are generally related, the available
mitigations (if any) vary from issue to issue. However, all of these
attacks rely on starting containers with custom mount configurations --
if you do not run untrusted container images from unknown or unverified
sources then these attacks would not be possible to exploit. Note that
Dockerfiles support custom mount configurations (with RUN --mount=...)
and so these issues are also exploitable from Dockerfiles.

Also please note that the below CVSS scores are based on the threat
model from runc's point of view. If you were to analyse the same
vulnerability from the perspective of network-enabled systems like
Docker or Kubernetes you would likely end up with a much higher
severity.

/ CVE-2025-31133 /

"container escape via 'masked path' abuse due to mount race conditions"

CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H (7.3)

https://github.com/opencontainers/runc/security/advisories/GHSA-9493-h29p-rfm2

CVE-2025-31133 exploits an issue with how masked paths are implemented
in runc. When masking files, runc will bind-mount the container's
/dev/null inode on top of the file. However, if an attacker can replace
/dev/null with a symlink to some other procfs file, runc will instead
bind-mount the symlink target read-write. This issue affects all known
runc versions.

This stage happens after pivot_root(2) and so cannot be used to
bind-mount host files directly. However, paths like
/proc/sys/kernel/core_pattern which can be used to break out of a
container entirely (coredump helpers are spawned as upcalls, which are
not namespaced and have full host privileges). /proc/sysrq-trigger can
also be used by an attacker to cause the host system to crash or halt.
(This is "Attack 1".)

While developing a fix for this issue, we also discovered that if the
attacker instead deleted /dev/null, runc would purposefully ignore the
error and thus make maskedPath a no-op. This is slightly less serious,
but it would permit some information disclosure through masked files
like /proc/kcore and /proc/timer_list. (This is "Attack 2".)

Potential mitigations for this issue include:

  • Using user namespaces, with the host root user not mapped into the
    container's namespace. procfs file permissions are managed using Unix
    DAC and thus user namespaces stop a container process from being able
    to write to them.

  • Not running as a root user in the container (this includes disabling
    setuid binaries with noNewPrivileges). As above, procfs file
    permissions are managed using Unix DAC and thus non-root users cannot
    write to them.

  • Depending on the maskedPath configuration (the default configuration
    only masks paths in /proc and /sys), using an AppArmor that blocks
    unexpected writes to any maskedPaths (as is the case with the default
    profile used by Docker and Podman) will block attempts to exploit
    this issue. However, CVE-2025-52881 allows an attacker to bypass LSM
    labels, and so this mitigation is not helpful when considered in
    combination with CVE-2025-52881.

  • Based on our analysis, SELinux will NOT help mitigate this issue --
    the /dev/null bind-mount used for maskedPaths get re-labeled to the
    container context and thus the container will have access to them.

Thanks to Lei Wang (@ssst0n3 from Huawei) for finding and reporting the
original vulnerability (Attack 1), and Li Fubang (@lifubang from
acmcoder.com, CIIC) for discovering another attack vector (Attack 2)
based on @ssst0n3's initial findings.

/ CVE-2025-52565 /

"container escape with malicious config due to /dev/console mount and related races"

CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H (7.3)

https://github.com/opencontainers/runc/security/advisories/GHSA-qw9x-cqr3-wc7r

CVE-2025-52565 is very similar in concept and application to
CVE-2025-31133, except that it exploits a flaw in /dev/console
bind-mounts. When creating the /dev/console bind-mount (to /dev/pts/$n),
if an attacker replaces /dev/pts/$n with a symlink then runc will
bind-mount the symlink target over /dev/console. This issue affects all
versions of runc >= 1.0.0-rc3.

As with CVE-2025-31133, this happens after pivot_root(2) and so cannot
be used to bind-mount host files directly, but an attacker can trick
runc into creating a read-write bind-mount of
/proc/sys/kernel/core_pattern or /proc/sysrq-trigger, leading to a
complete container breakout (as with CVE-2025-31133).

While developing a fix for this issue, we also found some potentially
concerning issues with os.Create usage (which may have allowed for host
files to be truncated by an attacker) -- though we deemed these issues
to not be exploitable, we have provided fixes for them. In addition,
some previously known issues with /dev/pts/$n race conditions were
re-analysed and we have included mitigations for them too (even though
we still feel these are mostly hypothetical issues).

Potential mitigations for this issue include:

  • Using user namespaces, with the host root user not mapped into the
    container's namespace. procfs file permissions are managed using Unix
    DAC and thus user namespaces stop a container process from being able
    to write to them.

  • Not running as a root user in the container (this includes disabling
    setuid binaries with noNewPrivileges). As above, procfs file
    permissions are managed using Unix DAC and thus non-root users cannot
    write to them.

  • The default SELinux policy should mitigate this issue, as the
    /dev/console bind-mount does not re-label the mount and so the
    container process should not be able to write to unsafe procfs files.
    However, CVE-2025-52881 allows an attacker to bypass LSM labels, and
    so this mitigation is not helpful when considered in combination with
    CVE-2025-52881.

  • The default AppArmor profile used by most runtimes will NOT help
    mitigate this issue, as /dev/console access is permitted. You could
    create a custom profile that blocks access to /dev/console, but such
    a profile might break regular containers. In addition, CVE-2025-52881
    allows an attacker to bypass LSM labels, and so that mitigation is
    not helpful when considered in combination with CVE-2025-52881.

Known Issues:

  • We are aware of an issue with our mitigation for this attack and certain configurations

Thanks to Lei Wang (@ssst0n3 from Huawei) and Li Fubang (@lifubang from
acmcoder.com, CIIC) for discovering and reporting the main /dev/console
bind-mount vulnerability, as well as Aleksa Sarai (@cyphar from SUSE)
for discovering the related issues mentioned above as well as the
original research into these classes of issues several years ago.

/ CVE-2025-52881 /

"container escape and denial of service due to arbitrary write gadgets and procfs write redirects"

CVSS:4.0/AV:L/AC:L/AT:P/PR:L/UI:A/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H (7.3)

https://github.com/opencontainers/runc/security/advisories/GHSA-cgrx-mc8f-2prm

This attack is a more sophisticated variant of CVE-2019-16884, which was
CVE-2019-19921
a flaw that allowed an attacker to trick runc into writing the LSM
process labels for a container process into a dummy tmpfs file and thus
not apply the correct LSM labels to the container process. The
mitigation we applied for CVE-2019-19921 was fairly limited and
effectively only caused runc to verify that when we write LSM labels
that those labels are actual procfs files. This issue affects all known
runc versions.

Rather than using a fake tmpfs file for /proc/self/attr/, an
attacker could instead (through various means) make
/proc/self/attr/ reference a real procfs file, but one that would
still be a no-op (such as /proc/self/sched). This would have the same
effect but would clear the "is a procfs file" check.

We were aware that this kind of attack would be possible (even going so
far as to discuss this publicly as "future work" at conferences), and we
were working on a far more comprehensive mitigation of this attack, but
this security issue was disclosed before we could complete this work.

This attack pairs well with CVE-2025-31133 and CVE-2025-52565, as the
most basic version described above acts as an LSM bypass that makes it
easy for an attacker to write to procfs files and break out of a
container.

However, rather than just making the write a no-op, the attacker could
instead redirect the write to a more malicious target (such as
/proc/sysrq-trigger to crash the host machine). In addition, sysctl
writes could be similarly redirected, so it is plausible an attacker
would be able to provide a custom payload to write, allowing for a
/proc/sys/kernel/core_pattern-based full container breakout.

This lead us to do a complete audit for all write operations in runc, as
any write operation could potentially be redirected in a similar way --
we did not find any more problematic writes in our analysis but we are
still investigating the possibility of using lints or static analysis to
detect this kind of issue.

Potential mitigations for this issue include:

  • Using rootless containers, as doing so will block most of the
    inadvertent writes (runc would run with reduced privileges, making
    attempts to write to procfs files ineffective).

  • Based on our analysis, neither AppArmor or SELinux can protect
    against the full version of the redirected write attack. The
    container runtime is generally privileged enough to write to
    arbitrary procfs files, which is more than sufficient to cause a
    container breakout.

    With SELinux, it is possible that the container_runtime_t label
    applied to runc will restrict how much runc can do with the no-op
    variant of the attack, but it seems to us that the
    /proc/sysrq-trigger host crash and /proc/sys/kernel/core_pattern
    container breakout attacks would still work.

Thanks to Li Fubang (@lifubang from acmcoder.com, CIIC) and Tõnis Tiigi
(@tonistiigi from Docker) for both independently discovering this
vulnerability, as well as Aleksa Sarai (@cyphar from SUSE) for the
original research into this class of security issues and solutions over
the past few years.

/ Other Container Runtimes /

These issues are all very easy-to-make logic flaws, and as such we
contacted several other container runtimes to alert them of these issues
and provide them our analysis.

Our current understanding is that youki and crun have similar flaws and
are working on patches to be released in co-ordination with this
advisory. LXC appears to have some similar bugs but their security
policy is (understandably) that non-user-namespaced containers are
fundamentally insecure and thus such exploits are not security issues.

If you use a container runtime other than runc, please check whether
upstream has released a security update addressing these (or similar)
issues once this issue becomes public.

If you are a container runtime author that we did not contact, please
get in touch with me at cyphar@cyphar.com to get added to the
cross-runtime security group. Please note that this group is intended
for low-level container runtime upstream maintainers only.

/ Extra Patches /

There were three issues with these patches which we became aware of
quite late in the embargo process. We have included new patches in the
released versions linked above to address two of them, but these patches
were not included in the pre-release tarballs provided to vendors:

  • 00-openat2-improve-resilience-on-busy-systems.patch
  • 00-rootfs-re-allow-dangling-symlinks-in-mount-targets.patch

Note that these are NOT security issues, they are usability
regressions that may affect some users depending on what images they use
and what kind of systems they run their containers on.

Below is the description provided to vendors, for your own reference,
but the issues listed have been fixed (with the exception of the last
issue, which is still being investigated).

/ openat2 EAGAIN Retry Failures /

openat2 will return -EAGAIN if there was a racing rename or mount when
trying to walk into ".." during a scoped lookup. On systems with heavy
load, this can happen fairly frequently. In the version of the patches
we merged, runc would retry every openat2 operation up to 32 times
before failing with an error in order to mitigate this while also
avoiding denial-of-service attacks.

Unfortunately, it seems this number was too conservative and some
vendors have reported seeing this error:

runc run failed: unable to start container process: error during container init: error mounting "$source" to rootfs at "$destination": create mountpoint for $destination mount: lookup mountpoint target: securejoin.OpenInRoot $destination: openat2 $destination: possible attack detected

Based on my testing, the worst-case failure rate for this is probably
around 3% (this is based on figures from me running very aggressive
rename loops on all 16 cores of my laptop). It is probably lower for
production deployments that have less aggressive rename and mount churn,
but it was a detectable regression for some downstreams.

00-openat2-improve-resilience-on-busy-systems.patch is a patch that
resolves this issue. The simplest mitigation is to just bump the retry
number (which this patch does), but I have also included some additional
retries with a time-based deadline that in my testing should be
virtually impossible to hit even in very high load scenarios (I was
unable to hit the error even after running >50k tests in a tight loop).

Some vendors have reported that this reduced the failure rate to
effectively 0 after 3-4 days of heavy load testing.

/ Dangling Symlink Mount Targets /

Due to the hardening work done for mounts in the provided patchsets, it
was necessary to block certain configurations that could not be done
safely in a reasonable way. One of these configurations is mount targets
that contain symlinks to non-existent paths (otherwise known as
"dangling symlinks"). With these patches, such configurations will
result in the following error:

runc create failed: unable to start container process: error during container init: error mounting "$source" to rootfs at "$destination": create mountpoint for $destination mount: make mountpoint "$destination": file exists

The workaround is to either change the symlink to point to a real path
or create the target of the dangling symlink (previously, runc would do
this for you). A survey of public images indicates that this pattern is
incredibly rare (the one example I've been given is of a broken
/etc/resolv.conf symlink), and in addition these kinds of symlinks are
quite hard to deal with in a sane and safe manner.

This change in behaviour was intentional, but after receving reports
from more than one downstream, I took another look and wrote a hotfix
that should allow us to continue to support these broken symlinks.
00-rootfs-re-allow-dangling-symlinks-in-mount-targets.patch is that
patch.

However, we still strongly suggest users refrain from creating images
with such broken symlinks.

/ Issues with "-v /dev:/dev" /

At SUSE, we found an example of a developer tool creating a bind-mount
of the host /dev into the container. For reasons that are not entirely
clear to me yet, this setup appears to have worked previously but can
now lead to permission issues with rootless containers with our
mitigating patches, with typical errors looking like:

exec failed: unable to start container process: reopen ptmx to get new pty pair: reopen fd 11: permission denied

I have not yet been able to root-cause this issue (I suspect that
ptmxmode=000 has some part to play here), but I would argue that such
setups are not particularly safe nor recommended, and users should
instead be doing --mount type=devpts,... if they have a strong need to
configure the /dev/pts mount (which is what our tool was trying to do
and had already been patched in newer versions to do properly).

If you have seen this issue or have any other information, feel free to
open a bug report.

/ Credits /

Thanks again to the following researchers for helping discover and
report these vulnerabilities:

  • Lei Wang (@ssst0n3 from Huawei)
  • Li Fubang (@lifubang from acmcoder.com, CIIC)
  • Tõnis Tiigi (@tonistiigi from Docker)
  • Aleksa Sarai (@cyphar from SUSE)

Additional thanks go to Tõnis Tiigi for showing that Dockerfiles can be
used to exploit these issues, and thus providing us with some very
useful exploit templates for these kinds of race attacks.

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

目录
相关文章
|
1月前
|
数据采集 监控 搜索推荐
低至 1% 性能损耗:阿里云 ARMS 配置模板如何实现精准可控的 JMX 数据采集
APM 一定要全量采集吗?ARMS 推出配置模板,支持按场景分级监控:核心应用上 Trace,非核心只采 JVM,成本直降 90%+ !
212 38
|
1月前
|
人工智能 自然语言处理 数据可视化
2025 ChatBI 产品选型推荐:智能问数+归因分析+报告生成
当企业站在 ChatBI 选型的十字路口,技术架构的先进性、场景适配的完整性、落地实践的可验证性应成为核心考量标准。
|
1月前
|
人工智能 算法 安全
所谓“十大GEO公司第一名”靠不靠谱?
AI搜索时代,GEO成流量新战场。各类“TOP1”宣传泛滥,实则多为算法漏洞投机者。企业应警惕虚假头衔,关注技术实力、方法论与案例真实性,选择真正具备研发能力的GEO服务商,构建长期数字竞争力。
|
11天前
|
人工智能 Java API
【Azure AI Search】如何通过Entra ID RBAC认证连接中国区 Azure AI Search
本文介绍如何在Java SDK中配置中国区AI Search资源访问。由于默认认证地址为全球环境(https://search.azure.com),在中国区需修改为https://search.azure.cn,并通过设置SearchAudience.AZURE_CHINA解决认证失败问题,确保资源正常获取。
100 18
|
6天前
|
移动开发 小程序 前端开发
小程序开发平台有哪些?哪个好
小程序项目落地的第一步,也是最关键的一步,就是开发平台的精准选型。它不仅影响项目的开发周期与成本投入,更直接决定了后续业务的适配度和运营上限。企业需结合自身技术能力、预算区间、功能需求等核心要素综合权衡。本文将对主流小程序开发平台进行分类拆解,通过详细对比和场景化推荐,帮助不同类型的企业找到最契合的解决方案。
124 9
|
6天前
|
存储 运维 vr&ar
实时云渲染与云桌面解析(二):从云桌面到实时云渲染:图形计算云化的下一站
实时云渲染技术通过云端渲染、终端显示的模式,解决了延迟和性能问题,支持多端接入和快速部署。相比云桌面,实时云渲染更适用于3D设计、VR等图形密集型场景,具有低延迟、弹性扩展等优势。随着5G和边缘计算发展,实时云渲染正推动图形计算向"云-边-端"协同演进,成为数字化转型的重要技术支撑。
|
9天前
|
人工智能 运维 Serverless
AgentScope 拥抱函数计算 FC,为 Agent 应用提供 Serverless 运行底座
AgentScope推出Serverless运行时,直面AI Agent部署成本高、运维复杂、资源利用率低三大痛点。通过“按需启动、毫秒弹性、零运维”架构,实现低成本、高弹性、强隔离的智能体部署,助力多智能体应用从实验迈向规模化落地。
|
24天前
|
JavaScript Java 关系型数据库
基于springboot的图书馆座位预约系统
针对高校图书馆座位紧张与管理低效问题,本研究设计并实现了一套基于Spring Boot、Vue.js与MySQL的智能预约系统。系统通过移动端实现座位实时查询、预约、签到及违规管理,提升资源利用率与用户体验。采用Java语言开发,结合前后端分离架构,支持高并发访问,解决传统人工管理排队久、监管难等问题。对比国内外现有方案,本系统在智能化分配、稳定性与可扩展性方面更具优势,助力智慧校园建设,具有良好的应用推广价值。
|
23天前
|
区块链 数据安全/隐私保护 计算机视觉
FSViewer看图软件安装教程!可以批量格式转换、批量重命名、批量压缩的看图软件(还有其他几款看图软件可以看看)
FSViewer是一款功能强大的免费看图软件,支持BMP、JPG、PNG、GIF、RAW等主流图片格式,具备快速浏览、批量格式转换、重命名、压缩及图片编辑功能,操作简便,适合日常图像处理需求。
219 72
|
2天前
|
人工智能 运维 监控
开源项目分享 : Gitee热榜项目 2025-12-13 日榜
本文整理Gitee当日热门开源项目,涵盖AI智能体、低代码开发、数字人、容器化部署等前沿技术。聚焦智能化、降本增效与垂直场景应用,展现AI工程化、全栈融合与技术普惠趋势,助力开发者把握开源脉搏。
71 14