ksvcreate: Process(m000) creation failed

简介:

一测试服务器数据库(Oracle Database 10g Release 10.2.0.5.0 - 64bit Production)突然访问不了,检查发现数据库处于挂起模式(hang mode),检查告警日志,发现有“ksvcreate: Process(m000) creation failed”,”kkjcre1p: unable to spawn jobq slave process“之类的错误信息。具体如下所示:

Sun Jan 17 09:56:05 CST 2016
Thread 1 advanced to log sequence 1729 (LGWR switch)
  Current log# 1 seq# 1729 mem# 0: /u01/oradata/SCM2/redo01.log
Sun Jan 17 21:34:01 CST 2016
Thread 1 advanced to log sequence 1730 (LGWR switch)
  Current log# 2 seq# 1730 mem# 0: /u01/oradata/SCM2/redo02.log
Mon Jan 18 09:06:00 CST 2016
ksvcreate: Process(m000) creation failed
Mon Jan 18 09:12:50 CST 2016
WARNING: inbound connection timed out (ORA-3136)
Mon Jan 18 09:37:13 CST 2016
Thread 1 advanced to log sequence 1731 (LGWR switch)
  Current log# 3 seq# 1731 mem# 0: /u01/oradata/SCM2/redo03.log
Mon Jan 18 09:43:10 CST 2016
kkjcre1p: unable to spawn jobq slave process 
Mon Jan 18 09:43:10 CST 2016
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_586.trc:

clip_image001

当时开发人员急着测试,没时间给我研究具体原因,所以就重启了数据库实例(不能通过shutdown immeidate关闭,只能通过shutdown abort关闭)。

关于告警日志里面的错误信息,我们看出m000进程创建失败,PMON进程无法启动该进程。一般情况下,PMON无法启动进程原因有下面一些:

1、Oracle连接数超过进程数限制。(正是由于Oracle达到了进程数限制,进而PMON无法创建m000进程)

2、进程死锁。

Bug 8426816 PMON may hang cleaning up a dead process (rare)

clip_image002

3、Bug引起的

Database hangs With Message 'Ksvcreate: Process(M001) Creation Failed' (文档 ID 1233079.1)

clip_image003

事后我检查了一下v$resource_limit,发现会话连接数、进程数并没有超。那么完全可以排除这个因素,那么现在就有可能是进程死锁或bug造成的

clip_image004

同事在检查过程中发现Physic memory资源严重不足,引起了Swap频繁读写。继续检查SGA参数发现sga_max_size、sga_target设置过大(这台测试服务器是 虚拟机做的克隆,生产环境的RAM为64G,SGA也设置较大,克隆过后ORACLE实例启动不了,调整了SGA_TARGET、 SGA_MAX_SIZE等参数后才启动成功,但是不知为什么sga_max_size设置了成了11264M(11G),有可能是当时要设置为1G多, 因为物理内存才3G多,但是不知是手抖了还是搞晕了,当然也不排除后面被人改掉,居然设置成了11264M大小,汗颜啊。居然运行了这么久直到最近才出现 问题,测试数据库基本不会做巡检)

clip_image005

clip_image001[6]

然后在Troubleshooting Guide (TSG) - Ksvcreate: Process(xxxx) Creation Failed / ORA-00445: Background Process "xxxx" Did Not Start After n Seconds (文档 ID 1379200.1) 里面发现当OS的资源或设置不正确时,尤其是物理内存或swap不足时,将会导致不能生成新的进程。英文原文如下:

OS Configuration Checks

This error may be observed due to lack of OS resources or incorrect configuration, typically memory or swap may be insufficient to spawn a new process. Please check the list below to verify the OS settings and configuration

当然关于这点我和同事有些争议。不过我认为是这些导致数据库出现这些问题的。修改SGA相关参数应该能解决这个问题,不过还需观察一段时间。

 

另外,关于kswapd0进程,在博客调整linux内核尽量用内存,而不用swap里面有较详细介绍,摘抄部分内容如下所示:

Linux uses kswapd for virtual memory management such that pages that havebeen recently accessed are kept in memory and less active pages are paged outto disk.

(what is a page?)…Linux uses manages memory in units called pages.

So,the kswapd process regularly decreases the ages of unreferencedpages…and at the end they are paged out(moved out) to disk

kswapd0进程的作用:它是虚拟内存管理中,负责换页的,操作系统 每过一定时间就会唤醒kswapd ,看看内存是否紧张,如果不紧张,则睡眠,在 kswapd 中,有2 个阀值,pages_hige 和 pages_low,当空闲内存页的数量低于 pages_low的时候,kswapd进程就会扫描内存并且每次释放出32 个free pages,直到 free page 的数量到达pages_high。

physical mem 不足,引起 swap 频繁读写。kswapd0 是系统的虚拟内存管理程序,如果物理内存不够用,系统就会唤醒 kswapd0 进程,由 kswapd0 分配磁盘交换空间作缓存,因而占用大量的 CPU 资源。

  

相关文章
|
6月前
|
JSON 数据格式
【ERROR】Error: transaction invalidated with status (ENDORSEMENT_POLICY_FAILURE)
【ERROR】Error: transaction invalidated with status (ENDORSEMENT_POLICY_FAILURE)
49 0
|
7月前
|
安全
Error:Execution failed for task ':transformClassesAndResourcesWithProguardForRelease'
Error:Execution failed for task ':transformClassesAndResourcesWithProguardForRelease'
61 0
|
安全 Windows
CRITICAL_PROCESS_DIED
CRITICAL_PROCESS_DIED
4543 2
|
Java 应用服务中间件
Unable to start ServletWebServerApplicationContext due to missing ServletWebServerFactory bean.
Unable to start ServletWebServerApplicationContext due to missing ServletWebServerFactory bean.
Unable to start ServletWebServerApplicationContext due to missing ServletWebServerFactory bean.
Error:svn:E155037:Previous operation has not finished; run ‘cleanup‘ if it was interrupted(完美解决)
Error:svn:E155037:Previous operation has not finished; run ‘cleanup‘ if it was interrupted(完美解决)
428 0
Error:svn:E155037:Previous operation has not finished; run ‘cleanup‘ if it was interrupted(完美解决)
|
Go iOS开发
The operation couldn’t be completed. Unable to log in with account 'myappleid'. An unexpected failure occurred while logging in (Underlying error code 1100).解决方法
The operation couldn’t be completed. Unable to log in with account 'myappleid'. An unexpected failure occurred while logging in (Underlying error code 1100).解决方法
474 0
Starting a Gradle Daemon, 5 busy and 1 incompatible and 1 stopped Daemons could not be reused, use --status for details FAILURE: Build failed with an
执行gradle build出的问题,查看hs_err_pid11064.log日志文件发现,是电脑的RAM不足导致
4153 0
|
Java
Unable to start EmbeddedWebApplicationContext due to missing EmbeddedServletContainerFactory bean.
自己新建的Maven 项目,然后通过修改 pom.xml 转为 Spring Boot 项目,出现此问题。 启动日志如下: org.springframework.context.ApplicationContextException: Unable to start embedded container; nested exception is org.
2051 0