问题描述
项目组某应用在HSF Consumer调用HSF Provider的时候,在HSF Consumer端收到了如下异常信息:
01 2024-08-09 09:13:48.553 ERROR [http-bio-8080-exec-132:t.hsf] [f947c77c-999e-4219-9dd6-44b0c6577b51] [] [] RPC invoke Service[xx.xx.xx.xx.xx.xx.xx.xx:1.0.0.daily]#[xx]occur exception: target server address:[hsf://xxx.xx.xx.xx:12200?_p=hessian2&APP=0af06301-e0db-4066-a5e9-8ec42f9c6969&st=xx-xx-xx-xx-a&_ENV=DEFAULT&v=2.0&_TIMEOUT=3000&id=81241a9f-f8a0-4a8e-a5d8-4921cc7895b0&_TID=46cb6ab7-5ea5-4606-a293-78d9c2a67eee&_CONNECTTIMEOUT=1000&_SERIALIZETYPE=hessian&ig=default&_auth=y] TraceId=0a048cf417231660282283137e1a25, RpcId=0.1.2 com.taobao.hsf.exception.HSFException: 84 error message : Serialize response error on provider side.Please make sure your DO is serializable and your dependency is latest. at com.taobao.hsf.invocation.AsyncToSyncInvocationHandler.judgeException(AsyncToSyncInvocationHandler.java:387) at com.taobao.hsf.invocation.AsyncToSyncInvocationHandler.getRPCResult(AsyncToSyncInvocationHandler.java:290) at com.taobao.hsf.invocation.AsyncToSyncInvocationHandler.invokeType(AsyncToSyncInvocationHandler.java:238) at com.taobao.hsf.invocation.AsyncToSyncInvocationHandler.invoke(AsyncToSyncInvocationHandler.java:52) at com.taobao.hsf.profiler.ProfilerSyncInvocationHandler.invoke(ProfilerSyncInvocationHandler.java:35) at com.taobao.hsf.rpc.client.ErrorLogSyncInvocationHandler.invoke(ErrorLogSyncInvocationHandler.java:47) at com.taobao.hsf2dubbo.DubboClientFilterSyncInvocationHandlerInterceptor.invoke(DubboClientFilterSyncInvocationHandlerInterceptor.java:56) at com.taobao.hsf.rpc.client.ClientConcurrencyLimiter.invoke(ClientConcurrencyLimiter.java:41) at com.taobao.hsf.InvocationUtil.invoke(InvocationUtil.java:51) at com.taobao.hsf.proxy.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:31) at com.taobao.hsf.proxy.JdkProxyFactory$JdkProxyInvocationHandler.invoke(JdkProxyFactory.java:99) at com.sun.proxy.$Proxy177.recommenLessPriceCar(Unknown Source) at xx.xx.xx.xx.xx.xx.xx.xx.xx(xx.java:992) at sun.reflect.GeneratedMethodAccessor6394.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:114) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.obtainContent(SiteMeshFilter.java:129) at com.opensymphony.sitemesh.webapp.SiteMeshFilter.doFilter(SiteMeshFilter.java:77) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61) at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
问题分析
通过线程栈,首先看一下栈顶相关类的实现:
private void judgeException(HSFResponse hsfResp, int timeout) { if (hsfResp.isError()) { if (ResponseStatus.CLIENT_TIMEOUT == hsfResp.getStatus()) throw new HSFTimeOutException(getErrorLog(hsfResp.getErrorMsg()), String.valueOf(timeout)); if (hsfResp.getAppResponse() instanceof Throwable) throw new HSFException(hsfResp.getErrorMsg(), (Throwable)hsfResp.getAppResponse()); throw new HSFException(String.valueOf(hsfResp.getStatus().getCode()), hsfResp.getErrorMsg()); } }
其中错误码:84和error message是从hsfResp中获取的,接着看一下ResponseStatus定义:
SERVER_GETCODER(10, "server send coders"), OK(20, "HSF Provider process succeed."), CLIENT_TIMEOUT(30, "client timeout"), SERVER_TIMEOUT(31, "server timeout"), CLIENT_WRITE_ERROR(32, "client write error"), BAD_REQUEST(40, "bad request"), BAD_RESPONSE(50, "Some unknown exception happened in hsf provider during process IO protocol.Please check hsf.log in provider's machine."), SERVICE_NOT_FOUND(60, "service not found"), METHOD_NOT_FOUND(61, "method not found"), HTTP_INTERCEPTOR_NOT_FOUND(62, "HSF http cannot find a interceptor to process this request"), TOP_SERVICE_TRANSFORMER_NOT_FOUND(63, "Top transformer not found"), TOP_TRANSFORMER_ERROR(64, "exception happened in top transformer"), SERVICE_ERROR(70, "HSF Provider catch some exception when invoke biz method."), SERVER_ERROR(80, "server error"), THREADPOOL_BUSY(81, "Provider's HSF thread pool is full."), COMM_ERROR(82, "Communication error"), UNKNOWN_CODE(83, "Unknown code"), SERVER_SERIALIZE_ERROR(84, "Deserialize request error on provider side.Please make sure the arguments of request are serializable and your dependency is latest"), PERMISSION_VERIFY_REJECT(85, "HSF Provider reject your request because spas verify your permission failed."), UNIT_VERIFY_REJECT(86, "HSF Provider reject your request because of wrong target unit."), TPS_VERIFY_REJECT(87, "HSF Provider reject your request because of exceeding tps limit."), SERVER_CLOSING(88, "server will close soon"), CLIENT_ERROR(90, "client error"), UNKNOWN_ERROR(91, "Some unknown exception happened in hsf provider.Please check hsf.log in provider's machine."), WHITELIST_VERIFY_REJECT(92, "HSF Provider reject your request because of the consumer is not in whitelist."), CLIENT_DESERIALIZE_ERROR(100, "server send coders"), REGION_VERIFY_REJECT(101, "HSF Provider reject your request because of wrong target region."), FILTER_EXCEPTION(102, "HSF Filter exception");
其中84状态码对应的message与响应的不一致,接着看看服务端是如何处理响应的:
public ResponsePacket serverCreate(RPCResult rpcResult, Stream serverStream) { byte[] bodyBytes; long requestId = rpcResult.getResponseContext().getRequestId(); byte serializeType = rpcResult.getResponseContext().getSerializeType(); Serializer serializer = SerializerSelector.getInstance().select(serializeType); byte[] responseMapBody = null; byte status = rpcResult.getHsfResponse().getStatus().getCode(); try { if (rpcResult.isError()) { if (rpcResult.getErrorMsg() != null) { bodyBytes = rpcResult.getErrorMsg().getBytes(RemotingConstants.DEFAULT_CHARSET); } else { bodyBytes = "unknown error".getBytes(RemotingConstants.DEFAULT_CHARSET); } } else { if (status == ResponseStatus.SERVICE_ERROR.getCode()) status = ResponseStatus.OK.getCode(); bodyBytes = serializer.serialize(rpcResult.getAppResponse(), serverStream); } if (rpcResult.getResponseContext().isResponseAttachmentSupported()) { Map responseAttachmentMap = rpcResult.getAttachments(); if (responseAttachmentMap != null && !responseAttachmentMap.isEmpty()) responseMapBody = serializer.serialize(rpcResult.getAttachments(), serverStream); } } catch (Exception e) { String errorMsg = "Serialize response error on provider side.Please make sure your DO is serializable and your dependency is latest."; String errorCodeStr = LoggerHelper.getErrorCodeStr("HSF", "HSF-0037", "BIZ", errorMsg); log.error("HSF-0037", errorCodeStr + serverStream, e); bodyBytes = errorMsg.getBytes(RemotingConstants.DEFAULT_CHARSET); status = ResponseStatus.SERVER_SERIALIZE_ERROR.getCode(); } return (ResponsePacket)new HSFResponsePacket(requestId, serializeType, status, bodyBytes, responseMapBody); }
从以上代码可知,在HSF Provider序列化响应的时候,出现了异常,最终将异常信息返回给了HSF Consumer。
为了监控服务端发生了什么异常,可以通过arthas watch命令来分析:
watch com.taobao.hsf.serialize.hessian2.Hessian2Serializer serialize "{throwExp,params[0]}" -e -x 3 > /home/admin/tools/exp1.txt &
通过分析,异常发生在调用com.taobao.hsf.com.caucho.hessian.io.ContextSerializerFactory.getCustomSerializer,接下来可以通过arthas watch该方法进行分析:
watch com.taobao.hsf.com.caucho.hessian.io.ContextSerializerFactory getCustomSerializer "{throwExp,params[0]}" -e -x 3 > /home/admin/tools/exp2.txt &
通过watch,抓取到的信息如下:
method=com.taobao.hsf.com.caucho.hessian.io.ContextSerializerFactory.getCustomSerializer location=AtExceptionExit ts=2024-08-12 13:10:23; [cost=29.631753ms] result=@ArrayList[ com.taobao.hsf.com.caucho.hessian.HessianException: java.lang.NullPointerException at com.taobao.hsf.com.caucho.hessian.io.ContextSerializerFactory.getCustomSerializer(ContextSerializerFactory.java:201) at com.taobao.hsf.com.caucho.hessian.io.SerializerFactory.loadSerializer(SerializerFactory.java:292) at com.taobao.hsf.com.caucho.hessian.io.SerializerFactory.getSerializer(SerializerFactory.java:242) at com.taobao.hsf.com.caucho.hessian.io.SerializerFactory.getObjectSerializer(SerializerFactory.java:215) at com.taobao.hsf.com.caucho.hessian.io.Hessian2Output.writeObject(Hessian2Output.java:418) at com.taobao.hsf.com.caucho.hessian.io.CollectionSerializer.writeObject(CollectionSerializer.java:102) at com.taobao.hsf.com.caucho.hessian.io.Hessian2Output.writeObject(Hessian2Output.java:421) at com.taobao.hsf.com.caucho.hessian.io.MapSerializer.writeObject(MapSerializer.java:100) at com.taobao.hsf.com.caucho.hessian.io.Hessian2Output.writeObject(Hessian2Output.java:421) at com.taobao.hsf.serialize.hessian2.Hessian2Serializer.serialize(Hessian2Serializer.java:39) at com.taobao.hsf.io.remoting.hsf.HSFPacketFactory.serverCreate(HSFPacketFactory.java:145) at com.taobao.hsf.io.stream.AbstractStream.write(AbstractStream.java:102) at com.taobao.hsf.io.RpcOutput.flush(RpcOutput.java:38) at com.taobao.hsf.remoting.provider.ProviderProcessor$OutputCallback.operationComplete(ProviderProcessor.java:172) at com.taobao.hsf.remoting.provider.ProviderProcessor$OutputCallback.operationComplete(ProviderProcessor.java:147) at com.taobao.hsf.util.concurrent.AbstractListener.run(AbstractListener.java:18) at com.taobao.hsf.invocation.AbstractContextAwareRPCCallback.access$001(AbstractContextAwareRPCCallback.java:12) at com.taobao.hsf.invocation.AbstractContextAwareRPCCallback$1.run(AbstractContextAwareRPCCallback.java:27) at com.taobao.hsf.util.concurrent.WrappedListener.run(WrappedListener.java:34) at com.taobao.hsf.invocation.AbstractContextAwareRPCCallback.run(AbstractContextAwareRPCCallback.java:36) at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456) at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:817) at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:595) at com.taobao.hsf.util.concurrent.DefaultListenableFuture.addListener(DefaultListenableFuture.java:32) at com.taobao.hsf.remoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:72) at com.taobao.hsf.io.remoting.hsf.message.HSFServerHandler$1.run(HSFServerHandler.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException , @Class[ ANNOTATION=@Integer[8192], ENUM=@Integer[16384], SYNTHETIC=@Integer[4096], cachedConstructor=null, newInstanceCallerCache=null, name=@String[xx.xx.xx.xx.xx.xx.xx.xx], allPermDomain=@ProtectionDomain[ codesource=null, classloader=null, principals=@Principal[][isEmpty=true;size=0], permissions=@Permissions[java.security.Permissions@7b6bdca4 ( ("java.security.AllPermission" "<all permissions>" "<all actions>") ) ], hasAllPerm=@Boolean[true], staticPermissions=@Boolean[true], key=@Key[java.security.ProtectionDomain$Key@8cd49e], debug=null, ], useCaches=@Boolean[true], reflectionData=@SoftReference[ clock=@Long[151555156332], timestamp=@Long[151555156332], referent=@ReflectionData[java.lang.Class$ReflectionData@6882ecdf], queue=@Null[java.lang.ref.ReferenceQueue$Null@2a4b46b1], next=null, discovered=null, lock=@Lock[java.lang.ref.Reference$Lock@12928257], pending=null, ], classRedefinedCount=@Integer[0], genericInfo=@ClassRepository[ NONE=@ClassRepository[sun.reflect.generics.repository.ClassRepository@663b63b1], superclass=null, superInterfaces=null, typeParams=null, factory=null, tree=@ClassSignature[sun.reflect.generics.tree.ClassSignature@685833e7], ], serialVersionUID=@Long[3206093459760846163], serialPersistentFields=@ObjectStreamField[][isEmpty=true;size=0], reflectionFactory=@ReflectionFactory[ initted=@Boolean[true], reflectionFactoryAccessPerm=@RuntimePermission[("java.lang.RuntimePermission" "reflectionFactoryAccess")], soleInstance=@ReflectionFactory[sun.reflect.ReflectionFactory@7009fb06], langReflectAccess=@ReflectAccess[java.lang.reflect.ReflectAccess@480cc99e], noInflation=@Boolean[false], inflationThreshold=@Integer[15], ], initted=@Boolean[true], enumConstants=null, enumConstantDirectory=null, annotationData=@AnnotationData[ annotations=@EmptyMap[isEmpty=true;size=0], declaredAnnotations=@EmptyMap[isEmpty=true;size=0], redefinedCount=@Integer[0], ], annotationType=null, classValueMap=null, ], ]
接下来看看com.taobao.hsf.com.caucho.hessian.io.ContextSerializerFactory.getCustomSerializer的代码:
public Serializer getCustomSerializer(Class cl) { Serializer serializer = this._customSerializerMap.get(cl.getName()); if (serializer == AbstractSerializer.NULL) return null; if (serializer != null) return serializer; try { Class<?> serClass = Class.forName(cl.getName() + "HessianSerializer", false, cl.getClassLoader()); Serializer ser = (Serializer)serClass.newInstance(); this._customSerializerMap.put(cl.getName(), ser); return ser; } catch (ClassNotFoundException e) { log.log(Level.ALL, e.toString(), e); } catch (Exception e) { throw new HessianException(e); } this._customSerializerMap.put(cl.getName(), AbstractSerializer.NULL); return null; }
从arthas抓取的数据分析,该方法中在加载类的时候应该会抛出ClassNotFoundException,结果抛出了NullPointerException。
接下来需要思考如何复现异常及定位try...catch块中哪行代码导致了NullPointerException异常,我们可以在arthas中添加一个自定义的Command来实现,避免对程序的影响。代码如下:
"hessian") ("HSF Hessian Test") (Constants.EXAMPLE + ( " hessian cn.chinapost.jdpt.tms.api.transport.dto.ContractCollectVolumnCapacity \n" + Constants.WIKI + Constants.WIKI_HOME + "hessian") public class HSFHessianCommand extends AnnotatedCommand { private static final Logger logger = LoggerFactory.getLogger(HSFHessianCommand.class); private String classPattern; argName = "class-pattern", index = 0) ( "Class name pattern, use either '.' or '/' as separator") ( public void setClassPattern(String classPattern) { this.classPattern = classPattern; } public void process(final CommandProcess process) { RowAffect affect = new RowAffect(); Instrumentation inst = process.session().getInstrumentation(); String cp = "com.taobao.hsf.com.caucho.hessian.io.ContextSerializerFactory"; boolean isRegEx = false; String hashCode = null; List<Class<?>> matchedClasses = new ArrayList<Class<?>>(SearchUtils.searchClass(inst, cp, isRegEx, hashCode)); Class contextSerializerFactoryCls = matchedClasses.get(0); logger.info(contextSerializerFactoryCls + ",classLoader is " + contextSerializerFactoryCls.getClassLoader()); matchedClasses = new ArrayList<Class<?>>(SearchUtils.searchClass(inst, classPattern, isRegEx, hashCode)); Class ccvcCls = matchedClasses.get(0); ClassLoader ccvcCll = ccvcCls.getClassLoader(); logger.info(ccvcCls + ",classLoader is " + ccvcCll); Object contextSerializserFactory = null; try { contextSerializserFactory = MethodUtils.invokeStaticMethod(contextSerializerFactoryCls,"create",ccvcCll); Object serializer = MethodUtils.invokeMethod(contextSerializserFactory,"getCustomSerializer",ccvcCls); logger.info("=========== serializer " + serializer); }catch (Exception e){ logger.error("ContextSerializserFactory.getCustomSerializer",e); } Class<?> serClass = null; String clsName = ccvcCls.getName() + "HessianSerializer"; try{ serClass = Class.forName(clsName, false, ccvcCll); }catch (ClassNotFoundException e){ logger.error( "1_" + clsName,e); } catch (NullPointerException e){ logger.error("1_" + clsName,e); } catch (Exception e){ logger.error("1_" + clsName,e); } if(serClass != null){ try{ Object ser = serClass.newInstance(); logger.info( clsName + " instance is " + ser); }catch (Exception e){ logger.error("newInstance error",e); } } affect.rCnt(matchedClasses.size()); process.appendResult(new RowAffectModel(affect)); process.end(); } public void complete(Completion completion) { if (!CompletionUtils.completeClassName(completion)) { super.complete(completion); } } }
通过以上代码:
- 可以复现线上的空指针问题
- 可以确定是Class.forName抛出了空指针
为什么Class.forName会抛出空指针呢?Class.forName是通过native方式实现的,代码主要分布在Class.c、jvm.cpp、systemDictionary.cpp、classLoader.cpp中,也没分析出造成空指针的原因。
怀疑是不是该台机器JVM环境有问题,于是在该应用的其他机器上执行以上代码,结果抛出的是ClassNotFoundException。
总结
- 通过arthas定位出出现异常的方法
- 通过定制arthas Command定位出出现异常的具体代码