自己在测试环境部署了RocketMQ,发现namesrv很容易挂掉,于是就想着监控,挂了就发邮件通知。 查看了rocketmq-dashboard项目,发现只能监控Broker,遂放弃这一路径。 于是就从报错的日志入手,发现最终可以根据RocketMQTemplate获得可活动的NameServer。 12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598  INFO 59571 --- [tWorkerThread_2] RocketmqRemoting                         : NETTY CLIENT PIPELINE: CLOSE 192.168.240.86:9876
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598  INFO 59571 --- [tWorkerThread_2] RocketmqRemoting                         : closeChannel: the channel[192.168.240.86:9876] was removed from channel table
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598  INFO 59571 --- [tWorkerThread_2] RocketmqRemoting                         : NETTY CLIENT PIPELINE: CLOSE 192.168.240.86:9876
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598  INFO 59571 --- [tWorkerThread_2] RocketmqRemoting                         : eventCloseChannel: the channel[null] has been removed from the channel table before
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598  INFO 59571 --- [lientSelector_1] RocketmqRemoting                         : closeChannel: close the connection to remote address[192.168.240.86:9876] result: true
12月 25 13:59:25 192.168.240.65 java[59571]: 2023-12-25 13:59:25.597  INFO 59571 --- [ntScan_thread_1] RocketmqRemoting                         : createChannel: begin to connect remote host[192.168.240.86:9876] asynchronously
12月 25 13:59:25 192.168.240.65 java[59571]: 2023-12-25 13:59:25.597  INFO 59571 --- [tWorkerThread_3] RocketmqRemoting                         : NETTY CLIENT PIPELINE: CONNECT  UNKNOWN => 192.168.240.86:9876
12月 25 13:59:25 192.168.240.65 java[59571]: 2023-12-25 13:59:25.598  WARN 59571 --- [ntScan_thread_1] RocketmqRemoting                         : createChannel: connect remote host[192.168.240.86:9876] failed, AbstractBootstrap$PendingRegistrationPromise@f2a3fc5(failure: io.netty.channel.AbstractChannel$AnnotatedConnectException: 拒绝连接: /192.168.240.86:9876)根据日志可以发现是NettyRemotingClient类在做监控,持续调用,具体核心方法: org.apache.rocketmq.remoting.netty.NettyRemotingClient#createChannel
private  Channel  createChannel ( String  addr)  throws  InterruptedException  { NettyRemotingClient. ChannelWrapper  cw =  ( NettyRemotingClient. ChannelWrapper ) this . channelTables. get ( addr) ; if  ( cw !=  null  &&  cw. isOK ( ) )  { return  cw. getChannel ( ) ; }  else  { if  ( this . lockChannelTables. tryLock ( 3000L ,  TimeUnit . MILLISECONDS ) )  { try  { cw =  ( NettyRemotingClient. ChannelWrapper ) this . channelTables. get ( addr) ; boolean  createNewConnection; if  ( cw !=  null )  { if  ( cw. isOK ( ) )  { Channel  var4 =  cw. getChannel ( ) ; return  var4; } if  ( ! cw. getChannelFuture ( ) . isDone ( ) )  { createNewConnection =  false ; }  else  { this . channelTables. remove ( addr) ; createNewConnection =  true ; } }  else  { createNewConnection =  true ; } if  ( createNewConnection)  { ChannelFuture  channelFuture =  this . bootstrap. connect ( RemotingHelper . string2SocketAddress ( addr) ) ; LOGGER . info ( "createChannel: begin to connect remote host[{}] asynchronously" ,  addr) ; cw =  new  NettyRemotingClient. ChannelWrapper ( channelFuture) ; this . channelTables. put ( addr,  cw) ; } }  catch  ( Exception  var8)  { LOGGER . error ( "createChannel: create channel exception" ,  var8) ; }  finally  { this . lockChannelTables. unlock ( ) ; } }  else  { LOGGER . warn ( "createChannel: try to lock channel table, but timeout, {}ms" ,  3000L ) ; } if  ( cw !=  null )  { ChannelFuture  channelFuture =  cw. getChannelFuture ( ) ; if  ( channelFuture. awaitUninterruptibly ( ( long ) this . nettyClientConfig. getConnectTimeoutMillis ( ) ) )  { if  ( cw. isOK ( ) )  { LOGGER . info ( "createChannel: connect remote host[{}] success, {}" ,  addr,  channelFuture. toString ( ) ) ; return  cw. getChannel ( ) ; } LOGGER . warn ( "createChannel: connect remote host["  +  addr +  "] failed, "  +  channelFuture. toString ( ) ) ; }  else  { LOGGER . warn ( "createChannel: connect remote host[{}] timeout {}ms, {}" ,  new  Object [ ] { addr,  this . nettyClientConfig. getConnectTimeoutMillis ( ) ,  channelFuture. toString ( ) } ) ; } } return  null ; } } 
以NettyRemotingClient类为起点,使用Debug分析,最终可以看到完整的调用链路: 那么监控开发就很容易了,注册RocketMQTemplate,使用定时任务监听即可,示例代码如下: @Slf4j 
@Component 
public  class  MQMonitorTask  { @Resource private  RocketMQTemplate  rocketMQTemplate; @Scheduled ( cron =  "0/10 * * * * ?" ) public  void  scanNameServerBroker ( )  { org. apache. rocketmq. remoting.  RemotingClient=  rocketMQTemplate. getProducer ( ) . getDefaultMQProducerImpl ( ) . getMqClientFactory ( ) . getMQClientAPIImpl ( ) . getRemotingClient ( ) ; List < String > =  remotingClient. getNameServerAddressList ( ) ; List < String > =  remotingClient. getAvailableNameSrvList ( ) ; log. info ( "nameServerAddressList:{}" ,  JSONUtil . toJsonStr ( nameServerAddressList) ) ; log. info ( "availableNameSrvList:{}" ,  JSONUtil . toJsonStr ( availableNameSrvList) ) ; } } 另外要在SprongBoot启动类加上注解@EnableScheduling来开启定时任务。