问题背景
lettuce 连接Redis的主从实例,当主节的主机异常下电重启后,由于没有发送RST 包,导致 lettuce 一直在复用之前的TCP链接,然后会出现连接超时的情况。一直出现io.lettuce.core.RedisCommandTimeoutException: Command timed out after
的错误,直到约15分钟后,TCP不再重传,断开连接才恢复
详细原因参考:
https://juejin.cn/post/7494573414150438927#heading-5
https://www.modb.pro/db/1711557186789908480
https://github.com/lettuce-io/lettuce-core/issues/2082
https://support.huaweicloud.com/usermanual-dcs/dcs-ug-211105002.html
解决方案
最简单的是修改客户端操作系统的TCP重传次数,例如改为 8次。就可以不再等待15分钟
sysctl -w net.ipv4.tcp_retries2=8
但是由于影响较大,也会影响主机上其他的TCP连接,此方案被排除。
于是按照官方提供的方案(参考:https://github.com/lettuce-io/lettuce-core/issues/2082
和 https://support.huaweicloud.com/usermanual-dcs/dcs-ug-211105002.html
),升级 lettuce 的依赖到 6.3.0,同时设置tcpUserTimeout
参数
升级版本并设置tcpUserTimeout
,应用日志显示参数不生效,同时验证后确实无效,问题依然存在
采用如下依赖:
<dependency><groupId>io.lettuce</groupId><artifactId>lettuce-core</artifactId><version>6.3.0.RELEASE</version>
</dependency><dependency><groupId>io.netty</groupId><artifactId>netty-transport-native-epoll</artifactId><version>4.1.100.Final</version><classifier>linux-x86_64</classifier>
</dependency>
日志显示:
2025-05-06 13:40:05.024 WARN 1939829 --- [nio-8888-exec-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
2025-05-06 13:40:05.390 WARN 1939829 --- [ioEventLoop-6-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
2025-05-06 13:40:05.394 WARN 1939829 --- [ioEventLoop-6-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
2025-05-06 13:40:05.396 WARN 1939829 --- [ioEventLoop-6-1] io.lettuce.core.ConnectionBuilder : Cannot apply TCP User Timeout options to channel type io.netty.channel.socket.nio.NioSocketChannel
分析依赖判断是netty的冲突问题于是修改依赖的配置如下:
<dependency><groupId>io.lettuce</groupId><artifactId>lettuce-core</artifactId><version>6.3.0.RELEASE</version></dependency><dependency><groupId>io.netty</groupId><artifactId>netty-all</artifactId><version>4.1.100.Final</version></dependency>
应用启动不再提示,并且在故障后30s出现重连的日志,说明tcpUserTimeout
参数生效了。
2025-05-07 09:07:59.411 INFO 2112812 --- [xecutorLoop-1-6] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was 10.50.190.43:6379
2025-05-07 09:07:59.419 INFO 2112812 --- [llEventLoop-6-3] i.l.core.protocol.ReconnectionHandler : Reconnected to 10.50.190.43:6379
Redis配置类如下:
@Configuration
public class RedisConfig {@Value("${spring.redis.host}")private String redisHost;@Value("${spring.redis.port:6379}")private Integer redisPort = 6379;@Value("${spring.redis.database:0}")private Integer redisDatabase = 0;@Value("${spring.redis.password:}")private String redisPassword;@Value("${spring.redis.connect.timeout:2000}")private Integer redisConnectTimeout = 2000;@Value("${spring.redis.read.timeout:2000}")private Integer redisReadTimeout = 2000;/*** TCP_KEEPALIVE 配置参数:* 两次 keepalive 间的时间间隔 = TCP_KEEPALIVE_TIME = 30* 连接空闲多久开始 keepalive = TCP_KEEPALIVE_TIME/3 = 10* keepalive 几次之后断开连接 = TCP_KEEPALIVE_COUNT = 3*/private static final int TCP_KEEPALIVE_TIME = 30;/*** TCP_USER_TIMEOUT 连接空闲限制时间,解决Lettuce长时间超时问题。* refer: https://github.com/lettuce-io/lettuce-core/issues/2082*/private static final int TCP_USER_TIMEOUT = 30;@Beanpublic LettuceConnectionFactory redisConnectionFactory(LettuceClientConfiguration clientConfiguration) {RedisStandaloneConfiguration standaloneConfiguration = new RedisStandaloneConfiguration();standaloneConfiguration.setHostName(redisHost);standaloneConfiguration.setPort(redisPort);standaloneConfiguration.setDatabase(redisDatabase);standaloneConfiguration.setPassword(redisPassword);LettuceConnectionFactory connectionFactory = new LettuceConnectionFactory(standaloneConfiguration, clientConfiguration);connectionFactory.setDatabase(redisDatabase);return connectionFactory;}@Beanpublic LettuceClientConfiguration clientConfiguration() {SocketOptions socketOptions = SocketOptions.builder().keepAlive(SocketOptions.KeepAliveOptions.builder()// 两次 keepalive 间的时间间隔.idle(Duration.ofSeconds(TCP_KEEPALIVE_TIME))// 连接空闲多久开始 keepalive.interval(Duration.ofSeconds(TCP_KEEPALIVE_TIME / 3))// keepalive 几次之后断开连接.count(3)// 是否开启保活连接.enable().build()).tcpUserTimeout(SocketOptions.TcpUserTimeoutOptions.builder()// 解决服务端rst导致的长时间超时问题.tcpUserTimeout(Duration.ofSeconds(TCP_USER_TIMEOUT)).enable().build())// tcp 连接超时设置.connectTimeout(Duration.ofMillis(redisConnectTimeout)).build();ClientOptions clientOptions = ClientOptions.builder().autoReconnect(true).pingBeforeActivateConnection(true).cancelCommandsOnReconnectFailure(false).disconnectedBehavior(ClientOptions.DisconnectedBehavior.ACCEPT_COMMANDS).socketOptions(socketOptions).build();LettuceClientConfiguration clientConfiguration = LettuceClientConfiguration.builder().commandTimeout(Duration.ofMillis(redisReadTimeout)).readFrom(ReadFrom.MASTER).clientOptions(clientOptions).build();return clientConfiguration;}@BeanRedisTemplate<String, Object> redisTemplate(LettuceConnectionFactory redisConnectionFactory) {RedisTemplate<String, Object> template = new RedisTemplate<>();template.setConnectionFactory(redisConnectionFactory);System.out.println("SocketOptions: " + redisConnectionFactory.getClientConfiguration().getClientOptions().get().getSocketOptions().getTcpUserTimeout().getTcpUserTimeout().toString());return template;}}