When the SMB server reboots and the client immediately accesses the mount
point, a race condition can occur that causes operations to fail with
"Host is down" error.
Reproduction steps:
# Mount SMB share
mount -t cifs //192.168.245.109/TEST /mnt/ -o xxxx
ls /mnt
# Reboot server
ssh root@192.168.245.109 reboot
ssh root@192.168.245.109 /path/to/cifs_server_setup.sh
ssh root@192.168.245.109 systemctl stop firewalld
# Immediate access fails
ls /mnt
ls: cannot access '/mnt': Host is down
# But works if there is a delay
The issue is caused by a race condition between negotiate and reconnect.
The 20-second negotiate timeout mechanism can interfere with the normal
recovery process when both are triggered simultaneously.
ls cifsd
---------------------------------------------------
cifs_getattr
cifs_revalidate_dentry
cifs_get_inode_info
cifs_get_fattr
smb2_query_path_info
smb2_compound_op
SMB2_open_init
smb2_reconnect
cifs_negotiate_protocol
smb2_negotiate
cifs_send_recv
smb_send_rqst
wait_for_response
cifs_demultiplex_thread
cifs_read_from_socket
cifs_readv_from_socket
server_unresponsive
cifs_reconnect
__cifs_reconnect
cifs_abort_connection
mid->mid_state = MID_RETRY_NEEDED
cifs_wake_up_task
cifs_sync_mid_result
// case MID_RETRY_NEEDED
rc = -EAGAIN;
// In smb2_negotiate()
rc = -EHOSTDOWN;
The server_unresponsive() timeout triggers cifs_reconnect(), which aborts
ongoing mid requests and causes the ls command to receive -EAGAIN, leading
to -EHOSTDOWN.
Fix this by introducing a dedicated `neg_start` field to
precisely tracks when the negotiate process begins. The timeout check
now uses this accurate timestamp instead of `lstrp`, ensuring that:
1. Timeout is only triggered after negotiate has actually run for 20s
2. The mechanism doesn't interfere with concurrent recovery processes
3. Uninitialized timestamps (value 0) don't trigger false timeouts
Fixes:
7ccc1465465d ("smb: client: fix hang in wait_for_response() for negproto")
Signed-off-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
__le32 session_key_id; /* retrieved from negotiate response and send in session setup request */
struct session_key session_key;
unsigned long lstrp; /* when we got last response from this server */
+ unsigned long neg_start; /* when negotiate started (jiffies) */
struct cifs_secmech secmech; /* crypto sec mech functs, descriptors */
#define CIFS_NEGFLAVOR_UNENCAP 1 /* wct == 17, but no ext_sec */
#define CIFS_NEGFLAVOR_EXTENDED 2 /* wct == 17, ext_sec bit set */
/*
* If we're in the process of mounting a share or reconnecting a session
* and the server abruptly shut down (e.g. socket wasn't closed, packet
- * had been ACK'ed but no SMB response), don't wait longer than 20s to
- * negotiate protocol.
+ * had been ACK'ed but no SMB response), don't wait longer than 20s from
+ * when negotiate actually started.
*/
spin_lock(&server->srv_lock);
if (server->tcpStatus == CifsInNegotiate &&
- time_after(jiffies, server->lstrp + 20 * HZ)) {
+ time_after(jiffies, server->neg_start + 20 * HZ)) {
spin_unlock(&server->srv_lock);
cifs_reconnect(server, false);
return true;
server->lstrp = jiffies;
server->tcpStatus = CifsInNegotiate;
+ server->neg_start = jiffies;
spin_unlock(&server->srv_lock);
rc = server->ops->negotiate(xid, ses, server);