Julien Palard
2016-11-16 09:45:47 UTC
Package: openssh-client
Version: 1:7.3p1-3+b1
Severity: normal
Dear Maintainer,
I think that the bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=714526 is still
live, after upgrading to 1:7.3p1-3+b1 (should have been fixed still
1:7.2p2-6).
I straced SSH with and without `ControlMaster` and extracted revelant
bits of information, redacting it, and removing useless lines. First,
without `ControlMaster`:
```
# Here, Python 3 Popen prepares pipes for stdin, stdout, and stderr:
7880 1479228055.059186 pipe2([8, 9], O_CLOEXEC) = 0
7880 1479228055.059226 pipe2([10, 11], O_CLOEXEC) = 0
7880 1479228055.059258 pipe2([12, 13], O_CLOEXEC) = 0
# Then ssh is cloned:
7880 1479228055.060158 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7bfd7fa9d0) = 7949
7949 1479228055.121259 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd3f2a63650) = 7954
7954 1479228055.121135 execve("/bin/sh", ["/bin/sh", "-c", "exec ssh ***@gate.redacted -W dev2.redacted:22"], [/* 16 vars */] <unfinished ...>
# Normal exit
7949 1479228055.592488 +++ exited with 0 +++
7871 1479228055.592519 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7949, si_uid=1003, si_status=0, si_utime=0, si_stime=2} ---
7954 1479228055.592786 exit_group(129) = ?
# Python receive EOF on stdout AND stderr
7880 1479228055.694455 read(12, <unfinished ...>
7880 1479228055.694480 <... read resumed> "", 32768) = 0
7880 1479228055.738380 read(10, <unfinished ...>
7880 1479228055.738410 <... read resumed> "", 32768) = 0
# So, as the process is exited, AND stdout/stderr had their EOF, Python 3 subprocess.Popen.communicate allows itself to wait for it:
7880 1479228055.924898 wait4(7949, <unfinished ...>
7880 1479228055.925334 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 7949
```
Now, with `ControlMaster`:
```
# Same pipes for stdin, stdout, and stderr from Python Popen:
6749 1479227922.771783 pipe2([8, 9], O_CLOEXEC) = 0
6749 1479227922.771831 pipe2([10, 11], O_CLOEXEC) = 0
6749 1479227922.771865 pipe2([12, 13], O_CLOEXEC) = 0
# A bunch of clones, SSH starting its master process:
6749 1479227922.774839 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa1957fa9d0) = 6824
6824 1479227922.818219 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6827
6827 1479227922.820940 execve("/bin/sh", ["/bin/sh", "-c", "exec ssh ***@redacted -W redacted:22"], [/* 16 vars */] <unfinished ...>
6824 1479227923.204918 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6856
6856 1479227923.205441 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6857
6857 1479227923.206004 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6858
# Normal exit after work is done:
6824 1479227923.335999 +++ exited with 0 +++
6749 1479227923.336042 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6824, si_uid=1003, si_status=0, si_utime=0, si_stime=0} ---
# Python is getting EOF on STDOUT
6749 1479227923.336086 read(10, "", 32768) = 0
# But here, process is a zombie (dead but not awaited), after like a minute,
# I send a SIGINT and a SIGQUIT to the Python process, so everyone dies and finally:
6749 1479227986.358448 read(12, "", 32768) = 0
6827 1479227986.328027 exit_group(130) = ?
6858 1479227986.329254 exit_group(255) = ?
6749 1479227986.578489 wait4(6824, <unfinished ...>
6749 1479227986.578546 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6824
```
So here it looks like the bug is exactly the same as before: stdout is
closed but stderr is kept open. As Python's
``subprocess.Popen.communicate`` want to wait for both to be closed before
waiting for the process, it yields to stuck Python process and zombies.
-- System Information:
Debian Release: 8.6
APT prefers stable
APT policy: (900, 'stable'), (200, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages openssh-client depends on:
ii adduser 3.113+nmu3
ii dpkg 1.17.27
ii libc6 2.24-5
ii libedit2 3.1-20140620-2
ii libgssapi-krb5-2 1.14.3+dfsg-2
ii libselinux1 2.3-2
ii libssl1.0.2 1.0.2j-1
ii passwd 1:4.2-3+deb8u1
ii zlib1g 1:1.2.8.dfsg-2+b1
Versions of packages openssh-client recommends:
ii xauth 1:1.0.9-1
Versions of packages openssh-client suggests:
pn keychain <none>
pn libpam-ssh <none>
pn monkeysphere <none>
pn ssh-askpass <none>
-- no debconf information
Version: 1:7.3p1-3+b1
Severity: normal
Dear Maintainer,
I think that the bug
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=714526 is still
live, after upgrading to 1:7.3p1-3+b1 (should have been fixed still
1:7.2p2-6).
I straced SSH with and without `ControlMaster` and extracted revelant
bits of information, redacting it, and removing useless lines. First,
without `ControlMaster`:
```
# Here, Python 3 Popen prepares pipes for stdin, stdout, and stderr:
7880 1479228055.059186 pipe2([8, 9], O_CLOEXEC) = 0
7880 1479228055.059226 pipe2([10, 11], O_CLOEXEC) = 0
7880 1479228055.059258 pipe2([12, 13], O_CLOEXEC) = 0
# Then ssh is cloned:
7880 1479228055.060158 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f7bfd7fa9d0) = 7949
7949 1479228055.121259 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd3f2a63650) = 7954
7954 1479228055.121135 execve("/bin/sh", ["/bin/sh", "-c", "exec ssh ***@gate.redacted -W dev2.redacted:22"], [/* 16 vars */] <unfinished ...>
# Normal exit
7949 1479228055.592488 +++ exited with 0 +++
7871 1479228055.592519 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=7949, si_uid=1003, si_status=0, si_utime=0, si_stime=2} ---
7954 1479228055.592786 exit_group(129) = ?
# Python receive EOF on stdout AND stderr
7880 1479228055.694455 read(12, <unfinished ...>
7880 1479228055.694480 <... read resumed> "", 32768) = 0
7880 1479228055.738380 read(10, <unfinished ...>
7880 1479228055.738410 <... read resumed> "", 32768) = 0
# So, as the process is exited, AND stdout/stderr had their EOF, Python 3 subprocess.Popen.communicate allows itself to wait for it:
7880 1479228055.924898 wait4(7949, <unfinished ...>
7880 1479228055.925334 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 7949
```
Now, with `ControlMaster`:
```
# Same pipes for stdin, stdout, and stderr from Python Popen:
6749 1479227922.771783 pipe2([8, 9], O_CLOEXEC) = 0
6749 1479227922.771831 pipe2([10, 11], O_CLOEXEC) = 0
6749 1479227922.771865 pipe2([12, 13], O_CLOEXEC) = 0
# A bunch of clones, SSH starting its master process:
6749 1479227922.774839 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa1957fa9d0) = 6824
6824 1479227922.818219 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6827
6827 1479227922.820940 execve("/bin/sh", ["/bin/sh", "-c", "exec ssh ***@redacted -W redacted:22"], [/* 16 vars */] <unfinished ...>
6824 1479227923.204918 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6856
6856 1479227923.205441 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6857
6857 1479227923.206004 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7ff4c6f11650) = 6858
# Normal exit after work is done:
6824 1479227923.335999 +++ exited with 0 +++
6749 1479227923.336042 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6824, si_uid=1003, si_status=0, si_utime=0, si_stime=0} ---
# Python is getting EOF on STDOUT
6749 1479227923.336086 read(10, "", 32768) = 0
# But here, process is a zombie (dead but not awaited), after like a minute,
# I send a SIGINT and a SIGQUIT to the Python process, so everyone dies and finally:
6749 1479227986.358448 read(12, "", 32768) = 0
6827 1479227986.328027 exit_group(130) = ?
6858 1479227986.329254 exit_group(255) = ?
6749 1479227986.578489 wait4(6824, <unfinished ...>
6749 1479227986.578546 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6824
```
So here it looks like the bug is exactly the same as before: stdout is
closed but stderr is kept open. As Python's
``subprocess.Popen.communicate`` want to wait for both to be closed before
waiting for the process, it yields to stuck Python process and zombies.
-- System Information:
Debian Release: 8.6
APT prefers stable
APT policy: (900, 'stable'), (200, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages openssh-client depends on:
ii adduser 3.113+nmu3
ii dpkg 1.17.27
ii libc6 2.24-5
ii libedit2 3.1-20140620-2
ii libgssapi-krb5-2 1.14.3+dfsg-2
ii libselinux1 2.3-2
ii libssl1.0.2 1.0.2j-1
ii passwd 1:4.2-3+deb8u1
ii zlib1g 1:1.2.8.dfsg-2+b1
Versions of packages openssh-client recommends:
ii xauth 1:1.0.9-1
Versions of packages openssh-client suggests:
pn keychain <none>
pn libpam-ssh <none>
pn monkeysphere <none>
pn ssh-askpass <none>
-- no debconf information