
njahnke at gmail
Apr 5, 2009, 3:32 PM
Post #1 of 2
(587 views)
Permalink
|
|
select() hangs in sftp_server_main()
|
|
First off, a disclaimer: this is not a problem with openssh per se as it is also occurring with other software on my server, but I was hoping someone reading this might know more about the problem than I do. Thank you very much in advance for your help. Problem: connecting to the server via sftp results in a hang here: if (select(max+1, rset, wset, NULL, NULL) < 0) { which is line 1428 from 5.2p1's sftp-server.c (main loop of sftp_server_main()). The same hang occurs when opening a data connection over e.g. vanilla FTP. I am sometimes able to get through after a number of seconds or minutes, but sometimes the connection times out on the client side before the server is able to respond. When the server does respond and I am connected, then if I issue e.g. 'ls' it will hang again at the select() for some time. ssh is OK; can connect with no delay and issue commands, etc. I don't think it's socket death: root [at] d:~# cat /proc/net/sockstat sockets: used 304 TCP: inuse 444 orphan 302 tw 152 alloc 451 mem 5280 UDP: inuse 4 RAW: inuse 0 FRAG: inuse 0 memory 0 root [at] d:~# netstat -tan | awk '{print $6}' | sort | uniq -c 2 CLOSE_WAIT 121 CLOSING 1 established) 109 ESTABLISHED 17 FIN_WAIT1 9 FIN_WAIT2 1 Foreign 300 LAST_ACK 20 LISTEN 2 SYN_RECV 433 TIME_WAIT It also doesn't seem to be out of file descriptors but I'm not 100% sure on that. And even if it were, wouldn't that produce an error, not hang? It does seem to be somewhat related to the number of connections lighttpd is serving. I can shut down lighttpd and the problem goes away. Having said this, lighttpd and apache are able to coexist in this state with no problem (apache never hangs). People can also connect to an IRC server on the same machine with no problem during these "episodes". So maybe it is limited to select()? What resource is lighttpd using that is not sockets/file descriptors that is causing select() to hang? I am pulling my hair out over this. I've tried all of the usual network tuning stuff (the various settings through sysctl, reducing the timeouts), all with no effect. The problem must be elsewhere. Linux dl 2.6.18-6-486 #1 Sat Dec 27 08:57:46 UTC 2008 i686 GNU/Linux It's running Debian Etch. What might cause select() to hang checking some sockets? Thanks, Nathan _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev [at] mindrot https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
|