Tuesday, February 5, 2013

Handling OpenSSL's SSL_ERROR_WANT_WRITE to avoid socket error:1409F07F

I spent a good part of last week working with OpenSSL C++ networking code.  One of the issues I ran into was the network SSL socket no longer sending data for some reason.

There are a fair number of error codes OpenSSL will return for certain situations when calling into the API.  Based on the error code returned from an I/O operation you may be required to take certain actions.  In particular, if you write to a SSL socket and get the SSL_ERROR_WANT_WRITE error code, you really need to jump through some hoops.  If you receive SSL_ERROR_WANT_WRITE during a write operation, the OpenSSL documentation states you must call the write method again at a later time, with the same parameters.  Please note I emphasized the last part about the parameters, because there is a huge caveat there.

What the OpenSSL documentation doesn't properly convey is when you retry the OpenSSL write operation at a later point after receiving a SSL_ERROR_WANT_WRITE error, the parameters must literally be the same, DOWN TO THE ADDRESS OF THE BUFFER YOU WISH TO WRITE.  My initial assumption was the buffer contents just needed to be the same, which may be the case, but I found the actual buffer address needed to be the same as well.

It seems OpenSSL internally remembers the address of the buffer passed in when SSL write returns the want write error, and if a different address is used on the next write operation, the socket enters an error state.  In this error state the SSL write call returns SSL_ERROR_SSL, and checking the SSL error queue returns this error:

"error:1409F07F:SSL routines:SSL3_WRITE_PENDING: bad write retry"

I'm not sure if this error state is recoverable.  It may be possible to recover by calling the write with the proper parameters.  However if the original buffer was lost, because you memcpy'ed the data into a new buffer and freed it, then your socket is probably hosed.

In my case, we were trying to send data through the socket first, then queuing a copy of the data if the write failed.  Unfortunately the copy operation memcpy'ed the data we wished to send, so the next time we called SSL write we were sending a pointer to a different data location than the original call.  Our solution was to simply copy and queue the data first, then trying to write to the socket.  If the write succeeded then we just removed it from the queue, and if it failed then we were guaranteed to still have the same data buffer address when we tried to write again.