Tuesday, May 05, 2009

Weird RVRD issue explained

47838934_1726066a43_bDear reader. This is a highly specific description of a problem related to the Tibco RV (Rendezvous) product and chances are that it is of no interest to you. If this is the case, feel free to skip it. I’ve wanted to document it here, so that other people searching about this problem can find the information.

Here is the problem:

  • lets say that you have to environments (RVs): RV1 and RV2.
  • the environments are linked together by a pair of RVRDs (this is important – no RVRDs – no problems)
  • a server in RV1 wants to send a message to a server in RV2 on subject S1 and then expect to hear back from the given server on S2
  • under some conditions the response can be lost

What is the cause? (The following theory has been confirmed by TIBCO support, so I’m not pulling it out of my rear-end – entirely)

  • RVRDs try to optimize network traffic by forwarding messages only on the subjects where there are listeners
  • What happens if that there is a very small time interval between setting up a listener on S2 by the server in R1 and sending the “request” message. This means that the RVRD in RV2 only learns about a listener being interested in the response messages after it already decided not to forward then. More specifically: there is no guarantee that “normal” messages and administrative messages (those sent on the _RV.> subject) are delivered in the order they are sent. This means that  the administrative message announcing the new listener can arrive later than then messages which are dependent on it.

Possible solutions:

  • Insert delays in your programs after setting listeners on subjects or before responding to messages.
  • Start a tibrvlisten on the given subject (or a superset which includes a given subject) in RV1 on the machine which is running the RVRD. This keeps at least one active “listener” open (of course, you can redirect its output to /dev/null). Strangely enough, it seems important to have tibrvlisten started on the same machine, because using an other machine from RV1 doesn’t seem to deliver the same result.

Hope this helps someone.

Picture taken from psd's photostream with permission.

No comments:

Post a Comment