8. Detection of Station Inconsistency

8.1. Introduction

When a network topology other than a client-server network experiences high packet loss, the session can be lost between some stations.

But even when a portion of the session between stations is disconnected, the other stations can continue to communicate so that the game can proceed with some stations not able to communicate. If communications are disconnected between the host (the session master) and the candidate host (the station that joined second), the candidate host is promoted to become the host and you end up with two hosts in a single session (see Figure 8.1). When this occurs, the entire session might become unstable because both hosts independently give instructions to the other stations.

_images/Fig_Station_Inconsistency.png

Figure 8.1 Two Hosts Resulting from Communication Loss Between Hosts

To deal with these problems, NEX automatically performs a detection process and tries to disconnect stations in an attempt to recover. If the situation is not recoverable, NEX generates a fatal error, disconnects all stations, and enters an endable state. The chance of having two hosts at the same time as a result of connection being lost between the host and host candidate can be lowered by promoting the host candidate to host status when other stations are aware the host disconnected, or when polling the other stations confirms that a majority of them have lost their connection.

The application must monitor for fatal errors using GetFatalError and, when a fatal error occurs, it must execute NetZ::Terminate and delete the NetZ object.

If you are using the host migration extension feature, handle the NotificationEvents::HostChangeEvent notification with NotificationEventHandler and execute the Session::GameOver function. (See Section 8.3.)

8.2. Actions of Detecting Station Inconsistencies

With NEX, the list of stations as recognized by the host is sent periodically to the other stations, and each station compares this list and performs the following two actions.

First, it compares the periodic notifications from the host and, when the client does not exist on the station list, it generates a fatal error (QERROR(DOCore, StationInconsistency); see the GetFatalError function) and requests execution of the NetZ::Terminate function and the deletion of the NetZ object. Second, it automatically disconnects from stations that are not recognized by the host to decouple networks with two different hosts.

In some cases, the candidate host may not recognize that the host has been disconnected from the session, even though some of the other stations do recognize the situation. If these stations communicate with the candidate host and confirm that the candidate host has no intention of assuming the role of host, they generate a fatal error (QERROR(DOCore, FaultRecoveryJobProcessFailed); see the GetFatalError function) and request execution of the NetZ::Terminate function and the deletion of the NetZ object.

If the session is lost between the host and host candidate, other stations either know this or are polled. Stations return disconnection immediately if they know about it. If the station is connecting, disconnection is checked for the period of time given by StreamSettings::GetMaxSilenceTime(), and disconnection is returned if the station is disconnected. Connection is returned if the station is still connected after the timeout. The host candidate is promoted to host if, as a result of polling, more than half the stations in a session report they are disconnected from the host. If more than half of the stations in the session report they are disconnected from the host, a fatal error (QERROR(DOCore, FaultRecoveryJobProcessFailed); see the GetFatalError function) is generated, and a request is made to execute the NetZ::Terminate function and delete the NetZ object.

8.3. Point to Note When Using the Host Migration Extension Feature

When using the host migration extension feature that decouples two different networks (HostMigrationExtension), the host that is promoted after the fact gets registered to the game server. At that time, NotificationEvents::HostChangeEvent is reported to the previous host by NotificationEventHandler.

New stations never join a host for which NotificationEvents::HostChangeEvent has been generated.

But because the two networks have the same GatheringID, if the game proceeds in this state a different network host could get registered every time a host is promoted.

Because this type of problem can occur, any host for which NotificationEvents::HostChangeEvent is notified should execute Session::GameOver and execute processes to release the network under its control.

8.4. Parameter Tuning

After some station has joined the host, the NAT traversal and join-in processing are delayed until the last station has joined. For this reason, the duration for comparing notifications from the session host and recognizing station inconsistencies must be set to a value that is larger than the time it takes for NAT traversal and the join-in process.

Specifically, set values so that the product of the interval for periodically sending the list of stations recognized by the host (Session::SetSyncStationListInterval) multiplied by the threshold for the number of times the other stations check before detecting station inconsistencies (Session::SetDetectStationInconsitencyThresholdCount) is 5000 to 10000 milliseconds larger than the value of RootTransport::GetNATTraversalTimeout.

Generally RootTransport::SetNATTraversalTimeout is set to the product of the maximum number of people that can participate multiplied by a value between 3000 and 5000. If you use a value between 3000 and 5000 for Session::SetSyncStationListInterval, set Session::SetDetectStationInconsitencyThresholdCount to the maximum number of people, plus 1 to 3.

8.5. Verification Methods

There are two ways to generate station inconsistencies: You can use aging in an environment with a high probability of dropped packets set up by the network emulator, or you can drop stations.

The EmulationDevice class can be used with NEX to implement a network emulator for debugging purposes. You can configure the loss rate with the network emulator using settings such as those in the following code. (For more information, see Section 17.1.1.) For best results, set a loss rate of 15% or higher.

Code 8.1 Setting the Loss Rate

nn::nex::OutputEmulationDevice * pOutputEmulation =
                nn::nex::RootTransport::GetInstance()->GetOutputEmulationDevice();
pOutputEmulation->Enable(); //Set the value after running Enable.
pOutputEmulation->SetPacketDropProbability(0.15);

To reduce the number of keep-alive packets sent prior to detection of a problem, use the StreamSettings::SetKeepAliveTimeout() function to set a longer timeout for keep-alive packets that are sent when there is no response.

Code 8.2 Setting a Timeout for Keep-Alive Packets

nn::nex::Stream::GetSettings()::SetKeepAliveTimeout(5000);
//Setting for an easier disconnect. The default is 1000.

You can also perform verification by using the debug feature for forcibly disconnecting arbitrary stations. The station is forcibly disconnected by Station::SignalFault.

Whichever method you choose, either an EventLog::Warning or an EventLog::Fatal level log message is output when station inconsistency occurs.


CONFIDENTIAL