Return-Path: NDNNET-C%FINHUTC.BITNET@FINHUT.HUT.FI
Received: from [130.233.224.3] by sunic.sunet.se (5.61+IDA/KTH/LTH/1.99)
	id AAsunic04266; Wed, 15 Nov 89 20:58:23 +0100
Message-Id: <8911151958.AAsunic04266@sunic.sunet.se>
Received: from Finhutc.HUT.FI by FINHUT.HUT.FI (IBM VM SMTP R1.2) with BSMTP id 9132; Wed, 15 Nov 89 21:58:07 EET
Received: by FINHUTC (Mailer R2.03B) id 9460; Wed, 15 Nov 89 21:57:09 EET
Date:         Wed, 15 Nov 89 16:49:00 +0100
Reply-To: NDNNET-C%FINHUTC.HUT.FI@FINHUT.HUT.FI
Sender: NORDUnet coordination group <NDNNET-C%FINHUTC@FINHUT.HUT.FI>
From: G|ran Bengtson <GOERAN%AE.CHALMERS.SE@FINHUT.HUT.FI>
Subject:      About DECnet performance on NORDUnet (and SUNET) backbone
X-To:         dnansv@dnansv.gd.chalmers.se, hwg@biovax.umdc.umu.se,
              nordunet-staff@sunic.sunet.se, ndnnet-c@finhutc.hut.fi
To: "(no name)" <boss@SUNIC.SUNET.SE>

							G|teborg 891115


To start with: This may be a long message but I'm interested in
response from someone at the nordunet staff (Stockholm and UNIC)  and
someone managing the BITNET/JNET link between SEGBOX and DKGBOX so please
read the last section (F) even if you dislike DECnet...

----
This is a status summary of then "open problem" with poor DECnet
performance over the NORDUnet (and SUNET) backbone, at least as I see it.
Some recommendation based on  the response we got from DEC (this far) and
some additional tests done at Chalmers.  This is not the final analyze and
solution, comments and suggestions are welcomed.

We have to consider a number of fact and problems when we evaluate the
performance  problems. Some of the are inherent to the current VAX/VMS
DECnet implementation, some to the DECnet specifications, some to the fact
that we use remote bridges.

At least the following factors are important:

1.	The quality of the leased lines.
2.	The line and datalink protocol used between the bridges.
3.	The maximum queue length (and possible queue priorities) at the
	bridges.
4.	The resolution in the round trip delay measurements.
5.	The algorithm used to estimate and update round trip delay,
6.	The time before a message is retransmitted.
7.	The window size and algorithm at transport layer.


A.	Packets can be lost on our extended Ethernet mainly for two reasons:

	A1. Bridge congestion.  This is unavoidable, and can only be
	    totally controlled if a specific transport connection through
	    the bridge is THE ONLY one passing that bridge.

	    Now, the current configuration in our Vitalink TransLAN brdige
	    use a max queue length of about 8k Bytes (per line).

	    One should note that when we are near congestion, the actual
	    delay before a small packet i forwarded through the bridge will
	    be about 1 second (8 Kbytes in queue drained through a 64kbps
	    line)!!

	A2. Lost due to bad line quality.  We should note that the Vitalinks
	    just drop packets with bad checksums, no retransmits on the
	    datalink layer.
	    This occur independent of the actual load on the bridge/lines.

B.	DECnet implementation.

	Although DECnet is a CLNS, its transport layer is not optimized, at
	least not the VMS implementation with default parameters, for our
	type of network.  In fact, the  resolution of the round trip delay
	estimations is 1 second. This give much too high estimates for fast
	network, but it is not always the case in our network (remember the
	queues in the Vitalinks?).

	The timer whose expiration cause a retransmit, is preset to
	the so called "NSPDelay" times the current round trip delay
	estimation.  The NSPdelay is called "Executor Delay Factor" on
	VAX/VMS (and Ultrix?), and has a default value of 80 sixteens of
	a unit (i.e. 80/16 = 5).  This is a somewhat high value, proably
	for historical reasons. (For Ultrix I think the default is 64).

	The algorithm used to update the delay estimate may be suboptimal,
	but accordning to DEC, we won't gain much in changing the only
	factor that can be changed in that formula (Executor delay weight,
	5 at VMS, 3 at Ultrix...).

	The so called Executor Pipeline Quota, specified how much
	unacknowledged data one transport connection can have outstanding.
	I'm not sure of exactly how this is used (particulary updated),
	but using a large value (>8000) increase the risk of bridge
	congestion, even when we only have ONE transport connection through
	the bridge.

C.	Let us consider the two main reasons for retransmits and how
	DECnet performes in those cases.

	Ia. Assume a bridge with almost no load at all and that we have
	    an interactive session active.

	    In this case, the round trip delay (assuming fast routes
	    between end systems and the NORDUnet DECnet routers!)
	    is somewhere between 50 and 500 ms, usually below 250ms.  This
	    is in VAX/VMS estimate to 1 s (!).

	    If a packet is lost (in this case due to bad lines), it will be
	    retransmitted in about 5 s (80/16*1 = 5).  The user will
	    experience a long unexpecteed delay.

	    In this case, the fact that the round trip delay is measured with
	    such low reslution is one of the two dominant factors.

	    The other dominant factor is that the Delay Factor is as high
	    as 80.

	    As we already know, decreasing th delay factor will improve
	    response.  There is no hope in a higher resolution in the
	    round trip delay...

	Ib. As Ia, but the session of interest is a bulk transfer (e.g.
	    a file copy).

	    The same effect as in C1, but somewhat less emphasis on the
	    low resolution. We usually have packets of at least 512 bytes,
	    so with two bridge and one way traffic we have a round trip delay
	    of, say 150-500ms.

	IIa.Assume a bridge near congestion limit and that we study an
	    interactive session.

	    In this case we can have a round trip delay of 1-1.5 s, so
	    the low resolution in the estimate does not have such a disastrous
	    effect. The user will experience poor performance for some
	    interactive work anyway (e.g. screen oriented editing) due to
	    the long queues.

	    The dominant factor effecting the response time in this case
	    is the delay factor!!

	C4. As in C3, but a bulk transfer.

	    Assuming that the bridges are near congestion even without
	    this connection, is is likely that packets will be lost due to
	    congestion during the transfer.

	    As in C3, the dominant factor is the Delay Factor.
	    In this case, however, it is important that the algorithm
	    for estimating round trip delay work OK to avoid continuos
	    bridge congestion because of unnecessary retransmits.

D.	Proposed solution (seen from a DECnet point of view).  A couple of
	suggeestion are given.  The one immediately installable is 3, but I
	think 1,2 and 5 are of technical interest.  4 is a more general
	recommendation to users.
	One thing to remember is that DECnet/OSI Phase V will use ISO TP4
	and ISO CLNS.  Is someone sufficent familiar with TP4 som predict
	how that will work on our backbone???


    1.	Judging from my experience at Chalmers, most retransmits occur
	because frames are lost due to checksum errors.  The fix is
	of course to make sure the line have acceptable quality.
	(In Sweden the PTT won't guaranti ANYTHING for the leased lines,
	but inofficial information say that they consider an bit errorrate
	of 1e-6 is acceptable!!!.  I think we all agree that in modern
	digital communication it should be better that 1e-9).

	As an alternative, we could try to run some protocol (LAPB or
	so) on the datalink layer between the bridges.  I've heard two
	suggestions.

	1. The Vitalink could be configures to do this.  I have never
	   seen this documented, but there are people that say it IS
	   possible...

	2. By some sort of data compression equipment and put between
	   the Vitalink and the modem.  I was informed that they (apart
	   from increasing the effective badnwidth) use some reliable
	   protocol with retransmits.  At what cost (delays etc.).
	   This of course needs further investigation.


    2.  Avoid using LARGE pipeline quotas.  The default (pre DECWindow)
	was 3000, and a commonly used value was 6000.  Now DEC expects
	us all to use DECWindows (at least the client part), so we end
	up with a default pipline quota of 10000.

    3.	Lower the executor delay factor.  I got the suggestion from DEC
        to try some value in the 24-32 range. (not LESS THAN 24!!!).

	I've used 28 for a while now at a couple of systems I manage,
	at it seems to work well, so why not use 28!!

	One possible drawback is that the max retransmit counter may
	overflow in at least two cases:

	A.  When the DECnet topology change.
	B.  When we get continuous bridge congestion.

	If the counter overflows, the active link (transport connection)
	will be aborted.  If this happens often, we must determine the
	exact cause and frequence, and possible increase the executor
	retransmit factor (default is 10).  So, keep an eye on the
	response timeout counters if you change!!  Another reason for
	monitoring response timeouts is that we are now changing parameters
	that normally, by recommendation, are kept at their default....

    4.	Avoid remote interactive session with lots of character and screen
	oriented input/output (mostly screen oriented editors).  (Normal
	program input/output (not screen oriented) will NOT cause that amount
	of short packet, and neither will screen oriented OUTPUT, as long as
	it use DEC standard software (SMG, FMS, TDMS etc.)).

    5.  Increase speed of the leased lines...  Recude the probability
        for congestion, but what about the line quality???

E.      Another important topic for DECnet is the multicast threshold in the
	Bridges.

	DECnet use so called hello messages to keep track of nodes that
	are up.  These are on Ethernet sent as multicasts at an interval
	specified by the circuit hello timer value (default on VMS is 15,
	on Ultrix 10,  both somewhat low, causing a lot of multicast traffic!).
	A bit simplified, a system that has not been heard of for 3 times
	the hello timer THAT system used, is considerd "lost".

	What can then cause a multicast message to get lost?  Of course,
	simple congestion and poor lines can cause it, BUT, the Vitalink
	use a multicast threshold less that the max queue length, and
	therefore starts to ignore received multicasts before it gets
	congested.  This seems to be the major reason for "Adjacency down/up"
	cycles on our network.  Such a cycle will cause a delay for the user
	until the adjacency i reestablished, the routing change is propagated
	and stable, or the retransmit factor exceeded (hence we may have to
	raise this because we lower the delay factor).

	Cure?  I don't know.  Maybe a multiple queues with priorities in the
	bridges could be of help (I don't think the current SQM/QOS mechanism
	in the Vitalink is sufficient).

F.	As I said, I've lowered the delay factor to 28 on some systems.
	This include SEGBOX and DKGBOX (in the volatile database).

	I did some DTS loopbacks before and after (SEGBOX <-> DKGBOX).
	(No control over IP and/or BITNET/JNET traffic!).

	60 seconds of 3kByte (application level) packets looped by the
	application.

			Line 		    Response timeouts
		     utilization 	On SEGBOX	On DKGBOX
		(<=200% (fullduplex!))	for node DKGBOX	for node SEGBOX

	Before change 	40%		Not measured	Not measured
			47%		Not measured	Not measured
			78%		\
			44%		/	6	    4

	After change	124%			4	    1
			115%		\
			125%		/	8	    4
			130%			4	    2

	So without saying this was a good test, I can se improvements,
	at least for bulk transfers.  Experience for other systems I
	reduced delay factor for indicates improved interactive response
	as expected.


	Now!  What would be interesting to know is:

	a.	Can a increase (or change) in the number or intensity
		of dropped packets be observed for the NORDUnet
		Vitalinks????

	b.	Can an improvement in the BITNET/JNET performance be
		observed???

	c.	Can we see an impact on IP (or X25/LLC2)???

--------------------

		/ G|ran Bengtson
