What is a distributed system?
Put simply, a distributed system is a number of pieces of hardware and software that communicate with each other but use different processors. They can communicate over a network such as a LAN, or the internet. Distributed systems are subject to a number of problems, however, such as variable latency (Two messages sent between two computers may not take the same amount of time to arrive), and unpredictable failures such as a message not being sent.
In this posting we’ll be investigating some of the issues of VoIP, transmitting voice over the internet. We’ll look at how it works, some of the problems involved with it, and how to overcome those problems.
Basics of sound and voice – Analogue
Voice is just variation in air pressure, travelling from a speaker to a listener at the speed of sound. A microphone can convert this pressure into voltage, and the computer can store it as a sound waveform. This waveform is the sum of a number of sine waves of different frequencies, measured in Hz.

Humans can hear up to 20kHz. CDs typically have a ‘bandwidth’ (frequency range) of 50Hz to 20 kHz. Phone lines typically have a ‘bandwidth’ of 300Hz to 3.4kHz, which explains why speech over a telephone sounds a lot more unnatural.
Old Fashioned Phone System
The old fashioned phone system used analogue transmission. The wires carried different voltages depending on the air pressure in the receiver. There was very little delay – about 1 millisecond per 100 miles.
In the 1960s, digital transmission started between exchanges, and most phone speech is now transmitted digitally. Analogue transmission is still used between the exchange and the house. Circuit switching is still used in a connection oriented manner – setting up a connection between the sender and the recipient.
Basics of sound and voice – Digital
In order to transmit something digitally, there has to be some way it can be represented using 0s and 1s. In order to do this, an analogue signal from the microphone has to be ’sampled’. This means the program has to take the input regularly and convert all of the data since the last sample into binary. The more often a signal is sampled, the better the quality.
CDs sample at around 44100 times per second, and store 16 bits per sample. Phones sample less frequently, at only 8000 times per second, and only store 8 bits per sample, which means that the phone line is transmitting 64 kb/s of sound data. This is a famous standard for digitising speech called ITU-G711.
Unfortunately, 64 kb/s is far too high for mobile phones, and for VoIP. This is why other standards have to be used. A mobile phone will usually use a 9.6kb/s standard.
Buffering
The sound cards themselves control the sampling rates (If the processors controlled them, they’d have to be interrupted 8000 times a second, which would slow the computer down considerably). They have a separate clock, and use buffers to store the inputs and outputs. A buffer is basically an array that is filled by the processor and emptied by the sound card, or vice versa.
Imagine a leaky bucket filled up by a water tap. The inputs from the processor fill up the bucket every time the processor isn’t busy. These inputs then come out of the bottom of the bucket through a hole in the same order that they came in, in a constant stream, despite the input from the processor being intermittent.
It’s very important that the bucket doesn’t empty or overflow. Thinking about the sound, if the bucket emptied then there would be no more sound until more was put in the bucket. If the bucket overflowed then some sound would be lost.
Problems
But there’s a problem. The sampling rates are controlled by crystals, and are accurate to 0.01%. An 8000Hz sampling rate could in fact be 7999Hz or 8001 Hz. This is a problem, because this could mean one system is getting an extra 2 samples per second.
Slower clocks can run out of samples (buffer underflow), faster clocks could get too many samples (buffer overflow). This is a fundamental problem with distributed systems, because there is no global clock that defines how fast everything goes. Everything works at their own speed.
If we add a network in, we introduce even more problems. Networks send data in a number of packets, and there’s a danger of packets being lost (not making it to the recipient) or delayed (arriving after one that was sent later). So, we need to set up a communication. Lets first look at protocol layers.
TCP/IP Protocol Layers
The TCP/IP Protocol layers are described below, at which different protocols are relevant.

1) The top-most layer is the Application layer. Protocols here include http, POP3, FTP, etc. These are protocols in which two applications can talk to each other over a distributed system.
2) The second later is called the Transport layer. Protocols here include TCP and UDP. These are the basis for the protocols in the Application layer, and are closer to how data is actually sent.
3) Lower down there is the Network layer. In this layer the IP protocol is of particular relevance. This is the basis for the protocols in the Transport layer. TCP and UDP have to use the IP protocol to communicate.
4 and 5) Even lower there is the Data-Link layer, and at the bottom is the Physical layer. These layers are concerned with physical protocols, such as Ethernet (for wired networks) or IEEE802.11 (for wireless networks). It is on these protocols that IP depends.
So as we move down from the Application layer to the Physical layer, we can see that messages are passed down. A message using an Application protocol has to be handled by a Transport protocol, which has to be handled by a Network protocol, which has to be handled by a Physical protocol.
Lets look at some of these layers in more detail.
Network layer (Internet Protocol)
The Internet Protocol, or IP for short, deals with the routing of IP packets. A packet is an amount of data that is sent in one part.
![]()
A packet consists of a header of around 20 bytes, and the data which can be any length. The header contains information about the packet such as the header length, the data length, the ‘check-sum’, the source IP address, destination IP address, etc. Anything that needs to be known to get the packet intact to the right place.
A checksum is a number of extra bits that can detect errors in the data. This is sometimes referred to as a cyclic redundancy check (CRC). In IP packets, a checksum algorithm is as follows:
1) Split up the data into 16 bit ‘words’.
2) Add together the binary values of all of these words.
3) If the result is longer than 16 bits, take anything past 16 bits and add it to the result.
This results in the 16 bit checksum.
The CRC is a check to make sure that the header is correct. Suppose that the header is the decimal number 149. To find the CRC, divide it by, say, 7, and express the remainder in binary. This gives 110 (6) as the CRC, so those are the check bits.
The same division is done at the receiving end. If the remainder is different, then we know that something is wrong and an error has occurred. The number ‘7′ would have to be agreed upon in advance.
This does not detect ALL errors. Any combination of errors that add or subtract multiples of 7 from the header wouldn’t be detected as they have the same remainder, but the majority of errors will be found. In practice, a much higher number than 7 is used, and binary arithmetic is used rather than simple integer division.
These checks allow an error in an IP header to be detected. If an error has occurred, the data is simply discarded. Errors are rare in wired networks, but very frequent in wireless networks.
The IP protocol is connectionless – it just sends a message one way then forgets about it. It’s unreliable, no guarantees that the message was sent. Data can easily be lost, delayed, or damaged. IP is a fundamental part of the internet, and many private networks.
Data-Link and Physical layers
The physical layer protocols just sends relevant voltage pulses of 0s and 1s along physical wires or wirelessly. This is where bit errors can occur.
The data link layer is responsible for finding and possibly correcting bit errors.
Transport Layer: TCP, UDP, RTP, and RTCP
The transport layer has protocols which use the IP layer to achieve data transfer in a way that is more suitable for application layer protocols. The most important of these are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). Two others that are adapted to real time applications such as VoIP are RTP (Real Time Protocol) and RTCP (Real Time Control Protocol).
TCP uses IP to create a connection-oriented transmission. This is slower, but more reliable, and should be used for data that must be sent but delay won’t matter too much. Port numbers are used to tell the difference between messages arriving, and things such as sequence numbers if a number of packets is being sent are implemented.
TCP is reliable because of a mechanism for acknowledging receipt of a packet, and retransmitting packets if necessary. This can take time and can create delays, so TCP is not really suited to VoIP.
UDP is simple, connectionless, and unreliable. You send a message then forget about it. It sends the source port number, the destination port number, the length of the data, a check, and the data itself. It’s used for applications which don’t need or don’t have time to wait for acknowledgment or retransmission. UDP could be used in VoIP because voice is not as sensitive to bit errors and lost packets.
RTP & RTCP is useful because it allows the transmitter to know how many packets are getting through, and what the delay is. It’s specifically designed for real time applications like VoIP. A time stamp is added to a UDP packet. RTCP on the receivers end can send reports back to the sender every 5 seconds or so, giving a general idea of how many packets are being lost.





