Summer 09 Revision COMP10412 AI Part 2: Probabilistic Reasoning

In the previous post we talked about probability theory, and talked about agents having a probability distribution to represent their degrees of belief.  We also talked about the axioms K1 and K2, that is:

K1 – If a is always true, then p(a) = 1.
K2 – If a AND b are never true at the same time, then the probability of a OR b being true is p(a) + p(b).

These axioms become more important when we talk about the idea of a ‘dutch book’.

Dutch Books – Why Use Probability Distributions?

Lets consider an artificially intelligent agent walks into a betting shop.  No, this isn’t the start of some bad joke. In this betting shop, the agent will bet on propositions such as ‘Horse A will win the horse race’.  The tickets pay out £1 if the proposition is true, and no money if the proposition is not true.  But what price should the agent buy the ticket for?  This depends upon his degree of belief in the proposition. If the agent’s degree of belief, for example, is 0.6 that the horse will win, then he will be willing to pay 60p on the ticket.

But there’s two horse races going on tomorrow. Let ‘a’ and ‘b’ be the propositions ‘Horse A will win the first race’ and ‘Horse B will win the second race’.  Our agent has been programmed with the following degrees of belief:\

p(a) = 0.6
p(b) = 0.5
p(neither a nor b) = 0.25
p(not both a and b) = 0.7

Based on these beliefs, our agent takes out the following four bets:

Bet 1 pays £1 if horse ‘A’ wins the first race.  This ticket costs 60p.
Bet 2 pays £1 if horse ‘B’ wins the second race.  This ticket costs 50p.
Bet 3 pays £1 if horses ‘A’ and ‘B’ both lose their races.  This ticket costs 25p.
Bet 4 pays £1 as long as not both ‘A’ and ‘B’ win their races.  This ticket costs 70p.

So our agent has spent a total of 60+50+25+70 = £2.05

But there’s something weird about these bets.  Lets analyse it more closely:

If both horse A and horse B win, then our agent wins bets 1 and 2, but loses bets 3 and 4.  He wins £2.
If horse A wins and horse B loses, then our agent wins bets 1 and 4, but loses bets 2 and 3.  He wins £2.
If horse A loses and horse B wins, then our agent wins bets 2 and 4, but loses bets 1 and 3.  He wins £2.
If both horse A and horse B lose, then our agent wins bets 3 and 4, but loses bets 1 and 2.  He wins £2.

In any case, he wins £2.  But that can’t be right, he spent £2.05 on the ticket, so no matter what the outcome is, he’s lose money!

We call this situation a ‘Dutch Book’. And the reason that a Dutch Book was able to be made against the agent is all because of his initial degrees of belief! They violate the axioms of probability theory, K1 and K2, in some way.

In general, if an agents degrees of belief violate K1 and K2, then a Dutch Book can be made against it.  If an agents degrees of belief conform to K1 and K2, then a Dutch Book is impossible.  This example of Dutch Books shows us why we should represent our degrees of belief as probability distributions.

Why Conditionalise Probability Distributions?

Alright, lets take another example.  Let ‘a’ be the proposition that horse A wins the first race, and ‘b’ be the proposition that horse B wins the second race.  The ticket now says:

You get a payout of £1 if both horse A and horse B win.  You get no payout if horse A loses and horse B wins.  You get your money back if horse B loses.

How much would you spend on that ticket? This number is your conditional degree of belief in ‘a’ given ‘b’.

The proposal is that if an agents degrees of belief are given by a probability distribution, p, then its conditional degree of belief in ‘a’ given ‘b’ should be given by p(a|b) when p(b) is not 0.

Now lets consider the following scenario.  ‘a’ and ‘b’ are propositions as before.  p is our initial probability distribution, and pa is the probability distribution after the first race but before the second race, if a has won.

The agent buys the following tickets:

Bet 1: £1 if a and b both win.  0 if a wins but b loses.  Money back if a loses.  This ticket costs £p(b|a).
Bet 2: £x if a wins, otherwise nothing.  This ticket costs £(x * p(a))

After the first race, if a has won then the agent has already won £x, and has the following bet that used to be Bet 1:

£1 if b wins, otherwise nothing.

The agent then sells this ticket for £pa(b)

Now the payouts for the agent are as follows:

If a has won, then the payout will be his winnings on bet 2 (x*(1-p(a))) [ie. the winnings (£x) minus the amount spent on the ticket (x*p(a)], minus the amount he spent on bet 1 (p(a|b)), plus the amount he sold bet 1 for after race 1 (pa(b)).  So his total winnings are… x – xp(a) + pa(b) – p(b|a).

If a loses, then he has lost bet 2 and got his money back on bet 1.  So his total losses are the amount that he spent on bet 2, which was x * p(a).

If a loses, then his payoff is going to be negative, ie. he’s made a loss, as long as x is greater than 0 (Which it probably will be).

If a wins, then his payoff is going to negative if:

x – xp(a) + pa(b) – p(b|a) < 0
x (1 – p(a)) + pa(b) – p(b|a) < 0
x (1 – p(a)) < p(b|a) – pa(b)
x < (p(b|a) – pa(b)) / (1 – p(a))

So if we choose a value of ‘x’ between 0 and (p(b|a) – pa(b)) / (1 – p(a)), then we have made a dutch book on this agent, because no matter whether a wins or loses, he will always have a negative payoff.

Notice the numerator: (p(b|a) – pa(b)).  If the agents degrees of belief DO conform to axioms K1 and K2, then these two terms will be equal to each other and it will read x<0.  So in order to create a dutch book, a value of x must be chosen that’s greater than 0 and less than 0.  Clearly, this isn’t possible.

In this scenario, a dutch book can be avoided by making the agents degree of belief in p(b|a) equal to pa(b). This shows why we need to conditionalise, because if we don’t, then we’re going to end up in a dutch book situation.

The Monty Hall Paradox

This is a famous paradox, that shows how you have to be careful with Bayesian Updating.

An agent is on a TV competition.  At the end of the show, the host, Monty Hall, shows you over to three doors.  Behind one of the doors is a car, and behind the other two are goats.  Clearly, the aim is to pick the door with the car.  The doors are labelled A, B, and C.

You are asked to select a door, and the prize you get is behind the door you choose.  But after you’ve chosen a door, Monty goes and opens one of the other doors.  Behind that door is a goat.  Then he gives you the opportunity to either switch your selection, or stick with your selection.  The question: Do you stick, or switch?

To answer this, lets consider the three propositions, ‘A’, ‘B’, and ‘C’, which represent the propositions that the car is behind the respective door.

The probability that the car is behind A and not B is equal to the probability that the car is behind door A, obviously, which is equal to 1/3.  The probability that the car is NOT behind door B is 1 – p(B), which, obviously, is 2/3, because it could be behind A or C, each with a 1/3 probability.

So we chose door A to begin with, then Monty went and opened door B.  We now have the information that it is NOT B.  We can use Bayes Theorem to update our probability distribution as such:

p(A|~B) = p(A AND NOT B) / p(NOT B) = (1/3) / (2/3) = 1/2.  So this is telling us that the probability that it is behind door A now is 1/2, and the probability that it is behind door C now is 1/2.

But… wait.  If we run a simulation, with the computer randomly switching or sticking, then we find that the switchers win about 2/3 of the time, and the stickers win about 1/3 of the time… Why is this?!

The solution is to get the right representation.  Lets say that K~B is the proposition ‘I KNOW the car is not behind door B’.

So, we start out, we choose A.  Now, we can work out p(K~B|A), ie. the probability that door B will be revealed to not have the car behind it, given that the car IS behind door A.  If the car is behind door A, Monty will either open door B or door C.  So the probability that we will KNOW that the car is not behind door B is 1/2.

If, on the other hand, the car ISN’T behind door A, then it will either be behind door B or door C.  Monty will open one of these two doors.  So again, p(K~B|~A), ie. the probability that door B will be revealed to now have the car behind it, given that the car is NOT behind door A, will be 1/2.

From these two pieces of data, we can work out, using Bayes Theorem, the value of p(A|K~B), as follows:

p(A|K~B) = (p(K~B|A)p(A) / (p(K~B|A)p(A) + p(K~B|~A)(1-p(A)))
p(A|K~B) = (1/2 * p(A)) / (1/2 * p(A) + 1/2 * (1 – p(A)))

Obviously p(A) will be 1/3, because it could be behind any door.

p(A|K~B) = (1/2 * 1/3) / (1/2 * 1/3 + 1/2 * 2/3)
p(A|K~B) = (1/6) / (3/6)
p(A|K~B) = 1/3

So the probability of the car being behind door A, given that we know that the car is NOT behind door B, is 1/3.  This means, because there is a partition, and we know the probability of it being behind door B is 0, then the probability of the car being behind door C must be 2/3.

If you want to know more about the Monty Hall problem, check the wikipedia page for it out here.  It’s quite cool.  And there’s some proofs on there a LOOOT more convincing than mine.

Summer 09 Revision COMP10412 AI Part 1: Robot Localisation

Artificial Intelligence is the field of computer science that concerns itself with getting better and more positive responses to the question ‘Can machines think?’.  Over the years there have been a number of approaches to artificial intelligence, the current best method that is growing in popularity is the use of probability theory in AI, and this probabilistic reasoning is spreading to almost all areas of Artificial Intelligence.

Basics of Probability Theory

A probability distribution is a function that takes a proposition and gives a probability in the range of 0 and 1.  For example, say you have a proposition that “If I toss a coin, it will come up tails”.  If you put this in the probability distribution, you should get an output of 0.5.  Or you could use a two-headed coin, which would have a different probability distribution that would give an output of 0.

There are two basic facts about probability distributions which make it a probability distribution, given propositions ‘a’ and ‘b’.

K1 – If a is always true, then p(a) = 1.
K2 – If a AND b are never true at the same time, then the probability of a OR b being true is p(a) + p(b).

p(a) can represent an agents degree of belief that a is true.  If p(a) = 1, then the agent believes a is always true.  If p(a) = 0, then the agent believes a is always false.

K1 and K2 are intuitively reasonable and can allow us to prove many more things about probabilities, such as:

The probability of a not happening is 1 – p(a).
If b is always true when a is true, then p(a) is less than or equal to p(b).
If a and b are always either true or false at the same time, then p(a) = p(b).
The probability of a OR b being true equals p(a) + p(b) – p(a AND b).

Partitions

Say you have a number of propositions, a1, a2, a3, a4, etc.

These propositions are ‘mutually exclusive’ if no two of the propositions is ever true at the same time.
These propositions are ‘jointly exhaustive’ if the sum of p(a1) and p(a2) and p(a3) etc. is 1.
These propositions form a ‘partition’ if they are both mutually exclusive and jointly exhaustive.

An example of a partition could be the propositions ‘The coin will land on heads’ and ‘The coin will land on tails’.  p(heads) = 0.5 and p(tails) = 0.5.  They can never be true at the same time so they are mutually exclusive, and they add to 1 so they are jointly exhaustive.  Therefore, these two propositions form a partition.

Conditional Probability

Conditional probability is when you have two propositions, and know that one is true, and want to know the probability of the other GIVEN that the other is true.

p(a|b) can be read as ‘the probability of a given that b is known to be true’.  It can be worked out with the formula p(a|b) = p(a AND b) / p(b).

Another way of thinking about it is that the proposition (a|b) is the degree of belief that an agent would have in a if and when it got the information that b was certainly true.  Some basic facts that follow from this are the following:

p(a|b) is always between 0 and 1, because it is a proposition.
If p(a) is 0, then p(a|b) = 0 as well, because it doesn’t matter whether b is true, a will never be true.
If b is true whenever a is true, then p(a|b) = 1, because if b is known to be true then a must be true.
If b and c are always true or false at the same time, then p(a|b) = p(a|c).

Conditionalising

So we have the probability distribution p, and we can use that to find the probability of something being true.  But we’re going to define another probability distribution, pb, to be the probability distribution assuming that b is true (given that p(b) is greater than 0).  So basically, pb(a) = p(a|b).  This is the probability distribution obtained by ‘conditionalising’ on b.  Whenever an agent receives a piece of information, it conditionalises it and works out a new probability distribution in this way.

Note that it’s also true that (pc)b(a) = pb AND c(a) = (pb)c(a), so it doesn’t matter in what order we conditionalise the information, if an agent receives more than one piece of information at a time.  The fundamentals of probabilistic reasoning in AI is that an agent keeps receiving data and keeps working out a new probability distribution based on the previous probability distribution and the information that it has received.

Bayes Theorem

The next theorem is probably the most important, but most simple, equation in probability theory.

p(b|a) = p(a|b)p(b) / p(a).

Or in its alternative, often more useful, form:

p(bi|a) = p(a|bi)p(bi) / ( p(a|b1)p(b1) + p(a|b2)p(b2) + … + p(a|bn)p(bn) )

This is typically applied when a is an observation statement and b1, b2, …, bn are all different hypotheses that can explain that observation.  We’ll see what this means in practical terms in the next section on robot localisation.  For now, just know that this is a very useful theorem.

Robot Localisation

Say we have a robot, in a room.  The room has obstacles in.  The robot has a map in its memory that tells it where the obstacles are, but the robot doesn’t know where it is in the room.  The robot has three sensors on the front, left, and right sides.  It can move forward, or it can turn left and right.

First lets talk about the surroundings.  Lets make it a coordinate system, so the robot is in a square room, from (0,0) in one corner to (99,99) in the opposite corner.  In addition to this, we have to know which direction the robot is facing, so we’ll also split up the direction into the range (0,99), where 0 is him facing one way, increasing as he turns more clockwise.

So lets let lijt be the proposition that ‘the robot is currently in location (i,j) facing in direction t’.

The collection of propositions l0,0,0, l0,0,1, l0,0,2, …, l99,99,98, l99,99,99 form a partition.  The robot can be in any one of 100×100x100 = 1,000,000 poses, but it must be in one and only one.  So these propositions all add up to 1.

We can always get less accurate information, say, li,j which represents the probability of it being in position (i,j) regardless of the direction it is facing is equal to the total sum of li,j,0, li,j,1, …, li,j,99.

Initial State of Belief

So now we know how the robot will represent its belief of what pose it is in, we now consider what the initial data should be.  For simplicity, we assume that the robot has no idea where it is in the room.  It could be in any pose, with equal probability.  However, the robot knows that there are objects in the room and it knows where the objects in the room are.  So it knows it cannot be on top of an object, obviously.  So it needs to be set up with its initial beliefs such that all positions where it knows there is an object must have a probability of 0 of it being there.  The poses that represent it NOT being on an object must all have the same probability and form a partition.

This might seem a bit complicated, but consider it smaller scale, say a room that’s only 2×2. If there are no objects, the probability of it being in any coordinate is 1/4 because there are 4 coordinates.  If there is something blocking the coordinate (1,1) then clearly it can’t  be on (1,1), and there are 3 coordinates that it can be on, so each will have a probability of 1/3.  It’s basically the same as this but on a 100×100 grid, and taking direction into account as well.  Some coordinates have a probability of 0, but all the rest have the same probability and form a partition.

Sensor Readings

So we’ve set the initial state, now we need to consider how we’re going to update the robot’s belief state so that when it gets information, it updates the probability of where it is accordingly.  We’ve already said that the robot has 3 sensors, one on the front, right, and left.  So lets say that ‘o’ represents the proposition that a certain sensor has reported a distance.  For example, ‘Left sensor reports a distance of 32 to nearest object or wall’.

But sensors are not accurate, they are noisy.  So we need to model the values that the sensors return.  We’ll say that they are representative of a normal distribution, and the sensor returns an integer between 0 and 2x (where x is the true distance), with mean x and standard deviation of about x/3.

Now we’ve been given an observation, o, we need to update our probability distribution, p, so that it takes o into account, ie. we need to work out the probability distribution po.  Remember that po(li,j,t) = p(li,j,t|o). So we need to work this value out for every single value of i, j, and t, and update the 100×100x100 matrix storing our degrees of belief of where we are.

Using the alternative form of Bayes Theorem, we can see that:

p(li,j,t|o) = p(o|li,j,t)p(li,j,t) / ∑i’,j’,t’ p(o|li’,j’,t’)p(li’,j’,t’)

And we have to work this value out for every single one of our 1,000,000 poses.  Which is a pretty ugly formula, but it makes sense.  Lets go through it term by term, starting with the top.  The top consists of p(o|li,j,t)p(li,j,t).

p(li,j,t) is the robots current belief that it is in pose i,j,t, so we can simply extract that value from the 100×100x100 matrix for each value.

p(o|li,j,t) is the probability of observation o, if we assume that the robot is currently in pose i,j,t.  This is tricky and depends on the accuracy of the sensors, but we now have a model for this.  Say, for example, that pose i,j,t is in the middle of the room facing a wall 50 squares away, and the forward sensor returns a value of 2, then this value will be very low because the probability of the sensor returning a value of 2 given that the robot is 50 squares away from the wall is very very low, on a normal distribution.

The bottom is the sum of, for all values of i,j,t, the above two values.  So first you would loop through the matrix and work out p(o|li,j,t)p(li,j,t) for every possible pose i,j,t.  Then you would sum up these numbers, and that would be the bottom of the equation every time.  ∑i’,j’,t’ p(o|li’,j’,t’)p(li’,j’,t’) is a constant for each observation o.

Robot Actions

But the robot is not static, the robot can move forward, and turn left and right, as we’ve seen.  Lets use ‘a’ to represent the proposition of the robot performing an action, such as ‘Move forward 10 squares’.  It may not be as simple as that.  For example, what if the floor is wet, or the motors are running out of power, or the robot runs into a wall.  It may run less than or greater than the 10 squares it’s supposed to.  We can model this with another normal distribution.

So if we have li,j,t representing the robots belief that it is at position (i,j,t), we can use l`i,j,t to represent the proposition “After the action, the robots new pose will be (i,j,t)”.

The engineer programming the robot should be able to estimate the probabilities p(l`i,j,t|a AND li’,j’,t’).

To update the matrix in response to an action, we can use the following formula:

p(l`i,j,t|a) = ∑i’,j’,t’ p(l`i,j,t|li’,j’,t’ AND a)p(li’,j’,t’).

Where p(l`i,j,t|a) means the probability of the robots pose being (i,j,t) given that the action a has happened, p(li’,j’,t’) is the probability of the robot being in post (i’,j’,t’), and p(l`i,j,t|li’,j’,t’ AND a) is the probability of the robot being in pose (i,j,t) given that action a has happened AND the robot used to be in position (i’,j’,t’).

Summary

So to sum up,the robot should start with an equal belief that it could be in any position.

The robot has a 3 dimensional matrix, 100×100x100, storing its x-coordinate, y-coordinate, and direction.

Sensor readings and actions are not static, they may be less or more than what it should be.

The robots belief matrix is updated every time there is an observation or the robot follows some action.

To update on an observation this formula should be followed:

p(li,j,t|o) = p(o|li,j,t)p(li,j,t) / ∑i’,j’,t’ p(o|li’,j’,t’)p(li’,j’,t’)

And to update on an action this formula should be followed:

p(l`i,j,t|a) = ∑i’,j’,t’ p(l`i,j,t|li’,j’,t’ AND a)p(li’,j’,t’).

Summer 09 Revision COMP10020 Maths Part 2: Relations

If you want more notes for COMP10020, check out my functions notes here: Functions

And you can check out Badgerati’s notes on Induction here: Part 1Part 2

What Is A Relation?

If we have two sets, A and B, we can refer to the ‘Cartesian Product’ as being A x B.  This means that if A = {1, 2, 3} and B = {a, b, c}, then A x B = {1a, 1b, 1c, 2a, 2b, 2c, 3a, 3b, 3c}.  It’s the set of all pairs of values where the first is from A and the second  is from B.

A relation from A to B is a subset of A x B.  A relation can be said to connect values together, in a way, showing a relationship between them.

As an example, lets take two sets.  A = {0, 1, 2} and B = {0, 1, 2}.  Because the two sets are the same, we can say that this is a relation on A.  Now consider the relation that is described by ‘is greater than’.  We can start by looking at A x B, which is {(0,0),(0,1),(0,2),(1,0),(1,1),(1,2)(2,0)(2,1)(2,2)}.  Now the relation R has to be a subset of this.  One way to work out what the relation is is to look at each value in turn.  So we look at the first value – (0,0).  The first value is 0 and the second value is 0, so the relation we have is ‘0 is greater than 0′.  This is not true, so (0,0) is not part of the relation.

The same is true of (0,1) which says ‘0 is greater than 1′, and (0,2) which says ‘0 is greater than 2′.  But when we get up to (1,0) we see that yes, ‘1 is greater than 0′, and that value is in the relation.

In the end we find out that the relation is {(1,0),(2,0),(2,1)} which is a subset of A x B.  We can show this as a digraph as follows, where we say ‘a and b are related by R if there is an arrow going from a to b’.

Relation Digraph 1

Relation Digraph 1

If a is related to b by relation R, we can also say aRb.  So in this case, 1R0, 2R1, and 2R0.

Reflexive, Transitive, Symmetric, Antisymmetric

There are a number of different properties that a relation can have.  Let R be a subset of A x A.

R can be said to be reflexive if it is true that for every possible value of ‘a’ where ‘a’ is in ‘A’, it is true that aRa.  This means that everything maps onto itself.  On a digraph, this can be shown by the fact that every node should have a loop attached to it that points it back on itself.  An example of a relation of this type could be ‘is the same height as’ – Everything is the same height as itself, but it can still be the same height as other things.  The key feature, though, is that aRa is always true.

R can be said to be transitive if it is true that for every possible value of a, b, and c, where they are all in A, if aRb is true and bRc is true then it must be the case that aRc is true.  An example of this kind of relation is ‘is smaller than’. If we have an object a that is smaller than b, and an object b that is smaller than c, then obviously object a must be smaller than object c.  This is hard to spot on a digraph, but basically whenever there is an arrow from a to b and an arrow from b to c then there is also an arrow from a to c.

R can be said to be symmetric if all relations work backwards.  For example if you have two nodes in your relation, a and b, and it’s true that aRb, then for the relation to be symmetric then it must also be true that bRa.  ‘is the same height as’ is a good example of a symmetric relation – If a is the same height as b, then obviously b is the same height as a, and this is true for all objects.  On a digraph, this can be shown by the fact that wherever there is an arrow between two nodes, there has to be another one going in the opposite direction.

R can be said to be antisymmetric if it is true that for all a and b in A, if aRb and bRa then a must equal b.  ie. if the only relations that work in reverse are between two same nodes.

Equivalence

An equivalence relation is a relation which is reflexive, symmetric, and transitive.  To show this we look at the three features and show this:

  • Reflexive: a is equivalent to a.  This is true.
  • Symmetric: If a is equivalent to b, then b is equivalent to a.  This is true.
  • Transitive: If a is equivalent to b and b is equivalent to c, then a must be equivalent to c.  This is true.

The idea of an equivalence relation is that if aRb then a and b are in some sense the same. This will usually divide set A into seperate ‘parts’.  The idea is that each part contains elements that are equivalent to each other, so a and b are only in the same part if aRb.

So lets assume A is the set of all natural numbers, and R is the relation that states that ‘a – b is equal to 0 (mod 2)’.  0R0 is true, 0R1 is not, 0R2 is true, 0R3 is not, etc.  1R0 is not, 1R1 is true, 1R2 is not, 1R3 is true, etc.  The digraph for this will look like this:

Relation Digraph 2

Relation Digraph 2

So we can see that if we continued, there would be a partition between the odd numbers and the even numbers.  The odd numbers are equivalent in this relation, and the even numbers are equivalent in this relation.  We say that it is divided into two R-equivalence classes.

We can talk about the R-equivalence class [a], which is the set of all nodes that are equivalent to a.  So in this case, [0] = the set {0,2,4,6,8,10,…} and [1] = the set {1,3,5,7,9,11,…}.  This way we can see that the set of all natural numbers, N, consists of all of the elements in each partition exactly once.  N = [0] \cup \!\, [1].

Similarly, we can say that the conjunction, [0] \cap \!\, [1] is always equal to the empty set, because there is no value in [0] that will also appear in [1].

In fact, for any equivalence relation, the following is true:

  • a belongs to the set [a]
  • For all values of b and c in [a], bRc is true.
  • For all values of a and b in A, either [a] is equal to [b], or [a] \cap \!\, [b] = the empty set. (ie. two values are either in the same partition or not.)
  • For all values of a and b in A, if [a] = [b] then aRb must be true. (ie. for any two values in the same set, aRb is true)

A partition is a set of subsets that make up a set.  For example, the set {[0],[1]} is a partition that makes up the set N.

Operations on Relations

There are a number of operations possible on relations, as they are sets.  Let R1 and R2 be relations on A:

The union of R1 and R2 is described as R1 \cup \!\, R2, and this is the same as it is in set theory – it’s every pair of things that is in either R1 or R2.  For example, in a family, we can take the union of the relations ‘is the mother of’ and ‘is the father of’ to get the relation ‘is the parent of’.

The intersection of R1 and R2 is described as R1 \cap \!\, R2, and this is the same as it is in set theory – it’s every pair of things that is in both R1 and R2.  For example, the intersections of the relations ‘is the mother of’ and ‘is the father of’ is empty, because nobody is both the mother and father of a child.

The complement of R1 is described as Rc1, and is basically the set of all pairs that are in A x A but are not in R1. For example, the inverse of ‘is the parent of’ is ‘is not the parent of’

The inverse of R1 is described as R-11, and is every pair backwards.  For example, if 0R3 is a relation in R1, then 3R0 is true in R-11.  For example, the inverse of ‘is parent of’ is ‘is child of’.

Composition

If we consider the two relations ‘is a parent of’ and ‘is the mother of’, then we can define another relation ‘is the grandmother of’, by combining these by finding some value where a ‘is the mother of’ b and b ‘is the parent of’ c.  Then a ‘is the grandmother of’ c.

In general, if we have two relations R1 from A to B and R2 from B to C, the relational composition of R1 and R2 can be described as R2 ∘ R1, which is a relation from A to C where there is some b in B where aR1b and bR2c.

Closure

For any relation, R, and property, P, we can say that the relation S is the P closure of R if S has property P, and if R’ is another relation with property P and R is a subset of R’ then S is a subset of R’ as well.

For example, lets assume that P is the property of transitivity.  The P closure of R is the ‘transitive closure’ of R.  When P is reflexivity then it is known as the reflexive closure, and when P is symmetry it is known as the symmetric closure.

The P closure is constructed by adding to R the minimum number of pairs (a,b) which will make a new relation with property P.

For example, let us consider the relation consisting of (0,1), (0,2), (2,0), and (2,2).

The reflexive closure is the relation consisting of (0,0), (0,1), (0,2), (1,1), (2,0), and (2,2).  All we’ve added are the minimum pairs that are required to make it reflexive – that is, (0,0) and (1,1).  All the rest has been left alone.

The transitive closure is harder to find.  But consider each pair of pairs in turn to see if there is a way to get straight to them.

(0,1), (0,2) – Not needed. 1 is not equal to 0.
(0,1), (2,0) – Not needed. 1 is not equal to 2.
(0,1), (2,2) – Not needed. 1 is not equal to 2.
(0,2), (0,1) – Not needed. 2 is not equal to 0.
(0,2), (2,0) – (0,0) is needed.
(0,2), (2,2) – (0,2) is needed, but is already there.

And so on.  The transitive closure of this relation is (0,0), (0,1), (0,2), (2,0), (2,1), (2,2).

The symmetric closure is much easier, it’s just the minimum pairs to ensure that every pair works backwards.  In this case it is (0,1), (1,0), (0,2), (2,0), (2,2).  We added (1,0).

Order Relations

Another common type of relation is the set of relations that share the properties of ‘less than or equal to’.  Such relations are called order relations.

The simplest kind of order relation is called the ‘partial order’, as follows:

A partial order is a relation R which is reflexive, antisymmetric, and transitive.

A partially ordered set, or ‘poset’, has an additional property.  Namely that for any integers, x and y, then either x is less than or equal to y, or y is less than or equal to x (or both).  We can say this as ‘two integers are comparable with the relationship ‘is less than or equal to”.

A partial order relation in which any two elements are comparable is called a total order.

Maximal and Minimal

Given an ordering of a set, we can define the maximal and minimal elements.

An element of a poset A with partial order ≪  is said to be a least element if, for all a, the element ≪ a.
An element of a poset A with partial order ≪ is said to be a greatest element if, for all a, a ≪ the element.

A minimal element for a partial order ≪ of a set A is any element a such that if b ≪ a then a = b.
A maximal element for a partial order ≪ of a set A is any element a such that if a ≪ b then a = b.

In more basic terms, a minimal element is one that can’t have another element smaller than itself, and a maximal element is an element that can’t have another element bigger than itself.  That’s not to say that the minimal element is the smallest and the maximal element is the biggest, because two elements may not be comparable in a partial order.

A partial order can have lots of minimal and maximal elements.

In general a poset A can be drawn with a partial order by placing ‘larger’ elements at the top and less ‘large’ elements further down.  Draw the minimum number of lines so that if x is  less than y then there is some sequence of lines descending from y to x.  This is called a Hasse Diagram.

And that’s basically all there is to know about relations.  I recommend reading pages 62-66 of the notes on order relations and minimal and maximal elements and partial orders and stuff because they don’t make a tiny bit of sense.

I remembered to take a Before picture this time

Before
Before

After
After

inb4 “What’s the difference?”

Thank Science that’s over

Well, the first exam is over.  It was also probably the one I was looking forward to the least.  Well, second least.  There’s always Artificial Intelligence to come… but that’s still a way off.

It went better than I expected.  I have the question paper so I’ve been going through it to see where I’ve made stupid mistakes.  I’m tentatively saying I got about…. 8/10 on the first question, 9/10 (I probably made at least one stupid mistake) on the second question, probably 7/10 and 7/10 on Thierry’s questions, and like 6/10 on Howard’s part.  Grand total of 37/50.  Which is quite better than I was hoping for.

It was 13 for a pass, or 33/50 for a total of 70% (counting coursework as well), so I’m pleased with how I’ve done.

I should also point out it’s the first exam in my life where I’ve ever actually not finished past the first hour.  Which…. is saying something.  Horrible questions.

So where to from here.  Java on Tuesday, Distributed Systems on Thursday, then maths at some point and AI at some other point.  Java should be easyish, I did the mock and it’s likely to be as hard or easier (Java 1 last semester was easier than the Java 1 mock). There’s not exactly much one can do to revise for java either, short of doing the assignments all over again.  It’s all multiple choice though, so… oh wait, how much do I need?

Java is 50% exam 50% coursework, and I got a total of 81% on the coursework, so already have 40% which is a 2ii even if I don’t go to the exam.  There’s 25 questions, so each question gives me 2% toward my final mark.  So I need to get 15/25 correct to get 70%, which is a first.  That shouldn’t be too problematic.  I got 11/13 on the mock.  I might have a quick read of my Generics notes.  Generic classes and interfaces are things I do not totally understand.  But ultimately aside reading the notes there’s nothing I can exactly do to further my understanding.

So most of my revision from now will be about Distributed Systems, which is another that I’m not certain on.  I started writing up Barry’s notes for here, and will have to finish those over the next couple of days.  In fact, what do I need for distributed systems?  I have 92.5% on the coursework, and the coursework is 25% so that’s 23.125% already.  I need 70-23.125=46.875% total.  Given that the exam is 75% and I need 46.875%, I need a total score of 66.964% on the exam. Round it up to 67%.  The exam is structured such that section A is out of 20 marks and is multiple choice, and section B is 3 questions of which we only answer one, one on Alvaro’s stuff, one on Barry’s stuff, and one on Steve’s stuff.  So totally out of 40.  67% of 40 is 27 marks out of 40.

Hrm, I’m not confident at the moment.  Hopefully revision will help.

But it should all go well.  At least, until AI.

Summer 09 COMP10042 Computation 1: Patterns

In the last blog post we looked at what the various components in the module, mostly languages, symbols, and alphabets.  Now we’re going to look in more detail at how to define a language from symbols.

To define languages more easily we’re going to look at patterns.  Patterns can also be called regular expressions.  Computers, after all, can’t recognise set notation symbols such as {, }, or ∈.

The Kleene Star

So lets say we want the computer to recognise all instances of 01 repeated, so it would match 01, 0101, 010101, 01010101, 0101010101, etc.  We notice that we need to concatenate 01 an arbitrary number of times.  We can do this with (01)*.  This is a pattern.  The * is called the Kleene Star and will apply an arbitrary number of concatentations to whatever precedes it, in this case 01.  Now the computer just has to compare: Is the first string equal to 0?  Is the next equal to 1?  And so on until the end of the string.

Other examples of patterns could be 1*0, which is the language of all strings consisting of any number of 1s, followed by a 0.  Or there may be two Kleene Stars, such as the pattern 1*0*.  This is any number of 1s followed by any number of 0s.

Alternative

So we’ve looked at using the Kleene Star to create languages.  But that’s not enough for some patterns.  For example, what if we wanted the language of all strings consisting of just 0s and just 1s, ie. the language {0, 00, 000, 1, 11, 111, 1111, 0000, …}

We can also use the alternative, |, to represent a choice.  For example, 0|1 is the language consisting of the string 0 and the string 1.  The language 0*|1* represents the language described above – All strings that are either an arbitrary number of 0s or an arbitrary number of 1s.

Building up Patterns

We can define patterns recursively in the following way:

Base cases:

The character ø is the empty pattern.
The character ϵ is a pattern consisting of the empty word.
Every letter from a specified alphabet ∑ is a pattern.

Recursive cases:

If p and q are both patterns, then so is (pq).  This is concatenation.
If p and q are both patterns, then so is (p|q).  This is the alternative.
If p is a pattern, then so is (p*).  This is the kleene star.

Matching words with Patterns

In order to match a word with a pattern, we can do it recursively, as follows:

Base cases:

The empty word ϵ matches the pattern ϵ.
The pattern p is a single character and s is that character.

Recursive cases:

The pattern p is a concatenation p=(p1p2) and there are words s1 and s2 such that s1 matches p1, s2 matches p2, and s = s1s2.
The pattern p is an alternative p=(p1|p2) and s matches p1 or p2, or both.
The pattern p is of the form p = (q*) and s can be written as s = s1s2s3…sn such that s1, s2, s3, …, sn all match q.

(Note that no word matches the empty pattern ø).

Languages of Patterns

Given a pattern, p, we can refer to the language of all words that match that pattern as L(p).  Or, again, we can define the language recursively, as follows:

Base cases:

L(ø) = ø.
L(ϵ) = {ϵ}.
L(x) = {x} for all x in ∑.

Recursive cases:

L(p1p2) = L(p1)L(p2)
L(p1|p2) = L(p1) ∪ L(p2)
L(p*) = (L(p))

A regular language is any language that can be described by a pattern. Note that there are some languages that can’t be represented by a pattern, such as the language of all strings that consist of an arbitrary number of 0s followed by the SAME number of 1s, ie. the language {01, 0011, 000111, 00001111, …}.  This is because there is no way of remembering how many 0s there were.  We will find other ways to express these languages later, in Context Free Grammars.

So from simple patterns we’ve shown how to build them up into more complex patterns, then we saw how to recursively tell if a word matches a pattern, and finally how to show and build up the languages of a pattern.  This basically concludes what patterns are, and you should by now have some idea of how to build up a pattern given a rule.  There’s a number of examples to ensure you understand in Andrea’s notes on page 19.

And of course, as usual, there are the fantastic Badgerati’s notes available at http://badgerati.wordpress.com/2009/05/11/comp10042-computation-regular-expressions/

Summer 09 COMP10042 Computation 1: Sets & Languages

In order for a computer to organise a textual input, the computer has to be able to recognise certain strings, such as, in parsing a programming language, it has to be able to find ‘if’ or ‘while’.

The first part of Computation deals with how we describe to a computer what string to look for, how to come up with the right description to solve the problem, and how to do more advanced things such as check whether a piece of code is correctly formed, making sure that opening and closing brackets match, or checking that a webpage is written in valid HTML.

Symbols, Words and Strings – Definitions

Input to a computer is given in symbols, such as letters and numbers. A collection of symbols is an alphabet.  An alphabet is usually denoted by the symbol ∑.

Symbols can be combined into strings, which can be called ‘words’.  We can have words consisting of any number of symbols, including zero.  A word with zero symbols is called the empty string, and is referred to by ϵ.

A space can be a symbol, and we can use the symbol ı_ı to show it. (Though in these blog posts, I’ll just use _ since there’s no single character for ı_ı).

The length of a string can be shown by a modulus symbol.  For example, |s| is the length of the string s.  |ϵ| = 0, obviously, because it is the empty string.

Languages

A language is a set of words that can be created with an alphabet. Some examples of languages are the empty language, {}, the language consisting of the empty word {ϵ}, or languages consisting of some number of strings, {ϵ,1,aba,abba,abbba,abbbba}.  A language is referred to with the symbol L.

We can use set notation to define a language.  For example, say we wanted the language of all words that could be made with the alphabet of the symbols ‘a’, ‘b’, and ‘c’.  We can define this language as {s | s is a word over the alphabet {a,b,c}}.  The | should be read as ‘where’, so this is ‘All languages consisting of the word ’s’, where s is a word over the alphabet {a,b,c}’

Concatenation

Two strings can be concatenated, that is, put together.  So the strings Hel and lo could be concatenated into the single string Hello.  To show this, we use a concatenation symbol as such: Hel · lo = Hello.

For multiple concatentation, we can use a power symbol.  For example, h³ = hhh. h° = ϵ (ie. the string consisting of 0 numbers of h).  We can also use n to denote an arbitrary number of concatenations.  For example, an is the language {ϵ, a, aa, aaa, aaaa, aaaaa, …}.

Using set notation, we can write it as L = {an | n ∈ N}.  This can be read as ‘The language L consists of all strings an where n is a whole number’.  We could also exclude the empty string by saying ‘where n is a positive whole number’ with the symbol N+.

We can also concatenate two languages as follows: L1 · L2 = {st | s ∈ L1 and t ∈ L2}.  ie. ‘the set of all concatenated strings where s is a string from L1 and t is a string from L2′.

Blog Update 3.1 p2

Hmm, quick checklist.

- Categorise all blog posts.  Done.
- Change the colour scheme.  Done.
- Change the widgets to ones more appropriate of my current social networking state.  Done.
- Space out the widgets.  Done. (By adding a text widget between each widget with the text “——————”)
- Make the Yearchievements page readable.  Done.
- Delete the About page.  Done.
- Work out what other pages are needed.  Pending.
- Create other pages.  Pending.

I think that’s all that’s left to do, working out pages and creating them.

Hmm, what pages are required though.. Clearly an About page, but I’d have to update it a bit more frequently.  The old About page still said I was 17 and my favourite pizza topping was pepperoni, quite clearly insane and out of date.

So basically, I hereby declare MINDEZ’S BLOG VERSION 3.1 to be officially OPEN iff (if and only if) you see an About page at the top.  If the About page isn’t there, then version 3.1 isn’t out yet.  It hardly seems worthwhile to make another blog post to announce that it’s open.  Tomorrow, COMP10042 revision notes.

In other news, the ant problem has been cleared up.  A bottle of RAID has seen to that.  Every possible entry and exit point is now shielded in a layer of poisonous material.  Mwahahahahahah.  I’m sure no crawling insects do anything at all for the environment anyway.  Of course there’s still the possibility of them getting in in other ways, attached to someone or something that comes safely into the room…. how could I get around that… some kind of airport style security at the door to ensure that no insects are being transported in?  I’ll have to look into the costs of that.

I sure hope none of this poisonous sprayed material in a confined poorly ventilated space has had any effects on me.

Splurgh.

Blog Update 3.1

Heads up.

I’m updating my blog. In colours, maybe layout, certainly how it feels and navigation. Generally make the place more purple. And finally make use of categories.

I didn’t really have much to say, so here’s a cute lolcat while you wait.

Summer 09 COMP10052 Distributed Systems 1: VoIP 1

What is a distributed system?

Put simply, a distributed system is a number of pieces of hardware and software that communicate with each other but use different processors.  They can communicate over a network such as a LAN, or the internet. Distributed systems are subject to a number of problems, however, such as variable latency (Two messages sent between two computers may not take the same amount of time to arrive), and unpredictable failures such as a message not being sent.

In this posting we’ll be investigating some of the issues of VoIP, transmitting voice over the internet.  We’ll look at how it works, some of the problems involved with it, and how to overcome those problems.

Basics of sound and voice – Analogue

Voice is just variation in air pressure, travelling from a speaker to a listener at the speed of sound. A microphone can convert this pressure into voltage, and the computer can store it as a sound waveform. This waveform is the sum of a number of sine waves of different frequencies, measured in Hz.

Sound Waveform
Humans can hear up to 20kHz. CDs typically have a ‘bandwidth’ (frequency range) of 50Hz to 20 kHz. Phone lines typically have a ‘bandwidth’ of 300Hz to 3.4kHz, which explains why speech over a telephone sounds a lot more unnatural.

Old Fashioned Phone System

The old fashioned phone system used analogue transmission. The wires carried different voltages depending on the air pressure in the receiver. There was very little delay – about 1 millisecond per 100 miles.

In the 1960s, digital transmission started between exchanges, and most phone speech is now transmitted digitally. Analogue transmission is still used between the exchange and the house. Circuit switching is still used in a connection oriented manner – setting up a connection between the sender and the recipient.

Basics of sound and voice – Digital

In order to transmit something digitally, there has to be some way it can be represented using 0s and 1s.  In order to do this, an analogue signal from the microphone has to be ’sampled’.  This means the program has to take the input regularly and convert all of the data since the last sample into binary.  The more often a signal is sampled, the better the quality.

CDs sample at around 44100 times per second, and store 16 bits per sample.  Phones sample less frequently, at only 8000 times per second, and only store 8 bits per sample, which means that the phone line is transmitting 64 kb/s of sound data. This is a famous standard for digitising speech called ITU-G711.

Unfortunately, 64 kb/s is far too high for mobile phones, and for VoIP.  This is why other standards have to be used.  A mobile phone will usually use a 9.6kb/s standard.

Picture 4

Buffering

The sound cards themselves control the sampling rates (If the processors controlled them, they’d have to be interrupted 8000 times a second, which would slow the computer down considerably).  They have a separate clock, and use buffers to store the inputs and outputs.  A buffer is basically an array that is filled by the processor and emptied by the sound card, or vice versa.

Imagine a leaky bucket filled up by a water tap.  The inputs from the processor fill up the bucket every time the processor isn’t busy.  These inputs then come out of the bottom of the bucket through a hole in the same order that they came in, in a constant stream, despite the input from the processor being intermittent.

It’s very important that the bucket doesn’t empty or overflow.  Thinking about the sound, if the bucket emptied then there would be no more sound until more was put in the bucket.  If the bucket overflowed then some sound would be lost.

Problems

But there’s a problem.  The sampling rates are controlled by crystals, and are accurate to 0.01%.  An 8000Hz sampling rate could in fact be 7999Hz or 8001 Hz.  This is a problem, because this could mean one system is getting an extra 2 samples per second.

Slower clocks can run out of samples (buffer underflow), faster clocks could get too many samples (buffer overflow).  This is a fundamental problem with distributed systems, because there is no global clock that defines how fast everything goes.  Everything works at their own speed.

If we add a network in, we introduce even more problems.  Networks send data in a number of packets, and there’s a danger of packets being lost (not making it to the recipient) or delayed (arriving after one that was sent later).  So, we need to set up a communication.  Lets first look at protocol layers.

TCP/IP Protocol Layers

The TCP/IP Protocol layers are described below, at which different protocols are relevant.

TCPIP Protocols

1) The top-most layer is the Application layer.  Protocols here include http, POP3, FTP, etc.  These are protocols in which two applications can talk to each other over a distributed system.

2) The second later is called the Transport layer.  Protocols here include TCP and UDP.  These are the basis for the protocols in the Application layer, and are closer to how data is actually sent.

3) Lower down there is the Network layer.  In this layer the IP protocol is of particular relevance.  This is the basis for the protocols in the Transport layer.  TCP and UDP have to use the IP protocol to communicate.

4 and 5) Even lower there is the Data-Link layer, and at the bottom is the Physical layer.  These layers are concerned with physical protocols, such as Ethernet (for wired networks) or IEEE802.11 (for wireless networks).  It is on these protocols that IP depends.

So as we move down from the Application layer to the Physical layer, we can see that messages are passed down.  A message using an Application protocol has to be handled by a Transport protocol, which has to be handled by a Network protocol, which has to be handled by a Physical protocol.

Lets look at some of these layers in more detail.

Network layer (Internet Protocol)

The Internet Protocol, or IP for short, deals with the routing of IP packets.  A packet is an amount of data that is sent in one part.

IPPacket

A packet consists of a header of around 20 bytes, and the data which can be any length.  The header contains information about the packet such as the header length, the data length, the ‘check-sum’, the source IP address, destination IP address, etc.  Anything that needs to be known to get the packet intact to the right place.

A checksum is a number of extra bits that can detect errors in the data.  This is sometimes referred to as a cyclic redundancy check (CRC).  In IP packets, a checksum algorithm is as follows:

1) Split up the data into 16 bit ‘words’.
2) Add together the binary values of all of these words.
3) If the result is longer than 16 bits, take anything past 16 bits and add it to the result.

This results in the 16 bit checksum.

The CRC is a check to make sure that the header is correct.  Suppose that the header is the decimal number 149.  To find the CRC, divide it by, say, 7, and express the remainder in binary.  This gives 110 (6) as the CRC, so those are the check bits.

The same division is done at the receiving end.  If the remainder is different, then we know that something is wrong and an error has occurred.  The number ‘7′ would have to be agreed upon in advance.

This does not detect ALL errors.  Any combination of errors that add or subtract multiples of 7 from the header wouldn’t be detected as they have the same remainder, but the majority of errors will be found.  In practice, a much higher number than 7 is used, and binary arithmetic is used rather than simple integer division.

These checks allow an error in an IP header to be detected.  If an error has occurred, the data is simply discarded.  Errors are rare in wired networks, but very frequent in wireless networks.

The IP protocol is connectionless – it just sends a message one way then forgets about it.  It’s unreliable, no guarantees that the message was sent.  Data can easily be lost, delayed, or damaged.  IP is a fundamental part of the internet, and many private networks.

Data-Link and Physical layers

The physical layer protocols just sends relevant voltage pulses of 0s and 1s along physical wires or wirelessly.  This is where bit errors can occur.

The data link layer is responsible for finding and possibly correcting bit errors.

Transport Layer: TCP, UDP, RTP, and RTCP

The transport layer has protocols which use the IP layer to achieve data transfer in a way that is more suitable for application layer protocols.  The most important of these are TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).  Two others that are adapted to real time applications such as VoIP are RTP (Real Time Protocol) and RTCP (Real Time Control Protocol).

TCP uses IP to create a connection-oriented transmission.  This is slower, but more reliable, and should be used for data that must be sent but delay won’t matter too much.  Port numbers are used to tell the difference between messages arriving, and things such as sequence numbers if a number of packets is being sent are implemented.

TCP is reliable because of a mechanism for acknowledging receipt of a packet, and retransmitting packets if necessary. This can take time and can create delays, so TCP is not really suited to VoIP.

UDP is simple, connectionless, and unreliable.  You send a message then forget about it.  It sends the source port number, the destination port number, the length of the data, a check, and the data itself.  It’s used for applications which don’t need or don’t have time to wait for acknowledgment or retransmission.  UDP could be used in VoIP because voice is not as sensitive to bit errors and lost packets.

RTP & RTCP is useful because it allows the transmitter to know how many packets are getting through, and what the delay is.  It’s specifically designed for real time applications like VoIP.  A time stamp is added to a UDP packet.  RTCP on the receivers end can send reports back to the sender every 5 seconds or so, giving a general idea of how many packets are being lost.