Just don’t.
For realz.
Generating random numbers is quite fascinating. Today I pose the question: If you have to generate a massive amount of random numbers, what’s the fastest way to go about it. Big thanks to David Rhys Thomas for providing me with the original problem and being very supportive while I criticised every part of his code.
As an example, consider the following randomly generated 100×100 image:
This image, consisting of 10,000 pixels, has been randomly generated using a PHP script. Each pixel was assigned a random value of either 1 or 0 and that translated to either ‘black’ or ‘white’. The proposed question is: Which method of generating random numbers is the fastest to generate such an image in PHP?
WHAT ARE YOU DOING HERE?
GO OVER THERE: http://computersciencesource.wordpress.com
…WHY ARE YOU STILL READING THIS?
…SRSLY. GO. NOW. HERE’S THE LINK AGAIN IN CASE YOU MISSED IT. http://computersciencesource.wordpress.com
…
Ahhh finally they’ve gone. *puts feet up* Alone on my blog again. AND DON’T COME BACK YOU HEAR!
In the previous post we talked about probability theory, and talked about agents having a probability distribution to represent their degrees of belief. We also talked about the axioms K1 and K2, that is:
K1 – If a is always true, then p(a) = 1.
K2 – If a AND b are never true at the same time, then the probability of a OR b being true is p(a) + p(b).
These axioms become more important when we talk about the idea of a ‘dutch book’.
Lets consider an artificially intelligent agent walks into a betting shop. No, this isn’t the start of some bad joke. In this betting shop, the agent will bet on propositions such as ‘Horse A will win the horse race’. The tickets pay out £1 if the proposition is true, and no money if the proposition is not true. But what price should the agent buy the ticket for? This depends upon his degree of belief in the proposition. If the agent’s degree of belief, for example, is 0.6 that the horse will win, then he will be willing to pay 60p on the ticket.
But there’s two horse races going on tomorrow. Let ‘a’ and ‘b’ be the propositions ‘Horse A will win the first race’ and ‘Horse B will win the second race’. Our agent has been programmed with the following degrees of belief:\
p(a) = 0.6
p(b) = 0.5
p(neither a nor b) = 0.25
p(not both a and b) = 0.7
Based on these beliefs, our agent takes out the following four bets:
Bet 1 pays £1 if horse ‘A’ wins the first race. This ticket costs 60p.
Bet 2 pays £1 if horse ‘B’ wins the second race. This ticket costs 50p.
Bet 3 pays £1 if horses ‘A’ and ‘B’ both lose their races. This ticket costs 25p.
Bet 4 pays £1 as long as not both ‘A’ and ‘B’ win their races. This ticket costs 70p.
So our agent has spent a total of 60+50+25+70 = £2.05
But there’s something weird about these bets. Lets analyse it more closely:
If both horse A and horse B win, then our agent wins bets 1 and 2, but loses bets 3 and 4. He wins £2.
If horse A wins and horse B loses, then our agent wins bets 1 and 4, but loses bets 2 and 3. He wins £2.
If horse A loses and horse B wins, then our agent wins bets 2 and 4, but loses bets 1 and 3. He wins £2.
If both horse A and horse B lose, then our agent wins bets 3 and 4, but loses bets 1 and 2. He wins £2.
In any case, he wins £2. But that can’t be right, he spent £2.05 on the ticket, so no matter what the outcome is, he’s lose money!
We call this situation a ‘Dutch Book’. And the reason that a Dutch Book was able to be made against the agent is all because of his initial degrees of belief! They violate the axioms of probability theory, K1 and K2, in some way.
In general, if an agents degrees of belief violate K1 and K2, then a Dutch Book can be made against it. If an agents degrees of belief conform to K1 and K2, then a Dutch Book is impossible. This example of Dutch Books shows us why we should represent our degrees of belief as probability distributions.
Alright, lets take another example. Let ‘a’ be the proposition that horse A wins the first race, and ‘b’ be the proposition that horse B wins the second race. The ticket now says:
You get a payout of £1 if both horse A and horse B win. You get no payout if horse A loses and horse B wins. You get your money back if horse B loses.
How much would you spend on that ticket? This number is your conditional degree of belief in ‘a’ given ‘b’.
The proposal is that if an agents degrees of belief are given by a probability distribution, p, then its conditional degree of belief in ‘a’ given ‘b’ should be given by p(a|b) when p(b) is not 0.
Now lets consider the following scenario. ‘a’ and ‘b’ are propositions as before. p is our initial probability distribution, and pa is the probability distribution after the first race but before the second race, if a has won.
The agent buys the following tickets:
Bet 1: £1 if a and b both win. 0 if a wins but b loses. Money back if a loses. This ticket costs £p(b|a).
Bet 2: £x if a wins, otherwise nothing. This ticket costs £(x * p(a))
After the first race, if a has won then the agent has already won £x, and has the following bet that used to be Bet 1:
£1 if b wins, otherwise nothing.
The agent then sells this ticket for £pa(b)
Now the payouts for the agent are as follows:
If a has won, then the payout will be his winnings on bet 2 (x*(1-p(a))) [ie. the winnings (£x) minus the amount spent on the ticket (x*p(a)], minus the amount he spent on bet 1 (p(a|b)), plus the amount he sold bet 1 for after race 1 (pa(b)). So his total winnings are… x – xp(a) + pa(b) – p(b|a).
If a loses, then he has lost bet 2 and got his money back on bet 1. So his total losses are the amount that he spent on bet 2, which was x * p(a).
If a loses, then his payoff is going to be negative, ie. he’s made a loss, as long as x is greater than 0 (Which it probably will be).
If a wins, then his payoff is going to negative if:
x – xp(a) + pa(b) – p(b|a) < 0
x (1 – p(a)) + pa(b) – p(b|a) < 0
x (1 – p(a)) < p(b|a) – pa(b)
x < (p(b|a) – pa(b)) / (1 – p(a))
So if we choose a value of ‘x’ between 0 and (p(b|a) – pa(b)) / (1 – p(a)), then we have made a dutch book on this agent, because no matter whether a wins or loses, he will always have a negative payoff.
Notice the numerator: (p(b|a) – pa(b)). If the agents degrees of belief DO conform to axioms K1 and K2, then these two terms will be equal to each other and it will read x<0. So in order to create a dutch book, a value of x must be chosen that’s greater than 0 and less than 0. Clearly, this isn’t possible.
In this scenario, a dutch book can be avoided by making the agents degree of belief in p(b|a) equal to pa(b). This shows why we need to conditionalise, because if we don’t, then we’re going to end up in a dutch book situation.
This is a famous paradox, that shows how you have to be careful with Bayesian Updating.
An agent is on a TV competition. At the end of the show, the host, Monty Hall, shows you over to three doors. Behind one of the doors is a car, and behind the other two are goats. Clearly, the aim is to pick the door with the car. The doors are labelled A, B, and C.
You are asked to select a door, and the prize you get is behind the door you choose. But after you’ve chosen a door, Monty goes and opens one of the other doors. Behind that door is a goat. Then he gives you the opportunity to either switch your selection, or stick with your selection. The question: Do you stick, or switch?
To answer this, lets consider the three propositions, ‘A’, ‘B’, and ‘C’, which represent the propositions that the car is behind the respective door.
The probability that the car is behind A and not B is equal to the probability that the car is behind door A, obviously, which is equal to 1/3. The probability that the car is NOT behind door B is 1 – p(B), which, obviously, is 2/3, because it could be behind A or C, each with a 1/3 probability.
So we chose door A to begin with, then Monty went and opened door B. We now have the information that it is NOT B. We can use Bayes Theorem to update our probability distribution as such:
p(A|~B) = p(A AND NOT B) / p(NOT B) = (1/3) / (2/3) = 1/2. So this is telling us that the probability that it is behind door A now is 1/2, and the probability that it is behind door C now is 1/2.
But… wait. If we run a simulation, with the computer randomly switching or sticking, then we find that the switchers win about 2/3 of the time, and the stickers win about 1/3 of the time… Why is this?!
The solution is to get the right representation. Lets say that K~B is the proposition ‘I KNOW the car is not behind door B’.
So, we start out, we choose A. Now, we can work out p(K~B|A), ie. the probability that door B will be revealed to not have the car behind it, given that the car IS behind door A. If the car is behind door A, Monty will either open door B or door C. So the probability that we will KNOW that the car is not behind door B is 1/2.
If, on the other hand, the car ISN’T behind door A, then it will either be behind door B or door C. Monty will open one of these two doors. So again, p(K~B|~A), ie. the probability that door B will be revealed to now have the car behind it, given that the car is NOT behind door A, will be 1/2.
From these two pieces of data, we can work out, using Bayes Theorem, the value of p(A|K~B), as follows:
p(A|K~B) = (p(K~B|A)p(A) / (p(K~B|A)p(A) + p(K~B|~A)(1-p(A)))
p(A|K~B) = (1/2 * p(A)) / (1/2 * p(A) + 1/2 * (1 – p(A)))
Obviously p(A) will be 1/3, because it could be behind any door.
p(A|K~B) = (1/2 * 1/3) / (1/2 * 1/3 + 1/2 * 2/3)
p(A|K~B) = (1/6) / (3/6)
p(A|K~B) = 1/3
So the probability of the car being behind door A, given that we know that the car is NOT behind door B, is 1/3. This means, because there is a partition, and we know the probability of it being behind door B is 0, then the probability of the car being behind door C must be 2/3.
If you want to know more about the Monty Hall problem, check the wikipedia page for it out here. It’s quite cool. And there’s some proofs on there a LOOOT more convincing than mine.
Artificial Intelligence is the field of computer science that concerns itself with getting better and more positive responses to the question ‘Can machines think?’. Over the years there have been a number of approaches to artificial intelligence, the current best method that is growing in popularity is the use of probability theory in AI, and this probabilistic reasoning is spreading to almost all areas of Artificial Intelligence.
A probability distribution is a function that takes a proposition and gives a probability in the range of 0 and 1. For example, say you have a proposition that “If I toss a coin, it will come up tails”. If you put this in the probability distribution, you should get an output of 0.5. Or you could use a two-headed coin, which would have a different probability distribution that would give an output of 0.
There are two basic facts about probability distributions which make it a probability distribution, given propositions ‘a’ and ‘b’.
K1 – If a is always true, then p(a) = 1.
K2 – If a AND b are never true at the same time, then the probability of a OR b being true is p(a) + p(b).
p(a) can represent an agents degree of belief that a is true. If p(a) = 1, then the agent believes a is always true. If p(a) = 0, then the agent believes a is always false.
K1 and K2 are intuitively reasonable and can allow us to prove many more things about probabilities, such as:
The probability of a not happening is 1 – p(a).
If b is always true when a is true, then p(a) is less than or equal to p(b).
If a and b are always either true or false at the same time, then p(a) = p(b).
The probability of a OR b being true equals p(a) + p(b) – p(a AND b).
Say you have a number of propositions, a1, a2, a3, a4, etc.
These propositions are ‘mutually exclusive’ if no two of the propositions is ever true at the same time.
These propositions are ‘jointly exhaustive’ if the sum of p(a1) and p(a2) and p(a3) etc. is 1.
These propositions form a ‘partition’ if they are both mutually exclusive and jointly exhaustive.
An example of a partition could be the propositions ‘The coin will land on heads’ and ‘The coin will land on tails’. p(heads) = 0.5 and p(tails) = 0.5. They can never be true at the same time so they are mutually exclusive, and they add to 1 so they are jointly exhaustive. Therefore, these two propositions form a partition.
Conditional probability is when you have two propositions, and know that one is true, and want to know the probability of the other GIVEN that the other is true.
p(a|b) can be read as ‘the probability of a given that b is known to be true’. It can be worked out with the formula p(a|b) = p(a AND b) / p(b).
Another way of thinking about it is that the proposition (a|b) is the degree of belief that an agent would have in a if and when it got the information that b was certainly true. Some basic facts that follow from this are the following:
p(a|b) is always between 0 and 1, because it is a proposition.
If p(a) is 0, then p(a|b) = 0 as well, because it doesn’t matter whether b is true, a will never be true.
If b is true whenever a is true, then p(a|b) = 1, because if b is known to be true then a must be true.
If b and c are always true or false at the same time, then p(a|b) = p(a|c).
So we have the probability distribution p, and we can use that to find the probability of something being true. But we’re going to define another probability distribution, pb, to be the probability distribution assuming that b is true (given that p(b) is greater than 0). So basically, pb(a) = p(a|b). This is the probability distribution obtained by ‘conditionalising’ on b. Whenever an agent receives a piece of information, it conditionalises it and works out a new probability distribution in this way.
Note that it’s also true that (pc)b(a) = pb AND c(a) = (pb)c(a), so it doesn’t matter in what order we conditionalise the information, if an agent receives more than one piece of information at a time. The fundamentals of probabilistic reasoning in AI is that an agent keeps receiving data and keeps working out a new probability distribution based on the previous probability distribution and the information that it has received.
The next theorem is probably the most important, but most simple, equation in probability theory.
p(b|a) = p(a|b)p(b) / p(a).
Or in its alternative, often more useful, form:
p(bi|a) = p(a|bi)p(bi) / ( p(a|b1)p(b1) + p(a|b2)p(b2) + … + p(a|bn)p(bn) )
This is typically applied when a is an observation statement and b1, b2, …, bn are all different hypotheses that can explain that observation. We’ll see what this means in practical terms in the next section on robot localisation. For now, just know that this is a very useful theorem.
Say we have a robot, in a room. The room has obstacles in. The robot has a map in its memory that tells it where the obstacles are, but the robot doesn’t know where it is in the room. The robot has three sensors on the front, left, and right sides. It can move forward, or it can turn left and right.
First lets talk about the surroundings. Lets make it a coordinate system, so the robot is in a square room, from (0,0) in one corner to (99,99) in the opposite corner. In addition to this, we have to know which direction the robot is facing, so we’ll also split up the direction into the range (0,99), where 0 is him facing one way, increasing as he turns more clockwise.
So lets let lijt be the proposition that ‘the robot is currently in location (i,j) facing in direction t’.
The collection of propositions l0,0,0, l0,0,1, l0,0,2, …, l99,99,98, l99,99,99 form a partition. The robot can be in any one of 100×100x100 = 1,000,000 poses, but it must be in one and only one. So these propositions all add up to 1.
We can always get less accurate information, say, li,j which represents the probability of it being in position (i,j) regardless of the direction it is facing is equal to the total sum of li,j,0, li,j,1, …, li,j,99.
So now we know how the robot will represent its belief of what pose it is in, we now consider what the initial data should be. For simplicity, we assume that the robot has no idea where it is in the room. It could be in any pose, with equal probability. However, the robot knows that there are objects in the room and it knows where the objects in the room are. So it knows it cannot be on top of an object, obviously. So it needs to be set up with its initial beliefs such that all positions where it knows there is an object must have a probability of 0 of it being there. The poses that represent it NOT being on an object must all have the same probability and form a partition.
This might seem a bit complicated, but consider it smaller scale, say a room that’s only 2×2. If there are no objects, the probability of it being in any coordinate is 1/4 because there are 4 coordinates. If there is something blocking the coordinate (1,1) then clearly it can’t be on (1,1), and there are 3 coordinates that it can be on, so each will have a probability of 1/3. It’s basically the same as this but on a 100×100 grid, and taking direction into account as well. Some coordinates have a probability of 0, but all the rest have the same probability and form a partition.
So we’ve set the initial state, now we need to consider how we’re going to update the robot’s belief state so that when it gets information, it updates the probability of where it is accordingly. We’ve already said that the robot has 3 sensors, one on the front, right, and left. So lets say that ‘o’ represents the proposition that a certain sensor has reported a distance. For example, ‘Left sensor reports a distance of 32 to nearest object or wall’.
But sensors are not accurate, they are noisy. So we need to model the values that the sensors return. We’ll say that they are representative of a normal distribution, and the sensor returns an integer between 0 and 2x (where x is the true distance), with mean x and standard deviation of about x/3.
Now we’ve been given an observation, o, we need to update our probability distribution, p, so that it takes o into account, ie. we need to work out the probability distribution po. Remember that po(li,j,t) = p(li,j,t|o). So we need to work this value out for every single value of i, j, and t, and update the 100×100x100 matrix storing our degrees of belief of where we are.
Using the alternative form of Bayes Theorem, we can see that:
p(li,j,t|o) = p(o|li,j,t)p(li,j,t) / ∑i’,j’,t’ p(o|li’,j’,t’)p(li’,j’,t’)
And we have to work this value out for every single one of our 1,000,000 poses. Which is a pretty ugly formula, but it makes sense. Lets go through it term by term, starting with the top. The top consists of p(o|li,j,t)p(li,j,t).
p(li,j,t) is the robots current belief that it is in pose i,j,t, so we can simply extract that value from the 100×100x100 matrix for each value.
p(o|li,j,t) is the probability of observation o, if we assume that the robot is currently in pose i,j,t. This is tricky and depends on the accuracy of the sensors, but we now have a model for this. Say, for example, that pose i,j,t is in the middle of the room facing a wall 50 squares away, and the forward sensor returns a value of 2, then this value will be very low because the probability of the sensor returning a value of 2 given that the robot is 50 squares away from the wall is very very low, on a normal distribution.
The bottom is the sum of, for all values of i,j,t, the above two values. So first you would loop through the matrix and work out p(o|li,j,t)p(li,j,t) for every possible pose i,j,t. Then you would sum up these numbers, and that would be the bottom of the equation every time. ∑i’,j’,t’ p(o|li’,j’,t’)p(li’,j’,t’) is a constant for each observation o.
But the robot is not static, the robot can move forward, and turn left and right, as we’ve seen. Lets use ‘a’ to represent the proposition of the robot performing an action, such as ‘Move forward 10 squares’. It may not be as simple as that. For example, what if the floor is wet, or the motors are running out of power, or the robot runs into a wall. It may run less than or greater than the 10 squares it’s supposed to. We can model this with another normal distribution.
So if we have li,j,t representing the robots belief that it is at position (i,j,t), we can use l`i,j,t to represent the proposition “After the action, the robots new pose will be (i,j,t)”.
The engineer programming the robot should be able to estimate the probabilities p(l`i,j,t|a AND li’,j’,t’).
To update the matrix in response to an action, we can use the following formula:
p(l`i,j,t|a) = ∑i’,j’,t’ p(l`i,j,t|li’,j’,t’ AND a)p(li’,j’,t’).
Where p(l`i,j,t|a) means the probability of the robots pose being (i,j,t) given that the action a has happened, p(li’,j’,t’) is the probability of the robot being in post (i’,j’,t’), and p(l`i,j,t|li’,j’,t’ AND a) is the probability of the robot being in pose (i,j,t) given that action a has happened AND the robot used to be in position (i’,j’,t’).
So to sum up,the robot should start with an equal belief that it could be in any position.
The robot has a 3 dimensional matrix, 100×100x100, storing its x-coordinate, y-coordinate, and direction.
Sensor readings and actions are not static, they may be less or more than what it should be.
The robots belief matrix is updated every time there is an observation or the robot follows some action.
To update on an observation this formula should be followed:
p(li,j,t|o) = p(o|li,j,t)p(li,j,t) / ∑i’,j’,t’ p(o|li’,j’,t’)p(li’,j’,t’)
And to update on an action this formula should be followed:
p(l`i,j,t|a) = ∑i’,j’,t’ p(l`i,j,t|li’,j’,t’ AND a)p(li’,j’,t’).
If you want more notes for COMP10020, check out my functions notes here: Functions
And you can check out Badgerati’s notes on Induction here: Part 1 – Part 2
If we have two sets, A and B, we can refer to the ‘Cartesian Product’ as being A x B. This means that if A = {1, 2, 3} and B = {a, b, c}, then A x B = {1a, 1b, 1c, 2a, 2b, 2c, 3a, 3b, 3c}. It’s the set of all pairs of values where the first is from A and the second is from B.
A relation from A to B is a subset of A x B. A relation can be said to connect values together, in a way, showing a relationship between them.
As an example, lets take two sets. A = {0, 1, 2} and B = {0, 1, 2}. Because the two sets are the same, we can say that this is a relation on A. Now consider the relation that is described by ‘is greater than’. We can start by looking at A x B, which is {(0,0),(0,1),(0,2),(1,0),(1,1),(1,2)(2,0)(2,1)(2,2)}. Now the relation R has to be a subset of this. One way to work out what the relation is is to look at each value in turn. So we look at the first value – (0,0). The first value is 0 and the second value is 0, so the relation we have is ‘0 is greater than 0′. This is not true, so (0,0) is not part of the relation.
The same is true of (0,1) which says ‘0 is greater than 1′, and (0,2) which says ‘0 is greater than 2′. But when we get up to (1,0) we see that yes, ‘1 is greater than 0′, and that value is in the relation.
In the end we find out that the relation is {(1,0),(2,0),(2,1)} which is a subset of A x B. We can show this as a digraph as follows, where we say ‘a and b are related by R if there is an arrow going from a to b’.

Relation Digraph 1
If a is related to b by relation R, we can also say aRb. So in this case, 1R0, 2R1, and 2R0.
There are a number of different properties that a relation can have. Let R be a subset of A x A.
R can be said to be reflexive if it is true that for every possible value of ‘a’ where ‘a’ is in ‘A’, it is true that aRa. This means that everything maps onto itself. On a digraph, this can be shown by the fact that every node should have a loop attached to it that points it back on itself. An example of a relation of this type could be ‘is the same height as’ – Everything is the same height as itself, but it can still be the same height as other things. The key feature, though, is that aRa is always true.
R can be said to be transitive if it is true that for every possible value of a, b, and c, where they are all in A, if aRb is true and bRc is true then it must be the case that aRc is true. An example of this kind of relation is ‘is smaller than’. If we have an object a that is smaller than b, and an object b that is smaller than c, then obviously object a must be smaller than object c. This is hard to spot on a digraph, but basically whenever there is an arrow from a to b and an arrow from b to c then there is also an arrow from a to c.
R can be said to be symmetric if all relations work backwards. For example if you have two nodes in your relation, a and b, and it’s true that aRb, then for the relation to be symmetric then it must also be true that bRa. ‘is the same height as’ is a good example of a symmetric relation – If a is the same height as b, then obviously b is the same height as a, and this is true for all objects. On a digraph, this can be shown by the fact that wherever there is an arrow between two nodes, there has to be another one going in the opposite direction.
R can be said to be antisymmetric if it is true that for all a and b in A, if aRb and bRa then a must equal b. ie. if the only relations that work in reverse are between two same nodes.
An equivalence relation is a relation which is reflexive, symmetric, and transitive. To show this we look at the three features and show this:
The idea of an equivalence relation is that if aRb then a and b are in some sense the same. This will usually divide set A into seperate ‘parts’. The idea is that each part contains elements that are equivalent to each other, so a and b are only in the same part if aRb.
So lets assume A is the set of all natural numbers, and R is the relation that states that ‘a – b is equal to 0 (mod 2)’. 0R0 is true, 0R1 is not, 0R2 is true, 0R3 is not, etc. 1R0 is not, 1R1 is true, 1R2 is not, 1R3 is true, etc. The digraph for this will look like this:

Relation Digraph 2
So we can see that if we continued, there would be a partition between the odd numbers and the even numbers. The odd numbers are equivalent in this relation, and the even numbers are equivalent in this relation. We say that it is divided into two R-equivalence classes.
We can talk about the R-equivalence class [a], which is the set of all nodes that are equivalent to a. So in this case, [0] = the set {0,2,4,6,8,10,…} and [1] = the set {1,3,5,7,9,11,…}. This way we can see that the set of all natural numbers, N, consists of all of the elements in each partition exactly once. N = [0]
[1].
Similarly, we can say that the conjunction, [0]
[1] is always equal to the empty set, because there is no value in [0] that will also appear in [1].
In fact, for any equivalence relation, the following is true:
[b] = the empty set. (ie. two values are either in the same partition or not.)A partition is a set of subsets that make up a set. For example, the set {[0],[1]} is a partition that makes up the set N.
There are a number of operations possible on relations, as they are sets. Let R1 and R2 be relations on A:
The union of R1 and R2 is described as R1
R2, and this is the same as it is in set theory – it’s every pair of things that is in either R1 or R2. For example, in a family, we can take the union of the relations ‘is the mother of’ and ‘is the father of’ to get the relation ‘is the parent of’.
The intersection of R1 and R2 is described as R1
R2, and this is the same as it is in set theory – it’s every pair of things that is in both R1 and R2. For example, the intersections of the relations ‘is the mother of’ and ‘is the father of’ is empty, because nobody is both the mother and father of a child.
The complement of R1 is described as Rc1, and is basically the set of all pairs that are in A x A but are not in R1. For example, the inverse of ‘is the parent of’ is ‘is not the parent of’
The inverse of R1 is described as R-11, and is every pair backwards. For example, if 0R3 is a relation in R1, then 3R0 is true in R-11. For example, the inverse of ‘is parent of’ is ‘is child of’.
If we consider the two relations ‘is a parent of’ and ‘is the mother of’, then we can define another relation ‘is the grandmother of’, by combining these by finding some value where a ‘is the mother of’ b and b ‘is the parent of’ c. Then a ‘is the grandmother of’ c.
In general, if we have two relations R1 from A to B and R2 from B to C, the relational composition of R1 and R2 can be described as R2 ∘ R1, which is a relation from A to C where there is some b in B where aR1b and bR2c.
For any relation, R, and property, P, we can say that the relation S is the P closure of R if S has property P, and if R’ is another relation with property P and R is a subset of R’ then S is a subset of R’ as well.
For example, lets assume that P is the property of transitivity. The P closure of R is the ‘transitive closure’ of R. When P is reflexivity then it is known as the reflexive closure, and when P is symmetry it is known as the symmetric closure.
The P closure is constructed by adding to R the minimum number of pairs (a,b) which will make a new relation with property P.
For example, let us consider the relation consisting of (0,1), (0,2), (2,0), and (2,2).
The reflexive closure is the relation consisting of (0,0), (0,1), (0,2), (1,1), (2,0), and (2,2). All we’ve added are the minimum pairs that are required to make it reflexive – that is, (0,0) and (1,1). All the rest has been left alone.
The transitive closure is harder to find. But consider each pair of pairs in turn to see if there is a way to get straight to them.
(0,1), (0,2) – Not needed. 1 is not equal to 0.
(0,1), (2,0) – Not needed. 1 is not equal to 2.
(0,1), (2,2) – Not needed. 1 is not equal to 2.
(0,2), (0,1) – Not needed. 2 is not equal to 0.
(0,2), (2,0) – (0,0) is needed.
(0,2), (2,2) – (0,2) is needed, but is already there.
And so on. The transitive closure of this relation is (0,0), (0,1), (0,2), (2,0), (2,1), (2,2).
The symmetric closure is much easier, it’s just the minimum pairs to ensure that every pair works backwards. In this case it is (0,1), (1,0), (0,2), (2,0), (2,2). We added (1,0).
Another common type of relation is the set of relations that share the properties of ‘less than or equal to’. Such relations are called order relations.
The simplest kind of order relation is called the ‘partial order’, as follows:
A partial order is a relation R which is reflexive, antisymmetric, and transitive.
A partially ordered set, or ‘poset’, has an additional property. Namely that for any integers, x and y, then either x is less than or equal to y, or y is less than or equal to x (or both). We can say this as ‘two integers are comparable with the relationship ‘is less than or equal to”.
A partial order relation in which any two elements are comparable is called a total order.
Given an ordering of a set, we can define the maximal and minimal elements.
An element of a poset A with partial order ≪ is said to be a least element if, for all a, the element ≪ a.
An element of a poset A with partial order ≪ is said to be a greatest element if, for all a, a ≪ the element.
A minimal element for a partial order ≪ of a set A is any element a such that if b ≪ a then a = b.
A maximal element for a partial order ≪ of a set A is any element a such that if a ≪ b then a = b.
In more basic terms, a minimal element is one that can’t have another element smaller than itself, and a maximal element is an element that can’t have another element bigger than itself. That’s not to say that the minimal element is the smallest and the maximal element is the biggest, because two elements may not be comparable in a partial order.
A partial order can have lots of minimal and maximal elements.
In general a poset A can be drawn with a partial order by placing ‘larger’ elements at the top and less ‘large’ elements further down. Draw the minimum number of lines so that if x is less than y then there is some sequence of lines descending from y to x. This is called a Hasse Diagram.
And that’s basically all there is to know about relations. I recommend reading pages 62-66 of the notes on order relations and minimal and maximal elements and partial orders and stuff because they don’t make a tiny bit of sense.
Before

After

inb4 “What’s the difference?”
Well, the first exam is over. It was also probably the one I was looking forward to the least. Well, second least. There’s always Artificial Intelligence to come… but that’s still a way off.
It went better than I expected. I have the question paper so I’ve been going through it to see where I’ve made stupid mistakes. I’m tentatively saying I got about…. 8/10 on the first question, 9/10 (I probably made at least one stupid mistake) on the second question, probably 7/10 and 7/10 on Thierry’s questions, and like 6/10 on Howard’s part. Grand total of 37/50. Which is quite better than I was hoping for.
It was 13 for a pass, or 33/50 for a total of 70% (counting coursework as well), so I’m pleased with how I’ve done.
I should also point out it’s the first exam in my life where I’ve ever actually not finished past the first hour. Which…. is saying something. Horrible questions.
So where to from here. Java on Tuesday, Distributed Systems on Thursday, then maths at some point and AI at some other point. Java should be easyish, I did the mock and it’s likely to be as hard or easier (Java 1 last semester was easier than the Java 1 mock). There’s not exactly much one can do to revise for java either, short of doing the assignments all over again. It’s all multiple choice though, so… oh wait, how much do I need?
Java is 50% exam 50% coursework, and I got a total of 81% on the coursework, so already have 40% which is a 2ii even if I don’t go to the exam. There’s 25 questions, so each question gives me 2% toward my final mark. So I need to get 15/25 correct to get 70%, which is a first. That shouldn’t be too problematic. I got 11/13 on the mock. I might have a quick read of my Generics notes. Generic classes and interfaces are things I do not totally understand. But ultimately aside reading the notes there’s nothing I can exactly do to further my understanding.
So most of my revision from now will be about Distributed Systems, which is another that I’m not certain on. I started writing up Barry’s notes for here, and will have to finish those over the next couple of days. In fact, what do I need for distributed systems? I have 92.5% on the coursework, and the coursework is 25% so that’s 23.125% already. I need 70-23.125=46.875% total. Given that the exam is 75% and I need 46.875%, I need a total score of 66.964% on the exam. Round it up to 67%. The exam is structured such that section A is out of 20 marks and is multiple choice, and section B is 3 questions of which we only answer one, one on Alvaro’s stuff, one on Barry’s stuff, and one on Steve’s stuff. So totally out of 40. 67% of 40 is 27 marks out of 40.
Hrm, I’m not confident at the moment. Hopefully revision will help.
But it should all go well. At least, until AI.
In the last blog post we looked at what the various components in the module, mostly languages, symbols, and alphabets. Now we’re going to look in more detail at how to define a language from symbols.
To define languages more easily we’re going to look at patterns. Patterns can also be called regular expressions. Computers, after all, can’t recognise set notation symbols such as {, }, or ∈.
So lets say we want the computer to recognise all instances of 01 repeated, so it would match 01, 0101, 010101, 01010101, 0101010101, etc. We notice that we need to concatenate 01 an arbitrary number of times. We can do this with (01)*. This is a pattern. The * is called the Kleene Star and will apply an arbitrary number of concatentations to whatever precedes it, in this case 01. Now the computer just has to compare: Is the first string equal to 0? Is the next equal to 1? And so on until the end of the string.
Other examples of patterns could be 1*0, which is the language of all strings consisting of any number of 1s, followed by a 0. Or there may be two Kleene Stars, such as the pattern 1*0*. This is any number of 1s followed by any number of 0s.
So we’ve looked at using the Kleene Star to create languages. But that’s not enough for some patterns. For example, what if we wanted the language of all strings consisting of just 0s and just 1s, ie. the language {0, 00, 000, 1, 11, 111, 1111, 0000, …}
We can also use the alternative, |, to represent a choice. For example, 0|1 is the language consisting of the string 0 and the string 1. The language 0*|1* represents the language described above – All strings that are either an arbitrary number of 0s or an arbitrary number of 1s.
We can define patterns recursively in the following way:
Base cases:
The character ø is the empty pattern.
The character ϵ is a pattern consisting of the empty word.
Every letter from a specified alphabet ∑ is a pattern.
Recursive cases:
If p and q are both patterns, then so is (pq). This is concatenation.
If p and q are both patterns, then so is (p|q). This is the alternative.
If p is a pattern, then so is (p*). This is the kleene star.
In order to match a word with a pattern, we can do it recursively, as follows:
Base cases:
The empty word ϵ matches the pattern ϵ.
The pattern p is a single character and s is that character.
Recursive cases:
The pattern p is a concatenation p=(p1p2) and there are words s1 and s2 such that s1 matches p1, s2 matches p2, and s = s1s2.
The pattern p is an alternative p=(p1|p2) and s matches p1 or p2, or both.
The pattern p is of the form p = (q*) and s can be written as s = s1s2s3…sn such that s1, s2, s3, …, sn all match q.
(Note that no word matches the empty pattern ø).
Given a pattern, p, we can refer to the language of all words that match that pattern as L(p). Or, again, we can define the language recursively, as follows:
Base cases:
L(ø) = ø.
L(ϵ) = {ϵ}.
L(x) = {x} for all x in ∑.
Recursive cases:
L(p1p2) = L(p1)L(p2)
L(p1|p2) = L(p1) ∪ L(p2)
L(p*) = (L(p))
A regular language is any language that can be described by a pattern. Note that there are some languages that can’t be represented by a pattern, such as the language of all strings that consist of an arbitrary number of 0s followed by the SAME number of 1s, ie. the language {01, 0011, 000111, 00001111, …}. This is because there is no way of remembering how many 0s there were. We will find other ways to express these languages later, in Context Free Grammars.
So from simple patterns we’ve shown how to build them up into more complex patterns, then we saw how to recursively tell if a word matches a pattern, and finally how to show and build up the languages of a pattern. This basically concludes what patterns are, and you should by now have some idea of how to build up a pattern given a rule. There’s a number of examples to ensure you understand in Andrea’s notes on page 19.
And of course, as usual, there are the fantastic Badgerati’s notes available at http://badgerati.wordpress.com/2009/05/11/comp10042-computation-regular-expressions/
In order for a computer to organise a textual input, the computer has to be able to recognise certain strings, such as, in parsing a programming language, it has to be able to find ‘if’ or ‘while’.
The first part of Computation deals with how we describe to a computer what string to look for, how to come up with the right description to solve the problem, and how to do more advanced things such as check whether a piece of code is correctly formed, making sure that opening and closing brackets match, or checking that a webpage is written in valid HTML.
Input to a computer is given in symbols, such as letters and numbers. A collection of symbols is an alphabet. An alphabet is usually denoted by the symbol ∑.
Symbols can be combined into strings, which can be called ‘words’. We can have words consisting of any number of symbols, including zero. A word with zero symbols is called the empty string, and is referred to by ϵ.
A space can be a symbol, and we can use the symbol ı_ı to show it. (Though in these blog posts, I’ll just use _ since there’s no single character for ı_ı).
The length of a string can be shown by a modulus symbol. For example, |s| is the length of the string s. |ϵ| = 0, obviously, because it is the empty string.
A language is a set of words that can be created with an alphabet. Some examples of languages are the empty language, {}, the language consisting of the empty word {ϵ}, or languages consisting of some number of strings, {ϵ,1,aba,abba,abbba,abbbba}. A language is referred to with the symbol L.
We can use set notation to define a language. For example, say we wanted the language of all words that could be made with the alphabet of the symbols ‘a’, ‘b’, and ‘c’. We can define this language as {s | s is a word over the alphabet {a,b,c}}. The | should be read as ‘where’, so this is ‘All languages consisting of the word ’s’, where s is a word over the alphabet {a,b,c}’
Two strings can be concatenated, that is, put together. So the strings Hel and lo could be concatenated into the single string Hello. To show this, we use a concatenation symbol as such: Hel · lo = Hello.
For multiple concatentation, we can use a power symbol. For example, h³ = hhh. h° = ϵ (ie. the string consisting of 0 numbers of h). We can also use n to denote an arbitrary number of concatenations. For example, an is the language {ϵ, a, aa, aaa, aaaa, aaaaa, …}.
Using set notation, we can write it as L = {an | n ∈ N}. This can be read as ‘The language L consists of all strings an where n is a whole number’. We could also exclude the empty string by saying ‘where n is a positive whole number’ with the symbol N+.
We can also concatenate two languages as follows: L1 · L2 = {st | s ∈ L1 and t ∈ L2}. ie. ‘the set of all concatenated strings where s is a string from L1 and t is a string from L2′.