Last time, we learned about the basic building block of our theory: the channel, or conditional probability distribution. Now we'll learn about two special cases of this which we'll use often enough that they'll get special names and representations:
- States are plain old "probability distributions". The unconditional ones.
- Events are statements or observations about outcomes.
Functionally, events and states have opposite roles and flavours.
States
A channel gives a distribution on the target, for every point we could start from in the source. But where do we start from?
The disease→test channel we saw last time gives us the probability that someone will get a certain test result, conditional that they have the disease or not.
How many positive tests do we actually expect to see, in practice? That depends on how many people actually have the disease. Simply, unconditionally, with nothing left to be given.
There's a specific kind of channel we can use for that: a channel that starts from a set that has only one point in it.
It's hard to see from the histogram, even though we've made it taller here, but there's a 2% probability of having the disease. (If it were zero, we'd have used a dashed gray arrow.)
We don't need to label that one point. There are no other options to distinguish it from. It's always given. We always start from it.
This kind of channel is called a state, because it models a given state of the world.
This state on gives the probability that someone will have the disease, without the answer depending on any other context.
Pushing states forward
So, what's the overall probability of a positive test?
A state is just a type of channel, and we can compose it with other channels. That's how we can transform our state on to a state on .
The thickness of the bottom path makes it clear that almost all the time, the person being tested doesn't actually have the disease, and the test accurately reports that they don't: we'll see mostly negative tests.
The probability of a positive test is about , though the probability of having the disease was only . That's because there are two paths that reach a positive test.
There's only a chance we'll have the disease, and in that case we'll test positive of the time – so of the time, we follow the upper path to a positive test . But there's also a chance the test will be a false positive, in the of cases where we don't have the disease. We follow the lower path of the time. Most of the positive tests are actually false positives!
Events
This morning, I was still half-asleep when I wandered to my fridge to get something to drink. There were three options: coffee with milk, decaffeinated coffee with milk, and a coffee-flavoured protein shake. (I enjoy variety.)
Now it's 30 minutes later, and I can't remember which one I drank. I was half-asleep, and they all taste pretty similar.
Normally, I'd reach for the coffee () about 80% of the time, but only 10% each for the decaf () or the protein shake (). These prior beliefs can be represented by a state.
An event is a statement, the answer to a true-false question, about a set of outcomes. Here are two events that are relevant to our current problem:
Both of these are simple functions: exactly one arrow begins at each starting point in .
On the left, we map the protein shake and the decaf coffee to (for ) because they satisfy the claim "this lacks caffeine".
On the right, we see the opposite: only the coffee is mapped to , in answer to "this has caffeine".
These two events are mutually exclusive, since one can't happen when the other does: either we drank something that has caffeine, or something that doesn't.
Fuzzy events
Like states, events are also a special type of channel.
We just saw an event , which is a simple function. Recall that simple functions are a special case of channels where all the weight is on one arrow for each starting point. That means we can re-draw our event as such a channel.
Finally, since there are only two ending points ( and ), and each starting point must form a distribution over them that sums to 1 (or 100%), we really only need to remember one of the two probabilities.
We can imagine events that are only kind of true or false, by giving weights between 0 and 1. For instance, we said our event was all-or-nothing: the coffee maps to 1, and the other two drinks map to 0. But a decaf coffee still has a trace amount of caffeine, maybe around 3% of a regular coffee. We could model that as a fuzzy event instead:
On the right, the fuzzy event gives the decaf a weight of 0.03, instead of 0 in the sharp predicate on the left.
Why might fuzzy events be useful? Well, information can be unreliable, and statements about the world (not just beliefs about states of the world) can be uncertain. Fuzziness allows us to represent that.
Event transformation
When fuzzy events are allowed, we can pull an event back through a channel, similarly to how we were able to push a state forward.
This doesn't work if we want to use only non-fuzzy, all-or-nothing events: if we pull a non-fuzzy event back through a channel, we generally get a fuzzy one!
This is yet another example of channel composition.
For example, this is how we would pull back the event in the disease network. Depending on how positive or negative test results determine whether care is received or not, corresponds to a fuzzy event stating whether the test is positive or negative. In turn, that corresponds to a fuzzy event stating whether the patient has the disease.
The probability of an event
Given my usual chances of drinking each thing, what's the probability I drank something without caffeine?
This is just another case of composition – multiply along, sum across. We're specifically interested in the probability we'll end at , so let's calculate just that part.
Since the event is a function, each of its arrows (,) has weight , or . That means nothing happens when we multiply along each path, and we can just sum over the blue arrows:
This might seem pretty obvious: I had a 10% chance of drinking each of the two caffeine-free drinks, so there's a 20% chance I drank something caffeine-free.
More generally, we can compute the probability, or validity of a fuzzy event by composing it with a state, to get a fuzzy predicate with a single arrow, i.e. a statement of a single probability.
States versus events
While states and fuzzy events are both special types of channels, they have different properties and serve different roles.
-
States are more ontological (world model / prior beliefs) whereas events are more epistemic (statements or observations).
-
States must be consistent (distributional) whereas events are potentially inconsistent (not distributional). There's nothing stopping us from having a fuzzy event that maps to 0.6, and to 0.8
-
State information moves forwards through the network of channels, and represent causality. Event information moves backwards and represents the influence of epistemic context.
Conditioning
Earlier I claimed a 20% chance I drank something caffeine-free, based on my prior knowledge of how I usually behave. That's fine… but based on how I'm feeling now, I'm sure that what I drank didn't have caffeine in it. In that case, clearly the chance that I drank one of the two caffeine-free drinks isn't just 20%. Intuitively, it should actually be 100%: 50% for the protein shake, and 50% for the decaf coffee, since we've eliminated the 80% chance that what I drank was the regular coffee.
The calculation we just intuitively described is conditioning: if I know that a certain event has occurred in a given context, I can use that information to update a state to suit the context.1. This depends on the state and the event being defined on the same set of outcomes, just like channel composition only works when the target of the first channel is the same as the source of the second one, so they fit together.
I know I drank something with no caffeine. How does that affect the prior state on ?
- 1. This depends on the state and the event being defined on the same set of outcomes, just like channel composition only works when the target of the first channel is the same as the source of the second one, so they fit together.
