Statistical Rethinking

notes from book and youtube videos given by McElreath

Can I come up with a question to answer, maybe stellar formation?

Fundamentally, we’re trying to ‘construct a posterior’ based on data. The prior is the posterior with no data. McElreath views Bayesian inference from the scientific questioning lens. In his view our parameter is a conjecture, which he relates as parameters to build an estimator model that ‘generates’ the data we see.

The notion is to create generative models from DAG’s or process models. This can generate ‘dummy data’. Then we hope to create statistical models that can analyse the synthetic data. Then you might provide it with real data.

How does McElreath see Bayesian statistics?

Posterior distribution is the whole mathematical object, you can try and summarise it. For instance, with intervals, there’s nothing special about the interval.

It seems with McElreath’s approach we’re combining the model and the variable inference approach our Θ is some set of values that define a generative model for data we have.

Tsitsiklis says that a fundamental disagreement in inference is what the ultimate mathematical object we’re trying to arrive at is. In the classical approach, the quantity we’re looking for is a constant values, we don’t know it, but if we did it would be a point. In the Bayesian approach this object should always be modelled as random variable, a distribution across values. It doesn’t mean nature is random but my subjective experience is it being an rv.

What message is the model of the garden of forking data trying to relay?

For each conjecture that one has, imagine a ‘world’ of that conjecture and see how prevalent the data you’ve observed in that world is.

The more prevalent the data is in a conjectured world the more we can scale up the plausibility of such a world. Or, the more we could say our world is that world.

Things that can happen in more ways are more plausible, what Mc refers to as the unglamorous applied probability. You want variations of your model that are more plausible to come to the for. See, now that I have an intuition that the model is a construction of parameters the intuition makes sense, but it can be hard to really ground some of these things in words.

Example: The Binomal Distribution will ‘count’ paths for you. Here paths are the same as mathematical objects that Tsitsiklis would consider and Fellers with his letters in cells.

McElreaths steps for Bayesian data analysis?

$\frac{P(\bar{y}|\theta}{P(\bar{y})}$ is a scaling factor on the prior P(θ).

How does grid approximation compare to a Laplace approximation?

Grid approximate discretises the parameter theta. For instance, in the water on the globe example, was a range of values for p. 

What are the components of Bayes Theorem?

$$P(\theta|\bar{y}) = \frac{P(\bar{y}|\theta)P(\theta)}{P(\bar{y})}$$

Bayes theorem operates withing a probability model or space.

In this context, the unobservable θ, is a conjecture and something that can be inferred from a observation vector .

We need * Probability for the event of the observation vector given our conjecture. * Probability of the observation vector.

Whats an alternative formulation of Bayes Rule using an example?

The common formulation is as such:

If vampires are very rare in the population say P(V) = 0.001 but we have a test that says with a 95% true positive rate if someone is a vampire. We select a random person from the population, they test positive for vampirism, what is the probability they’re actually a vampire?

$$P(V | P) = \frac{P(V) P(P|V)}{P(P)} = \frac{0.001 * 0.95}{(1 - 0.001)*0.05 + 0.001*0.95}$$

We have to normalise by all the ways you could see the data (get a positive test). In this case, you can get a positive when you’re actually a vampire (0.001*0.95)

Randomness as a property of information not of the real world.

I think a good example of this is the Monty Hall problem, we search for some ontological basis for updating our beliefs but it doesn’t exists, its purely an informational change.

What is the geocentric notion McElreath’s trying to get across?

This is a model of prediction without explanation. Mechanistically wrong.

Standard Error distribution

When you see standard error or standard normal distribution, think, when I measure this value, I expect some random error to effect my measurement.


Lecture 1

Lecture 3

workflow, from a scientific question, to the development of a causal model and from there to a Bayesian estimator

Gaussian is a model with very little assumptions (mean and variance).

  1. State a clear question Describe the association between adult height and weight
  2. Sketch your causal assumptions. Causal model: weight is some function of height.
  3. Use the sketch to define a generative model Assume that they effect each other with no mechanism.
  4. Use the generative model to build an estimator Want to estimate how the average weight changes with height.

Conceptually useful to defined unobserved things that might affect height (eg causality).

Generative model starts out as W = βH + U(unobserved stuff).

Estimator: E(Wi|Hi) = σ + βHi.

Notes

Chapter 1

Chapter 2

“Bayesian data analysis usually means producing a story for how the data came to be”. A Bayesian model begins with a set of plausibilities for each conjecture (priors).

Components of Model

Assign plausibility of p with the data (observables). We were able to defined the ‘state of the world’ through one variable p in the marble case (that is, the proportion that were blue).

The story as McElreath puts it is that we have to events, W and L. Nothing else can happen. We are given a string of 9 events (in this examples). Out of all the possible worlds where 9 events occur, with our parameter p defining what is the case, what is the plausibility of the string of 9 events we have.

A binomial distribution is counting the paths for you. For a given proportion of water to land, it’s saying how many So we have some variable p, that constrains our sample space. On determining a new path, p is ever present. We have W, L which we might consider the data. The observables. Given that

2.4

Or story is we want to know the plausibility of p given some observable W out of N tosses.

The binomial distribution gives us a set of plausibilities for P(W,L|p). We just want this for every p.

The initial goal was to determine which conjecture, out of a set of conjectures was the most plausible given some data. In the marble example, we had 4 possible conjectures. Moving on to the globe example, the conjectures are all the possible states of the world (literally), this state is defined by the proportion of water to land.

Plausibility for a given conjecture is proportional to the plausibility of the data given the state of the world is our conjecture times the plausibility of that world being the case (prior).

This prior can also be thought of as the prior number of paths (for some previous data say). So it’s just counting paths.

Chapter 2

2.2 Building the model

Likelihood function (1) the number of ways each conjecture could produce an observation

Chapter 3

3.3