The goal of this online supplement is to provide a complete worked example of the implementation of a Metropolis-Hastings within Gibbs sampler for a two parameter logistic (2PL) item response theory (IRT) model in R.
My approach is to mimic the steps that I went through when I wrote the sampler for the chapter. My hope is that by doing so, I will make it easier to follow along at home. I have attempted to make these pages relatively self-contained. That said, it would likely be helpful if you read the chapter first.
I have broken up the content into separate posts:
Getting started with R
Building your first sampler
Improving the sampler's performance (Coming soon)
- Post 10: Over dispersion and multi-core parallelism
- Post 11: Replacing R with C
- Post 12: Adaptive tuning of the Metropolis-Hastings proposals
Miscellany (Coming later)
- Post 13: Optimizing your computer for R.
I chose to implement this supplement as a blog because it seemed like a good way to communicate with the people who use it. Please feel free to leave a comment on any of the posts. Thanks.
R is an interpreted programming language that makes it easy to think about statistics instead of thinking about programming. Unlike other programming languages, R is commonly used by typing commands one-at-a-time in an interactive session.
RStudio is a program that facilitates this interactive use of R.
We will use both.
In this post, we define the Two-Parameter Logistic (2PL) IRT model, derive the complete conditionals that will form the basis of the sampler, and discuss our choice of prior specification.
In order to check that an estimation algorithm is working properly, it is useful to see if the algorithm can recover the true parameter values in one or more simulated "test" data sets. This post explains how to build such a test data set for a 2PL model, gives two methods to check that the data are being generated properly, and examines the robustness of the checks by intentionally introducing bugs.
In the chapter, we argue that a useful way to develop a Metropolis-Hastings (MH) within Gibbs sampler is to split the code into two levels. The top level is the "shell" of the MH within Gibbs algorithm, which sets up the data structures, implements burn-in and thinning of the chain, and calls a lower-level function to do the actual sampling from the complete conditionals. In this way, we can test the shell of the algorithm separately from testing the complete conditional samplers.
This post quotes the code presented in the chapter, discusses a data structure choice, gives an example of how to run the "empty" shell, and gives an example of how to visualize output from the shell.
The previous post outlined the general strategy of writing a MH within Gibbs sampler by breaking the code into two levels: a high level shell and a series of lower-level samplers which do the actual work.
This post discusses the first of the lower-level samplers, the sampler for the person ability parameters. The chapter gives a complete treatment of how to write the sampler, please read it for the details. Here, we summarize the development of the sampler, check that the sampler code is correct, and give two examples of how bugs might be detected. The final bug will motivate the next post where we will refactor this specific person ability sampler into a general MH sampler.
A best practice in software engineering is to re-use code instead of duplicating it. One reason for this is that it makes finding and fixing bugs easier. You only have to find the bug once instead of finding it every place you duplicated the code.
Since the person ability, item discrimination, and item difficulty samplers will all use the same Metropolis-Hastings (MH) algorithm, it would be useful to define each of those samplers in terms of a generic MH sampler.
In this post, we refactor the person ability sampler from Post 4 into: (1) a generic MH sampler, (2) a function to draw proposals for the person ability parameters, and (3) a function to calculate the complete conditional density qfor the person ability parameters. We then implement a new version of the person ability sampler and compare its output to the sampler from Post 4.
In this post we refactor the proposal function from the previous post into a generic normal proposal function. This allows us to implement normal proposals for the (yet to be developed) samplers for the item parameters without duplicating code.
Our approach is to write a function which returns a function. We begin with a toy example to explain why that would work, implement a function that returns a normal proposal function, and then check that the refactored function works.
In this post, we will build the samplers for the item parameters using the generic functions developed in Post 5 and Post 6. We will check that the samplers work by running them on the fake data from Post 2, visualizing the results, and checking that the true values are recovered.
The Bayesian 2PL IRT model we defined in Post 1 set a hyper-prior on the variance of the person ability parameters. This post implements the sampler for this parameter as a Gibbs step. We will check that Gibbs step is working by running it on the fake data from Post 2, visualizing the results, and checking that the true value is recovered.