Reading through this, I came up with more questions about the methodology. Again, I'll take the standpoint of someone reviewing this article for publication.
4. I am asking for more data than I can probably use; I'm expecting most programs (as I'm directly contacting them) to send me just the two scores. However, this study is designed based off Miloro's original paper assessing Part 1 scores and USMLE step 1; these data are practically identical to the ones he asked reported. The biggest issue with this study I think will be power; but if I can get enough responses I would like to look at specifics as well. To be honest, I'm not a great statistician, so once the data is obtained I'll be meeting with our statistics department to determine which tests would be most appropriate (but I'll start looking into them, thanks for the heads up).
I would highly recommend consulting with your statistics department prior to collecting data. To keep the study design tight, you should know exactly which statistical tests and transformations you plan on performing in advance. Otherwise, based on the amount of data you're gathering, you'll run into problems with multiple comparisons and "p-hacking." Assuming you use proper statistical comparisons for the number of variables you're gathering, and compare all of them, the effective "significant" p-value would be somewhere around .0005. Using a p-value of .05, with the number of correlations 15 variables generate, any single "significant" comparison would have around a 99.5% likelihood of being due to chance. Basically, given the highest possible n you could get, there is no possible way you could properly power the study as described.
In terms of repeating Miloro's study, is there a reason to do that? Even using 90+ on the old boards as a cutoff, while sensitive, is inordinately non-specific. The ratio of residents who failed their first attempt was distributed at nearly 1:1 between >90 and <90. If you move beyond the "first attempt" metric in Miloro's study, even this disappears. Given that the NBDE is a criterion referenced test, and the "scores" handed out were highly variable (note that higher scores are even
more variable in criterion referenced tests) this is unsurprising. The same problems exist with the CBSE, along with other, new issues (content, construct validity). Basically, the problems with doing this are explained in this comic:
1. "Cutoff" I suppose is misleading; I want to know the average (the 65 they quote is based off medical students) of dental students taking this test. One of the other big metrics I want to examine is: is there a difference in CBSE scores between dental programs that have integrated classes with medical schools vs. your "traditional" GP programs that have limited basic science. Depending on that average, then we can stratify scores into a likelihood-of-passing type report, similar to Miloro's paper.
Seeing if the test is truly standardized for our unique applicant pool (Does dental school curriculum style influence score? USMLE passage rate?) seems like a better question to ask.
3. Excluding 4 years programs because they don't take the USMLE step 1. Conversely, it would be interesting (and another area of research) to look at 4 year programs and the CBSE scores they consider "good." As part of a 6 year program, I wanted to start with just those. Perhaps myself - or someone else - could look at 4 year programs. The issue being that there isn't a great standardized test for 4 years programs; perhaps the OMSITE (but it is very program-dependent).
There is a great standardized test: the ADAT. It's statistically constructed to differentiate applicants, tests material specific to dental school curricula (allows PDs to equate things like ranks at different dental schools, or non-ranked programs with ranked programs), and is developed specifically for dental residency admission.
Finding a silver-bullet test to distinguish who will and who will not fail the USMLE for OMS-residency is going to be nearly impossible due to the variability of both dental school and OMS-residency curricula.
2. I'll think more on this; I haven't considered anything past what the correlation between the two tests would be.
You already described them, you want to try and develop a statistical "cut-off score" to ensure Step 1 passage