In this post I give a brief instruction on how to calculate the smallest effect size of interest with output from G*Power. My instruction is largely based on an excellent blog post from a blog named “The 20% Statistician” by Daniel Lakens. Mr. Lakens is an experimental psychologist at the Human-Technology Interaction group at Eindhoven University of Technology, The Netherlands.
In his post he explains how to infer the smallest effect size of interest (SESOI) from a power analysis. To do this he visualizes a power analysis for a two-tailed t test by plotting the distribution of Cohen’s d given n participants per group, effect size d, and alpha level α = 0.05. From this it is possible to infer the critical d-value. This is the value that is equivalent to the critical t value you know from the respective figures from G*Power. Go to hi Shiny app to have a look at the graph.
Usually, I use G*Power in order to infer how many observations I need to detect an expected effect size d, say 0.5, with α = 0.05 and 90% power using a two-sided t test. Even though it is not advised for good reasons, G*Power also allows you to calculate the observed (post hoc) power given the observed effect size, the number of observations, and α. The former calculations provides you with the information that you need 86 participants per group, while the latter calculates a power of 0.9 given a sample size of 172 (86 per group) with the above values.
However, sometimes it is important to know what the SESOI is. If you want to register a randomized controlled trial at the AEA RCT Registry you are asked to calculate the minimum detectable effect size (MDES) of your most important analyses. Possibly this is equivalent to the SESOI. G*Power does not provide the SESOI directly, but you can calculate it easily by hand with the information G*Power provides.
What you want to find, basically, is the critical d instead of the critical t. Thinking of the graph output by G*Power: the green line shows the critical t value left of which is a blue area that depicts the Type II error rate. When your t value is right of this, you reject the null hypothesis either correctly or incorrectly because, depending on whether there is a true effect or not, you fall into one of the four cells depicting the relationship between null hypothesis and the decision about the null hypothesis.*
To transform this critical t value (tcrit) to a critical d value (dcrit, or dSESOI), I came up with the following formula:
dSESOI = tcrit * d / δ
dSESOI is the smallest effect size of interest, i.e. dcrit
tcrit is the critical t value that is output by the an a priori or ex-post power analysis
d is the
actual or observed power given α and n effect size
δ is the non-centrality parameter
I derived this formula working with the examples provided by Mr. Lakens in his blog post. I do not know exactly how and why it works because I lack the statistical knowledge. I would be happy about any useful comments and explanations though. However, until then I should say that I am not sure how generalizable this is to other tests.
I provide an example from Mr. Lakens’ post:
Given a two-sided t test, α = 0.05, 86 observations per group gives an actual power of 90%. The critical t value is tcrit = 1.974, and the noncentrality parameter δ is δ = 3.279. Both are output parameters from G*Power (see figure below).
Inserting these values into the equation gives:
dSESOI = 1.974 * 0.5 / 3.279 = 0.301
This is the critical d value that Mr. Lakens finds. You you can calculate it yourself with his excellent Shiny app (be aware of the differences between the graphs from the figure above and that provided in the App. Both appear similar but the axes differ). Effect sizes smaller than 0.301 will never be statistically significant with this type of analysis. Consequently, Mr. Lakens argues, smaller effect sizes are uninteresting to the researcher using this analysis. This often is an implicit decision a researcher makes. If the researcher would be interested in smaller effect sizes, she should have used more observations. However, Mr. Lakens also correctly states that the SESOI can also be based on theoretical reasoning. At best, both theoretical and ‘implicit’ SESOI are identical.
Note that when you use G*Power in order to compute required effect size given α, power, and sample size, you do not calculate the SESOI. Inserting the parameters from above, this calculates the required effect size d = 0.497. This, however, is not the SESOI. Rather, it is the true effect size that you assume to be true. Moreover, this is the smallest effect size detectable with 90% power. The same applies to the equivalent calculation of effect size and experimental-group mean in Stata.
Please note that most of this is based on the post by Daniel Lakens. I am not providing much more insights here. I just thought that this formula would help to get the SESOI from G*Power output quickly and hope that someone benefits from it. Most of the credits go to Lakens, though.
In the future I want to find out if and how I can calculate the SESOI for other tests, for example simulated power analyses for tobit models.
* More precisely, if the null hypothesis is true, you incorrectly reject the null hypothesis if the t statistic is in the red area (|tobs| > tcrit). This would be a Type I error (False Positive). If the null is true, you correctly not reject the null hypothesis if the t statistic is in the grey area under the red distribution line (|tobs| < tcrit) (True Negative). If the null is false, i.e. the alternative hypothesis is true, i.e. if there is a true effect, you incorrectly not reject the null hypothesis, i.e. incorrectly ‘accept’ the null hypothesis, if your t statistic is lower than the critical t value. You make a Type II error (False Negative). This is indicated by the blue area. If the alternative hypothesis is true and you observe a t value greater than the critical t value, you make the correct inference of rejecting the null hypothesis (True Positive). Note however that you never know the ‘true’ state of nature, i.e. whether the null is wrong or false. The Frequentist approach here only allows you to be pretty sure that, on the long run, say 100 identical experiments, you are not committing a Type I error more than 5, and a Type II error more than 10 times.