Another issue for these studies is following up too soon (e.g., checking for regret after a year).

And indeed, for the AAP adoption of the affirmative model (Rafferty et al, 2018): "Remarkably, not only did the AAP statement fail to include any of the actual outcomes literature on such cases, but it also misrepresented the contents of its citations, which repeatedly said the very opposite of what AAP attributed to them." https://www.tandfonline.com/doi/abs/10.1080/0092623X.2019.1698481 Lot of experts, but still....false!

This is really impressive, thank you so much for doing this!

I hope that this is ok - it is possible that you phrased it this way to simplify things, but I thought that the explanation regarding the p-value was not entirely accurate.

A p-value is not the probability that the null hypothesis is false (it is also not the probability that the null hypothesis is true). The p-value is the probability for us to get a result as extreme or more extreme than the one in the sample. It basically tells you how marginal your result is under the assumption that the null hypothesis is true, which is used as an indication of how likely it is that the null hypothesis is true, given the data we got in our sample (though this is not exactly what it is either). For example, if you are trying to see if a certain species is, on average, the same size as another similar species, you can look at the distribution of sample means under the assumption that there is no difference between the two populations, and see where your sample mean falls in that distribution - if it is in the far end of the margins, you will get a small p-value and reject your null hypothesis (a significant result).

Another, probably more important, comment is that while a p-value of .05 is commonly used, when the risks of falsely rejecting the null hypothesis are high, a smaller p-value is sometimes used. For example, when a new treatment has dangerous side-effects, a researcher may use a smaller alpha level (such as .01) so that only a strong indication that the treatment is effective will lead to implementing it.

Finally, while p-values may give some indication as to whether the results were likely to be obtained if there was no effect, the p-value depends on the sample size and therefore it does not tell you how strong the effect is. In other words, a treatment may be found to lead to a statistically significant improvement, but the improvement itself may be very small. To indicate how big or small the actual improvement following the treatment is, one needs to look at the effect size, not just at the p-value. So when "gender affirming treatments" which have long-term side-effects and medical risks are offered, one would wish that the data behind them show that they don't only offer a statistically significant improvement (i.e., small p-value), but also that the improvement is big enough considering the costs (i.e., large effect size).

Otherwise I thought that this was really incredible work, and so very needed in this area. Thank you so much!

Thank you for the clarification. My introductory statistics course was many years ago. I was trying to give a very high level view of a complex topic in limited space.

I thought you did an amazing job! I just thought that the comments may add something, but I also tend to be very nitpicky with these things, so maybe it's just me. I hope this is ok.

Another issue for these studies is following up too soon (e.g., checking for regret after a year).

And indeed, for the AAP adoption of the affirmative model (Rafferty et al, 2018): "Remarkably, not only did the AAP statement fail to include any of the actual outcomes literature on such cases, but it also misrepresented the contents of its citations, which repeatedly said the very opposite of what AAP attributed to them." https://www.tandfonline.com/doi/abs/10.1080/0092623X.2019.1698481 Lot of experts, but still....false!

Tl;dr: Scientifically speaking, there's 50 different ways to be bullshitted, and if you can think of 3 of them, you're a genius.

(With apologies to Mickey Rourke and Body Heat)

Fantastic, thank you!

Another issue for these studies is following up too soon (e.g., checking for regret after a year).

And indeed, for the AAP adoption of the affirmative model (Rafferty et al, 2018): "Remarkably, not only did the AAP statement fail to include any of the actual outcomes literature on such cases, but it also misrepresented the contents of its citations, which repeatedly said the very opposite of what AAP attributed to them." https://www.tandfonline.com/doi/abs/10.1080/0092623X.2019.1698481 Lot of experts, but still....false!

As always justdad7, a terrific contribution.

I’m going to share this with a couple medical professionals I’ve been speaking with, thank you!

Thanks for freely providing such a thorough piece.

There seem to be a couple of typos/missing words, unless I'm mistaken:

1) A 1 percent change in something is probably to significant but a 10 percent change would be.

2) Puberty blockers dysphoria by a clinic in Amsterdam in the 1990s and were not widely used in North America until at least 2009.

Looking forward to reading your next article.

This is really impressive, thank you so much for doing this!

I hope that this is ok - it is possible that you phrased it this way to simplify things, but I thought that the explanation regarding the p-value was not entirely accurate.

A p-value is not the probability that the null hypothesis is false (it is also not the probability that the null hypothesis is true). The p-value is the probability for us to get a result as extreme or more extreme than the one in the sample. It basically tells you how marginal your result is under the assumption that the null hypothesis is true, which is used as an indication of how likely it is that the null hypothesis is true, given the data we got in our sample (though this is not exactly what it is either). For example, if you are trying to see if a certain species is, on average, the same size as another similar species, you can look at the distribution of sample means under the assumption that there is no difference between the two populations, and see where your sample mean falls in that distribution - if it is in the far end of the margins, you will get a small p-value and reject your null hypothesis (a significant result).

Another, probably more important, comment is that while a p-value of .05 is commonly used, when the risks of falsely rejecting the null hypothesis are high, a smaller p-value is sometimes used. For example, when a new treatment has dangerous side-effects, a researcher may use a smaller alpha level (such as .01) so that only a strong indication that the treatment is effective will lead to implementing it.

Finally, while p-values may give some indication as to whether the results were likely to be obtained if there was no effect, the p-value depends on the sample size and therefore it does not tell you how strong the effect is. In other words, a treatment may be found to lead to a statistically significant improvement, but the improvement itself may be very small. To indicate how big or small the actual improvement following the treatment is, one needs to look at the effect size, not just at the p-value. So when "gender affirming treatments" which have long-term side-effects and medical risks are offered, one would wish that the data behind them show that they don't only offer a statistically significant improvement (i.e., small p-value), but also that the improvement is big enough considering the costs (i.e., large effect size).

Otherwise I thought that this was really incredible work, and so very needed in this area. Thank you so much!

Thank you for the clarification. My introductory statistics course was many years ago. I was trying to give a very high level view of a complex topic in limited space.

I thought you did an amazing job! I just thought that the comments may add something, but I also tend to be very nitpicky with these things, so maybe it's just me. I hope this is ok.

Fantastic, thank you!

Another issue for these studies is following up too soon (e.g., checking for regret after a year).

And indeed, for the AAP adoption of the affirmative model (Rafferty et al, 2018): "Remarkably, not only did the AAP statement fail to include any of the actual outcomes literature on such cases, but it also misrepresented the contents of its citations, which repeatedly said the very opposite of what AAP attributed to them." https://www.tandfonline.com/doi/abs/10.1080/0092623X.2019.1698481 Lot of experts, but still....false!

This was incredibly useful, well thought out and written, thank you so much.

Thank you for this. Perfect title.

Very well done. Thank you.

Your SEGM link in the Control Groups section is incorrect.

Thank you. I rechecked all the links.