SELECTing for the right results
The second portion of the SELECT GLP-1 study is taking outlets by storm. Unfortunately, there's a lot that needs to be scrutinized regarding this study.
Earlier this week various researchers and scientists met for the 31st European Congress on Obesity (ECO2024) which took place in Venice, Italy. It serves as a routine meeting in which people within the medical community discuss current breakthroughs and ongoing research in obesity and weight management.
Sounds like a pretty good deal to be able to spend some time in Venice! It also serves a pretty big example of a conflict of interest as both Lilly (maker of Tirzepatide) and Novo Nordisk (Ozempic/Wegovy) were two of the major sponsors for ECO2024.
Therefore, it wouldn’t be too hard to assume that such a meeting would prop up these GLP-1 RAs as the breakthrough miracle drugs that they are made out to be.
And coincidentally a study was published in Nature1 on Monday which further reinforced benefits of these medications, with this study being part of the overall SELECT study and suggests that those who take the highest dose of Semaglutide available (i.e. Wegovy) were able to maintain long-term weight loss.
Ongoing debate has questioned how long people should stay on these medications, with consensus being that patients may have to be on them possibly for the rest of their life. Of course, this assumption was based on scant data suggesting that people who stop these medications gain their weight back, leading to this strange assumption that people should just take these medications for years to come. At the same time growing concerns about possible safety issues have been brought up which have cast doubt on the actual long-term use of these medications.
Given that this is a new frontier there is not much data regarding these possible safety issues as well. Therefore, a study such as this one would serve to encourage not only further use of GLP-1 RAs but their use for the rest of people’s lives.
Now, I have covered a portion of the SELECT trial previously, and so there shouldn’t be a surprise that this study has some of its own problems.
Study Overview
Note that the participants in this study are the same as the prior one focused on cardiovascular events (referred to as the “MACE-related study”). However, strangely this new study is the only place where deeper demographic breakdown is provided.
For instance, there was a higher proportion of Asian participants who were within the lower BMI categories relative to the higher categories. Also, note that the higher BMI categories began to skew more towards female even though males made up a large portion of the participants in the study:
Of note, in the lower BMI categories (<30 kg m−2 (overweight) and 30 to <35 kg m−2 (class I obesity)), the proportion of Asian individuals was higher (14.5% and 7.4%, respectively) compared with the proportion of Asian individuals in the higher BMI categories (BMI 35 to <40 kg m−2 (class II obesity; 3.8%) and ≥40 kg m−2 (class III obesity; 2.2%), respectively). As the BMI categories increased, the proportion of women was higher: in the class III BMI category, 45.5% were female, compared with 20.8%, 25.7% and 33.0% in the overweight, class I and class II categories, respectively.
It’s strange that only select demographic data is provided across the two studies, and this is reflected several times in how the data is presented making it rather difficult to get a complete picture of this study.
When the Wegovy group was compared to the placebo group there was a noticeable decrease in body weight relative to baseline up to the 65-week mark. Afterwards, it appears that those on Wegovy reached a weight-loss plateau. In contrast, minimal weight-loss occurred within the placebo group.
There are a few things here that are worth pointing out. Note that, for some reason, most participants appeared to stop losing weight around the 65-week mark. This effect, also known as “Ozempic Plateau” has been reported in the media in which people just seem to stop losing weight on Ozempic after some time.
The reason for this plateau is unknown, and for the most part many of the explanations don’t seem rather compelling as they refer to things that are not pharmacological in nature. Because most participants are supposed to be on the same dosage of Semaglutide there shouldn’t be a dose-dependent reason for the plateau, although the authors note that only around 77% of participants reached the full 2.4 mg dosage of Semaglutide. There’s no separation of partial-dosage patients and their weight loss so this data isn’t available.
Strange that no mention is put forth regarding why this plateau is occurring in some patients. Instead of referring to this as a plateau the researchers describe this phenomenon as a “sustaining” of weight loss. A strange choice of words, and it shows how researchers can engage with semantic wordplay to change how data is interpreted:
For those in the semaglutide group, the weight-loss trajectory continued to week 65 and then was sustained for the study period through week 208 (−10.2% for the semaglutide group, −1.5% for the placebo group; treatment difference −8.7%; 95% CI −9.42 to −7.88; P < 0.0001).
So what some may consider a plateau worth investigating some may see it as sustained loss of weight…
But there’s more strangeness with this dataset. For some reason the number of patients is sporadic between both the Semaglutide and the placebo group, and not in a way that trends towards lower participant numbers.
If you didn’t notice look at the bottom of the graphs from Figure 1 which notes the number of participants whose data was used for that specific time plot. Note that, at specific timepoints, there are sporadic jumps in participant numbers:
For the most part we should expect that participant numbers would decline over time as participants get lost from repeat visits. This raises some questions as to why these sudden jumps in participants are occurring. More importantly, note that these sporadic jumps are occurring at what the researchers refer to as “landmark visits”, likely because they are occurring at critical endpoints that are used for calculations.
So what exactly is happening to lead to these sudden bumps in participant numbers, especially at the year 2/104 week mark which is far higher than at the 52 week and 78 week mark?
The authors note this remark within their Methods section which raises some red alarms:
Missing data at the landmark visit, for example, week 104, were imputed using a multiple imputation model and done separately for each treatment arm and included baseline value as a covariate and fit to patients having an observed data point (irrespective of adherence to randomized treatment) at week 104. The fit model is used to impute values for all patients with missing data at week 104 to create 500 complete data sets.
Now, I’m not very good with statistical analyses so take my interpretation with a grain of salt. My (amateur) interpretation of this comment is that the authors “filled in” missing participant data at these landmark visit points.
That is, the endpoints for hundreds of patients appear to have been artificially constructed.
Even if we assume that the weight-loss plateau would persist past the 65- week mark and therefore would still lead to the same results as reported above the inclusion of constructed data should raise some concerns, especially if these timepoints are endpoints which the researchers are using to present their results. It’s obviously the endpoints that those in the media use, and so it seems fitting that this artificially constructed data should be criticized even if commonplace in clinical research.
In essence, this should bring into question the validity of these 104-week and 208-week endpoint values since it’s clear that some of this data was not collected from actual participants.
In a callback to findings from the other SELECT trial (MACE-related) note that I commented on the subgroup analysis showing the lines of both the placebo and the Semaglutide group converging. In such a study we should expect that both lines should diverge if we assume that a treatment group should experience fewer MACE relative to the placebo group.
It’s curious that this period also coincided with the jump in numbers boxed in red above, so were some of these values artificially constructed as well?
The serious issue here is that it almost feels as if we are looking at the exact same dataset but from different lenses and different interpretations. The MACE-related SELECT study doesn’t show the same degree of sporadic participant numbers, and this may be because reporting of MACE relies on some degree of documentation rather than just lab visit alone.
But even then, why is it that the same dataset just seems to sporadically collect different data? Why does it seem as if the same dataset comes with various inconsistencies in how it is being analyzed and interpreted.
And to that point let’s go back to a prior comment I made in how muddy of a picture this study presents.
Remember that the MACE-related SELECT study noted that around double the number of participants within the Semaglutide group dropped out due to adverse events relative to the placebo group, and this comment is also made in the current MACE study:
We reported in the primary outcome of the SELECT trial that adverse events (AEs) leading to permanent discontinuation of the trial product occurred in 1,461 patients (16.6%) in the semaglutide group and 718 patients (8.2%) in the placebo group (P < 0.001)21.
Strangely this comment seems to leave out the breakdown of the adverse events- again, remember that the MACE-related study made mention that 880 patients within the Semaglutide group experienced gastrointestinal problems.
Now, I’ll take the conservative position that the “880 patients” who dropped out are not because of gastrointestinal problems (i.e. not strictly related to one AE in particular) but rather that of the people who dropped out because of AEs 880 reported gastrointestinal problems. Note- 880 people, not 880 individual reports of GI-related adverse events.
Since we are looking at the same dataset wouldn’t it therefore make sense that this number should also appear when analyzing adverse events for this current study. That is, shouldn’t we expect that at least 880 different participants reported GI-related AEs?
This data is reported in the Supplemental material as serious adverse events stratified by BMI category, but surprisingly the numbers here are different:
When looking at the Semaglutide group, and if we add up the number of participants who reported GI-related AE (in the N column) do we get 880?
No, in fact we get around 342 participants- less than half of what was reported in the MACE-related SELECT study.
Remember that the 880-value reported is a conservative estimate of people who dropped out specifically due to AEs, while the data presented in the Supplemental for this current SELECT study doesn’t differentiate between those who experienced an AE and discontinued the study from those who experienced an AE and still participated.
For all intents and purposes, the data from this Supplement should be more generous, and should give a number higher than the 880 participants reported in the MACE-related study.
AGAIN, we are looking at the same dataset, yet the reporting of even AEs is different between the two studies. This study doesn’t even contain a category for gallbladder-related disorders even though this was mentioned in the MACE-related study! It does have a category for hepatobiliary disorders, but that would make the inconsistencies even worse given that such a category would include liver, pancreatic, and gallbladder-related complications- not just gallbladder issues alone.
So why so much inconsistency between the two studies- how are participants being categorized, and what data is being collected/excluded between the two studies that may present an appearance of good results?
And quite frankly how many of the people who are propping up this study even took much time to assess egregious issues? This is why we are seeing such failures in both science and journalism when outlets don’t do their due diligence to assess the data that they are being presented with.
In Addition
I could go on about other issues with this study, but I’ll try to summarize some more points below.
Note that subgroup analysis suggested a remarkably high decrease in weight loss among females:
That’d be a pretty good way of marketing these medications to women.
But if you were paying attention to my comment above note that the breakdown of women across the BMI categories is different. That is, even though females made up around ¼ of the study population they were nearly ½ of the participants who were within the highest BMI category (Class III Obesity). They also made up around 1/3 of those within the Class II Obesity category as well.
Essentially, women were overrepresented within the higher BMI categories, and therefore were likely the ones to benefit the most from taking Semaglutide. And so, it’s not that females as a subgroup are likely to see a huge benefit from this drug, but more so that many of the women in this study were within the highest BMI categories that may provide a better explanation.
It’s funny that the authors don’t mention this in their interpretation, and yet they include a comment that the lower weight-loss from Semaglutide among Asians within this study may be because Asians were overrepresented within the lower BMI category:
Asian patients were more likely to be in the lowest BMI category (<30 kg m−2), which is known to be associated with less weight loss, as discussed below.
It’s not that Asians may respond less to Semaglutide, but that Asians had lower BMIs within this study and therefore likely would have seen far less weight-loss overall.
This is one of the reasons why I have grown more critical of aggregate analyses of clinical data. Individual variances will regress to some norm when coalesced with other individuals of a subgroup. Coalesced subgroups will lose even further nuance and variance with even additional regression, and eventually you reach the point of “drug vs no drug” and examine whether a drug had an effect on a population level that misses the different trees for the overall forest.
Sure, maybe this is easier for clinical analysis and statistical measures, but by missing even slight variances in subgroups it becomes easy to report something that may not actually be present, or to extrapolate meaning that may otherwise be explained by other reasons.
I’ll end my post with one final comment- exactly how long is “long-term”?
We’ve grown quite critical of the term “long-term” and how nebulous of a phrase it has become. Anything more than a few days or months can be made out to be “long-term”, even though this term is being used to refer to medications or procedures that are intended to last for years and even a lifetime. After all, most clinicians and journalists have suggested that you would need to be on these GLP-1 RAs for the rest of your life to prevent gaining the weight back.
So, on the grand scale of many years how meaningful is a study that only encapsulates 4 years of data?
Really only 2/3 years of data since patient drop off accelerated after the 2-year mark. I’d be hard pressed to argue that there is sufficient participant data at the 208-week mark when nearly 1/10 of participants from both groups were lost by that time (this isn’t even including whether this number may be inflated as well!).
It also shows how inconsistent the data presentation is when most of the endpoint measures presented are for the 104-week mark and not the 208-week mark.
This makes me even more skeptical towards the presentation of this study as a “4-year” study, but this hasn’t stopped those in the media from running with this number without providing even a hint of scrutiny:
It’s just another good reason why people should be careful in trusting the media’s reporting on studies. Not many do it well, and quite frankly not many are reading the studies that they are covering anyways.
Hopefully I’ve made it clear that there’s quite more going on than what is being reported. There should be serious questions raised regarding why two separate studies relying on the same dataset seem to present different bits of information and appear to have some inconsistencies.
It’s why we need more due diligence in assessing information rather than rushing to report on studies that can have serious social/medical ramifications. As of now people are making a huge deal from the findings of the studies even though we’re nowhere near having enough long-term data despite what people are reporting.
As I have mentioned remember that we produce our own GLP-1, and before we go all in on the pharmaceutical route it’s important to consider what is going on with the GLP-1 we should be producing. We should understand why GLP-1 dysfunction may be happening within ourselves if we are to solve the problem that is being managed by exogenous use of a simulacrum.
That’s the real long-term study that needs to be conducted.
If you enjoyed this post and other works please consider supporting me through a paid Substack subscription or through my Ko-fi. Any bit helps, and it encourages independent creators and journalists such as myself to provide work outside of the mainstream narrative.
Ryan, D.H., Lingvay, I., Deanfield, J. et al. Long-term weight loss effects of semaglutide in obesity without diabetes in the SELECT trial. Nat Med (2024). https://doi.org/10.1038/s41591-024-02996-7
The Adverse Event rate exceeds 16%?!?! Expect massive class-action lawsuits to follow.