On dealing with problematic data

A case example from East Palestine on how sometimes decent concepts can still come from so-called "bad" people.

May 03, 2023

The following is another East Palestine-related post. However, the point of this post is to argue that information should not be gauged on who uses or discusses the information. Rather, the person who disseminates the information may be criticized for using poor data. It’s the data that should drive how we look at studies not those who may utilized flawed studies. Also, the following includes remarks from someone who reached out to explain some of the water sample data reported on by Pace Analytical. I greatly appreciate that someone reached out to provide their own industry-based perspective on some of these results which helped to add some clarity.

Interpretation of data relies heavily on the accuracy of the data collected. In the case of studies the data can only be extrapolated to a degree allowed for by the participants or samples included.

For instance, a study looking at the partying behaviors of college students isn’t going to tell us much about what people in their 30’s think of with respect to such proclivities, and it would be rather reckless to extrapolate that data to make assumptions about these 30-something adults.

Unfortunately, not all data is clean, and in these cases where the data is messy it becomes an issue in discerning whether something is just noise or an actual signal.

This is one of the reasons I have stayed away from excess mortality data regarding the past few years- the data is just way too messy and everyone would end up interpreting it in their own way like some post-modernist piece of “art”.

In such cases where the data may be difficult to interpret heuristics may takeover and assessment of data may be relegated to the nature of the messenger.

Consider many federal agencies which have come under necessary scrutiny over the past few years such as the CDC, NIH, FDA, and EPA.

These agencies have lost a lot of public trust- if the public ever had any- due to their weaponization of science and policy to dictate public health and lifestyles.

In many cases COVID policies were made on the back of horrible data and false pretenses, and yet many representatives of these alphabet bureaucracies may obfuscate such criticisms in order to dictate how we all should live.

But this warranted concern can also lead to a distrust of any information that comes from these agencies, and in many cases people may choose to do whatever is opposite of what these agencies suggest.

This comment is generally made tongue-in-cheek, but some people may also take such remarks to heart.

Such reasoning runs into its own issues of shortcuts over critical thinking, in which the message is interpreted through the lens of the messenger rather than through the actual message.

That is, the way we look at data may be based on who provides the data rather than what the actual data suggests. It’s not for the fact that much of what Fauci or Walensky states are false that they are criticized, but more so that the data they rely on are heavily flawed. It’s the data that’s the problem, and these representatives serve more as a face for underlying issues in how science gets disseminated through the use of lab coats and appeals to authority.

One example of this recently occurred with respect to the East Palestine derailment.

Last month a congressional hearing was held in which Debra Shore, the Administrator for Region 5 of the U.S. EPA, was asked by Senator Capito about why the EPA did not test for dioxin sooner.

Shore’s answer was that the EPA was looking for primary indicators of combustion products from the controlled train fire, which included phosgene and hydrogen chloride.

The argument here is that the lack of concerning levels of these primary indicators would suggest that dioxin levels would be relatively much lower as dioxin should be a secondary indicator:

Of course, such an answer would raise some comments about the EPA not wanting to test for dioxins because they would end up finding high levels (a few people in the comments suggested as much).

Although the lack of dioxin testing warrants criticisms, Shore’s answer also isn’t inherently wrong.

That is to say, if the main products of vinyl chloride combustion, which are phosgene and hydrogen chloride as remarked on in a previous post are not present at high levels, one shouldn’t expect less produced compounds to somehow occur at higher rates.

Such a comment wouldn’t be met with positive feedback given these circumstances and the EPA’s overall failure to act more quickly, but the problem here is determining whether Shore’s hypothetical makes sense.

This is the problem when assuming that one’s position may warrant immediate dismissals of proposed hypotheticals or models.

It’s clear that the EPA hasn’t done enough to deal with the fallout in East Palestine, but that also doesn’t mean that all comments that come from the EPA are inherently incorrect or untrustworthy.

Now, it should also be clear that dioxin production would rely on many factors, and so even this hypothetical can come under scrutiny when one raises questions about the reliability of the air monitoring being conducted and why dioxin levels weren’t tested irrespective of this hypothetical- remember that one cannot claim that East Palestine is safe when not enough was done to ensure that such a declaration can be made.

But I also don’t expect members of Congress, some of whom grilled Mark Zuckerberg and revealed how little they know of technology, to be able to ask pertinent questions and allow for a good back and forth (I’m pretty sure not many of them know much about dioxin aside from the fact that they are toxic- how are they toxic? Don’t ask them…). This may be something worth saving for another time…

Remember that the important factor in all of this is the information, and not the person who disseminates this information. So on one hand these sorts of representatives can come under necessary scrutiny, but not all that is stated may be inaccurate if there is evidence to back up such statements.

But for a clearer example we can look at some of the questions with respect to some of the water samples taken from East Palestine.

As an example, early in my reporting I raised the point that some of the water samples collected by 3rd parties hired by Norfolk Southern had some problems, including air bubbles found in one sample vial that may mess with sample results. Refer to the prior link for additional links to those Pace Analytical results.

Around that time I was fortunate enough to have someone who worked in the field of analytical chemistry actually reach out to me to provide some clarity with the report. The pertinent portion of that email can be seen below, which broke down some of the issues with the report (the Summary is bolded below, with some of the technical details included in the other sections):

ISSUES I HAVE WITH PACE DATA - but do not find indicating the data is bad.
POOR BLANKS - their lab blanks are not perfect. Some common compounds are noted in Pace lab blanks. However, this does not compromise the data if these compounds were not found in the samples analyzed. If the same compound found in the laboratory blank was found in the sampes from E Palestine, one would not be guaranteed the sample had that compound or if it was an artifact from the laboratory analysis. If the compound was not found in any sample - there is no issue with the E Palestine sample as the compound did not show up. It just indicates Pace is not controlling their Volatile Blank water as well as they should.
POOR DAILY CALIBRATION - similar to the blanks, the lab runs an injection daily that is at the mid value of the curve. Recovery of this data point has values set - "normally" about 70-130 percent of the true value. You never get "100%" of the value you injected, as there is a bit of fluctuation built into an analysis where you are measuring at a parts per billion level. Pace did not have every compound within this window - so you will note a CH or CL in the Qualifier column. Like the blank sample, if the compounds are not found in the sample, it has little effect on the sample results, it just indicates that Pace is a bit sloppy, not insuring a perfect daily calibration before running samples. At my lab we never ran samples unless the daily calibration was perfect - some days it took a bit of instrument fixing to achieve this - so it cost us more money. We passed this onto the customer with higher prices - but many of our customers paid our prices due to the extremely high quality control. Pace looked at the E Palestine data, noted the compounds that had the daily calibration outliers were not found, so did not bother to run them under a perfect daily calibration as it did not affect the data if the E Palestine samples did not have that compound. Look at page 37 of the Pace report. Here you see the effect of the calibration issue - several of the samples with a CH in the Qualifier column were spiked in at 10PPB, but detected at 14.2, 14.5 or 13.3 ppb - higher than they should have been found. The qulifier (due to the daily calibration issue) shows these are biased high due to the poor daily calibration.
MATRIX SPIKE & MATRIX SPIKE DUPLICATE - notes as MS & MSD samples.
An additional quality check is to collect 1 in 20 samples with sufficient volume to run the sample 3 times, with two of those times adding a known amount of known "problem" compounds to ensure the data is consistent. This confirms that the sample itself is not causing problems with the analytical result. Apparently the sampling team did not collect sufficient sample to allow Pace laboratory to perform a MS MSD analysis. This is not an issue with the laboratory - just poor technique by CTEH who collected the samples. Does not prove the sample data is bad, it just means there could be an issue.
SUMMARY
The quality control at Pace was not ideal, but did not really give bad data for the samples run as it did not affect those compounds found. CTEH did some "sloppy" water collecting - but I did not see in the report any explanation given. I was also disappointed that Pace did not give a simple summary of what was found, and a brief explanation of the items I noted above. A "great" lab would have summed up the analysis based on all the data, given a brief review, and commented on any major issues noted.

So in this case the sample collection appears to have been poor, along with a few sloppy methods taken by the scientists at Pace Analytical, but overall the findings likely weren’t affected to a serious degree.

Norfolk Southern, and CTEH in particular, don’t appear to have a good track record when it comes to public perception. At the same time, this lack of trust and poor sampling doesn’t mean that the data reported is inherently flawed- it’s being able to understand what you can from the data provided that determines what can be extrapolated.

In this case, even though there were a few issues the compounds in question don’t appear to have been affected, which may suggest that some of these questionable compounds may be within acceptable ranges.

Again, this is based on the limited data provided at the time (and the limited data available 3 months out now).

This is an important reminder that data, even when provided by people who would otherwise be untrustworthy, shouldn’t automatically be dismissed. Rather, understanding why such data may be flawed is far more critical.

I sincerely thank the person who reached out to provide the information with respect to assessing the Pace Analytical data! It helped put the results into perspective and clarify whether the data would be considered seriously unreliable. So thank you to the individual!

Substack is my main source of income and all support helps to support me in my daily life. If you enjoyed this post and other works please consider supporting me through a paid Substack subscription or through my Ko-fi. Any bit helps, and it encourages independent creators and journalists such as myself to provide work outside of the mainstream narrative.

Click on the image above or visit the website at **https://ko-fi.com/moderndiscontent**

ClearMiddle

May 3, 2023

"...like some post-modernist piece of 'art'"? I never thought of it that way, but yes. I have been giving low weight to conclusions based primarily upon excess mortality.

I worked with healthcare data in one phase of my career and "unfortunately, not all data is clean" would be an understatement there. It's been 10 years, but "no data is clean" would seem closer to what I remember.

Thank you!

Expand full comment

2 replies by Modern Discontent and others

chico

Really enjoyed this article. One question - went to your Ko-fi page - does it not take PayPal?

1 reply by Modern Discontent

4 more comments...

Modern Discontent

On dealing with problematic data

A case example from East Palestine on how sometimes decent concepts can still come from so-called "bad" people.

Discussion about this post