Lab Series#18: The Levey-Jennings Chart

LJ chart. Source
A huge reader base of this blog is somehow connected to health care and a majority are connected to a laboratory that deals with medical samples. A strong requirement of the work nature is to be producing consistent results. But how do you actually measure that the results produced are consistent and reliable? Well, one of the methods it is to measure the variations and see if it is acceptable. There is a huge number of tests that can be potentially done to estimate the variations and check if the results that are produced are indeed acceptable. The point of this post is to talk about one such test- "LJ Chart", simply because the method is universally done to ascertain the authenticity of results. But before I jump into the calculations, it is important to understand some background concepts.

A Scenario to Consider...

Let us start with a scenario. Assume that you have received a blood sample for measuring HIV Viral Load. The method of estimation that you are using is a Real-time PCR based kit. Any method of estimation uses some sort of chemistry, involving some set of steps, each of which uses reagents and some delivery devices. In the above example, we aliquot several reagents of predetermined volume sequentially, using a pipette and then look for the sequence amplification using a Real time PCR method. Any analytical measurement has an inbuilt error into it. Consider this. The International System of Units (SI units) identifies a total of 7 base units and 22 derived units. Based on these standards, a first copy is made which is used to calibrate further downstream. For example, a copy of the international standard of 1 Kg is issued against which the national level standard is calibrated. Because of the process of calibration, a small error is inevitably introduced. This, in turn, is used for calibrating other regional level systems and other industries which offer calibration and so on with a small level of error introduced at each level. So, in the end, a well-calibrated pipette which was supposed to deliver you a 100 uL of water, will actually deliver you anywhere between 99.9999 uL and 100.0001 uL (The actual numbers will vary. But you get the illustrative idea). Now consider repeated process and addition of multiple reagents (Each having its own set of undefinable errors). Thus, even in the best hands, there is an uncontrollable level of error. Add to it a level of manual handling and the levels of variations shoot up significantly. At some point, if not well controlled, the error rate is too much and can be called as "Failure to achieve reproducibility and quality".

True value, Accuracy and Precision

True Value

"True value" is a hypothetical concept. This represents the actual concentration of the analyte. Let's say in this above example, the true value is 300 copies/ml. However, because of the error inbuilt to the technique, let's say the final result was 302 copies/ml. I would say that the testing of the sample was exceptionally good. But there's a problem. I don't know the true value (If I had known then why will I run the test). Since in reality, we cannot ever find the true value and thus the error cannot be actually quantified, we use "conventional true value". It is also known as "Assigned" or "Reference" value. As per the International vocabulary of basic and general terms in metrology (VIM) document,

"The true value would be obtainable by a perfect measurement, that is a measurement without measurement error. The true value is by nature unobtainable."

Accuracy

Figure 1: Accuracy and Precision. Source
The classical definition states that accuracy is the closeness of the agreement between the result of a measurement and a true value. Mathematically, accuracy can be calculated if the true value is known. In the case scenario, accuracy= 99.33%. But since the true value is not known, we calculate this based on the reference value. We will talk more about how to find the reference value later. The nearer the result is to the true value, the better is the accuracy. See Figure 1.

Precision

Precision, in comparison to true value and accuracy, is a real quantifiable measurement. Precision refers to the agreement of a set of results among themselves. In the above example, let's say I repeated the test 20 times, and each time I got the value as 302 copies/ml then my precision is 100 %. Now let's say, I performed the test 20 times and each time I got 350 copies/ml each time, my precision remains 100% but my accuracy is far from reality. On the contrary, if testing is accurate every time, the precision is bound to be very tight.

In reality, we don't express the accuracy and precision as a percentage but rather as a statistical function called a central tendency (Mean ± Standard deviation). In Figure 1, the central point (Bull's eye) is the True value. Measurements near to the true value are accurate and clustering of measurement indicates precision. The Levey-Jennings Chart is basically a mathematical model that helps you say if the accuracy and precision of the measurements are good enough to call your result as reproducible, thereby aiding quality control.

Constructing an LJ (Levey-Jennings) Chart

There is a lot of history associated with quality control charts (Read more about it here). What I want to emphasize is that Stanley Levey and Εlmer Jennings adapted this method from Shewhart charts.

For constructing an LJ chart you need the following.
  1. Quality Control Material 
  2. An estimate of the True Value
  3. An estimate of the variations that are inherent to the testing.
A quality control material is a material that is in all aspects similar to what is otherwise tested (In this case patient material), however, the value of analyte is already known. The quality control materials are preferentially a commercially available material. For example, ready-made quality control materials are available with known HIV RNA copies in it. In a few cases, such materials are not available. In these cases, an in house material can be prepared by pooling a few samples. If using a commercial material the values provided by the manufacturer are taken as conventional true value (As I already stated above a real true value is always unknown). For the in house materials, it is assumed that the average (Mean value) of multiple different runs of the same sample is the conventional true value. This is based on the assumption that all the errors are random in nature and hence the average value will be extremely close to the true value.

To generate an estimate of allowed variations the IQC material is tested multiple times. It is recommended that the test is run 20 times, but has to be done at least 10 times to generate a reasonably accurate standard deviation (SD). For example, let us say I have a QC material which is known to have 300 copies/ml of HIV. I will run the same QC 20 different times and all the values that I have obtained will be used to compute a mean and standard deviation. I have created an example here for you to follow.

Run Number
Copies/ml
Run Number
Copies/ml
Run Number
Copies/ml
Run Number
Copies/ml
1
301
6
298
11
302
16
297
2
302
7
299
12
301
17
298
3
300
8
301
13
299
18
301
4
298
9
300
14
298
19
299
5
299
10
302
15
303
20
302

Table 1: Example data for creating baseline data. Mean= 300; SD= 1.747

Now, all you need is some statistical calculations. In simplest terms, what you need is to calculate mean and SD. If you don't know how to calculate mean and SD, please see the link. You must note that the logic of using mean here is that the generated data is based on a normal distribution. If you are unsure, check if the data is normally distributed or not by performing a Shapiro-Wilk Test. In Table 1, the data is normally distributed. You can perform this check directly online by using this Link.

A common problem that is encountered in building the baseline data is an outlier. In Table 1, assume that Run 19 produced a result of 315 copies. Just one variable changes your entire data (Mean= 300.8; SD= 3.76). Such a variable is called an outlier. So how do you actually decide if its an outlier? There are a lot of ways you can detect an outlier in the data (See this link). Personally, I use a statistical analysis method called a Grubb's test to identify an outlier. You can perform this test using an online tool. If there is a significant outlier (p >0.05), the baseline data can be constructed by omitting that outlier. Another method is to just check if there is any variable that itself is more than the 3SD mark and omit that. Since, by default a value outside ± 3SD indicates out of quality, the value in itself doesn't qualify to be included for baseline data construction. As I will show later smaller is the SD, tighter is your LJ analysis. So just by removing one outlier, your baseline data quality, in this case, will be good. For plotting the baseline graph, just take a graph-paper and mark the middle line as Mean. Then mark the upper line as Mean +1SD and the lower line as Mean-1SD.  Next mark ± 2SD and ± 3SD. For your comparison, I have shown a comparison of the LJ baseline data for Table 1 and how it changes when the 19th variable is changed to 315 copies.
Figure 2: LJ Chart baseline. middle line indicates Mean. The red lines indicates ± 2SD. Note the difference in the 2 charts.

Now all you need to do is run your QC sample whenever I am running a test along with the the unknowns. For example, in the example scenario, along with the patient samples (unknown) I will also run a QC just like a sample. Whatever value I get, I will mark it on the graph. So here's the rules of how to interpret the QC values.
  1. If the QC value is in between ± 2SD the QC is considered as "Passed". You can be sure that the quality of testing is good.
  2. If the QC value is more than ± 2SD but less than ± 3SD the QC is considered as "Passed with warning". The testing is still good enough to be reported. However, there is some problem that has begun to creep in. This is usually because the pipette or the equipment requires calibration and the reagents needs to be checked.
  3. If the QC value is more than the ± 3SD the QC is considered as "Failed". This indicates that the test results are not acceptable and some component of testing has failed to function to the mark. The results obtained needs to be disqualified and the cause of problem has to be identified through a root cause analysis. Once the problem is rectified, the test has to be repeated along with QC.
Let's now assume that I check for QC everyday. Connecting all the dots on the graph provides me with my LJ chart for the month. Table 2 shows 15 QC runs. Try building the LJ graph on your own. The graph should look like Figure 2.


Run Number
Copies/ml
Run Number
Copies/ml
Run Number
Copies/ml
1
299.00
6
306.00
11
302.00
2
302.00
7
302.00
12
297.00
3
303.00
8
301.00
13
301.00
4
304.00
9
299.00
14
302.00
5
305.00
10
298.00
15
297.00

Table 2: Hypothetical QC runs.
Figure 3: LJ chart 

Now consider this same data in context with what would have happened if your baseline did include the outlier. The QC value of 306 copies/ml would be then considered as QC passed (Not even a warning; In this case the value would have been between ± 2SD; See Figure 2). Remember, lower is your SD for baseline data, the better is your LJ chart and the control.


That brings into question, when do you consider that your LJ chart is good? There is no perfect answer for this. In general SD is a measure of how much deviation is actually there in your data. In general, we say the SD should not be more than 10% of your mean. Again consider the above scenario. If the SD was 10% (=30), then your ± 3SD will be in the range of 210 to 390 copies/ml. A variation in the run amounting to more than this, is not reliable. In this case, the LJ chart will be tolerant to a very high variation. Imagine, the true value is 300 copies, but analysis shows its 215 copies and still, the run is considered as a pass. That had be far from accurate (about 71.6% of true value). To simplify this we calculate Coefficient of variation (CV). A CV of < 0.1 or more indicates that the SD is within 10%. For table 1, CV= 0.0058.


Errors identifiable in LJ Chart

Figure 4: Random and Systematic error. Source
There are two types of errors that can be identified when looking into the LJ chart- Random and Systematic. A random error is used to indicate that the deviation from mean as seen in the graph, is due to an uncontrollable influence. The LJ graph in this case will show readings on both the sides of mean randomly. This error cannot be eliminated entirely, though can be reduced with good calibration of equipment's used and good laboratory practices. In contrast, systematic error is usually due to a error in the procedure which skews the results into one side. In this case, the error will follow a trend. Figure 4 and Table 3 shows a comparison of the error types. 

Feature
Random Error
Systematic Error
Common Causes
·         Environmental variation
·         Instrumental variation
·         Variation in reagents
Incorrect usage or protocol
Variation and directionality
Varies and nonpredictable.
Roughly constant and unidirectional
Reproducibility
No
Yes
Error can be eliminated with good practices
No
Yes

Table 3: Comparison of Random and Systematic error.

Regularly building a LJ graph for every variable that is quantitatively tested in the laboratory takes time and is sometimes biased by human intervention. So I have created an excel template which can be directly used for constructing an LJ graph. Just follow the steps shown in Figure 5 to generate your data. You can download the excel template from this Link.

Figure 5: Using the Excel to generate LJ charts.

Now that you are able to construct the LJ charts lets consider a few more arguments. The LJ chart is a useful tool to see how good is the variation. In other words you are measuring the precision, with reference to a conventional true value. Just like any statistics tool, LJ chart quality depends on the baseline data that has been used. if the measurement was not precise enough (High SD) the chart will become tolerant to high error rate. The closer the values are to the mean, more accurate is your measurement capabilities. We call it as "GiGo" (Garbage in- Grabage out). If the quality of input data is bad, the output will also be bad.

LJ chart can also be used to determine other important aspects. For example, in declaring outbreaks. Lets say for example you have seen DEN-2 serotype diagnosis at the rate of 15 cases / week (with SD of 2 cases) for the past 6 months at a particular geographical location. And now, you diagnose 22 cases (> +3SD), in the same context. You can confidently declare it as an outbreak scenario for that week in the locality. Any situation where you are looking into significant change the concept of ±3SD can be evoked. The only disadvantage being that LJ can be used only when quantification can be achieved in numbers.

References

1. International vocabulary of basic and general terms in metrology (VIM). 3rd Edition. Link

2. Petros Karkalousos, Angelos Evangelopoulos. The History of Statistical Quality Control in Clinical Chemistry and Haematology (1950 – 2010). International Journal of Biomedical Laboratory Science (IJBLS). 2015. Link

3. Coskun A. Modified Levey-Jennings charts for calculated laboratory tests. Clin Chem Lab Med. 2006;44(4):387-90. Link

Comments

Popular Posts