Tuesday, July 26, 2016

BtB#8- Resistance, Persistence and Tolerance


It just so happened that I was talking about antibiotic resistance and the topic popped up about antibiotic resistance in terms of tolerance and persistence. Most commonly when an organism grows in the presence of a defined cutoff value of antibiotic concentration we just call it "Resistant". That's what we are measuring when performing an antibiotic susceptibility testing. But everything that doesn't kill is not resistance. Confused? As early as 1944, it was observed that bacteria were able to survive extensive antibiotic treatments without acquiring resistance mutations. To contrast with true resistance, these were termed as "Tolerant".

Antibiotic resistance is a topic I have talked about too many times in the blog and hence it is not worthy to spend time talking about it again (People interested please refer my earlier posts). These concepts are not well illustrated and sometimes terms like- "Persisters, Tolerant and Resistance" are used interchangeably.

Fig 1: Methods of phenotypic resistance. Source
Let us clarify what each term means. Resistance refers to an inherent ability of microorganisms to grow at high concentrations of an antibiotic irrespective of the duration of treatment. This is quantified in the laboratory by measuring MIC. Tolerance is defined as the ability to survive transient exposure to high concentrations of an antibiotic without a change in the MIC. Persisters are not mutants but are rather phenotypic variants randomly produced in a population. They are simply non-growing dormant cells and overcome the antibiotic attack simply by being totally inactive (which essentially makes them un-targetable). Most of the literature agree that persisters are simply an extreme case of tolerance. Another term that is often used is drug indifference. The usage of this term is very inconsistent in literature. Drug indifference occurs when the antibiotic is effective only in a specific bacterial physiological condition. Resistance requires changes in genetic code. Tolerance and persistence don't.

If persistence is an extension of tolerance, then why different terminologies? The answer is, tolerance and resistance are a phenomenon of the whole population. In contrast, Persistence is a phenomenon of a subpopulation. Persistence is observed when most of the population is rapidly killed and a subpopulation residue still continues to just be there, even though genetically they are clones. Tolerance is achieved by slowing down some essential step that is targeted by antibiotic. The best example is β- lactam resistance by slowing down the cell wall assembly. Persistence can be achieved through staying in Lag phase for a very long time (Type I) or by halting the growth (Type II).

Fig 2: Characteristic drug responses of resistance, tolerance
and persistence. Source
These properties can be differentiated by an assay called MDK (Minimum duration for killing). MDK is a measure of tolerance based on the idea that a tolerant strain requires long exposure time to be effectively killed than a susceptible strain. The MDK is defined as the typical duration of antibiotic treatment that is required to kill.

As can be seen from figure 2, MIC cannot distinguish between a resistance and tolerance, but MDK can. MIC for a tolerant strain of bacteria is similar to that of a susceptible strain; However, MDK is significantly higher for the tolerant strain. A persistent strain and susceptible strain of bacteria have a similar MIC and MDK99. However, the MDK99.9 is substantially higher for a persistent strain than the MDK99.99 of a susceptible strain. MDK99 means MDK for 99% of cells and MDK99.99 means MDK for 99.99% of cells.

An especially interesting case is the persistence through a gene called as hip. In E coli, hyperactivation of HipA through mutation leads to 100 to 1,000-fold increases in persistence. Basically, it generates two subpopulations with very different lag time distributions to one another. This can be identified using culture kinetic assay which will show a bimodal killing curve. There are many similar cases reported in the literature. A bacterial division is imperfect and except for the DNA, nothing is absolutely distributed between two daughter cells. This leads to different types of daughter cells sometimes with varying susceptibility to antibiotics. The best example of this case scenario is MTB. MTB divides asymmetrically and one of the two daughter cells will usually be longer and faster growing than its twin. In a recent study researchers from Tuft's university showed that these longer bacteria were least affected by rifampicin. 

Most of us have an impression that antibiotic sensitivity has to do with genotype. This is not true always. Antibiotic sensitivity has also contextual dependence such as time of exposure and the state of bacteria. Most bacteria have a far from "Ideal bacterial growth curve". Most bacteria have a stringent control on what machinery is under operation and this can be regulated to advantage under antibiotic stress. One of the most effective mode is the stationary phase, which is especially resistant. 

To conclude, resistance is not the only mechanism to resist antibiotics. You can tolerate it or persist over it.

   Brauner A, Fridman O, Gefen O, Balaban N. Distinguishing between resistance, tolerance and persistence to antibiotic treatment. Nature Reviews Microbiology. 2016;14(5):320-330. doi:10.1038/nrmicro.2016.34

Corona FMartinez J. Phenotypic Resistance to Antibiotics. Antibiotics. 2013;2(2):237-255. doi:10.3390/antibiotics2020237

Kirill Richardson, Owen T. Bennion, Shumin Tan, Anh N. Hoang, Murat Cokol, Bree B. Aldridge. Temporal and intrinsic factors of rifampicin tolerance in mycobacteria. PNAS, 2016; 201600372 DOI: 10.1073/pnas.1600372113

Monday, July 18, 2016

Lab series# 14- MS based proteomics


Now that I'm back from the break, let us go ahead with an unfinished discussion. I had earlier talked about some basic instrumentation principles of Mass spectrometry (MS) and left with a note that I will talk about using mass spectrometry for proteomics. In essence, proteomics refers to a large-scale study of proteins. The original definition  includes studying detection of protein, analysis of structure and studying their functions. MS-based proteomics can be used for detecting and quantifying the protein.

Fig 1: Edman's sequencing method for peptides. Source
It's worth mentioning that before use of the MS for protein sequencing, Edman's protein sequencing was the method of choice. It was a slow process and required that there is a free amino acid terminal. The method uses cyclic degradation of peptides based on the reaction of phenylisothiocyanate with the free amino group of the N-terminal residue such that amino acids are removed one at a time and identified as their phenylthiohydantoin derivatives via HPLC. The reaction fails if there is a modified amino terminus. It is notoriously slow and interpretation requires great expertise. Edman's method is still important since it offers some specific advantages. For example, amino acids with identical molecular weights can be identified. Isoleucine and Leucine both have a mass of 113.08 Da. Glutamine (128.05 Da) and Lysine (128.09 Da) have nearly similar mass but different HPLC retention time. However, most of the modern MS can differentiate Glutamine and Lysine. Reference mass of all amino acid can be found here.

Shotgun sequencing is a term derived from genomic science. In this method, the parental protein is fragmented via an enzyme and then the peptides generated are sequenced. The peptides are then realigned using software search algorithms. For an analogy, think you have several copies of a hundred page book. shredding multiple copies of the book and mixing up all the fragments and then reassembling the original text by finding fragments of text that overlap and piecing the book back together again. Chances are you will end up with most of the pages in the right sequence except for some errors here and there. This method requires computational power. Shotgun sequencing has its own set of problems. For example, a shared sequence can match to any protein. There are smart algorithms to avoid this issue which we will discuss later. The advantage of Shotgun method is that it is rapid and high throughput method allowing a maximum coverage in a small time.

So the first question. Why not directly sequence the whole protein? Why make peptides, sequence them and then realign back to the original sequence which cab bring in errors? The answer is there is more than one reason to do so.

The best way to get an information about the protein is finding its sequence. MS is most efficient in obtaining sequence information from peptides that are less than 20 residues long, which is far less than most parental sequences. Creating peptides also mean that properties such as solubility become irrelevant. As long as a protein can give rise to some peptides which can be sequenced, we still can find the protein. Though this affects the coverage, we still can have results which otherwise can never be had. That means the first step in MS-Proteomics is to create a peptide library. 

Once the peptide is injected through an LC-MS/MS platform, the first thing that happens is peptides are read for their m/z values in the first MS. Subsequently, the peptides are broken by CID (Collision-induced dissociation). In this method, the peptide ions are accelerated to high kinetic energy and then allowed to collide with neutral gas such as Helium, Nitrogen or Argon). Some of the energy is converted into internal energy which results in bond breakage and the fragmentation of the molecular ion into smaller fragments. The CID gives random cleavage and different types of ions can be produced from the same peptide.

Fig 2: Peptide Fragmentation Nomenclature
The ions can be named based on the bond cleaved and ion produced. Roepstorff P and Fohlman J proposed a nomenclature for sequence ions in mass spectra of peptides, now known as Roepstorff–Fohlmann–Biemann nomenclature. There are 3 possible bonds that can fragment along the amino acid backbone under the influence of CID: (i) NH-CH (ii) CH-CO and (iii) CO-NH.

On successful breakage, two fragments of molecules will be generated. Hence there are six possible combinations as shown in the diagram. The a, b, and c ions having the charge retained on the N-terminal fragment, and the x, y and z ions having the charge retained on the C-terminal fragment. The most common cleavage sites are at the CO-NH bonds which give rise to the b and y ions. In the light of this understanding, consider following MS spectra for a peptide sequence.
Fig 3: CID MS/MS, many copies of the same peptide are fragmented at the peptide backbone to form b and y ions. The spectrum consists of peaks at the m/z (mass to charge) values of the corresponding fragment ions. Source
By knowing the mass difference between b8 and parent ion we can calculate the mass of one ion (which is also equal to y1). In this case, the mass correlates with K. So the first amino acid from one side is K. By calculating the difference between b7 and b8 (which is also equal to y2) we can calculate the mass of 2nd ion V. That means sequence is VK. This can go on until the whole sequence is identified. This whole process can also be done other way around using y ions difference to yield the same results.

So the second question. How does MS know if the ion generated is b or y? For an MS its just ions and in reality b or y can be produced in random. The answer lies in the difference in cleavage products depending on where it is cleaved.

Fig 4: Generalised structure of a polypeptide. Source
As I already said, b ion represents the N-terminal cleavage and y ion represents C- terminal cleavage. (See Fig 4 for general peptide terminal nomenclature). As you can make out from the structure, there will be a difference in mass of the ion generated depending on if it has the NH3 group or it has COO group. The same estimation also helps in knowing the directionality of sequence.

For example, Let's compare two peptides as examples with different sequences.

ANELLLNVK      .........   ANELLLNV K
KANELLLNV      .........  K ANELLLNV

In both the cases, the K is cleaved which could have had come from any side thus totally altering the sequence. However, this is easily identified with its correct sequence by knowing the mass of K. If K has come from the 1st case it would not have the N-terminal component and in the second case, any C-terminal. In each case the mass is different. To calculate the mass of a specific b-type ion the add the mass of the N-terminal proton. For y-type ions the mass of the C-terminal -OH group is added, plus two additional protons (one for the N-terminus and one to provide the charge). There is a very detailed explanation of calculation given in this link.

In an ideal condition, all the y and b ions are produced for every peptide. But in reality, this doesn't happen. Only some of the ions show up in MS data. The challenge is now to deduce the sequence. Let us again consider the same sequence from Fig 3. Consider that b6 and y3 ions are not detected. Now how will you get the sequence? In the case point, the mass difference between b5 and b7 doesn't correspond to any amino acid with or without modification. But if you insert 2 amino acids in the combination LN the mathematics fits perfectly, in which case you can argue that LN is the right combination- giving the sequence ANELLLNVK. Note that more the number of missing ions there is more prediction involved and hence errors are more easily possible.

Fig 5: Methods to identify the peptide.
It is simply a pain to search the peptide sequence against everything possible to come up with a protein identification. The computational possibility in such case is infinite. It is always easier if you have some narrowing down of possibilities. So let's say I'm doing a proteomic study on E coli. I can search the results against E coli protein database and identify the protein. 

The exact algorithm for finding the protein differs based on the method followed. For example, in MASCOT search program, probability-based matching is used. The program first identifies the possible cleavage sites for peptide generation. This depends on the enzyme used during peptide preparation. For example, trypsin, a serine protease has very specific cleavage properties. trypsin cleaves peptides on the C-terminal side of lysine and arginine amino acid residues. If a proline residue is on the carboxyl side of the cleavage site, the cleavage will not occur. If an acidic residue is on either side of the cleavage site, the rate of hydrolysis will be slow. Based on these rules for a given protein, peptides can be hypothesised. Certain amino acids more easily break when CID is applied (such as proline). For a given peptide sequence the ideal spectra can be hypothetically computed. This hypothetically created peptide is a computational spectra. Next step is to match the computational and experimental spectra. The better they match each other more is the confidence in reporting the protein identification.

Fig 6: Venn diagrams comparing A) peptide identifications
and B) protein identifications. Source
Other methods of identification include Peptide Sequence Tags, based on the fact that fragmentation spectra usually contain at least a small series of an easily interpretable sequence. In another method called autocorrelation, mathematically determines the overlap between a theoretical spectrum that has been derived from every sequence in the database and the experimental spectrum. Different search strategy can give slightly different results and usually it is advised to search using more than one algorithm. In a study published by Joao A. Paulo; 2013, (See Fig 6), it is clear that there is a significant difference in identification based on search strategy. Over the years the search engines have improved though there is still a significant difference in identification. Probably, MASCOT and SEQUEST are most commonly used tools.

Fig 7: Target decoy method for estimating FDR.
One inherent problem in doing a shotgun sequencing is the error creeping in due to possible mistakes in the alignment of the sequence. This value needs to be kept at a minimum possible value. Universally a 1% error is fixed in academic standards. It is known as the FDR (False Discovery rate). FDR is analogous to Type I error. There are several reasons why there is mismatched identification such as a low-quality spectrum. In practice, it is impossible to tell which PSM (peptide spectrum match) is false. If there was a definitive method we could have designed an algorithm to remove false discoveries. Target-decoy method is commonly used to estimate the FDR. In this method, the software is used to search the target database and a decoy database. Hits at the decoy are considered as false ID. The decoy database is usually reversed sequences of database entry but needn't be always the case.

FDR = Number of Decoy Hits / Number of target hits

Fig 8: Setting up FDR cut off.
Keeping the FDR at a very ambitious level (Let's say 0.1%) will bring down the number of identification to a very low number and keeping it high (Let's say 2%) will identify too many on the wrong side. It must be understood that when we say 1% FDR there is a good possibility that out of 5000 proteins identified there is a good chance that 500 proteins are wrong identification. As can be seen from Fig 8, shifting cut off to the left will increase the number of peptides to be retained which means a number of False positive increases. Shifting it the other way decreases the identification. This also explains why proteomics cannot identify every peptide present (See Fig 8 again). A lot of protein identification is discarded because of lack of confidence in reporting the peptide.

Let's come to 3rd question. Can we try to reduce the FDR by the search strategy? Each search strategy has an inherent error which is fixed to 1%. Since the strategies are different it can be argued that the erroneous identification is different protein on a different strategy. So one's which are identified in both cases automatically overcomes the error, thereby reducing FDR. But since 2 methods have different errors, the error actually amplifies when it comes to peptides exclusively identified.

Now that you get some basic idea of how MS determines protein sequence, we can take the next step of talking about different types of proteomics experiments possible. Depending on the strategy and intent of the experiment, there are many different types of proteomics experiment possible.

I will give you some scenarios on how advanced MS is useful for studying microbiology.

Huge numbers of genomes are known, thanks to the sequencing capabilities. We have identified new microbial pathogens and want to study their molecular biology and how they interact with human cells, develop markers for the infection etc. For all these purposes, we need to know what kind of proteins the organism has. I was amazed to hear recently that in a big majority of cases (Even for well-known pathogens) it is really not known if the microbe produces that particular protein. In other words, there is no experimental evidence. For example, a proteogenomic study of MTB identified 3176 proteins with approx 250 novel peptides. Even for a heavily studied MTB, it is surprising to learn that we never knew about those 250 peptide sequences. It simply opens up new avenues to study what those proteins do. Proteogenomics is useful to annotate the genome for protein coding regions and provides experimental evidence for the existence of proteins. It helps create the database of protein sequences for that particular organism. Usually, a lot of proteins are similar to what has been already studied. Occasionally the new proteins discovered by this method is something of interest such as a potential marker, previously unidentified virulence marker etc.

Once a protein map is developed, we can further work on protein expression profile (Using quantitative proteomics), under different conditions or study signalling mechanisms (Example Phospho-proteomics). For example, Quantitative proteomics has been used to identify several different host factors that interact with pathogen thus increasing our understanding of the process. In these cases, a single experiment yields data that would have been otherwise collected over a huge number of experiments.

The topic of proteogenomics targeted proteomics or quantitative proteomics is in itself a huge topic to talk about and maybe I will talk about it in a future blog post so that I don't overburden this post. The idea of this post was to couple with an earlier post on MS principle to give you an idea of how MS can give you a protein identification data.

  Reinders J, Lewandrowski U, Moebius J, Wagner Y, Sickmann A. Challenges in mass spectrometry based proteomics. Proteomics. 2004;4(12):3686-3703.

Steen HMann M. The abc's (and xyz's) of peptide sequencing. Nature Reviews Molecular Cell Biology. 2004;5(9):699-711. 

Microbial Proteomics. Proteomics. 2011;11(15):2941-2942.

Thursday, July 07, 2016

Break Announcement-3


Fig 1: Overview of page hits
Thanks to the readers and interest in the blog content here the site has crossed 2,00,000 page view to date. As has been the custom since past year, am taking a 2-week break from blogging. I encourage you to go back and read my earlier posts that have been missed.

If you have already read every post on this site and want more, try my other recently launched site discussing science from a wide variety of fields- Science Open Source. Of course, that is still under development. But I think you would still be interested.

Will see you back in a couple of weeks.

Varun C N