Go back


Estimating diversity in TB populations

In order to be able to use genomic information to help inform public health action for TB control, it is important that we know how diverse TB genomes are.

The TB genome is circular and is made up from approximately 4.4 million base pairs of DNA. Current methods for genome sequencing can accurately read 92% of these (around 4 million).

Researchers have collected TB genome data from four different groups of TB isolates and compared the diversity of genomes in each group -

  • Group 1: TB isolated from different body sites within the same individual in the same month e.g. lymph node and sputum – to measure diversity within an individual at a given time.
  • Group 2: TB isolated from the same individuals over a period of time – to measure the extent to which TB genomes change (mutate) over time.
  • Group 3: TB isolated from within families (households) - to see how similar TB genomes are among people known to be part of the same chain of transmission.
  • Group 4: TB isolated from previously described community clusters – to see how similar TB genomes are among people who are linked epidemiologically.

To compare the diversity in each of these groups, TB genome sequence data was compared at the level of Single Nucleotide Polymorphisms (SNPs). A SNP is a position in the genome sequence that differs between genomes – if two strains are highly related there will be fewer SNPs between them than between strains that are not related to each other.

When using SNPs it is necessary to sequence sets of genomes which will be used as a reference for the level of genomic diversity expected in different contexts. These observations are then related to the SNP numbers in the investigational genomes. The four groups described above can be used as the reference for four types of situation.

» Continue