Comprehensive Genomic Profiling Including CLL Susceptibility Loci by Nanopore Long Read Sequencing
Clinical behavior of CLL is associated with somatic structural variants and BCR heavy chain mutation status. Furthermore, germline CLL risk alleles have been identified by genome-wide association studies and explain increased incidence of CLL in families. While the individual odds ratios of risk alleles are low, the cumulative polygenic risk score (PRS) derived from 41 risk loci was significantly higher in CLL (8.24) compared to monoclonal B cell lymphocytosis with high-count (8.05) and low-count (7.84), and healthy controls (7.46, Kleinstern & Slager Leukemia 2022).
To measure 41 risk loci for risk score evaluation in 5 CLL patients and their healthy siblings we performed PCR-free long-read sequencing by Oxford Nanopore Technology (ONT). To increase the genome-wide shallow depth to 10-20 reads per target locus, adaptive sampling was performed. In addition to the 41 risk loci we included 22 recurrently mutated genes and 19 genomic regions with recurrent structural variations covering a total of 35 Mb, corresponding to approximately 1% of the human genome. Freshly isolated genomic DNA of purified CLL cells and mononuclear cells of healthy siblings was barcoded, pooled in groups of 5 samples and run on 2 chips. After base calling and demultiplexing, alignments were produced and used to extract nucleotide counts at the risk loci.
The global median read depth was 25 (range 6-80) and per locus, median read depth ranged from 18 to 32. SNP alleles with allele frequency ≤0.2 were removed. PRS calculation resulted in an average PRS of 8.23±0.66 for combined 10 samples. PRS of 8.05±0.67 in healthy donors alone was lower than PRS 8.41±0.68 in CLL alone but was not significant. This was to be expected, as the healthy donors were siblings of the CLL patients. As a pilot to detect somatic variants we ran variant callers Clair3 and DeepVariant on the data. Candidate variants in targeted genes with depth of >10 reads were taken from the intersection of both callers. Variants with allele frequency of >0.001 in gnomAD were considered germline SNPs and removed. Variant effect prediction was performed and only variants which were predicted deleterious (SIFT, all confidence levels) and damaging (PolyPhen, possibly or probably) were kept. In 2 CLL samples a previously described pathogenic C to T substitution was detected in TP53 at chr17:7673802 (GRCh38).
Our data show that long-read sequencing with adaptive sampling reliably calls risk alleles and supports detection of somatic variants. Together with ongoing analyses of structural variants, copy number variation and the rearranged heavy chain locus, this single platform may enable comprehensive recovery of inherited and somatic genetic determinants of CLL.
