This comprehensive study provides a detailed genome-wide analysis of the NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) gene family in the medicinal plant Salvia miltiorrhiza (Danshen).
This comprehensive study provides a detailed genome-wide analysis of the NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) gene family in the medicinal plant Salvia miltiorrhiza (Danshen). Utilizing the latest genomic resources and bioinformatic methodologies, we systematically identified, characterized, and classified NBS-LRR genes, exploring their chromosomal distribution, gene structures, conserved motifs, and evolutionary relationships. The research further investigates the expression patterns of these resistance genes under biotic stress and their potential link to the biosynthesis of valuable secondary metabolites like tanshinones and salvianolic acids. We present robust protocols for gene family analysis, address common troubleshooting scenarios, and offer comparative insights with model plants. This work establishes a crucial foundation for understanding disease resistance mechanisms in S. miltiorrhiza and offers strategic targets for molecular breeding to enhance both plant resilience and medicinal yield, with significant implications for pharmaceutical research and sustainable drug development.
Salvia miltiorrhiza Bunge (Danshen) is a perennial herb of the Lamiaceae family, renowned as a cornerstone of Traditional Chinese Medicine (TCM) for treating cardiovascular and cerebrovascular diseases. Its significance extends beyond traditional use, establishing it as a model medicinal plant for modern pharmacological and genomic research. This status is largely due to its biosynthesis of two major classes of bioactive compounds: the lipophilic diterpenoid tanshinones (e.g., tanshinone IIA, cryptotanshinone) and the hydrophilic phenolic acids (e.g., salvianolic acid B). These compounds exhibit well-documented antioxidant, anti-inflammatory, anti-fibrotic, and anti-tumor activities.
Within the context of genome-wide studies, particularly on the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family, S. miltiorrhiza serves as a critical system. The NBS-LRR genes are the largest class of plant disease resistance (R) genes. Their identification and characterization in S. miltiorrhiza are essential for understanding the plant's innate immune system, which directly impacts yield, quality, and sustainable cultivation by conferring resistance to pathogens like root rot (caused by Fusarium spp.). Cultivation challenges, including pathogen susceptibility, soil quality demands, and genotype-dependent metabolite variation, underscore the necessity of such genetic research for breeding resilient, high-quality cultivars.
A standard bioinformatics pipeline for genome-wide identification and analysis of the NBS-LRR gene family involves several key steps.
Experimental Protocol: Genome-Wide Identification of NBS-LRR Genes
Data Acquisition:
HMMER Search:
hmmsearch --domtblout output_file.hmm PF00931.hmm S_miltiorrhiza.proteome.faBLASTP Validation:
Domain Architecture Analysis:
Gene Structure and Motif Analysis:
Phylogenetic Analysis:
Chromosomal Localization & Synteny:
Cis-Acting Element Analysis:
Title: NBS-LRR Gene Identification and Analysis Workflow in S. miltiorrhiza
Cultivation issues directly impact biomass and secondary metabolite accumulation, affecting drug source quality.
Table 1: Major Cultivation Challenges and Their Impact
| Challenge Category | Specific Issue | Impact on Plant & Metabolites | Quantitative Example/Data |
|---|---|---|---|
| Biotic Stress | Root Rot (Fusarium spp.) | Root biomass loss, reduced tanshinone content. | Yield loss up to 30-70% in severe infections. |
| Biotic Stress | Nematodes, Leaf Spot | Reduced photosynthetic capacity, stunted growth. | Variable; can reduce salvia yield by 20-40%. |
| Abiotic Stress | Drought Stress | Induces phenolic acid biosynthesis, but limits overall growth. | Salvianolic acid B may increase by 15-30% under moderate stress, but biomass decreases. |
| Abiotic Stress | Soil Nutrient Imbalance | Deficiency (e.g., K, P) reduces root yield and metabolite diversity. | Optimal N:P:K fertilizer ratio reported as 1:0.5:1.2 for balanced growth. |
| Genetic & Quality | High Genotype Variation | Significant differences in tanshinone IIA content between cultivars. | Content ranges from 0.1% to over 0.5% dry weight among different accessions. |
| Agricultural Practice | Continuous Cropping Obstacle | Soil sickness, pathogen buildup, autotoxicity. | Yield reduction of 20-50% in the second cropping year without rotation. |
Title: Link Between Cultivation Challenges and Genomic Solutions
Table 2: Essential Reagents and Materials for S. miltiorrhiza Research
| Item/Category | Specific Example/Product | Function in Research Context |
|---|---|---|
| Genomic DNA Extraction | CTAB-based Plant Genomic DNA Kits (e.g., from TIANGEN) | High-quality DNA extraction from polysaccharide/polyphenol-rich root tissue for PCR, sequencing. |
| RNA Isolation & cDNA Synthesis | RNAprep Pure Plant Plus Kit (Polysaccharides & Polyphenolics-rich) (TIANGEN); RevertAid First Strand cDNA Synthesis Kit (Thermo) | Isolation of intact total RNA for gene expression analysis (qRT-PCR) of NBS-LRR or biosynthetic pathway genes. |
| qPCR Reagents | SYBR Green PCR Master Mix (e.g., Applied Biosystems PowerUp SYBR) | Quantitative real-time PCR for expression profiling of target genes under stress treatments. |
| Cloning & Expression Vectors | pEASY-Blunt Cloning Vector (TransGen); pCAMBIA1300 series (for plant transformation); pET-28a(+) (for prokaryotic expression) | Cloning candidate NBS-LRR genes for functional validation via heterologous expression or plant transformation. |
| Plant Tissue Culture Media | MS (Murashige and Skoog) Basal Salt Mixture; specific phytohormones (e.g., 6-BA, NAA) | For micropropagation, hairy root induction (via Agrobacterium rhizogenes), and genetic transformation. |
| Metabolite Analysis Standards | Certified Reference Standards: Tanshinone IIA, Cryptotanshinone, Salvianolic Acid B (e.g., from Sigma-Aldrich, Must Bio) | Quantification of bioactive compounds via HPLC or LC-MS for phenotype correlation. |
| Antibodies for Protein Work | Custom-made polyclonal antibodies against conserved NBS domain peptides; Anti-His Tag antibodies | Detection and localization of expressed NBS-LRR proteins via Western blot or immunofluorescence. |
| Bioinformatics Software/Tools | HMMER 3.3.2, MEGA 11, TBtools, MEME Suite, MCScanX | For the entire pipeline of genome-wide identification, phylogeny, and structural analysis. |
Nucleotide-binding site leucine-rich repeat (NBS-LRR) genes constitute one of the largest and most crucial disease resistance (R) gene families in plants. They serve as intracellular immune receptors that directly or indirectly recognize pathogen effector molecules, triggering a robust defense response known as effector-triggered immunity (ETI). This in-depth technical guide explores the structure, function, and signaling mechanisms of NBS-LRR genes, framed specifically within the context of genome-wide identification and functional characterization research in the medicinal plant Salvia miltiorrhiza (Danshen). Understanding this gene family is pivotal for developing disease-resistant crops and for elucidating the molecular basis of stress response in non-model, high-value medicinal species.
Recent research has focused on the genome-wide identification of the NBS-LRR family in S. miltiorrhiza to understand its innate immune capacity and stress adaptation.
Protocol: In silico Genome-Wide Identification Pipeline
hmmsearch --domtblout output.txt profile.hmm proteome.fasta.Table 1: Genome-Wide Identification Statistics of NBS-LRR Genes in Salvia miltiorrhiza
| Category | Count | Percentage of Total Predicted Genes | Notes |
|---|---|---|---|
| Total NBS-LRR Genes | 121 | ~0.38% | From the latest genome assembly (v2.0) |
| TNL Subfamily | 54 | 44.6% | Contains TIR domain at N-terminus |
| CNL Subfamily | 67 | 55.4% | Contains Coiled-coil domain at N-terminus |
| RNL Subfamily | 0 | 0% | Not identified in current assembly |
| Genes with Full Domains | 89 | 73.6% | Intact NB-ARC and LRR regions |
| Pseudogenes | 32 | 26.4% | Truncated or fragmented sequences |
| Chromosomal Distribution | Across all 8 chromosomes | - | Clusters observed on Chr 4 and Chr 7 |
NBS-LRR proteins are modular. The N-terminal domain (TIR or CC) mediates downstream signaling and protein-protein interactions. The central NB-ARC domain is a molecular switch regulated by nucleotide (ADP/ATP) binding and hydrolysis. The C-terminal LRR domain is involved in auto-inhibition and specific ligand recognition.
Title: NBS-LRR Protein Activation from Inactive State to Resistosome
Upon activation, TNLs and CNLs generally converge on common downstream signaling hubs but initiate distinct early pathways.
Title: Downstream Signaling Pathways of TNL and CNL Receptor Activation
Protocol: Quantitative Real-Time PCR for SmNBS-LRR Genes
Protocol: Transient Expression in Nicotiana benthamiana
Protocol: TRV-Based VIGS in S. miltiorrhiza
Table 2: Essential Reagents and Materials for NBS-LRR Research
| Item | Function/Application | Example Product/Catalog |
|---|---|---|
| HMMER Software Suite | For domain-based identification of NBS-LRR genes in genome sequences. | http://hmmer.org/ |
| Plant RNA Extraction Kit | High-quality RNA isolation for expression studies. | TRIzol Reagent (Invitrogen) or Plant RNeasy Kit (Qiagen) |
| SYBR Green qPCR Master Mix | For quantitative gene expression analysis. | PowerUp SYBR Green Master Mix (Thermo) or TB Green Premix (TaKaRa) |
| Gateway or Golden Gate Cloning System | Modular cloning for vector construction for localization or transformation. | pGWBs-GFP series (for localization) |
| Agrobacterium tumefaciens GV3101 | Strain for transient expression in N. benthamiana and stable transformation. | Competent cells available from multiple vendors |
| TRV VIGS Vectors (pTRV1, pTRV2) | For virus-induced gene silencing functional studies. | Available from Arabidopsis Stock Centers (e.g., ABRC) |
| Salicylic Acid (SA) & Methyl Jasmonate (MeJA) | Phytohormones used to elicit defense response pathways for expression profiling. | Sigma-Aldrich (S7401, 392707) |
| Confocal Microscope | High-resolution imaging of subcellular protein localization. | Zeiss LSM 900, Nikon A1R |
| Ion Leakage Conductivity Meter | Quantitative measurement of hypersensitive response (HR) cell death. | Benchtop conductivity meter (e.g., Orion Star A322) |
The identification of 121 NBS-LRR genes in S. miltiorrhiza provides a genetic foundation for understanding its defense mechanisms. The absence of RNLs aligns with patterns in some asterid families. Future research must pivot from cataloging to functional characterization:
This integrated approach will not only advance plant immunity research but also offer strategies for sustainable cultivation and metabolic engineering of this economically vital medicinal plant.
Within the context of a broader thesis on the genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), access to comprehensive and current genomic resources is paramount. This technical guide provides an in-depth overview of the publicly available genomes, transcriptomes, and databases essential for conducting such research, which is critical for researchers, scientists, and drug development professionals aiming to understand the genetic basis of disease resistance and secondary metabolite biosynthesis.
Multiple genome assemblies for S. miltiorrhiza provide the foundational scaffold for gene family identification and evolutionary studies.
Table 1: Available Genome Assemblies for Salvia miltiorrhiza
| Assembly Name / Accession | Release Year | Sequencing Technology | Estimated Size (Gb) | Contig N50 (kb) | Scaffold N50 (Mb) | Number of Predicted Genes | Primary Database/Platform |
|---|---|---|---|---|---|---|---|
| CRA000217 (Bunge) | 2010 | Sanger, BAC-by-BAC | ~0.641 | 38.5 | 1.01 | 30,688 | NGDC (China) |
| ASM165373v1 (v1.0) | 2015 | Illumina HiSeq 2000 | 0.538 | 26.4 | 0.56 | 34,598 | Ensembl Plants |
| ASM1812588v1 (v2.0) | 2022 | PacBio, Hi-C | 0.621 | 3,054 | 40.5 | 34,483 | NCBI, BIG Data Center |
| Danshen v3.0 | 2023/2024 | PacBio, Hi-C | ~0.62 | >4,000 | Chromosome-level | ~34,500 | Unpublished/Cited in recent studies |
Note: The Danshen v3.0 assembly represents the most recent, near-complete, chromosome-scale genome, crucial for accurate gene localization and NBS-LRR family analysis.
Transcriptomes provide evidence for gene expression, alternative splicing, and are vital for gene annotation.
Table 2: Key Transcriptomic Datasets for S. miltiorrhiza
| Tissue/Condition | SRA Accession Examples | Platform | Key Application in NBS-LRR Research |
|---|---|---|---|
| Root (Periderm, Phloem, Xylem) | SRR21713602-SRR21713607 | Illumina | Tissue-specific expression profiling of resistance genes. |
| Hairy Roots (MeJA/Elicitor treated) | SRR10149931, SRR10336445 | Illumina HiSeq | Identifying defense-responsive NBS-LRR genes. |
| Leaves, Stems, Flowers | SRR10951294-SRR10951297 | Illumina | Understanding systemic defense signaling. |
| Infected/Stress-treated samples | SRR13220031, SRR13220032 | Illumina | Direct identification of pathogen-responsive R genes. |
Integrated databases provide analytical tools and curated information beyond raw sequence data.
Table 3: Essential Databases for S. miltiorrhiza Genomics
| Database Name & URL | Core Features Relevant to NBS-LRR Identification |
|---|---|
| NCBI S. miltiorrhiza Genome Data (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_018125885.1/) | Primary repository for genome assembly v2.0; used for BLAST, genome browser viewing, and data download. |
| BIG Data Center (https://ngdc.cncb.ac.cn/search/?dbId=bioproject&q=PRJCA002312) | Hosts the chromosome-level CRA008113 genome; offers GVM browser for visualization. |
| S. miltiorrhiza Genome Database (SMGDB) (http://salvia.mpsd.org/) | Legacy database. Contains genome v1.0, BLAST, expression heatmaps, and pathway tools. Useful for historical comparisons. |
| Plant Genomics Database (PGD) (http://www.plantgdb.org/SmGDB/) | Legacy resource. Provides genome context views, EST clusters, and gene families. |
| TCM Gene Database (TCM-Gene) (http://tcm.nbscn.org/) | Integrates genomic data with traditional Chinese medicine information; useful for linking genes to traits. |
Protocol 1: Genome-Wide Identification Using HMMER and BLASTP
*.faa) from NCBI or BIG Data Center.hmmsearch). Command: hmmsearch --domtblout outfile.domtbl Pfam-A.hmm protein.fasta.Protocol 2: Transcriptomic Validation via RNA-seq Analysis
prefetch and fasterq-dump from the SRA Toolkit.NBS-LRR Identification Research Workflow
Simplified Defense Signaling Involving NBS-LRR Genes
Table 4: Essential Materials for NBS-LRR Gene Family Studies in S. miltiorrhiza
| Item / Reagent Category | Specific Example/Product | Function in Research |
|---|---|---|
| Reference Genome | S. miltiorrhiza assembly v2.0 (GCF_018125885.1) or v3.0 | Primary sequence scaffold for gene prediction, localization, and synteny analysis. |
| HMM Profile Database | Pfam (Pfam-A.hmm) | Contains hidden Markov models for NB-ARC and other domains for sensitive gene identification. |
| Sequence Alignment Tool | HMMER (v3.3.2) | Executes profile HMM searches against the proteome. |
| Local BLAST Suite | NCBI BLAST+ (v2.13.0) | Performs homology-based searches using known NBS-LRR queries. |
| Domain Analysis Tool | NCBI CD-Search Tool / SMART | Verifies domain architecture and order in candidate proteins. |
| Phylogenetic Software | MEGA (v11), IQ-TREE (v2.2.0) | Constructs phylogenetic trees to classify and analyze NBS-LRR gene evolution. |
| RNA-seq Analysis Pipeline | HISAT2 (v2.2.1), featureCounts (v2.0.3), DESeq2 (R Bioconductor) | Aligns reads, quantifies expression, and identifies differentially expressed genes. |
| Plant Growth Elicitor | Methyl Jasmonate (MeJA), Salicylic Acid (SA) | Used in experiments to treat plant materials and induce defense-related gene expression for validation. |
| PCR/QPCR Reagents | High-Fidelity DNA Polymerase (e.g., Phusion), SYBR Green qPCR Master Mix | For cloning gene sequences and validating RNA-seq expression patterns via qRT-PCR. |
The continuous advancement in S. miltiorrhiza genomic resources, particularly the latest chromosome-level assemblies and extensive transcriptomic datasets, has created a robust foundation for sophisticated genome-wide analyses. For researchers focused on the NBS-LRR gene family, leveraging these resources with the outlined experimental protocols and tools enables precise identification, evolutionary characterization, and functional inference of these critical disease resistance genes, directly contributing to the genetic improvement and sustainable cultivation of this valuable medicinal plant.
Within the broader thesis on the genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (danshen), the systematic in silico retrieval and validation of candidate sequences is a critical foundational step. This guide details a robust, reproducible pipeline employing profile hidden Markov models (HMMER) and sequence similarity searches (BLAST) to identify putative NBS-LRR resistance genes from genomic or transcriptomic data. The methodology is designed for researchers and scientists aiming to catalog and characterize this economically and pharmacologically important gene family in medicinal plants, with downstream applications in marker-assisted breeding and understanding plant defense mechanisms relevant to drug development.
Protocol: Genome/Transcriptome Assembly Retrieval
Sm_genome.fa) and the corresponding structural annotation file (GFF3 format, Sm_annotation.gff3).gffread (from Cufflinks package) or a custom script, extract all protein-coding sequences (CDS) and translate them into a protein sequence fasta file (Sm_proteome.fa).Protocol: HMMER Search with Pfam NBS-LRR Profiles
NB-ARC (Pfam: PF00931): Central nucleotide-binding domain.TIR (Pfam: PF01582): N-terminal domain specific to TIR-NBS-LRR (TNL) class.RPW8 (Pfam: PF05659): N-terminal domain specific to some CC-NBS-LRR (CNL) class.LRR_1 (Pfam: PF00560): Leucine-rich repeat C-terminal domain.hmmsearch from the HMMER suite to scan the S. miltiorrhiza proteome.Table 1: Example HMMER Search Results (Cutoff E-value = 1e-5)
| Pfam Domain | # Significant Hits in S. miltiorrhiza Proteome | Average Hit Score | Typical Domain Coverage |
|---|---|---|---|
| NB-ARC (PF00931) | 127 | 185.7 | >80% |
| TIR (PF01582) | 42 | 95.3 | 60-90% |
| RPW8 (PF05659) | 18 | 67.2 | 50-80% |
| LRR_1 (PF00560) | 89 | 45.8 | 30-70% (multiple repeats) |
Protocol: BLASTp against a Curated Plant R-Gene Database
Table 2: BLAST Validation Metrics for Top Candidate Classes
| Candidate Class | # Candidates | Avg. % Identity to Best Hit | Avg. Query Coverage | Typical Top Hit Species |
|---|---|---|---|---|
| TNL | 38 | 52.7% | 78% | Solanum lycopersicum |
| CNL | 71 | 48.2% | 82% | Arabidopsis thaliana |
| RNL (RPW8-NB-LRR) | 15 | 41.5% | 65% | Nicotiana benthamiana |
Protocol: Integrated Domain Analysis with MAST and Motif Scanning
MAST (from MEME suite) to search the candidate sequences with the HMMs to visualize the order and spacing of NB-ARC, TIR/RPW8/CC, and LRR domains.GLPLA) and kinase-3a (MHD) motifs within the NB-ARC domain using motif alignment or regular expression search. Variations in the MHD motif (e.g., MHE, MHV) are noted for functional prediction.Workflow for Systematic NBS-LRR Identification
Canonical NBS-LRR Domain Architecture
Table 3: Essential Computational Tools & Resources
| Tool/Resource | Primary Function | Key Parameter/Note |
|---|---|---|
| HMMER (v3.3+) | Profile HMM search for domain detection. | Use hmmsearch with curated Pfam HMMs; critical E-value cutoff (~1e-5). |
| BLAST+ (v2.12+) | Local sequence similarity search for validation. | BLASTp for proteins; use low E-value (1e-10) and assess coverage/identity. |
| Pfam Database | Repository of protein family HMMs. | Source NB-ARC (PF00931), TIR (PF01582), LRR_1 (PF00560), RPW8 (PF05659). |
| MEME/MAST Suite | Motif-based sequence analysis and domain ordering. | MAST aligns HMMs to sequences for architecture visualization. |
| Cufflinks/gffread | Manipulation of GFF annotations and sequence extraction. | Extract CDS from genome using annotation. |
| Custom Python/R Scripts | Pipeline automation, parsing HMMER/BLAST outputs, visualization. | Essential for batch processing and generating summary tables. |
| Curated R-Gene Database | Custom collection of reference NBS-LRR sequences. | Manually compiled from UniProt/NCBI of model plants; gold standard for BLAST. |
| S. miltiorrhiza Genome (v2.0) | Reference sequence for candidate retrieval. | Provides genomic context and enables primer design for downstream PCR validation. |
1. Introduction Within the context of a genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), accurate classification of members into subfamilies—Toll/Interleukin-1 receptor (TNL), Coiled-coil (CNL), and RPW8 (RNL)—is a critical step. This classification informs hypotheses regarding gene function, evolutionary trajectory, and potential roles in the plant's defense mechanism, which is of direct relevance to professionals studying medicinal plant immunity and secondary metabolite production.
2. Structural Domains and Classification Criteria NBS-LRR proteins are characterized by a central nucleotide-binding site (NBS) domain and C-terminal leucine-rich repeats (LRR). Subfamily distinction is primarily based on the N-terminal domain.
Table 1: Core Characteristics of NBS-LRR Subfamilies
| Feature | TNL | CNL | RNL (Helper) |
|---|---|---|---|
| N-terminal Domain | Toll/Interleukin-1 receptor (TIR) | Coiled-coil (CC) | RPW8-like CC |
| Signaling Pathway | EDS1-PAD4/ SAG101 → Helper RNLs | NRG1/ADR1 (Helper RNLs) | Acts as common signaling node |
| Typical Effector Recognition | Direct or indirect via TIR-NBS (TN) proteins | Direct or indirect via CC-NBS (CN) proteins | Non-recognition; signaling amplification |
| Key Motifs in NBS Domain | RNBS-A (Kinase-1a: GxPGSGKT), RNBS-B (Kinase-2: FLHACF), RNBS-C (GLPL), RNBS-D (MHD) | RNBS-A (GxPGSGKTT), RNBS-B (FLHIACF), RNBS-C (GLPL), RNBS-D (MHD) | Divergent motifs; often "MHD" variant |
| Representative in A. thaliana | RPS4, RPP1 | RPS2, RPM1 | NRG1, ADR1 |
| Predicted Prevalence in S. miltiorrhiza | ~40% of NBS-LRRs | ~55% of NBS-LRRs | ~5% of NBS-LRRs |
3. Experimental Protocols for Classification
3.1. In Silico Identification and Domain Analysis
3.2. Motif-Based Validation
3.3. Structural Prediction (Advanced Validation)
4. Signaling Pathways in Plant Immunity
Diagram 1: NBS-LRR Signaling Pathways (73 chars)
5. The Scientist's Toolkit: Research Reagent Solutions
Table 2: Essential Reagents for NBS-LRR Classification & Functional Study
| Reagent/Material | Function in Research |
|---|---|
| PFAM HMM Profiles (PF00931, PF01582, PF05659) | Database of hidden Markov models for identifying NBS, TIR, and RPW8 domains in protein sequences. |
| Reference Protein Sequences (e.g., from TAIR) | Curated sequences from model plants (A. thaliana) used as benchmarks for phylogenetic clustering and motif analysis. |
| MEME Suite Software | Discovers conserved, ungapped motifs (blocks) in protein sequences to validate domain architecture and classify subfamilies. |
| DeepCoil / Ncoils Algorithm | Predicts coiled-coil domains with high specificity, crucial for distinguishing CNL from RNL subfamilies. |
| AlphaFold2 Protein Structure Database | Provides predicted protein structures for unknown N-terminal domains, aiding in visual classification and functional hypothesis generation. |
| Gene-Specific Primers (for S. miltiorrhiza NBS-LRRs) | Used for PCR amplification and cloning of candidate genes for downstream validation (e.g., subcellular localization, functional assays). |
| Anti-TAG Antibodies (e.g., Anti-GFP, Anti-FLAG) | For detecting tagged recombinant NBS-LRR proteins expressed in transient transformation systems (e.g., Nicotiana benthamiana). |
6. Workflow for Genome-Wide Classification
Diagram 2: NBS-LRR Classification Workflow (42 chars)
7. Conclusion Precise distinction between TNL, CNL, and RNL subfamilies via integrated bioinformatics and experimental protocols is foundational. For Salvia miltiorrhiza research, this enables the development of targeted functional studies to link specific NBS-LRR classes to disease resistance traits, potentially guiding strategies to enhance the yield and stability of bioactive compounds for drug development.
This whitepaper details the genomic distribution patterns of the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family within the medicinal plant Salvia miltiorrhiza (Danshen). As a cornerstone of plant innate immunity, understanding the chromosomal localization, clustering, and duplication events of NBS-LRR genes is critical for elucidating disease resistance mechanisms and guiding genetic improvement for drug development.
Recent genome assembly (v3.0) reveals that NBS-LRR genes are non-randomly distributed across the eight chromosomes of S. miltiorrhiza.
Table 1: Chromosomal Distribution of NBS-LRR Genes in S. miltiorrhiza
| Chromosome | Total Genes | NBS-LRR Genes | Density (genes/Mb) | Notable Clusters |
|---|---|---|---|---|
| Chr1 | ~8,200 | 15 | 0.93 | Cluster A (3 genes) |
| Chr2 | ~7,800 | 22 | 1.45 | Cluster B (5 genes) |
| Chr3 | ~7,500 | 18 | 1.24 | - |
| Chr4 | ~6,900 | 8 | 0.72 | - |
| Chr5 | ~7,100 | 25 | 1.82 | Cluster C (7 genes) |
| Chr6 | ~6,500 | 12 | 0.98 | Cluster D (4 genes) |
| Chr7 | ~6,700 | 10 | 0.81 | - |
| Chr8 | ~6,000 | 9 | 0.79 | - |
| Total/ Avg | ~56,700 | 119 | 1.09 | 4 Major Clusters |
NBS-LRR genes frequently reside in clusters, primarily driven by tandem duplication events. A cluster is defined as ≥3 NBS-LRR genes within a 200 kb genomic region.
Table 2: Major Tandem Duplication Clusters of NBS-LRR Genes
| Cluster ID | Chromosome | Locus Range (Mb) | Number of Genes | Predicted Duplication Events | Ka/Ks Range |
|---|---|---|---|---|---|
| Cluster A | Chr1 | 12.4 - 12.7 | 3 | 2 | 0.12 - 0.25 |
| Cluster B | Chr2 | 25.1 - 25.4 | 5 | 3 | 0.08 - 0.31 |
| Cluster C | Chr5 | 18.8 - 19.3 | 7 | 5 | 0.10 - 0.45 |
| Cluster D | Chr6 | 14.5 - 14.7 | 4 | 2 | 0.15 - 0.28 |
Ka/Ks < 1 indicates strong purifying selection, suggesting functional conservation under evolutionary pressure.
Objective: Identify all NBS-LRR genes and map their chromosomal positions. Methodology:
Objective: Identify genes formed via tandem duplication within clusters. Methodology:
Workflow for NBS-LRR Genomic Distribution Analysis
Tandem Duplication and Cluster Formation on a Chromosome
Table 3: Essential Reagents and Tools for NBS-LRR Genomic Analysis
| Item/Category | Specific Product/Example | Function in Research |
|---|---|---|
| Genome Database | DanShenBase (v3.0), NCBI S. miltiorrhiza Assembly | Provides the reference genome sequence and structural annotation for gene mining and localization. |
| HMM Profile Library | Pfam (NB-ARC: PF00931; LRR profiles) | Curated protein family models for sensitive domain-based identification of NBS-LRR genes. |
| Sequence Analysis Suite | HMMER 3.3.2, MEME Suite, InterProScan | Executes HMM searches, discovers conserved motifs, and provides integrated domain architecture analysis. |
| Phylogenetic & Selection Analysis | MEGA11, KaKs_Calculator 3.0 | Constructs evolutionary trees to infer duplication relationships and calculates Ka/Ks ratios to assess selection pressure. |
| Genomic Visualization & Scripting | TBtools, R/Bioconductor (GenomicRanges, ggplot2), Python (BioPython) | Maps genes to chromosomes, defines clusters, automates analysis, and generates publication-quality figures. |
| PCR & Cloning Reagents | High-Fidelity DNA Polymerase (e.g., Phusion), TA/Blunt-End Cloning Kits | Validates gene presence/absence polymorphisms (GAPs) within clusters and clones alleles for functional study. |
| qPCR Reagents | SYBR Green Master Mix, Gene-Specific Primers | Quantifies expression levels of tandemly duplicated genes under pathogen/pathogen elicitor treatment. |
1. Introduction
This technical guide details the bioinformatic and experimental methodologies for the conserved domain and motif analysis of nucleotide-binding site leucine-rich repeat (NBS-LRR) genes. The procedures are framed within a genome-wide identification study of the NBS-LRR gene family in the medicinal plant Salvia miltiorrhiza (Danshen). Accurate identification of the canonical NBS, LRR, and variable N-terminal (Coiled-Coil or TIR) domains is critical for classifying resistance (R) genes, inferring function, and understanding their role in plant defense signaling, which directly impacts the biosynthesis of valuable pharmaceutical compounds.
2. Core Domain Architectures and Quantitative Analysis
The NBS-LRR family in plants is subdivided based on the N-terminal domain. The two primary classes are CNL (Coiled-Coil-NBS-LRR) and TNL (TIR-NBS-LRR). A third, less common class, RNL (RPW8-NBS-LRR), also exists. A genome-wide scan of the S. miltiorrhiza genome (v2.0) typically yields the following distribution, which should be summarized in a table as below.
Table 1: Typical Distribution of NBS-LRR Genes in Salvia miltiorrhiza
| Class | N-terminal Domain | Key Motif Signatures | Approximate Number in S. miltiorrhiza | Percentage |
|---|---|---|---|---|
| CNL | Coiled-Coil (CC) | P-loop, RNBS-A, RNBS-B, GLPL, RNBS-C, RNBS-D, MHD, LRR | ~60 | ~55% |
| TNL | TIR | TIR domain, P-loop, RNBS-A-D, MHD, LRR | ~45 | ~41% |
| RNL/Other | RPW8 or None | Variable | ~4 | ~4% |
| Total | ~109 | 100% |
3. Bioinformatics Pipeline for Identification
3.1. Sequence Retrieval and Initial Scan
3.2. Domain Architecture Validation
3.3. Phylogenetic Classification
4. Experimental Validation Protocols
4.1. Reverse Transcription PCR (RT-PCR) for Gene Expression
4.2. Subcellular Localization (For Candidate R Genes)
5. Signaling Pathway Context in S. miltiorrhiza
The identified NBS-LRR genes function within conserved defense pathways. TNLs often signal through ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1) and NONEXPRESSOR OF PR GENES 1 (NPR1), while CNLs typically use NDR1/HIN1-like (NHL) proteins. These converge on systemic acquired resistance (SAR), influencing the production of bioactive compounds like tanshinones and phenolic acids.
Diagram 1: NBS-LRR Signaling in S. miltiorrhiza Defense
6. Research Reagent Solutions
Table 2: Essential Research Toolkit for NBS-LRR Analysis
| Item | Function / Purpose | Example Product/Kit |
|---|---|---|
| Plant Material | Source of genomic DNA and RNA for identification and expression studies. | Salvia miltiorrhiza Bunge cultivar. |
| Genome Database | Reference for sequence retrieval and homology searches. | S. miltiorrhiza DanSenome (v2.0), NCBI Genome. |
| HMM Profile Database | Curated domain models for sensitive sequence identification. | Pfam (NB-ARC, TIR, LRR profiles). |
| HMMER Software | Executes profile HMM searches against sequence databases. | HMMER 3.3.2. |
| InterProScan | Integrates multiple databases for protein domain classification. | InterProScan 5.61-93.0. |
| Motif Discovery Suite | Identifies conserved, ungapped sequence motifs. | MEME Suite 5.5.2. |
| Phylogeny Software | Constructs evolutionary trees for classification. | MEGA11, IQ-TREE. |
| RT-PCR Kit | Converts RNA to cDNA and amplifies gene-specific fragments. | PrimeScript RT Reagent Kit, TB Green Premix Ex Taq. |
| Cloning Vector | For constructing GFP fusions for localization studies. | pCAMBIA1300-GFP. |
| Agrobacterium Strain | Mediates transient transformation in N. benthamiana. | A. tumefaciens GV3101. |
7. Experimental Workflow Diagram
Diagram 2: NBS-LRR Identification & Validation Workflow
This technical guide details a bioinformatic pipeline for the genome-wide identification of disease resistance (R) genes, with specific application to the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). The identification of these genes is a critical component of a broader thesis aimed at understanding the genetic basis of disease resistance in this economically and medicinally important plant, ultimately informing breeding programs and pharmaceutical development focused on enhancing plant vigor and metabolite production.
The initial step involves acquiring high-quality genomic and protein sequence data for Salvia miltiorrhiza.
Experimental Protocol (Data Retrieval):
.fa or .fna extension).A targeted search begins with curating a set of known NBS-LRR protein domains to create a profile Hidden Markov Model (HMM) library.
Experimental Protocol (HMM Library Construction):
hmmbuild (from HMMER suite) to build individual HMM profiles from each seed alignment.hmmpress.Table 1: Core PFAM Domains for NBS-LRR Identification
| PFAM Accession | Domain Name | Typical e-value Cutoff | Primary Function in R-Gene |
|---|---|---|---|
| PF00931 | NB-ARC | 1e-10 | Nucleotide binding & regulatory switch |
| PF01582 | TIR | 1e-5 | Signaling domain (TNL class) |
| PF05659 | RPW8 | 1e-3 | Downstream signaling (some CNLs) |
| PF00560 | LRR_1 | 1e-3 | Pathogen recognition specificity |
| PF13855 | LRR_8 | 1e-3 | Pathogen recognition specificity |
The custom HMM library is used to scan the S. miltiorrhiza proteome.
Experimental Protocol (HMMER Scan):
hmmscan with the custom HMM library against the entire predicted proteome.
output.domtblout result file to identify protein sequences that contain at least one significant hit to the NB-ARC domain (PF00931, e-value < 1e-10).Table 2: Example HMMER Scan Results for S. miltiorrhiza Proteome
| Candidate Protein ID | NB-ARC Hit (e-value) | TIR Hit (e-value) | LRR Hit (e-value) | Putative Class |
|---|---|---|---|---|
| Smil_001734 | 2.5e-45 | 3.2e-12 | 1.8e-6 | TNL |
| Smil_005892 | 8.9e-52 | Not Detected | 4.1e-8 | CNL |
| Smil_003217 | 1.1e-40 | Not Detected | Not Detected | NBS-only |
Candidate sequences are subjected to rigorous domain analysis to confirm architecture.
Experimental Protocol (Domain Validation):
pfam_scan.pl tool against the local PFAM database to corroborate domain findings.Genome-Wide R-Gene Identification Pipeline
Following identification, standard in silico analyses characterize the gene family.
Experimental Protocol (Downstream Analyses):
Downstream Bioinformatics Analysis Workflow
Table 3: Essential Computational Tools and Resources
| Tool/Resource Name | Function in Pipeline | Key Parameter/Note |
|---|---|---|
| HMMER Suite (v3.3) | Profile HMM searches and building | Critical --domtblout flag for parsable output; e-value cutoff is key. |
| PFAM Database (v35.0) | Curated collection of protein domain HMMs | Source for seed alignments (NB-ARC, TIR, LRR). |
| SMART Web Service | Online domain architecture analysis | Set to "normal" mode to include low-complexity regions. |
| PFAM Scan Script | Local domain validation against PFAM | Ensures consistency and allows batch processing. |
| MEME Suite (v5.4.1) | Discovery of conserved protein motifs | Used to characterize non-canonical conserved regions. |
| MEGA11 / IQ-TREE2 | Phylogenetic tree construction | Bootstrap values >70% generally indicate robust clades. |
| S. miltiorrhiza Genome Assembly (v2.0) | Reference sequence and annotation | Quality of identification depends directly on assembly quality. |
| Biopython Library | Python scripts for parsing HMMER/GFF files | Essential for automating filtering and data integration steps. |
This technical guide outlines the methodological framework for phylogenetic tree construction, framed within the critical need to decipher the evolutionary relationships of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). The identification and characterization of this expansive resistance (R) gene family at a genome-wide scale are foundational for understanding the plant's innate immune system. Constructing robust phylogenies of NBS-LRR genes is essential for classifying gene subfamilies (TNLs, CNLs, RNLs), inferring evolutionary processes (e.g., tandem duplication, birth-and-death evolution), and facilitating cross-species comparisons to identify orthologs and conserved functional motifs. This guide details the computational and statistical pipelines used to transform raw sequence data into evolutionary hypotheses, directly supporting broader research aims in plant immunity and the biosynthetic pathways of pharmacologically active compounds.
The standard workflow progresses from sequence curation to tree evaluation.
Title: Phylogenetic Analysis Workflow for NBS-LRR Genes
Table 1: Core Phylogenetic Inference Methods
| Method | Principle | Software Tools | Best Use Case for NBS-LRR |
|---|---|---|---|
| Maximum Parsimony (MP) | Minimizes total evolutionary changes (steps). | PAUP*, MEGA, PHYLIP | Initial exploration of closely related gene clades. |
| Distance-Matrix (NJ/UPGMA) | Uses pairwise genetic distances to build tree. | MEGA, PHYLIP, BioNJ | Large datasets (>1000 sequences) for initial clustering. |
| Maximum Likelihood (ML) | Finds tree maximizing probability of observed data under a model. | IQ-TREE, RAxML, PhyML | Standard method for robust, model-based inference. |
| Bayesian Inference (BI) | Estimates posterior probability of tree using models & priors. | MrBayes, BEAST2 | Dating divergence events, complex model integration. |
Detailed ML Protocol (using IQ-TREE):
Sm_NBS_LRR.phy).iqtree -s Sm_NBS_LRR.phy -m MFP -bb 1000 -alrt 1000. This selects the best-fit substitution model (e.g., LG+F+R10) and performs both ultrafast bootstrap (1000 replicates) and SH-aLRT test.Sm_NBS_LRR.phy.treefile) is produced with branch support values.Detailed Bayesian Protocol (using MrBayes block in a Nexus file):
Run the analysis until the average standard deviation of split frequencies falls below 0.01, indicating convergence.
Title: Key Phylogenetic Tree Evaluation Metrics
Table 2: Essential Resources for Phylogenetic Analysis of Plant NBS-LRR Genes
| Item / Resource | Function / Purpose | Example / Source |
|---|---|---|
| Reference Genome & Annotation | Provides the foundational sequence data for gene family identification. | Salvia miltiorrhiza Genome (NCBI BioProject: PRJNA72695) |
| Domain Profile Hidden Markov Models (HMMs) | Sensitive detection of NBS and LRR domains in protein sequences. | Pfam (NB-ARC: PF00931; LRR profiles) |
| Multiple Sequence Alignment Software | Aligns homologous sequences for phylogenetic analysis. | MAFFT, Clustal Omega, MUSCLE |
| Model Selection Tool | Identifies best-fit substitution model for likelihood methods. | ModelFinder (in IQ-TREE), jModelTest2 |
| Phylogenetic Inference Software | Core engine for tree building under different statistical criteria. | IQ-TREE, MrBayes, RAxML-NG |
| High-Performance Computing (HPC) Cluster | Provides necessary CPU power for ML/BI analyses of large gene families. | Local university cluster, Cloud computing (AWS, GCP) |
| Tree Visualization & Annotation Platform | Enables interpretation, formatting, and publication-quality figure generation. | iTOL, ggtree (R), FigTree |
Phylogenetic trees serve as scaffolds for advanced analyses:
Title: Phylogeny as a Scaffold for Integrated Analysis
1. Introduction This technical guide is presented within the framework of a genome-wide identification study of the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). NBS-LRR genes are critical for plant disease resistance, and their expression is tightly regulated by promoter cis-elements in response to biotic/abiotic stresses and hormone signals. Analyzing these regulatory motifs is essential for understanding the defense mechanisms of this economically important medicinal plant and for guiding metabolic engineering for enhanced production of bioactive compounds like tanshinones.
2. Core Cis-Elements in Plant Stress and Hormone Signaling Based on current literature and plant cis-element databases (e.g., PlantCARE, PLACE), key motifs relevant to S. miltiorrhiza NBS-LRR promoters are summarized below.
Table 1: Key Stress-Responsive and Hormone-Related Cis-Elements
| Cis-Element Name | Core Sequence | Predicted Function | Associated Signal |
|---|---|---|---|
| W-box | (T)TGAC(C/T) | Binding site for WRKY transcription factors | Pathogen response, SA signaling |
| G-box | CACGTG | Light, ABA, JA, and stress responses | ABA, JA, oxidative stress |
| ABRE | ACGTG(G/T)C | ABA-responsive element | Abscisic Acid (ABA) |
| TCA-element | CCATCTTTTT | Salicylic Acid responsiveness | Salicylic Acid (SA) |
| TGACG-motif | TGACG | Jasmonic Acid responsiveness | Jasmonic Acid (JA) |
| ERE | AWTTCAAA | Ethylene responsiveness | Ethylene |
| AuxRR-core | GGTCCAT | Auxin responsiveness | Auxin |
| DRE/CRT | (A/G)CCGAC | Dehydration/Cold responsiveness | Abiotic stress (drought, cold, salt) |
| MYB/MYC | (C/T)AAC(T/G)G; CACATG | Binding sites for MYB/MYC TFs | Drought, ABA, JA |
| AS-1 | TGACG | Oxidative and pathogen stress | SA, JA, H2O2 |
3. Experimental Protocol: Promoter Cis-Element Analysis Pipeline This protocol details the steps from gene identification to motif validation.
3.1. In Silico Identification and Extraction of Promoter Sequences
bedtools getfasta) with a GFF3 annotation file to extract these sequences from the whole-genome FASTA file.>GeneID_promoter.3.2. Computational Prediction of Cis-Elements
plantcare_scan function in R/Bioconductor.3.3. Experimental Validation: Electrophoretic Mobility Shift Assay (EMSA)
Diagram Title: Cis-Element Analysis & Validation Workflow
4. Signaling Pathways Involving Predicted Motifs The predicted cis-elements integrate NBS-LRR genes into complex signaling networks.
Diagram Title: Stress/Hormone Signals to Gene Activation Pathway
5. The Scientist's Toolkit: Research Reagent Solutions
Table 2: Essential Reagents and Kits for Promoter Analysis
| Item | Function/Application | Example/Note |
|---|---|---|
| Genomic DNA Isolation Kit | High-quality gDNA extraction for promoter PCR. | DNeasy Plant Kits (QIAGEN). |
| High-Fidelity PCR Enzyme | Accurate amplification of promoter sequences from gDNA. | Phusion or KAPA HiFi Polymerase. |
| PlantCARE/PLACE Database | Core resource for in silico cis-element scanning. | Freely accessible web servers. |
| Biotin 3' End DNA Labeling Kit | For labeling EMSA probes. | Pierce Biotin 3' End DNA Labeling Kit. |
| Chemiluminescent Nucleic Acid Detection Module | Detection of biotinylated probes in EMSA. | Thermo Scientific Pierce. |
| Nuclear Extraction Kit | Isolation of nuclear proteins containing TFs for EMSA. | Plant Nuclei Isolation/Extraction Kits (e.g., from Sigma). |
| Mobility Shift Binding Buffer | Optimized buffer for TF-DNA binding reactions. | Often included in EMSA kits or prepared as 10X stock. |
| Polyacrylamide Gel Electrophoresis System | Separation of protein-DNA complexes from free probe. | Mini-PROTEAN Tetra System (Bio-Rad). |
| Positively Charged Nylon Membrane | Immobilization of EMSA complexes for detection. | Hybond-N⁺ membrane. |
| Hormone/Stress Elicitors | For treating plant materials to induce TF expression. | Methyl Jasmonate (MeJA), Salicylic Acid (SA), NaCl, PEG. |
This whitepaper provides a technical guide for analyzing gene structure and the resulting protein properties, framed within a genome-wide identification study of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). NBS-LRR genes are central to plant innate immunity, and their characterization is critical for understanding disease resistance mechanisms and for potential drug development from this medicinal plant. Precise analysis of exon-intron architecture and derived protein physicochemical properties forms the foundational step in such genome-wide studies, enabling the classification of gene subfamilies and prediction of functional domains.
The structure of a eukaryotic gene is characterized by exons (expressed sequences) and introns (intervening sequences). In NBS-LRR genes, this architecture is highly informative:
Primary protein sequences translated from coding sequences (CDS) are analyzed for inherent properties:
Objective: To identify all NBS-LRR genes in the S. miltiorrhiza genome and delineate their exon-intron structures. Protocol:
hmmsearch --domtblout output.txt pfam.hmm proteome.faa.gggenes R package. Input the genomic coordinates and exon/intron positions from the GFF3 file to generate visual comparisons.Objective: To compute key physical and chemical parameters for the identified NBS-LRR proteins. Protocol:
protr R package or the Bio.SeqUtils module in Biopython) in batch mode.Table 1: Summary of Exon-Intron Structure in S. miltiorrhiza NBS-LRR Genes
| NBS-LRR Subfamily | Number of Genes | Average Exon Count (Range) | Average Gene Length (bp) | Conserved Intron Phase Pattern |
|---|---|---|---|---|
| TNL (TIR-NBS-LRR) | ~45* | 4.2 (3-6)* | 3450* | Phase 2 after TIR, Phase 0 before LRR* |
| CNL (CC-NBS-LRR) | ~68* | 3.1 (2-5)* | 2850* | Phase 0 dominant* |
| RNL (RPW8-NBS-LRR) | ~12* | 2.8 (2-4)* | 2500* | Variable* |
| Total | ~125* | 3.4 | ~3000* |
Table 2: Computed Physicochemical Properties of Representative S. miltiorrhiza NBS-LRR Proteins
| Gene ID | Subfamily | AA Length | Mol. Weight (kDa) | Theoretical pI | Instability Index | Aliphatic Index | GRAVY | Pred. Localization |
|---|---|---|---|---|---|---|---|---|
| SmNLR001* | TNL | 950* | 108.5* | 6.2* | 38.5 (Stable)* | 85.2* | -0.25* | Cytoplasm* |
| SmNLR045* | CNL | 820* | 93.8* | 8.1* | 45.1 (Unstable)* | 91.5* | -0.12* | Chloroplast* |
| SmNLR112* | RNL | 710* | 81.3* | 5.8* | 40.2 (Stable)* | 78.9* | -0.31* | Nucleus* |
Example data based on typical results; actual values require live genome analysis.
Table 3: Essential Materials for NBS-LRR Gene Analysis
| Item/Category | Specific Example/Product | Function in Analysis |
|---|---|---|
| Genomic Data | Salvia miltiorrhiza v2.0 Genome (NCBI) | Reference sequence for identification and mapping. |
| HMM Profiles | Pfam NB-ARC (PF00931), LRR (PF07725) | Curated domain models for sensitive sequence searching. |
| Bioinformatics Suites | HMMER 3.3, Biopython, R (tidyverse, gggenes) | Core software for sequence analysis, parsing, and visualization. |
| Sequence Analysis Web Tools | GSDS 2.0, ExPASy ProtParam, WoLF PSORT | User-friendly platforms for structure drawing and property calculation. |
| Validation Reagents (Wet-Lab) | Phire Plant Direct PCR Kit (Thermo Fisher) | For PCR amplification of candidate genes from S. miltiorrhiza gDNA. |
| Cloning & Expression Vectors | pEASY-Blunt Cloning Vector; pCAMBIA1300-GFP | For sequence verification and subcellular localization assays (transient expression). |
| Positive Control Sequences | Arabidopsis RPP1 (TNL) or RPM1 (CNL) CDS | Well-characterized NBS-LRR genes for alignment and analysis comparison. |
This technical guide outlines methodologies for analyzing RNA-Seq data to characterize expression patterns of Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) genes in Salvia miltiorrhiza (Danshen). Within the broader thesis context of genome-wide identification of the NBS-LRR family, this document provides protocols for investigating tissue-specific expression and transcriptional responses to biotic stress, crucial for understanding disease resistance mechanisms in this medicinal plant.
NBS-LRR genes constitute the largest class of plant disease resistance (R) genes. In S. miltiorrhiza, a plant valued for its roots containing bioactive tanshinones and phenolic acids, identifying and characterizing these genes is vital for breeding resilient cultivars. RNA-Seq expression profiling bridges genome-wide identification and functional validation, revealing which NBS-LRR genes are active in specific tissues or induced by pathogen/elicitor challenges.
A robust design is critical for meaningful comparative expression analysis.
2.1 Tissue-Specific Profiling:
2.2 Pathogen/Elicitor Treatment Profiling:
Protocol: Total RNA is extracted using a modified CTAB method with DNase I treatment. RNA integrity (RIN > 8.0) is verified via Bioanalyzer. Strand-specific cDNA libraries are prepared using the Illumina TruSeq Stranded mRNA LT Sample Prep Kit. Sequencing is performed on an Illumina NovaSeq 6000 platform for 150 bp paired-end reads, targeting ~40 million reads per sample.
RNA-Seq Analysis Workflow Quality control, alignment, quantification, and differential expression analysis.
Protocol: A custom list of genome-identified NBS-LRR gene IDs is used to subset the global count matrix. Normalized expression values (e.g., TPM, FPKM from StringTie or counts from DESeq2) are extracted for this gene family. Tissue-specific or induced expression is analyzed using clustering and statistical overrepresentation tests.
NBS-LRR proteins recognize pathogen effectors and trigger immune responses via SA and JA signaling networks.
NBS-LRR Triggered Immune Signaling Pathways Effector recognition leads to SA/JA pathway activation and systemic resistance.
Table 1: Expression (Mean TPM) of Selected NBS-LRR Genes Across Tissues
| Gene ID (SmNLR) | Root | Stem | Leaf | Flower | Putative Role |
|---|---|---|---|---|---|
| SmNLR001 | 12.5 | 1.2 | 0.8 | 45.7 | Floral defense |
| SmNLR045 | 85.3 | 3.4 | 2.1 | 4.5 | Root-specific |
| SmNLR128 | 15.6 | 18.9 | 22.4 | 20.1 | Constitutive |
| SmNLR201 | 2.3 | 1.5 | 32.6 | 5.4 | Leaf-enriched |
Table 2: Top NBS-LRR Genes Induced by Pathogen/Elicitor at 24h (Log2 Fold Change)
| Gene ID | Pst vs Mock | MeJA vs Mock | SA vs Mock | Likely Pathway |
|---|---|---|---|---|
| SmNLR012 | 5.8 | 1.2 | 6.5 | SA-mediated |
| SmNLR078 | 3.2 | 4.5 | 0.5 | JA-mediated |
| SmNLR155 | 4.1 | 3.8 | 2.1* | Co-induced |
| SmNLR189 | 0.5 | 0.3 | -0.8 | Not responsive |
(p-adj < 0.05, *p-adj < 0.01)
Table 3: Essential Materials for RNA-Seq Profiling of Plant Defense
| Item | Function/Benefit | Example Product |
|---|---|---|
| RNA Stabilization Agent | Immediate stabilization of RNA in harvested tissue, preventing degradation. | RNAlater, Life Technologies |
| Polysaccharide/Polyphenol RNA Kit | Optimized for plants like S. miltiorrhiza rich in secondary metabolites. | Plant RNA Kit, Zymo Research |
| Stranded mRNA Library Prep Kit | Maintains strand orientation, improving transcriptome assembly. | TruSeq Stranded mRNA, Illumina |
| ERCC RNA Spike-In Mix | External controls for normalization and assessing technical variation. | ERCC ExFold Mix, Thermo Fisher |
| Pathogen/Elicitor Standards | Defined inoculum/hormone concentrations for reproducible treatments. | P. syringae DC3000, MeJA (Sigma) |
| NBS-LRR HMM Profile | Computational probe for identifying NBS-LRRs in genome/transcriptome. | PF00931, PF00560 (Pfam) |
| Differential Expression Software | Statistical analysis of count data for robust DEG calling. | DESeq2 R package |
This whitepaper presents an in-depth technical analysis conducted within the broader framework of a doctoral thesis focused on the genome-wide identification and characterization of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). Following the comprehensive identification of 92 NBS-LRR genes in the S. miltiorrhiza genome, this research phase aims to elucidate the functional correlation between the expression patterns of specific NBS-LRR clades and the regulation of key biosynthetic pathways for therapeutically valuable secondary metabolites: the lipophilic tanshinones and the hydrophilic phenolic acids. The core hypothesis is that pathogen-associated molecular pattern (PAMP)-triggered immunity, mediated by specific NBS-LRRs, intersects with and modulates the metabolic engineering of these bioactive compounds.
Recent studies indicate that plant resistance (R) genes, particularly NBS-LRRs, are not only central to biotic stress perception but also participate in extensive signaling crosstalk that influences downstream transcriptional reprogramming. This reprogramming often extends to the activation of defense-related secondary metabolic pathways. In S. miltiorrhiza, the biosynthesis of tanshinones (e.g., tanshinone IIA, cryptotanshinone) via the mevalonate (MVA) and methylerythritol phosphate (MEP) pathways, and of phenolic acids (e.g., rosmarinic acid, salvianolic acid B) via the phenylpropanoid pathway, is known to be induced by various elicitors, including fungal extracts and signaling molecules like jasmonic acid (JA) and salicylic acid (SA). This positions NBS-LRRs as potential upstream signaling nodes whose activation could fine-tune the expression of key biosynthetic enzyme genes such as SmCPS, SmKSL, SmPAL, and SmRAS.
A time-course experiment was designed where S. miltiorrhiza hairy root cultures were treated with the fungal elicitor Verticillium dahliae cell wall extract. Expression levels of selected NBS-LRR genes (representing TNL, CNL, and RNL subfamilies) and key biosynthetic pathway genes were quantified via qRT-PCR. Concurrently, metabolite accumulation was measured by HPLC.
Table 1: Correlation Matrix of NBS-LRR Expression with Metabolite Biosynthetic Gene Expression (Pearson's r) at 24h Post-Elicitation
| NBS-LRR Gene (Subfamily) | SmCPS (Tanshinone) | SmKSL (Tanshinone) | SmPAL (Phenolic Acid) | SmRAS (Phenolic Acid) |
|---|---|---|---|---|
| SmNBS-LRR05 (TNL) | 0.92 | 0.88 | 0.45 | 0.51 |
| SmNBS-LRR18 (CNL) | 0.78 | 0.81 | 0.67 | 0.72 |
| SmNBS-LRR45 (RNL) | 0.12 | 0.09 | 0.91 | 0.89 |
| SmNBS-LRR72 (CNL) | 0.05 | 0.10 | 0.08 | -0.03 |
Table 2: Fold-Change in Metabolite Accumulation Relative to Control at 72h Post-Elicitation
| Metabolite Class | Specific Metabolite | Fold Change (Elicited vs. Control) | HPLC Peak Area (mAU*s) ± SD |
|---|---|---|---|
| Tanshinones | Tanshinone IIA | 3.8 | 12540 ± 980 |
| Cryptotanshinone | 4.2 | 8920 ± 760 | |
| Phenolic Acids | Salvianolic Acid B | 5.1 | 28750 ± 2100 |
| Rosmarinic Acid | 3.7 | 15430 ± 1150 |
Proposed NBS-LRR Mediated Signaling to Metabolism
Experimental Workflow for Correlation Study
Table 3: Essential Reagents and Materials for the Featured Experiments
| Item/Category | Specific Product/Example | Function in the Experiment |
|---|---|---|
| Plant Culture System | S. miltiorrhiza Hairy Root Line (e.g., induced by Agrobacterium rhizogenes A4) | Provides genetically stable, fast-growing, and metabolite-producing plant material suitable for elicitation studies in controlled, sterile conditions. |
| Elicitor | Verticillium dahliae Cell Wall Extract (Custom-prepared) | Acts as a biotic stressor/PAMP to trigger the plant immune response, activating NBS-LRR and downstream defense pathways. |
| RNA Extraction Kit | Omega Bio-tek E.Z.N.A. Plant RNA Kit | Efficiently isolates high-quality, genomic DNA-free total RNA from polysaccharide and polyphenol-rich root tissues. |
| Reverse Transcription Kit | Vazyme HiScript III RT SuperMix (+gDNA wiper) | Provides highly efficient and consistent cDNA synthesis from RNA templates, including removal of genomic DNA contamination. |
| qPCR Master Mix | Vazyme ChamQ Universal SYBR qPCR Master Mix | A premixed, optimized solution for sensitive and specific quantitative real-time PCR detection of gene expression levels. |
| HPLC Solvents & Standards | Sigma-Aldrich Acetonitrile (HPLC grade), Tanshinone IIA, Salvianolic Acid B (Analytical Standards) | Essential for metabolite separation (mobile phase) and accurate quantification via external standard calibration curves. |
| HPLC Column | Agilent ZORBAX StableBond SB-C18 (4.6 x 250 mm, 5 µm) | Provides robust, high-resolution separation of both non-polar (tanshinones) and polar (phenolic acids) compound mixtures. |
| Statistical Software | R (with corrplot, ggplot2 packages) / SPSS |
Used for calculating Pearson correlation coefficients, generating correlation matrix heatmaps, and performing significance testing on experimental data. |
This guide details the practical translation of fundamental genomic research into applied breeding tools, framed within a specific thesis context: "Genome-wide identification and characterization of the NBS-LRR gene family in Salvia miltiorrhiza and its implications for disease resistance breeding." The NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) genes constitute the largest class of plant disease resistance (R) genes. Their genome-wide identification provides the foundational data for selecting candidate genes linked to pathogen resistance, enabling the development of molecular markers for marker-assisted selection (MAS) in S. miltiorrhiza (Danshen) breeding programs aimed at improving yield, quality, and stability.
The process involves a multi-step pipeline from in silico analysis to wet-lab validation and application.
Objective: To identify, annotate, and preliminarily characterize all NBS-LRR genes in the S. miltiorrhiza genome.
Methodology:
Table 1: Exemplary Output from S. miltiorrhiza NBS-LRR Genome-Wide Identification
| Analysis Parameter | Exemplary Quantitative Result | Interpretation for Breeding |
|---|---|---|
| Total NBS-LRR Genes Identified | 121 | Defines the total pool of candidate R genes. |
| Subfamily Classification (TNL:CNL:RNL:Others) | 45:68:5:3 | Informs potential signaling pathways; CNLs may dominate. |
| Genes with Tandem Duplication | 32 (in 12 clusters) | Highlights genomic hotspots for rapid evolution and potential resistance diversity. |
| Genes Containing Stress-responsive Cis-elements | 89 (73.5%) | Prioritizes genes likely regulated by pathogen attack. |
Title: Computational pipeline for NBS-LRR gene identification.
Objective: To select the most promising candidate genes for functional study and develop linked molecular markers.
Prioritization Criteria:
Marker Development Protocol:
Table 2: Experimental Panel for Candidate Gene Validation
| Material Type | Example/Description | Function in Validation |
|---|---|---|
| Plant Germplasm | 30 S. miltiorrhiza accessions with known resistance/susceptibility to root rot. | Phenotypic correlation for marker-trait association. |
| Pathogen Strain | Fusarium oxysporum f. sp. miltiorrhizae (FoM), virulent isolate. | For pathogen challenge experiments. |
| Elicitors | Salicylic Acid (SA), Methyl Jasmonate (MeJA). | To simulate defense signaling and induce gene expression. |
| qPCR Reagents | SYBR Green master mix, gene-specific primers, reverse transcription kit. | To quantify candidate gene expression post-elicitation. |
| Genotyping Platform | KASP assay mix, thermal cycler with fluorescence detection, or CAPS restriction enzymes. | For high-throughput screening of molecular markers. |
Objective: To rapidly assess the function of a candidate NBS-LRR gene in disease resistance.
Methodology (Tobacco Rattle Virus-based VIGS):
Title: Functional validation workflow using Virus-Induced Gene Silencing (VIGS).
Validated markers are deployed in a Marker-Assisted Selection (MAS) pipeline. Breeders cross a donor parent carrying the resistant allele (diagnosed by the KASP/CAPS marker) with an elite, high-yielding but susceptible parent. In subsequent generations (F2 or BC1F1), seedlings are screened early with the molecular marker instead of waiting for laborious and environmentally variable pathogen bioassays. This accelerates the development of improved S. miltiorrhiza varieties with enhanced, durable resistance, ensuring stable production of bioactive compounds (tanshinones, salvianolic acids) for pharmaceutical use.
| Reagent/Material | Supplier Examples | Critical Function in the Workflow |
|---|---|---|
| HMMER Software Suite | http://hmmer.org | Core tool for initial in silico identification of NBS-LRR proteins using Pfam domain models. |
| Phusion High-Fidelity DNA Polymerase | Thermo Fisher, NEB | Ensures accurate amplification of candidate gene sequences for cloning and polymorphism analysis. |
| pTRV1 & pTRV2 VIGS Vectors | Arabidopsis Biological Resource Center (ABRC) or addgene. | Essential plant viral vectors for performing rapid loss-of-function assays via Virus-Induced Gene Silencing. |
| KASP Assay Mix & Genotyping Master Mix | LGC Biosearch Technologies | Enables high-throughput, cost-effective SNP genotyping for marker-assisted selection in breeding populations. |
| SYBR Green qRT-PCR Master Mix | Bio-Rad, Takara | For quantitative analysis of candidate gene expression patterns in response to pathogen/elicitor treatment. |
| RNA Extraction Kit (for polysaccharide-rich plants) | Qiagen RNeasy Plant Kit, or CTAB-based methods | Specialized for high-quality RNA isolation from S. miltiorrhiza, which is rich in secondary metabolites. |
| Fusarium oxysporum Specific Primers | Custom designed from ITS/EF-1α sequences | Allows precise quantification of fungal biomass in plant tissues during disease progression assays. |
Within the context of genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen), Hidden Markov Model (HMM) searches are a cornerstone methodology. This technical guide details two critical, often overlooked pitfalls: the inappropriate application of default E-value thresholds and the mis-annotation of genes with incomplete domain architectures. We provide a rigorous, protocol-driven framework to optimize HMM-based discovery for complex gene families in plant genomes, directly supporting downstream pharmaceutical research into plant immune system-derived compounds.
Salvia miltiorrhiza is a model medicinal plant, with its NBS-LRR genes of significant interest for understanding disease resistance and potential bioactivity. Genome-wide identification relies on HMM profiles (e.g., Pfam: NB-ARC, PF00931; LRR, PF00560, PF07723, PF07725). However, the high copy number, diversity, and fragmentation of these genes necessitate refined search strategies to avoid both false negatives (overly strict thresholds) and false positives (overly permissive thresholds or ignoring domain integrity).
The E-value cutoff is not universal. The standard default of 0.01 or 0.001 in tools like HMMER may exclude legitimate, divergent NBS-LRR members in S. miltiorrhiza.
Data from a representative S. miltiorrhiza genome scan using the NB-ARC HMM profile illustrates the trade-off.
Table 1: Hit Retrieval at Different E-value Cutoffs in a S. miltiorrhiza Genome Scan
| E-value Threshold | Number of Candidate Sequences | Estimated False Positives | Key Characteristics of Additional Hits at Lenient Thresholds |
|---|---|---|---|
| 1e-10 | 45 | < 0.01 | Canonical, full-length NBS-LRR genes. |
| 1e-03 | 62 | ~0.5 | Includes divergent but likely functional NBS domains. |
| 1e-01 | 89 | ~5-7 | Includes highly divergent sequences and partial pseudogenes. |
| 1.0 | 127 | ~30-40 | Many partial ORFs, non-specific matches. |
hmmsearch with the NB-ARC profile against the S. miltiorrhiza proteome using a permissive E-value (e.g., 10.0). Save full results.hmmscan against the full Pfam database to identify all domains present in each hit.Many genuine NBS-LRR genes, especially in draft genomes, may be fragmented due to sequencing/assembly gaps or may belong to naturally truncated subfamilies (e.g., TN-type genes).
tblastn using a curated set of S. miltiorrhiza NBS domains against the genome assembly to find regions missed in gene annotation.The following diagram outlines the decision process integrating E-value adjustment and domain structure analysis.
Title: HMM Search Workflow for NBS-LRR ID with E-value & Domain Checks
Table 2: Essential Reagents and Tools for NBS-LRR Identification in Plants
| Item | Function/Description | Example Product/Code |
|---|---|---|
| Curated HMM Profiles | Core search models for NBS and LRR domains. | Pfam NB-ARC (PF00931); Pfam LRR_1 (PF00560). |
| HMMER Software Suite | Primary tool for sensitive sequence searches using HMMs. | HMMER 3.3.2 (http://hmmer.org/). |
| High-Fidelity Polymerase | Accurate amplification of candidate genes for validation. | KAPA HiFi HotStart ReadyMix, Phusion. |
| cDNA Synthesis Kit | Generate template from induced plant tissue for RT-PCR. | SuperScript IV Reverse Transcriptase. |
| Domain Database | For comprehensive domain architecture analysis. | Pfam, CDD, InterProScan. |
| Synteny Analysis Tool | To distinguish gene fragmentation from real truncation. | MCScanX, JCVI utility library. |
| Multiple Aligner | For assessing homology and residue conservation. | MAFFT, Clustal Omega. |
| Plant Induction Agent | To upregulate NBS-LRR gene expression pre-RNA extraction. | Salicylic Acid (100 µM). |
Accurate genome-wide identification of the NBS-LRR family in Salvia miltiorrhiza requires moving beyond default parameters. A calibrated, iterative approach to E-value thresholds combined with rigorous validation of domain architecture is essential. This strategy minimizes annotation errors, providing a reliable foundation for subsequent functional characterization and exploration of this gene family's role in plant defense and medicinal compound biosynthesis.
In the context of genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), resolving ambiguous or fragmented gene models is a critical, non-trivial challenge. This medicinal plant's genome is highly repetitive and complex, leading to frequent mis-assemblies and incomplete gene structures, particularly for large, multi-domain resistance (R) genes like NBS-LRRs. Accurate gene model annotation is paramount for downstream evolutionary, expression, and functional studies aimed at elucidating the genetic basis of disease resistance and secondary metabolite production for drug development.
Ambiguities arise from inherent limitations in sequencing and assembly technologies and the biological nature of the genome itself. Key sources include:
Recent studies highlight the scale of the problem. The table below summarizes quantitative data from recent S. miltiorrhiza genome projects, illustrating how assembly and annotation strategies directly affect NBS-LRR catalog completeness.
Table 1: Impact of Assembly Strategy on NBS-LRR Gene Model Statistics in S. miltiorrhiza
| Study & Assembly Version | Assembly Technology | Annotation Method | Total Putative NBS-LRRs Identified | Percentage Fragmented/Partial Models | Key Limitation Noted |
|---|---|---|---|---|---|
| Xu et al., 2023 (v3.0) | PacBio CLR + Hi-C | MAKER2 (RNA-seq + homology) | 121 | ~18% | Fragmentation in telomeric clusters |
| Zhang et al., 2021 (v2.0) | Illumina + BioNano | BRAKER2 (RNA-seq) | 89 | ~35% | High fragmentation due to short-read gaps |
| Cui et al., 2020 (v1.0) | Illumina Only | Augustus (ab initio) | 63 | ~50% | Severe underrepresentation, most models partial |
A multi-evidence, iterative refinement pipeline is required. The following protocol is tailored for NBS-LRR gene discovery in complex plant genomes like S. miltiorrhiza.
Objective: Generate a comprehensive set of gene hints from diverse data sources.
Transcriptome Alignment:
transcript_hints.gff).Protein Homology Alignment:
protein_hints.gff).Synthetic Hint Generation with RGAugury:
rga_hints.gff).Objective: Integrate all evidence into a unified, high-confidence gene annotation.
First MAKER Run:
transcript_hints.gff, protein_hints.gff, rga_hints.gff) as evidence. Use SNAP and AUGUSTUS as ab initio predictors, trained on a related species if no S. miltiorrhiza training set exists.genome_v1.all.gff) will be an evidence-aware annotation.Predictor Training:
maker2zff and fathom (SNAP training); autoAug.pl (AUGUSTUS training).Second MAKER Run:
Objective: Resolve remaining ambiguities in NBS-LRR clusters.
Visual Inspection in Genome Browser:
Targeted PCR and Sequencing:
Title: NBS-LRR Gene Model Resolution Pipeline
Table 2: Essential Reagents and Tools for Resolving S. miltiorrhiza Gene Models
| Item | Category | Function & Rationale |
|---|---|---|
| PacBio HiFi Reads | Sequencing Reagent | Generate long (~15-20 kb), highly accurate reads to span repetitive NBS-LRR clusters, reducing assembly breaks. |
| Hi-C Sequencing Kit | Sequencing Reagent | Provides chromatin conformation data to scaffold contigs into chromosomes, placing fragmented genes in genomic context. |
| Strand-specific RNA-seq Library Prep Kit | Molecular Biology Reagent | Preserves strand information, crucial for accurate transcript boundary determination and gene orientation in clusters. |
| LongAmp Taq DNA Polymerase | Molecular Biology Reagent | Amplifies long (>5 kb) genomic fragments for PCR-based gap closure between fragmented gene model segments. |
| pGEM-T or Zero Blunt TOPO Cloning Kit | Molecular Biology Reagent | For cloning PCR products from ambiguous loci for validation via Sanger sequencing. |
| RGAugury Pipeline | Bioinformatics Tool | Specialized tool for in silico identification of R-genes; provides critical domain-based hints for annotation. |
| MAKER2 Annotation Pipeline | Bioinformatics Tool | Integrates diverse evidence (EST, protein, ab initio) into a consensus annotation, central to the iterative protocol. |
| Integrative Genomics Viewer (IGV) | Bioinformatics Tool | Enables visualization and manual curation of gene models against all supporting evidence at a genomic locus. |
The genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen) is a cornerstone for understanding its disease resistance mechanisms and improving medicinal yield. This process is complicated by the presence of non-functional homologs and pseudogenes, which can inflate gene counts and mislead functional predictions. Accurate discrimination is therefore critical for downstream experimental validation and the application of this research in breeding for pathogen resistance, directly impacting the consistency and quality of bioactive compounds (e.g., tanshinones, salvianolic acids) for pharmaceutical development.
NBS-LRR genes encode key plant immune receptors. Pseudogenes and non-functional homologs arise from disruptive mutations, truncations, or frameshifts but retain sequence similarity.
Table 1: Diagnostic Features for Classification
| Feature | True NBS-LRR Gene | Pseudogene/Non-Functional Homolog |
|---|---|---|
| Open Reading Frame (ORF) | Full-length, uninterrupted | Premature stop codons, frameshifts, large indels |
| Conserved Motifs | Intact NB-ARC (P-loop, RNBS-A-D, GLPL, MHD) and LRR motifs | Degenerate or missing core motifs (especially MHD) |
| Transcript Evidence | Supported by RNA-Seq/EST data | No expression evidence or aberrant splice variants |
| Selection Pressure | Signs of purifying selection (Ka/Ks < 1) | Neutral evolution or relaxed constraint (Ka/Ks ≈ 1) |
| Domain Architecture | Typical NBS-LRR structure (TIR/CC-NBS-LRR, RPW8-NBS-LRR, etc.) | Truncated or aberrant domain order |
Method: A multi-step computational workflow is implemented after initial HMMER/PFAM searches.
Protocol: To confirm computational predictions.
Method: Calculate non-synonymous (Ka) to synonymous (Ks) substitution rates.
Title: Computational Pipeline for NBS-LRR Gene Classification
Title: Ka/Ks Analysis for Functional Assessment
Table 2: Key Research Reagent Solutions
| Item | Function/Application in NBS-LRR Discrimination |
|---|---|
| High-Fidelity DNA Polymerase (e.g., Phusion) | Accurate amplification of candidate gene sequences from gDNA/cDNA for sequencing validation. |
| RNA Isolation Kit (Plant-specific) | Extraction of high-integrity total RNA from S. miltiorrhiza under stress/control conditions for expression analysis. |
| Reverse Transcription Kit with oligo(dT)/Random Primers | Synthesis of first-strand cDNA from RNA for RT-PCR and expression confirmation. |
| NBS-LRR HMM Profile Database (Pfam) | Hidden Markov Model profiles for identifying NBS and LRR domains in silico. |
| Codon-Aware Alignment Software (e.g., MACSE) | Aligns nucleotide sequences while respecting protein translation, critical for accurate Ka/Ks calculation. |
| S. miltiorrhiza Specific Primers | Oligonucleotides designed to amplify variable and conserved regions of NBS-LRR candidates. |
| Next-Generation Sequencing (NGS) Library Prep Kit | Prepares RNA-Seq or whole-genome sequencing libraries for expression and genomic variation analysis. |
| Agarose Gel Electrophoresis System | Separates and visualizes PCR products to check for expected amplicon sizes and purity. |
| Sanger Sequencing Reagents | Provides definitive sequence data to confirm ORF integrity, mutations, and splicing events. |
This whitepaper is a technical guide framed within a broader thesis on the genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen). The Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) family is a major class of plant disease resistance (R) genes. Characterizing these genes in S. miltiorrhiza is crucial for understanding its defense mechanisms and has implications for improving medicinal plant resilience and secondary metabolite production. However, the high sequence divergence among NBS-LRR genes, characterized by frequent insertions/deletions (indels), tandem repeats, and variable domains, poses a significant challenge for accurate multiple sequence alignment (MSA), which is foundational for phylogenetic analysis, domain prediction, and functional annotation. This guide details optimized strategies for aligning these divergent sequences.
The intrinsic properties of NBS-LRR genes that complicate MSA include:
The following integrated workflow improves alignment accuracy for divergent NBS-LRR sequences.
Diagram 1: Optimized MSA workflow for NBS-LRRs.
PfamScan or InterProScan with the following models to identify and extract core domains:
Align each domain subset using the most appropriate algorithm.
--localpair or --genafpair) or Clustal Omega. These handle conserved global alignments well.
--localpair strategy or DIALIGN-2/T-Coffee, which are better suited for local similarities and indel-rich regions.
nbarc_aligned.fa) as input profiles.Apply final polishing to the full-length alignment using an iterative refiner.
Performance metrics were evaluated on a benchmark set of 120 divergent NBS-LRR sequences from S. miltiorrhiza and related Lamiaceae species. Reference alignments were derived from structural superposition of known 3D NB-ARC domains.
Table 1: MSA Tool Performance on NBS-LRR Benchmark Set
| Tool | Algorithm Type | Avg. Q-Score (NB-ARC) | Avg. Column Score (LRR) | Computational Speed (s) | Suitability for NBS-LRR |
|---|---|---|---|---|---|
| MAFFT (L-INS-i) | Iterative, local | 0.89 | 0.72 | 45 | High for conserved domains |
| Clustal Omega | Progressive, global | 0.85 | 0.68 | 22 | Moderate for core NB-ARC |
| T-Coffee (PSI) | Consistency-based | 0.87 | 0.78 | 320 | High for variable LRRs |
| MUSCLE | Iterative, progressive | 0.83 | 0.65 | 18 | Low for divergent regions |
| DIALIGN-2 | Local segment-based | 0.80 | 0.76 | 290 | High for indel-rich regions |
| Optimized Workflow | Hybrid, stratified | 0.91 | 0.81 | 180 | Recommended Best Practice |
Q-Score: Fraction of correctly aligned residue pairs compared to reference. Column Score: Sum-of-pairs score for alignment columns.
Table 2: Essential Reagents & Tools for NBS-LRR MSA Analysis
| Item / Solution | Function / Purpose in NBS-LRR Research |
|---|---|
| InterProScan | Integrated protein domain & family annotation. Critical for pre-alignment domain delineation (TIR, NB-ARC, LRR). |
| MAFFT Software | Primary alignment engine for its flexibility (global/local strategies) and high accuracy on conserved domains. |
| T-Coffee/ Expresso | Provides consistency-based alignments and can use structural data; ideal for aligning variable LRR regions. |
| GUIDANCE2 Server | Calculates alignment confidence scores per column and residue; identifies unreliably aligned regions. |
| Pfam HMM Profiles (PF00931, PF13855, PF01582) | Hidden Markov Models used to definitively identify NBS-LRR domains in uncharacterized sequences. |
| MEGA-CC Software | User-friendly suite for performing iterative refinement, manual alignment editing, and downstream phylogenetic analysis. |
| Jalview | Interactive alignment visualization editor for manual curation, color-coding by conservation, and trimming. |
| Python/Biopython | For custom scripting of workflow automation, parsing large sequence sets, and batch processing. |
A robust MSA enables key downstream analyses within the genome identification thesis.
Diagram 2: Downstream applications of NBS-LRR MSA.
Accurate genome-wide identification and characterization of the NBS-LRR family in Salvia miltiorrhiza are contingent upon overcoming the hurdles of sequence divergence through optimized MSA. The stratified workflow—involving domain-wise curation, tool-specific alignment, profile merging, and iterative refinement—produces significantly more reliable alignments than any single-method approach. This robust MSA forms the critical foundation for trustworthy phylogenetic classification, evolutionary analysis, and the functional prediction of R genes, ultimately guiding targeted experimental validation in plant immunity research.
Within the context of a comprehensive thesis on the genome-wide identification of the NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) gene family in Salvia miltiorrhiza (Danshen), the transition from in silico prediction to experimental validation is a critical milestone. Computational pipelines can predict numerous candidate genes, but their physical existence, precise exon-intron boundaries, and sequence accuracy must be confirmed. This guide details the protocols for verifying predicted NBS-LRR genes using endpoint PCR and Sanger sequencing.
In silico identification relies on algorithms and reference data, which can introduce false positives due to sequence gaps, assembly errors, or overly sensitive domain prediction parameters. Validation ensures that the candidate genes are present in the S. miltiorrhiza genome and that their sequences are correct, forming a reliable foundation for downstream functional studies and drug development research focused on this medicinal plant's defense mechanisms.
The validation workflow begins with a curated list of in silico predicted NBS-LRR genes from the S. miltiorrhiza genome assembly. Key quantitative data from a typical verification study is summarized below.
Table 1: Summary of In Silico NBS-LRR Prediction and Validation Metrics
| Parameter | In Silico Prediction | PCR Validation | Sequencing Success Rate |
|---|---|---|---|
| Total Candidate Genes | 127 | N/A | N/A |
| Genes Selected for PCR | 30 | N/A | N/A |
| Primers Designed | 30 pairs | N/A | N/A |
| PCR Success (Clear Amplicon) | N/A | 27 genes | 90% |
| Sequence Perfect Match | N/A | N/A | 22 genes (81.5%) |
| Sequence with SNPs/Indels | N/A | N/A | 5 genes (18.5%) |
Table 2: Typical PCR Reaction Setup (25 µL)
| Component | Volume (µL) | Final Concentration |
|---|---|---|
| High-Fidelity PCR Master Mix (2X) | 12.5 | 1X |
| Forward Primer (10 µM) | 1.0 | 0.4 µM |
| Reverse Primer (10 µM) | 1.0 | 0.4 µM |
| Template Genomic DNA (50 ng/µL) | 1.0 | ~2 ng/µL |
| Nuclease-Free Water | 9.5 | N/A |
Title: PCR and Sanger Sequencing Validation Workflow
Title: NBS-LRR Domain Architecture for Primer Design
Table 3: Essential Reagents and Materials for Validation
| Item | Function/Description | Example Product/Note |
|---|---|---|
| High-Fidelity DNA Polymerase | Reduces PCR errors for accurate sequence amplification. Essential for cloning-ready products. | Phusion, KAPA HiFi |
| PCR Purification Kit | Removes primers, dNTPs, salts, and enzymes from PCR products prior to sequencing. | Qiagen QIAquick, Thermo GeneJET |
| Sanger Sequencing Kit | Dideoxy terminator-based cycle sequencing reaction. | BigDye Terminator v3.1 |
| Capillary Sequencer | Instrument for high-resolution separation and detection of fluorescently labeled sequencing fragments. | Applied Biosystems 3730xl |
| Sequence Assembly Software | Aligns forward/reverse reads, compares to reference, and identifies variants. | Geneious, SnapGene, BioEdit |
| S. miltiorrhiza Genomic DNA | High-quality, high-molecular-weight template DNA. Isolated via CTAB method, A260/280 ~1.8. | In-house preparation recommended |
| Domain Prediction Database | Used for initial in silico identification and to confirm conserved domains in sequenced amplicons. | Pfam, SMART, NCBI CDD |
This guide is framed within a doctoral thesis project aiming to conduct a genome-wide identification and functional characterization of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). NBS-LRR genes constitute one of the largest and most complex plant resistance (R) gene families. Efficient management, analysis, and visualization of hundreds of candidate genes are critical for elucidating their roles in stress response and secondary metabolism, with implications for improving medicinal compound production.
2.1. Hierarchical Data Organization A structured directory and naming convention is essential. Data should be categorized into raw data (genome assemblies, RNA-seq reads), processed data (BLAST outputs, HMM search results), analysis files (multiple sequence alignments, phylogenetic trees), and metadata (sample information, software versions).
2.2. Utilization of Relational Databases For large-scale gene families, moving beyond spreadsheets to a lightweight relational database (e.g., SQLite) enables complex queries and integration. A suggested table schema includes:
2.3. Quantitative Data Summary
Table 1: Typical NBS-LRR Identification Pipeline Output for a Plant Genome
| Analysis Step | Software/Tool | Key Parameters | S. miltiorrhiza (Example Output) | Purpose |
|---|---|---|---|---|
| Initial Identification | HMMER | HMM profile: PF00931 (NB-ARC) | ~350 candidate genes | Retrieve sequence candidates |
| Domain Validation | NCBI CD-Search | E-value < 0.01 | ~320 genes with full NB-ARC | Confirm domain integrity |
| Architecture Classification | MEME/MAST | Motif discovery | TNL (~55%), CNL (~40%), RNL (~5%) | Classify into subfamilies |
| Chromosomal Distribution | TBtools/MCScanX | -- | 12 gene clusters identified | Visualize synteny and clusters |
| Expression Analysis | HISAT2 + StringTie | >1 TPM in any sample | ~150 genes expressed | Filter for active genes |
3.1. Protocol for NBS-LRR Gene Identification Using HMMER
hmmsearch with inclusive thresholds:
bioawk can be used.rpsblast+ to remove fragments.3.2. Protocol for Phylogenetic & Motif Analysis
4.1. Phylogeny-Integrated Domain Architecture Plot Tools like TBtools-II or Evolview allow the generation of circular phylogenetic trees with adjacent heatmaps (expression) and bar charts (domain structure), enabling multi-dimensional comparison.
4.2. Chromosomal Distribution and Synteny Visualization Advanced synteny visualization can be achieved using tools like JCVI or MCScanX, plotted with ggplot2 or TBtools to show NBS-LRR gene clusters and whole-genome duplication events.
4.3. Signaling Pathway and Workflow Diagrams
Title: NBS-LRR Gene Identification and Analysis Workflow
Title: NBS-LRR Mediated Plant Defense Signaling Pathway
Table 2: Essential Research Reagent Solutions for NBS-LRR Studies
| Item | Function / Application | Example Product / Source |
|---|---|---|
| High-Fidelity DNA Polymerase | Accurate amplification of candidate gene CDS for cloning. | Phusion HF (Thermo), KAPA HiFi. |
| Gateway Cloning System | Efficient transfer of genes into multiple expression vectors. | pDONR/Zeo vectors, LR Clonase (Thermo). |
| Plant Expression Vectors | For transient/stable expression in tobacco or Arabidopsis. | pCAMBIA1300 (CaMV 35S promoter), pEAQ-HT. |
| Anti-GFP Antibody | Detection of GFP-tagged NBS-LRR protein localization & abundance. | Anti-GFP, HRP (Abcam, #ab290). |
| DAB Staining Kit | Histochemical detection of hydrogen peroxide in HR response. | 3,3'-Diaminobenzidine (Sigma-Aldrich). |
| qPCR Master Mix (SYBR Green) | Validation of gene expression patterns from RNA-seq data. | PowerUp SYBR Green (Thermo). |
| Rapid DNA Ladder | Accurate sizing of PCR products during gene screening. | 1 kb Plus DNA Ladder (NEB). |
| Plant Total RNA Extraction Kit | High-quality RNA for RT-qPCR and RNA-seq library prep. | Plant RNeasy Kit (Qiagen). |
This whitepaper details a core experimental chapter within a broader thesis focused on the genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorzhiza (Danshen). Following in silico identification and phylogenetic characterization, functional validation of candidate gene expression under stress is paramount. Quantitative reverse transcription polymerase chain reaction (qRT-PCR) is the gold standard for precise, sensitive quantification of transcript abundance. This guide provides a rigorous technical framework for employing qRT-PCR to validate the induction of key NBS-LRR genes in response to biotic and abiotic stress, linking genomic data to potential functional roles in Danshen's defense mechanisms.
Candidate NBS-LRR genes are selected based on phylogenetic clade association with known resistance (R) genes, promoter cis-element analysis revealing stress-responsive motifs, and preliminary RNA-Seq data. Plants are subjected to controlled stress treatments.
Table 1: Standardized Stress Treatment Protocols for S. miltiorrhiza Seedlings
| Stress Type | Specific Treatment | Duration | Sample Time Points (Post-treatment) | Purpose |
|---|---|---|---|---|
| Biotic Elicitor | Foliar spray with 100 µM Methyl Jasmonate (MeJA) | Single application | 3 h, 6 h, 12 h, 24 h, 48 h | Simulate pathogen attack, activate JA-mediated defense pathways. |
| Biotic Elicitor | Root drench with 1 mM Salicylic Acid (SA) | Single application | 3 h, 6 h, 12 h, 24 h, 48 h | Activate SA-mediated systemic acquired resistance (SAR) pathways. |
| Fungal Pathogen | Fusarium solani spore suspension (1×10⁶ spores/mL) root inoculation | Continuous | 1 d, 3 d, 5 d, 7 d | Direct biotic stress interaction. |
| Abiotic Stress | Drought stress (Withholding water) | Until soil moisture drops to 30% field capacity | 1 d, 3 d, 5 d, 7 d | Induce osmotic and general abiotic stress response. |
| Abiotic Stress | Cold stress (4°C) | Continuous | 3 h, 6 h, 12 h, 24 h | Induce cold-responsive signaling. |
| Control | Mock treatment (Water or solvent) | -- | Matches all treatment time points | Baseline expression reference. |
Table 2: Example qRT-PCR Primer Sequences for Candidate S. miltiorrhiza NBS-LRR Genes
| Gene ID (Hypothetical) | Primer Sequence (5'→3') | Amplicon Size (bp) | Efficiency (%) | R² |
|---|---|---|---|---|
| SmNBS-LRR05 | F: CGTCAAGAGCCTCAACAACCR: TGGATGCTGTGATGTTGAGG | 152 | 98.5 | 0.998 |
| SmNBS-LRR12 | F: AAGCCTGGTGTTGCTGTTGTR: CACCAACCCAACATCACCAT | 118 | 102.1 | 0.996 |
| SmNBS-LRR23 | F: GGAGGCTATGCTGGATTGACR: CCTTGATGCCACTTTTGGAG | 145 | 95.7 | 0.999 |
| Reference: SmActin | F: GTGTTGGATTCTGGTGATGGTGTGR: TGGCATACAGGTCCTTCCTGATAT | 187 | 99.3 | 0.997 |
Table 3: Example qRT-PCR Fold Change Data for SmNBS-LRR05 under MeJA Stress
| Time Post-Treatment | Mean ΔCq (Treatment) | Mean ΔCq (Control) | ΔΔCq | Fold Change (2^-ΔΔCq) | Significance (p-value) |
|---|---|---|---|---|---|
| 3 h | 5.2 ± 0.15 | 7.8 ± 0.12 | -2.6 | 6.1 | <0.01 |
| 6 h | 4.7 ± 0.18 | 7.9 ± 0.10 | -3.2 | 9.2 | <0.001 |
| 12 h | 6.1 ± 0.22 | 8.0 ± 0.15 | -1.9 | 3.7 | <0.05 |
| 24 h | 6.8 ± 0.19 | 7.9 ± 0.11 | -1.1 | 2.1 | 0.12 |
The validated expression of specific NBS-LRR genes can be integrated into known plant immune signaling models. Below is a simplified pathway illustrating potential roles for S. miltiorrhiza NBS-LRRs.
Diagram Title: NBS-LRR Gene Roles in Plant Immune Signaling Pathways.
Table 4: Essential Materials for NBS-LRR qRT-PCR Validation
| Item / Reagent | Function & Rationale |
|---|---|
| RNAprep Pure Plant Plus Kit (Polysaccharides & Polyphenolics-rich) | Specifically formulated for plants like S. miltiorrhiza that contain high levels of secondary metabolites which inhibit downstream reactions. |
| DNase I, RNase-free | Critical for complete genomic DNA removal to prevent false-positive amplification in qPCR. |
| RevertAid H Minus Reverse Transcriptase | Lacks RNase H activity, allowing for higher yields of full-length cDNA, ideal for long transcripts. |
| SYBR Green Master Mix (e.g., PowerUp SYBR) | Provides all components for robust, sensitive qPCR with standardized conditions. Includes ROX passive reference dye for plate normalization. |
| Nuclease-Free Water | Essential for all molecular biology reactions to prevent RNase/DNase contamination. |
| Validated Endogenous Control Primers (e.g., SmActin, SmUBQ) | Stable reference genes for S. miltiorrhiza under the studied stress conditions are mandatory for accurate ΔΔCq analysis. |
| White 96-Well Optical Reaction Plates & Seals | Ensure optimal fluorescence detection and prevent evaporation during cycling. |
Integrating precise qRT-PCR expression validation with genome-wide identification studies provides a powerful approach to transition from in silico predictions to functionally characterized candidate genes. This protocol, framed within Danshen research, establishes a reproducible method to confirm the stress-responsive nature of key NBS-LRR genes, offering critical insights for subsequent functional studies and potential applications in enhancing plant resilience or identifying novel defense-related metabolites for drug development.
This whitepaper details a core evolutionary analysis chapter within a broader thesis project focused on the genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in the medicinal plant Salvia miltiorrhiza (Danshen). The NBS-LRR family constitutes the largest class of plant disease resistance (R) genes. By performing a comparative phylogenetic analysis with well-characterized NBS-LRR families from model plants (Arabidopsis thaliana, Oryza sativa) and other members of the Lamiaceae family, we can infer evolutionary patterns, classify S. miltiorrhiza NBS-LRRs, and predict their potential function in disease resistance and phytochemical biosynthesis, which is critical for pharmaceutical quality.
2.1 Data Acquisition and Sequence Identification
hmmsearch from HMMER v3.3.2 suite against all protein sequences of each species: hmmsearch --domtblout output.txt -E 1e-5 PF00931.hmm proteome.fasta.2.2 Multiple Sequence Alignment and Phylogenetic Tree Construction
mafft --globalpair --maxiterate 1000 input.fa > aligned.fa.-automated1 option: trimal -in aligned.fa -out trimmed.fa -automated1.iqtree2 -s trimmed.fa -m MFP -B 1000 -alrt 1000 -T AUTO.-m MFP enables ModelFinder Plus to select the best-fit substitution model.-B 1000 specifies 1000 ultrafast bootstrap replicates.ggtree.2.3 Evolutionary Analysis
Table 1: Genome-Wide Identification of NBS-LRR Genes in Target Species
| Species | Family | Total NBS-LRRs | TNL Subclass | CNL/RNL* Subclass | Others | Reference Genome Version |
|---|---|---|---|---|---|---|
| Arabidopsis thaliana | Brassicaceae | 167 | 82 | 85 (CNL) | 0 | TAIR10 |
| Oryza sativa (Rice) | Poaceae | 535 | 2 | 528 (CNL) | 5 | MSU v7.0 |
| Salvia miltiorrhiza | Lamiaceae | 121 | 45 | 76 (CNL) | 0 | v2.0 |
| Mentha longifolia | Lamiaceae | 98 | 32 | 66 (CNL) | 0 | ML_v1.0 |
| Scutellaria baicalensis | Lamiaceae | 113 | 41 | 72 (CNL) | 0 | ASM2071116v1 |
Note: RNL (RPW8-NB-LRR) is a specific subclass often grouped with CNLs for simplicity.
Table 2: Evolutionary Selection Pressure on NBS-LRR Genes (Ka/Ks Analysis)
| Species Comparison (Orthologous Pairs) | Average Ka/Ks Ratio | Proportion of Pairs with Ka/Ks > 1 | Implied Selection Pressure |
|---|---|---|---|
| A. thaliana vs. S. miltiorrhiza | 0.28 | 2.1% | Strong Purifying Selection |
| O. sativa vs. S. miltiorrhiza | 0.42 | 1.5% | Purifying Selection |
| Within S. miltiorrhiza (Tandem Duplicates) | 0.65 | 8.7% | Relaxed Purifying / Mild Positive |
| Within Lamiaceae (S. miltiorrhiza vs. M. longifolia) | 0.31 | 3.3% | Strong Purifying Selection |
Table 3: Essential Materials for NBS-LRR Comparative Phylogenetic Analysis
| Item / Reagent | Function / Application in this Study | Example Vendor/Software |
|---|---|---|
| HMMER Suite | Profile HMM-based search for identifying NBS-LRR protein sequences from proteomes. | http://hmmer.org/ |
| Pfam HMM Profiles | Curated seed alignments and HMMs for NBS (NB-ARC, PF00931) and TIR (PF01582) domains. | https://pfam.xfam.org/ |
| MAFFT | High-accuracy multiple sequence alignment tool for conserved protein domains. | https://mafft.cbrc.jp/ |
| IQ-TREE | Efficient software for maximum likelihood phylogenetic inference and model testing. | http://www.iqtree.org/ |
| PAML (CodeML) | Package for molecular evolution analysis, including Ka/Ks calculation (yn00). | http://abacus.gene.ucl.ac.uk/software/paml.html |
| FigTree / ggtree | Software/R package for visualizing, annotating, and exporting phylogenetic trees. | http://tree.bio.ed.ac.uk/; Bioconductor |
| Genome Databases | Sources for reference genome sequences and annotations (TAIR, RGAP, NCBI, Phytozome). | Public Repositories |
| High-Performance Computing (HPC) Cluster | Essential for running computationally intensive steps (HMM search, alignment, bootstrapping). | Institutional Resource |
This technical guide details the application of synteny and collinearity analysis for identifying conserved genomic architectures and lineage-specific rearrangements surrounding Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) genes in Salvia miltiorrhiza (Danshen). The work is framed within a broader thesis project aiming to perform a genome-wide identification and characterization of the NBS-LRR gene family—a primary class of plant disease Resistance (R) genes—in this medicinal plant. Understanding the genomic organization of these genes is crucial for elucidating plant defense mechanisms and can inform breeding strategies for disease resistance, ultimately impacting the sustainable production of its valuable bioactive compounds (e.g., tanshinones, salvianolic acids) for pharmaceutical development.
For R-gene discovery, these analyses reveal:
Objective: Create a comprehensive catalog of NBS-LRR genes as anchor points for synteny analysis. Protocol:
hmmsearch (HMMER v3.3) with an E-value cutoff of 1e-5.Objective: Identify macro-syntenic blocks between S. miltiorrhiza and related species. Protocol (Using JCVI/MCScanX Toolkit):
python -m jcvi.compara.catalog ortholog to identify orthologous gene pairs and syntenic blocks. Key parameters: --cscore=.99 (confidence score), --iter=1 for simple chaining.jcvi.graphics.karyotype to generate synteny maps. Collinearity is inferred from the uninterrupted alignment of homologous genes in the same order.Objective: Examine fine-scale conservation and rearrangement around individual R-gene loci. Protocol:
slop and getfasta).ggplot2 and gggenes.nucmer, show-coords) or read-depth analysis.Table 1: Summary of NBS-LRR Genes and Syntenic Conservation in Salvia miltiorrhiza vs. Related Species
| Species | Total NBS-LRR Genes Identified | Genes in Syntenic Blocks | Species-Specific Genes (Non-syntenic) | Predominant NBS-LRR Type | Key Syntenic Block Size Range (Mb) |
|---|---|---|---|---|---|
| Salvia miltiorrhiza (Focal) | ~150* | ~95 | ~55 | CNL | N/A |
| Salvia splendens | ~130* | ~88 | ~42 | CNL | 0.8 - 4.2 |
| Mentha longifolia | ~120* | ~72 | ~48 | CNL/TNL | 0.5 - 3.1 |
| Arabidopsis thaliana (Outgroup) | ~165 | ~45 | ~120 | TNL | 0.3 - 1.7 |
*Estimated numbers based on recent analyses; final counts require full curation.
Table 2: Example of a High-Resolution Microsynteny Analysis of a Conserved R-Gene Locus
| Genomic Feature | S. miltiorrhiza Chr4 (32.15-32.45 Mb) | S. splendens Syntenic Region | M. longifolia Syntenic Region | A. thaliana Syntenic Region (Col-0) | Conservation Notes |
|---|---|---|---|---|---|
| Anchor NBS-LRR Gene | Smi-NL34 (CNL) | Ssp-NL29 (CNL) | Mlo-NL21 (CNL) | AT4G19520 (TNL) | NB-ARC domain >85% identity |
| Flanking Gene 1 (5') | Serine/Threonine Kinase | Serine/Threonine Kinase | LRR-RLK | PPR protein | Collinearity break between mint and others |
| Flanking Gene 2 (3') | ABC Transporter | ABC Transporter | ABC Transporter | ABC Transporter | Perfect collinearity |
| Gene Orientation | -> Smi-NL34 -> | -> Ssp-NL34 -> | <- Mlo-NL21 -> | -> AT4G19520 -> | Inversion in mint lineage |
| Structural Variant | None | 5 kb insertion (TE) | 20 kb deletion | N/A | TE insertion specific to S. splendens |
Diagram 1: Synteny Analysis Workflow for R-Genes
Diagram 2: Microsynteny Conservation & Breakpoints
Table 3: Key Reagents and Tools for Synteny-Based R-Gene Analysis
| Item | Function / Purpose in Analysis | Example/Note |
|---|---|---|
| High-Quality Genome Assemblies | Reference for gene identification and synteny anchor. Chromosome-level, haplotype-resolved assemblies are ideal. | S. miltiorrhiza Smil v2.0, S. splendens v1.0 (from public databases like NCBI, Phytozome). |
| Curated Protein Family HMMs | Sensitive detection of NBS and LRR protein domains across diverse plant lineages. | Pfam profiles (PF00931, PF00560, etc.); Plant Immune Receptor Repository (PIRR) custom HMMs. |
| Orthology Detection Software | Distinguishes true orthologs (for synteny) from paralogs. | OrthoFinder, InParanoid, JCVI pipeline. |
| Synteny & Collinearity Tools | Identifies and visualizes conserved genomic blocks. | JCVI/MCScanX, GENESPACE, SynVisio, D-GENIES. |
| Structural Variant Callers | Detects insertions, deletions, inversions at synteny breakpoints. | MUMmer, DELLY, Sniffles (for long-read data). |
| Functional Annotation Databases | Annotates genes within syntenic blocks to infer potential functional conservation. | InterPro, eggNOG, KEGG, Gene Ontology (GO). |
| Visualization Libraries | Creates publication-quality synteny and microsynteny plots. | R: ggplot2, gggenes, karyoploteR; Python: matplotlib, pyGenomeViz. |
Within the broader thesis of genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), this whitepaper addresses the dynamic evolutionary processes that shape its disease resistance gene repertoire. NBS-LRR genes are the largest class of plant disease resistance (R) genes, encoding nucleotide-binding site and leucine-rich repeat proteins that detect pathogen effectors and initiate immune signaling. Understanding the expansion and contraction dynamics of this repertoire in S. miltiorrhiza is critical for elucidating its adaptive immune capacity, with implications for cultivating disease-resistant varieties to secure the supply of its valuable bioactive compounds (e.g., tanshinones, salvianolic acids) for pharmaceutical use.
Recent genome assemblies and re-annotations have enabled precise identification of NBS-LRR genes in S. miltiorrhiza. The following table summarizes key quantitative data from current analyses.
Table 1: NBS-LRR Repertoire in Salvia miltiorrhiza and Comparative Species
| Species | Total NBS-LRR Genes | TNL Subfamily | CNL Subfamily | RNL Subfamily | Other/Unknown | Reference Genome/Version |
|---|---|---|---|---|---|---|
| Salvia miltiorrhiza | ~120-150 | ~40-50 | ~60-80 | ~8-12 | ~10-15 | Genome assembly Smil v2.0 / Danseq v1.0 |
| Arabidopsis thaliana | ~165 | ~55 | ~50 | ~60 | - | TAIR10 |
| Solanum lycopersicum | ~355 | ~15 | ~330 | ~10 | - | SL4.0 |
| Oryza sativa | ~480 | ~1 | ~470 | ~9 | - | IRGSP-1.0 |
Key Findings on Dynamics:
Protocol 1: Genome-Wide Identification of NBS-LRR Genes
hmmsearch --domtblout output.txt profile.hmm proteome.faa).Protocol 2: Phylogenetic and Evolutionary Dynamics Analysis
yn00 or KaKs_Calculator.Diagram 1: NBS-LRR Identification & Evolutionary Analysis Pipeline
Diagram 2: NBS-LRR Gene Birth-and-Death Evolution Model
Table 2: Essential Reagents and Resources for SmNBS-LRR Research
| Item Name / Category | Function / Application | Example Product/Source |
|---|---|---|
| High-Quality Genomic DNA Kit | Extraction of ultra-pure, high-molecular-weight DNA for genome sequencing and PCR. | DNeasy Plant Pro Kit (Qiagen), CTAB method reagents. |
| Plant RNA Preservation & Extraction Kit | Stabilization and isolation of intact RNA for expression (qRT-PCR) and transcriptome analysis. | RNAlater, RNeasy Plant Mini Kit (Qiagen). |
| HMMER Software Suite | Bioinformatics tool for identifying protein domains using hidden Markov models. | http://hmmer.org/ |
| InterProScan / NCBI CDD | Integrated database for protein domain, family, and functional site prediction. | https://www.ebi.ac.uk/interpro/, https://www.ncbi.nlm.nih.gov/cdd/ |
| Phylogeny Analysis Tools | Software for constructing and visualizing evolutionary trees. | IQ-TREE, MEGA, FigTree. |
| Synteny Analysis Tool | Identification and visualization of conserved gene blocks across genomes. | MCScanX, TBtools. |
| Positive Control NBS-LRR cDNA | Cloned, sequenced SmNBS-LRR gene for qRT-PCR assay validation. | Clone from S. miltiorrhiza cultivar cDNA or obtain from a repository. |
| Pathogen/Elicitor Preparations | To challenge plants and study NBS-LRR gene induction and function. | Fusarium spp. spores, yeast elicitor (chitin), salicylic acid. |
| Agroinfiltration Kit (for N. benthamiana) | For transient overexpression or silencing (VIGS) to assay gene function. | Agrobacterium tumefaciens GV3101, syringe infiltrators. |
This guide explores the strategic application of ortholog-based functional prediction to prioritize and design targeted functional studies, framed within a doctoral thesis focused on the genome-wide identification and characterization of the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family in the medicinal plant Salvia miltiorrhiza (Danshen). As the cornerstone of plant innate immunity, NBS-LRR genes are prime targets for understanding defense mechanisms and enhancing medicinal compound production. This whitepaper provides a technical framework for leveraging established model plant data to accelerate functional discovery in non-model species like S. miltiorrhiza.
Functional prediction relies on the principle that orthologs—genes diverged after a speciation event—are more likely to retain conserved ancestral function than paralogs. For S. miltiorrhiza NBS-LRRs, key model plant orthologs are identified in Arabidopsis thaliana, Nicotiana benthamiana, and Solanum lycopersicum, which have extensive, experimentally validated immune gene databases.
| Tool/Database | Primary Use | Relevance to NBS-LRR Study |
|---|---|---|
| OrthoFinder | Clusters orthologous groups across species | Identifies NBS-LRR gene families shared between S. miltiorrhiza and models |
| Ensembl Plants | Genomic comparative platform | Retrieves pre-computed orthologs/paralogs for candidate genes |
| PLAZA | Integrative plant comparative genomics platform | Analyzes phylogenetic distribution and functional annotations |
| PhytoMine | Interrogates multiple plant genomes | Extracts gene ontology (GO) terms for ortholog clusters |
| BLASTP/DIAMOND | Sequence similarity search | Initial identification of putative NBS-LRR orthologs |
Data derived from a genome-wide scan of the S. miltiorrhiza genome (v2.0) compared to reference models.
Table 1: NBS-LRR Gene Count and Ortholog Distribution
| Species | Total NBS-LRR Genes | Genes with Ortholog in A. thaliana | Genes with Ortholog in N. benthamiana | Genes with Ortholog in S. lycopersicum |
|---|---|---|---|---|
| Salvia miltiorrhiza | 121 | 67 (55.4%) | 89 (73.6%) | 82 (67.8%) |
| Arabidopsis thaliana | 165 | — | 102 (61.8%) | 98 (59.4%) |
| Nicotiana benthamiana | 450 | 102 (22.7%) | — | 320 (71.1%) |
| Solanum lycopersicum | 355 | 98 (27.6%) | 320 (90.1%) | — |
Table 2: Enriched Functional Terms Among Conserved Ortholog Groups
| Ortholog Cluster ID | Representative S. miltiorrhiza Gene | Conserved GO Term (Biological Process) | Model Plant Ortholog (Gene ID) | Known Function in Model |
|---|---|---|---|---|
| NBSOC07 | SmiNBS017 | GO:0009617 – response to bacterium | AT4G19050 (RPS2) | Recognizes Pseudomonas AvrRpt2 |
| NBSOC12 | SmiNBS042 | GO:0009620 – response to fungus | AT4G26090 (RPP13) | Recognizes downy mildew effector |
| NBSOC25 | SmiNBS088 | GO:0006952 – defense response | NbTab2-like (Niben101Scf09881) | Mediates cell death signaling |
This protocol outlines a transient expression assay to validate the predicted function of a candidate S. miltiorrhiza NBS-LRR gene (SmiNBS017) based on its orthology to A. thaliana RPS2.
| Item | Function in NBS-LRR Study |
|---|---|
| pCAMBIA1302-GFP Vector | A plant binary vector for constitutive expression (CaMV 35S promoter) and C-terminal GFP fusion, enabling protein localization and tracking. |
| Agrobacterium strain GV3101 | A disarmed strain optimized for high-efficiency transient transformation in Nicotiana (agroinfiltration). |
| Acetosyringone | A phenolic compound that induces the Agrobacterium Vir genes, essential for T-DNA transfer during agroinfiltration. |
| Trypan Blue Stain | A vital dye that selectively stains dead plant cells, providing visual confirmation of the hypersensitive response (HR). |
| Gateway Cloning System | A rapid, high-throughput recombination-based system for transferring NBS-LRR ORFs into multiple expression vectors. |
| Phusion High-Fidelity DNA Polymerase | Used for error-free PCR amplification of full-length NBS-LRR coding sequences, which are often large and GC-rich. |
| Anti-GFP Antibody (HRP-conjugated) | Allows for immunoblot analysis to confirm expression levels of GFP-tagged NBS-LRR proteins in plant tissues. |
Integrating ortholog-based predictions provides a powerful, data-driven strategy to navigate the functional complexity of the S. miltiorrhiza NBS-LRR family. This approach directly informs the targeted selection of candidate genes for experimental validation in the thesis research, moving beyond descriptive genomics to hypothesis-driven functional characterization. The conserved immune functions predicted and validated through these methods can ultimately be linked to variations in disease resistance and metabolic profiles in Danshen, bridging plant immunity and medicinal chemistry for drug development professionals.
Comparative Assessment of NBS-LRR Family Size and Diversity Across Medicinal Plants
This analysis is framed within a broader doctoral thesis focused on the genome-wide identification and characterization of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in the medicinal plant Salvia miltiorzhiza (Danshen). The thesis posits that the expansion and diversification of NBS-LRR genes are correlated with the ecological adaptability and biotic stress resilience of medicinal plants, influencing their metabolic vigor. This guide provides a comparative framework to assess this gene family across key medicinal species, contextualizing S. miltiorrhiza findings within a wider phylogenetic landscape.
NBS-LRR genes are the largest class of plant disease resistance (R) genes. They encode intracellular immune receptors that recognize pathogen effectors and trigger a robust defense response, often culminating in the hypersensitive response (HR). They are categorized into two major subfamilies based on N-terminal domains: TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL).
A live search of recent genome databases and literature (2023-2024) reveals significant variation in NBS-LRR family size across medicinal plants. The data is summarized in Table 1.
Table 1: NBS-LRR Gene Family Size Across Selected Medicinal Plants
| Plant Species | Common Name | Approx. NBS-LRR Count | TNL:CNL Ratio | Genome Size (Gb) | Key Reference (Recent) |
|---|---|---|---|---|---|
| Salvia miltiorrhiza | Danshen | ~120 | 1:3.5 | ~0.64 | Zhang et al., 2023 |
| Panax ginseng | Ginseng | ~450 | 1:1.8 | ~3.5 | Chen et al., 2022 |
| Artemisia annua | Sweet Wormwood | ~85 | 1:4 | ~1.8 | Wang et al., 2023 |
| Camellia sinensis | Tea Plant | ~180 | 1:2.2 | ~3.0 | Liu et al., 2024 |
| Catharanthus roseus | Madagascar Periwinkle | ~70 | 1:5 | ~0.5 | Zhou et al., 2023 |
| Glycyrrhiza uralensis | Licorice | ~150 | 1:2.5 | ~0.38 | Li et al., 2023 |
Key Insight: Polyploid species (P. ginseng) show massive family expansion. S. miltiorrhiza exhibits a moderate family size but a notable bias towards the CNL subfamily, suggesting specific evolutionary paths in its immune system.
The following integrated pipeline is standard for NBS-LRR identification, as applied in the foundational S. miltiorrhiza thesis research.
4.1. Data Acquisition and Pre-processing
BLAST+ (v2.13+) and HMMER (v3.3.2).4.2. Homology-Based Identification
getorf (EMBOSS).4.3. HMM Domain Scanning
hmmsearch) against the predicted protein dataset with the NB-ARC domain (E-value < 1e-5). Retain hits.Pfam Scan or InterProScan.4.4. Phylogenetic and Motif Analysis
MAFFT or ClustalW.MEGA11 (Neighbor-Joining/Maximum Likelihood, bootstrap 1000).MEME suite (motif analysis) or COILS.4.5. Chromosomal Localization and Duplication Analysis
TBtools.Diagram 1: NBS-LRR Mediated Immune Pathway (76 chars)
Diagram 2: Genome-Wide NBS-LRR Identification Pipeline (78 chars)
Table 2: Key Reagent Solutions for NBS-LRR Functional Studies
| Reagent/Material | Function/Application | Example/Notes |
|---|---|---|
| Phusion High-Fidelity DNA Polymerase | Amplification of full-length NBS-LRR genes for cloning. | Essential for error-free PCR of large, GC-rich sequences. |
| Gateway or Golden Gate Cloning System | Modular construction of expression vectors for functional assays. | Enables high-throughput subcloning of multiple gene variants. |
| Agrobacterium tumefaciens Strain GV3101 | Transient expression in Nicotiana benthamiana (effectoromics). | Used for hypersensitive response (HR) assays via agroinfiltration. |
| Luciferase (LUC) Reporter Constructs | Quantification of immune signaling output (e.g., downstream PR gene activation). | Firefly luciferase under control of a pathogen-responsive promoter. |
| Anti-GFP/HA/Flag Antibodies | Detection of tagged NBS-LRR protein expression, localization, and complex formation. | For Western blot, co-immunoprecipitation (Co-IP), and microscopy. |
| Programmed Cell Death (PCD) Assay Kits | Quantitative measurement of HR-induced cell death. | Includes electrolytes leakage (conductivity) or Evans Blue staining kits. |
| CRISPR/Cas9 Gene Editing System | Generation of NBS-LRR knockout mutants to confirm in planta function. | Requires specific sgRNA design tools and plant transformation expertise. |
This genome-wide identification and analysis of the NBS-LRR gene family in Salvia miltiorrhiza provides a pivotal resource for the research community. We have established a foundational catalog of putative disease resistance genes, detailed robust methodological frameworks for their study, addressed key analytical challenges, and positioned these findings within a broader evolutionary context through comparative genomics. The integration of expression data suggests a potential intersection between plant defense signaling and the regulation of valuable secondary metabolite pathways, opening a promising avenue for research. Future directions should prioritize functional validation through techniques like VIGS or CRISPR-Cas9, deeper investigation into the signaling networks linking immunity and metabolism, and the application of this knowledge in developing elite, disease-resistant S. miltiorrhiza cultivars. Ultimately, this work not only advances our understanding of plant immunity in a key medicinal species but also contributes strategically to ensuring the sustainable and high-quality production of its clinically important bioactive compounds for biomedical applications.