Genome-Wide Identification and Functional Analysis of NBS-LRR Genes in Salvia miltiorrhiza: Implications for Disease Resistance and Bioactive Compound Production

Penelope Butler Feb 02, 2026 385

This comprehensive study provides a detailed genome-wide analysis of the NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) gene family in the medicinal plant Salvia miltiorrhiza (Danshen).

Genome-Wide Identification and Functional Analysis of NBS-LRR Genes in Salvia miltiorrhiza: Implications for Disease Resistance and Bioactive Compound Production

Abstract

This comprehensive study provides a detailed genome-wide analysis of the NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) gene family in the medicinal plant Salvia miltiorrhiza (Danshen). Utilizing the latest genomic resources and bioinformatic methodologies, we systematically identified, characterized, and classified NBS-LRR genes, exploring their chromosomal distribution, gene structures, conserved motifs, and evolutionary relationships. The research further investigates the expression patterns of these resistance genes under biotic stress and their potential link to the biosynthesis of valuable secondary metabolites like tanshinones and salvianolic acids. We present robust protocols for gene family analysis, address common troubleshooting scenarios, and offer comparative insights with model plants. This work establishes a crucial foundation for understanding disease resistance mechanisms in S. miltiorrhiza and offers strategic targets for molecular breeding to enhance both plant resilience and medicinal yield, with significant implications for pharmaceutical research and sustainable drug development.

Discovering the Defense Arsenal: A Comprehensive Guide to NBS-LRR Genes in Salvia miltiorrhiza

Salvia miltiorrhiza Bunge (Danshen) is a perennial herb of the Lamiaceae family, renowned as a cornerstone of Traditional Chinese Medicine (TCM) for treating cardiovascular and cerebrovascular diseases. Its significance extends beyond traditional use, establishing it as a model medicinal plant for modern pharmacological and genomic research. This status is largely due to its biosynthesis of two major classes of bioactive compounds: the lipophilic diterpenoid tanshinones (e.g., tanshinone IIA, cryptotanshinone) and the hydrophilic phenolic acids (e.g., salvianolic acid B). These compounds exhibit well-documented antioxidant, anti-inflammatory, anti-fibrotic, and anti-tumor activities.

Within the context of genome-wide studies, particularly on the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family, S. miltiorrhiza serves as a critical system. The NBS-LRR genes are the largest class of plant disease resistance (R) genes. Their identification and characterization in S. miltiorrhiza are essential for understanding the plant's innate immune system, which directly impacts yield, quality, and sustainable cultivation by conferring resistance to pathogens like root rot (caused by Fusarium spp.). Cultivation challenges, including pathogen susceptibility, soil quality demands, and genotype-dependent metabolite variation, underscore the necessity of such genetic research for breeding resilient, high-quality cultivars.

NBS-LRR Gene Family inSalvia miltiorrhiza: Genome-Wide Identification Workflow

A standard bioinformatics pipeline for genome-wide identification and analysis of the NBS-LRR gene family involves several key steps.

Experimental Protocol: Genome-Wide Identification of NBS-LRR Genes

Data Acquisition:
- Source: Download the latest S. miltiorrhiza genome assembly (e.g., Sm_v2.0 from NCBI or other plant genome databases) and its corresponding annotation file (GFF3/GTF format).
HMMER Search:
- Method: Use HMMER 3.3.2 software with the Pfam hidden Markov model (HMM) profiles for the NBS domain (PF00931, NB-ARC) and TIR domain (PF01582) or CC domain (as predicted by software like NCBI CDD or MEME).
- Command: hmmsearch --domtblout output_file.hmm PF00931.hmm S_miltiorrhiza.proteome.fa
- Threshold: Use an E-value cutoff of ≤ 1e-5 to ensure significant matches.
BLASTP Validation:
- Method: Perform a local BLASTP search using known NBS-LRR protein sequences from Arabidopsis thaliana or Oryza sativa as queries against the S. miltiorrhiza proteome.
- Parameters: E-value ≤ 1e-5, identity ≥ 30%.
Domain Architecture Analysis:
- Method: Screen candidate sequences using SMART, NCBI CDD, or InterProScan to confirm the presence of canonical NBS, TIR, and LRR domains. Classify genes into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and NBS-LRR (NL) subfamilies.
Gene Structure and Motif Analysis:
- Gene Structure: Visualize exon-intron arrangements using TBtools or GSDS 2.0 based on the genome annotation.
- Motif Discovery: Identify conserved motifs using the MEME Suite (e.g., 10 motifs expected).
Phylogenetic Analysis:
- Method: Perform multiple sequence alignment of NBS-LRR proteins using ClustalW or MAFFT. Construct a phylogenetic tree with MEGA 11 (Neighbor-Joining or Maximum Likelihood method, 1000 bootstrap replicates).
Chromosomal Localization & Synteny:
- Method: Map gene locations using the GFF3 file with TBtools. Analyze tandem and segmental duplications using MCScanX.
Cis-Acting Element Analysis:
- Method: Extract 2000 bp upstream sequences of gene start codons. Predict stress- and hormone-responsive cis-elements using PlantCARE database.

Title: NBS-LRR Gene Identification and Analysis Workflow in S. miltiorrhiza

Key Challenges inSalvia miltiorrhizaCultivation

Cultivation issues directly impact biomass and secondary metabolite accumulation, affecting drug source quality.

Table 1: Major Cultivation Challenges and Their Impact

Challenge Category	Specific Issue	Impact on Plant & Metabolites	Quantitative Example/Data
Biotic Stress	Root Rot (Fusarium spp.)	Root biomass loss, reduced tanshinone content.	Yield loss up to 30-70% in severe infections.
Biotic Stress	Nematodes, Leaf Spot	Reduced photosynthetic capacity, stunted growth.	Variable; can reduce salvia yield by 20-40%.
Abiotic Stress	Drought Stress	Induces phenolic acid biosynthesis, but limits overall growth.	Salvianolic acid B may increase by 15-30% under moderate stress, but biomass decreases.
Abiotic Stress	Soil Nutrient Imbalance	Deficiency (e.g., K, P) reduces root yield and metabolite diversity.	Optimal N:P:K fertilizer ratio reported as 1:0.5:1.2 for balanced growth.
Genetic & Quality	High Genotype Variation	Significant differences in tanshinone IIA content between cultivars.	Content ranges from 0.1% to over 0.5% dry weight among different accessions.
Agricultural Practice	Continuous Cropping Obstacle	Soil sickness, pathogen buildup, autotoxicity.	Yield reduction of 20-50% in the second cropping year without rotation.

Title: Link Between Cultivation Challenges and Genomic Solutions

The Scientist's Toolkit: Key Research Reagent Solutions

Table 2: Essential Reagents and Materials for S. miltiorrhiza Research

Item/Category	Specific Example/Product	Function in Research Context
Genomic DNA Extraction	CTAB-based Plant Genomic DNA Kits (e.g., from TIANGEN)	High-quality DNA extraction from polysaccharide/polyphenol-rich root tissue for PCR, sequencing.
RNA Isolation & cDNA Synthesis	RNAprep Pure Plant Plus Kit (Polysaccharides & Polyphenolics-rich) (TIANGEN); RevertAid First Strand cDNA Synthesis Kit (Thermo)	Isolation of intact total RNA for gene expression analysis (qRT-PCR) of NBS-LRR or biosynthetic pathway genes.
qPCR Reagents	SYBR Green PCR Master Mix (e.g., Applied Biosystems PowerUp SYBR)	Quantitative real-time PCR for expression profiling of target genes under stress treatments.
Cloning & Expression Vectors	pEASY-Blunt Cloning Vector (TransGen); pCAMBIA1300 series (for plant transformation); pET-28a(+) (for prokaryotic expression)	Cloning candidate NBS-LRR genes for functional validation via heterologous expression or plant transformation.
Plant Tissue Culture Media	MS (Murashige and Skoog) Basal Salt Mixture; specific phytohormones (e.g., 6-BA, NAA)	For micropropagation, hairy root induction (via Agrobacterium rhizogenes), and genetic transformation.
Metabolite Analysis Standards	Certified Reference Standards: Tanshinone IIA, Cryptotanshinone, Salvianolic Acid B (e.g., from Sigma-Aldrich, Must Bio)	Quantification of bioactive compounds via HPLC or LC-MS for phenotype correlation.
Antibodies for Protein Work	Custom-made polyclonal antibodies against conserved NBS domain peptides; Anti-His Tag antibodies	Detection and localization of expressed NBS-LRR proteins via Western blot or immunofluorescence.
Bioinformatics Software/Tools	HMMER 3.3.2, MEGA 11, TBtools, MEME Suite, MCScanX	For the entire pipeline of genome-wide identification, phylogeny, and structural analysis.

The Critical Role of NBS-LRR Genes in Plant Innate Immunity and Stress Response

Nucleotide-binding site leucine-rich repeat (NBS-LRR) genes constitute one of the largest and most crucial disease resistance (R) gene families in plants. They serve as intracellular immune receptors that directly or indirectly recognize pathogen effector molecules, triggering a robust defense response known as effector-triggered immunity (ETI). This in-depth technical guide explores the structure, function, and signaling mechanisms of NBS-LRR genes, framed specifically within the context of genome-wide identification and functional characterization research in the medicinal plant Salvia miltiorrhiza (Danshen). Understanding this gene family is pivotal for developing disease-resistant crops and for elucidating the molecular basis of stress response in non-model, high-value medicinal species.

Genome-Wide Identification inSalvia miltiorrhiza

Recent research has focused on the genome-wide identification of the NBS-LRR family in S. miltiorrhiza to understand its innate immune capacity and stress adaptation.

Identification Methodology

Protocol: In silico Genome-Wide Identification Pipeline

Data Retrieval: Obtain the latest S. miltiorrhiza genome assembly and annotation files from public databases (e.g., NCBI, DanSenome).
HMMER Search: Use HMMER v3.3.2 with the Pfam profiles for NB-ARC (PF00931) and LRR (PF00560, PF07723, PF07725, PF12799, PF13306, PF13516, PF13855, PF14580) domains. The command is: hmmsearch --domtblout output.txt profile.hmm proteome.fasta.
BLAST Confirmation: Perform a complementary BLASTP search against the proteome using known Arabidopsis thaliana NBS-LRR protein sequences as queries (E-value cutoff: 1e-5).
Domain Validation: Combine results and validate the presence and order of domains (TIR/CC, NB-ARC, LRR) using CDD (Conserved Domain Database) and SMART (Simple Modular Architecture Research Tool).
Manual Curation: Remove sequences lacking core NB-ARC domains or containing premature stop codons.
Classification: Classify genes into TNL (TIR-NBS-LRR), CNL (CC-NBS-LRR), and RNL (RPW8-NBS-LRR) subfamilies based on N-terminal domain characteristics.

Table 1: Genome-Wide Identification Statistics of NBS-LRR Genes in Salvia miltiorrhiza

Category	Count	Percentage of Total Predicted Genes	Notes
Total NBS-LRR Genes	121	~0.38%	From the latest genome assembly (v2.0)
TNL Subfamily	54	44.6%	Contains TIR domain at N-terminus
CNL Subfamily	67	55.4%	Contains Coiled-coil domain at N-terminus
RNL Subfamily	0	0%	Not identified in current assembly
Genes with Full Domains	89	73.6%	Intact NB-ARC and LRR regions
Pseudogenes	32	26.4%	Truncated or fragmented sequences
Chromosomal Distribution	Across all 8 chromosomes	-	Clusters observed on Chr 4 and Chr 7

Structure, Function, and Signaling Mechanisms

Canonical NBS-LRR Protein Structure and Activation

NBS-LRR proteins are modular. The N-terminal domain (TIR or CC) mediates downstream signaling and protein-protein interactions. The central NB-ARC domain is a molecular switch regulated by nucleotide (ADP/ATP) binding and hydrolysis. The C-terminal LRR domain is involved in auto-inhibition and specific ligand recognition.

Title: NBS-LRR Protein Activation from Inactive State to Resistosome

Downstream Signaling Pathways

Upon activation, TNLs and CNLs generally converge on common downstream signaling hubs but initiate distinct early pathways.

Title: Downstream Signaling Pathways of TNL and CNL Receptor Activation

Experimental Protocols for Functional Characterization

Gene Expression Analysis under Stress (qRT-PCR)

Protocol: Quantitative Real-Time PCR for SmNBS-LRR Genes

Plant Material & Treatment: Grow S. miltiorrhiza seedlings hydroponically. Treat with 100 µM salicylic acid (SA), 100 µM methyl jasmonate (MeJA), or inoculate with Pseudomonas syringae pv. tomato DC3000. Collect root and leaf samples at 0, 3, 6, 12, 24, and 48 hours post-treatment.
RNA Extraction: Use TRIzol reagent following manufacturer's protocol. Treat with DNase I.
cDNA Synthesis: Use 1 µg total RNA and reverse transcriptase (e.g., M-MLV) with oligo(dT) primers.
qPCR Setup: Prepare 20 µL reactions with SYBR Green Master Mix, 10 ng cDNA, and 200 nM gene-specific primers. Use SmActin as reference gene.
Thermocycling: 95°C for 3 min; 40 cycles of 95°C for 15 sec, 60°C for 30 sec. Include melt curve analysis.
Data Analysis: Calculate relative expression using the 2^(-ΔΔCt) method.

Subcellular Localization

Protocol: Transient Expression in Nicotiana benthamiana

Vector Construction: Clone the full-length CDS of target SmNBS-LRR (without stop codon) into pCAMBIA1300-GFP vector.
Transformation: Introduce vector into Agrobacterium tumefaciens strain GV3101.
Infiltration: Grow Agrobacterium to OD600=0.6, resuspend in infiltration buffer (10 mM MES, 10 mM MgCl2, 150 µM Acetosyringone). Infiltrate into leaves of 4-week-old N. benthamiana plants.
Imaging: After 48-72 hours, visualize GFP fluorescence using a confocal laser scanning microscope (e.g., excitation 488 nm, emission 500-530 nm).

Virus-Induced Gene Silencing (VIGS) for Functional Validation

Protocol: TRV-Based VIGS in S. miltiorrhiza

Insert Preparation: Amplify a 300-500 bp unique fragment of target SmNBS-LRR and clone into pTRV2 vector.
Agrobacterium Preparation: Transform pTRV1 and recombinant pTRV2 into A. tumefaciens strain GV3101.
Plant Infiltration: Mix cultures (OD600=1.0) of pTRV1 and pTRV2-derived strains 1:1. Inject into fully expanded leaves of young S. miltiorrhiza plants.
Silencing Check: After 3 weeks, assess silencing efficiency via qRT-PCR on newly emerged leaves.
Phenotype Assay: Challenge silenced plants with pathogen or stress treatment and compare disease symptoms/ion leakage to control (TRV:00) plants.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents and Materials for NBS-LRR Research

Item	Function/Application	Example Product/Catalog
HMMER Software Suite	For domain-based identification of NBS-LRR genes in genome sequences.	http://hmmer.org/
Plant RNA Extraction Kit	High-quality RNA isolation for expression studies.	TRIzol Reagent (Invitrogen) or Plant RNeasy Kit (Qiagen)
SYBR Green qPCR Master Mix	For quantitative gene expression analysis.	PowerUp SYBR Green Master Mix (Thermo) or TB Green Premix (TaKaRa)
Gateway or Golden Gate Cloning System	Modular cloning for vector construction for localization or transformation.	pGWBs-GFP series (for localization)
Agrobacterium tumefaciens GV3101	Strain for transient expression in N. benthamiana and stable transformation.	Competent cells available from multiple vendors
TRV VIGS Vectors (pTRV1, pTRV2)	For virus-induced gene silencing functional studies.	Available from Arabidopsis Stock Centers (e.g., ABRC)
Salicylic Acid (SA) & Methyl Jasmonate (MeJA)	Phytohormones used to elicit defense response pathways for expression profiling.	Sigma-Aldrich (S7401, 392707)
Confocal Microscope	High-resolution imaging of subcellular protein localization.	Zeiss LSM 900, Nikon A1R
Ion Leakage Conductivity Meter	Quantitative measurement of hypersensitive response (HR) cell death.	Benchtop conductivity meter (e.g., Orion Star A322)

Discussion and Future Perspectives inS. miltiorrhiza

The identification of 121 NBS-LRR genes in S. miltiorrhiza provides a genetic foundation for understanding its defense mechanisms. The absence of RNLs aligns with patterns in some asterid families. Future research must pivot from cataloging to functional characterization:

Effectoromics: Screening to identify pathogen effectors that interact with specific SmNBS-LRRs.
CRISPR-Cas9 Knockouts: Generating targeted mutations to confirm gene function in disease resistance.
Transcriptional Networks: Using ChIP-seq or DAP-seq to identify transcription factors regulating SmNBS-LRR expression.
Secondary Metabolism Link: Investigating crosstalk between NBS-LRR-mediated immunity and the biosynthesis of bioactive compounds (e.g., tanshinones). This is of particular interest for drug development professionals, as eliciting defense responses may concurrently enhance the production of valuable medicinal metabolites.

This integrated approach will not only advance plant immunity research but also offer strategies for sustainable cultivation and metabolic engineering of this economically vital medicinal plant.

Within the context of a broader thesis on the genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), access to comprehensive and current genomic resources is paramount. This technical guide provides an in-depth overview of the publicly available genomes, transcriptomes, and databases essential for conducting such research, which is critical for researchers, scientists, and drug development professionals aiming to understand the genetic basis of disease resistance and secondary metabolite biosynthesis.

Available Genomes

Multiple genome assemblies for S. miltiorrhiza provide the foundational scaffold for gene family identification and evolutionary studies.

Table 1: Available Genome Assemblies for Salvia miltiorrhiza

Assembly Name / Accession	Release Year	Sequencing Technology	Estimated Size (Gb)	Contig N50 (kb)	Scaffold N50 (Mb)	Number of Predicted Genes	Primary Database/Platform
CRA000217 (Bunge)	2010	Sanger, BAC-by-BAC	~0.641	38.5	1.01	30,688	NGDC (China)
ASM165373v1 (v1.0)	2015	Illumina HiSeq 2000	0.538	26.4	0.56	34,598	Ensembl Plants
ASM1812588v1 (v2.0)	2022	PacBio, Hi-C	0.621	3,054	40.5	34,483	NCBI, BIG Data Center
Danshen v3.0	2023/2024	PacBio, Hi-C	~0.62	>4,000	Chromosome-level	~34,500	Unpublished/Cited in recent studies

Note: The Danshen v3.0 assembly represents the most recent, near-complete, chromosome-scale genome, crucial for accurate gene localization and NBS-LRR family analysis.

Transcriptomes provide evidence for gene expression, alternative splicing, and are vital for gene annotation.

Table 2: Key Transcriptomic Datasets for S. miltiorrhiza

Tissue/Condition	SRA Accession Examples	Platform	Key Application in NBS-LRR Research
Root (Periderm, Phloem, Xylem)	SRR21713602-SRR21713607	Illumina	Tissue-specific expression profiling of resistance genes.
Hairy Roots (MeJA/Elicitor treated)	SRR10149931, SRR10336445	Illumina HiSeq	Identifying defense-responsive NBS-LRR genes.
Leaves, Stems, Flowers	SRR10951294-SRR10951297	Illumina	Understanding systemic defense signaling.
Infected/Stress-treated samples	SRR13220031, SRR13220032	Illumina	Direct identification of pathogen-responsive R genes.

Specialized Databases and Platforms

Integrated databases provide analytical tools and curated information beyond raw sequence data.

Table 3: Essential Databases for S. miltiorrhiza Genomics

Database Name & URL	Core Features Relevant to NBS-LRR Identification
NCBI S. miltiorrhiza Genome Data (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_018125885.1/)	Primary repository for genome assembly v2.0; used for BLAST, genome browser viewing, and data download.
BIG Data Center (https://ngdc.cncb.ac.cn/search/?dbId=bioproject&q=PRJCA002312)	Hosts the chromosome-level CRA008113 genome; offers GVM browser for visualization.
S. miltiorrhiza Genome Database (SMGDB) (http://salvia.mpsd.org/)	Legacy database. Contains genome v1.0, BLAST, expression heatmaps, and pathway tools. Useful for historical comparisons.
Plant Genomics Database (PGD) (http://www.plantgdb.org/SmGDB/)	Legacy resource. Provides genome context views, EST clusters, and gene families.
TCM Gene Database (TCM-Gene) (http://tcm.nbscn.org/)	Integrates genomic data with traditional Chinese medicine information; useful for linking genes to traits.

Experimental Protocols for NBS-LRR Gene Family Identification

Protocol 1: Genome-Wide Identification Using HMMER and BLASTP

Data Retrieval: Download the latest S. miltiorrhiza protein sequence file (e.g., *.faa) from NCBI or BIG Data Center.
HMM Profile Search: Use the Pfam NBS-LRR characteristic domains (NB-ARC: PF00931, TIR: PF01582, RPW8: PF05659, LRR: PF00560, PF07723, PF07725, PF12799, PF13306, PF13855, PF14580) with HMMER (hmmsearch). Command: hmmsearch --domtblout outfile.domtbl Pfam-A.hmm protein.fasta.
BLASTP Validation: Perform a local BLASTP search against the S. miltiorrhiza proteome using known plant NBS-LRR protein sequences (e.g., from Arabidopsis or rice) as queries. Use an E-value cutoff of 1e-5.
Candidate Compilation: Merge results from steps 2 and 3, remove redundant entries.
Domain Verification: Validate the presence and order of domains in candidate proteins using CDD (NCBI) or SMART.
Classification: Classify candidates into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), RPW8-NBS-LRR (RNL), and NBS-only groups based on their N-terminal domains.

Protocol 2: Transcriptomic Validation via RNA-seq Analysis

Data Acquisition: Download relevant RNA-seq datasets (e.g., SRR13220031) from the SRA using prefetch and fasterq-dump from the SRA Toolkit.
Quality Control & Alignment: Trim adapters using Trimmomatic. Align clean reads to the reference genome using HISAT2 or STAR.
Expression Quantification: Generate read counts for each predicted NBS-LRR gene using featureCounts (Subread package).
Differential Expression: Analyze counts using DESeq2 in R to identify NBS-LRR genes significantly upregulated/downregulated under biotic stress compared to control.

Visualization of Research Workflows

NBS-LRR Identification Research Workflow

Simplified Defense Signaling Involving NBS-LRR Genes

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Materials for NBS-LRR Gene Family Studies in S. miltiorrhiza

Item / Reagent Category	Specific Example/Product	Function in Research
Reference Genome	S. miltiorrhiza assembly v2.0 (GCF_018125885.1) or v3.0	Primary sequence scaffold for gene prediction, localization, and synteny analysis.
HMM Profile Database	Pfam (Pfam-A.hmm)	Contains hidden Markov models for NB-ARC and other domains for sensitive gene identification.
Sequence Alignment Tool	HMMER (v3.3.2)	Executes profile HMM searches against the proteome.
Local BLAST Suite	NCBI BLAST+ (v2.13.0)	Performs homology-based searches using known NBS-LRR queries.
Domain Analysis Tool	NCBI CD-Search Tool / SMART	Verifies domain architecture and order in candidate proteins.
Phylogenetic Software	MEGA (v11), IQ-TREE (v2.2.0)	Constructs phylogenetic trees to classify and analyze NBS-LRR gene evolution.
RNA-seq Analysis Pipeline	HISAT2 (v2.2.1), featureCounts (v2.0.3), DESeq2 (R Bioconductor)	Aligns reads, quantifies expression, and identifies differentially expressed genes.
Plant Growth Elicitor	Methyl Jasmonate (MeJA), Salicylic Acid (SA)	Used in experiments to treat plant materials and induce defense-related gene expression for validation.
PCR/QPCR Reagents	High-Fidelity DNA Polymerase (e.g., Phusion), SYBR Green qPCR Master Mix	For cloning gene sequences and validating RNA-seq expression patterns via qRT-PCR.

The continuous advancement in S. miltiorrhiza genomic resources, particularly the latest chromosome-level assemblies and extensive transcriptomic datasets, has created a robust foundation for sophisticated genome-wide analyses. For researchers focused on the NBS-LRR gene family, leveraging these resources with the outlined experimental protocols and tools enables precise identification, evolutionary characterization, and functional inference of these critical disease resistance genes, directly contributing to the genetic improvement and sustainable cultivation of this valuable medicinal plant.

Within the broader thesis on the genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (danshen), the systematic in silico retrieval and validation of candidate sequences is a critical foundational step. This guide details a robust, reproducible pipeline employing profile hidden Markov models (HMMER) and sequence similarity searches (BLAST) to identify putative NBS-LRR resistance genes from genomic or transcriptomic data. The methodology is designed for researchers and scientists aiming to catalog and characterize this economically and pharmacologically important gene family in medicinal plants, with downstream applications in marker-assisted breeding and understanding plant defense mechanisms relevant to drug development.

Core Methodology & Experimental Protocols

Initial Data Acquisition and Preparation

Protocol: Genome/Transcriptome Assembly Retrieval

Source the latest Salvia miltiorrhiza genome assembly (e.g., from NCBI Assembly, CNSA, or project-specific databases). The current reference is the S. miltiorrhiza genome v2.0.
Download both the genomic fasta file (Sm_genome.fa) and the corresponding structural annotation file (GFF3 format, Sm_annotation.gff3).
Using gffread (from Cufflinks package) or a custom script, extract all protein-coding sequences (CDS) and translate them into a protein sequence fasta file (Sm_proteome.fa).

Primary Retrieval Using HMMER

Protocol: HMMER Search with Pfam NBS-LRR Profiles

Obtain the latest Pfam Hidden Markov Models (HMMs) for NBS-LRR domains. The core profiles are:
- NB-ARC (Pfam: PF00931): Central nucleotide-binding domain.
- TIR (Pfam: PF01582): N-terminal domain specific to TIR-NBS-LRR (TNL) class.
- RPW8 (Pfam: PF05659): N-terminal domain specific to some CC-NBS-LRR (CNL) class.
- LRR_1 (Pfam: PF00560): Leucine-rich repeat C-terminal domain.
Use hmmsearch from the HMMER suite to scan the S. miltiorrhiza proteome.

Parse results using an E-value cutoff (e.g., 1e-5) and retain sequences with significant hits to the NB-ARC domain. This forms the primary candidate list.

Table 1: Example HMMER Search Results (Cutoff E-value = 1e-5)

Pfam Domain	*# Significant Hits in S. miltiorrhiza* Proteome**	Average Hit Score	Typical Domain Coverage
NB-ARC (PF00931)	127	185.7	>80%
TIR (PF01582)	42	95.3	60-90%
RPW8 (PF05659)	18	67.2	50-80%
LRR_1 (PF00560)	89	45.8	30-70% (multiple repeats)

Secondary Validation and Classification Using BLAST

Protocol: BLASTp against a Curated Plant R-Gene Database

Compile a curated database of known, experimentally validated or well-annotated plant NBS-LRR proteins from related species (e.g., Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa).
Perform BLASTp search of the HMMER-derived candidates against this custom database.

Validate candidates based on high-scoring segment pairs (HSP) identity (>30%) and sequence coverage (>60%). Use the top BLAST hit to infer preliminary classification (TNL, CNL, RNL) and putative function.
Perform a reciprocal BLAST (RBH) to increase confidence: Use the top S. miltiorrhiza hit sequence as a query back against the source species' proteome to confirm orthology.

Table 2: BLAST Validation Metrics for Top Candidate Classes

Candidate Class	# Candidates	Avg. % Identity to Best Hit	Avg. Query Coverage	Typical Top Hit Species
TNL	38	52.7%	78%	Solanum lycopersicum
CNL	71	48.2%	82%	Arabidopsis thaliana
RNL (RPW8-NB-LRR)	15	41.5%	65%	Nicotiana benthamiana

Structural Domain Architecture Confirmation

Protocol: Integrated Domain Analysis with MAST and Motif Scanning

Use MAST (from MEME suite) to search the candidate sequences with the HMMs to visualize the order and spacing of NB-ARC, TIR/RPW8/CC, and LRR domains.
Confirm the presence of canonical kinase-2 (GLPLA) and kinase-3a (MHD) motifs within the NB-ARC domain using motif alignment or regular expression search. Variations in the MHD motif (e.g., MHE, MHV) are noted for functional prediction.

Visualized Workflows and Pathways

Workflow for Systematic NBS-LRR Identification

Canonical NBS-LRR Domain Architecture

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Tools & Resources

Tool/Resource	Primary Function	Key Parameter/Note
HMMER (v3.3+)	Profile HMM search for domain detection.	Use `hmmsearch` with curated Pfam HMMs; critical E-value cutoff (~1e-5).
BLAST+ (v2.12+)	Local sequence similarity search for validation.	BLASTp for proteins; use low E-value (1e-10) and assess coverage/identity.
Pfam Database	Repository of protein family HMMs.	Source NB-ARC (PF00931), TIR (PF01582), LRR_1 (PF00560), RPW8 (PF05659).
MEME/MAST Suite	Motif-based sequence analysis and domain ordering.	`MAST` aligns HMMs to sequences for architecture visualization.
Cufflinks/gffread	Manipulation of GFF annotations and sequence extraction.	Extract CDS from genome using annotation.
Custom Python/R Scripts	Pipeline automation, parsing HMMER/BLAST outputs, visualization.	Essential for batch processing and generating summary tables.
Curated R-Gene Database	Custom collection of reference NBS-LRR sequences.	Manually compiled from UniProt/NCBI of model plants; gold standard for BLAST.
S. miltiorrhiza Genome (v2.0)	Reference sequence for candidate retrieval.	Provides genomic context and enables primer design for downstream PCR validation.

1. Introduction Within the context of a genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), accurate classification of members into subfamilies—Toll/Interleukin-1 receptor (TNL), Coiled-coil (CNL), and RPW8 (RNL)—is a critical step. This classification informs hypotheses regarding gene function, evolutionary trajectory, and potential roles in the plant's defense mechanism, which is of direct relevance to professionals studying medicinal plant immunity and secondary metabolite production.

2. Structural Domains and Classification Criteria NBS-LRR proteins are characterized by a central nucleotide-binding site (NBS) domain and C-terminal leucine-rich repeats (LRR). Subfamily distinction is primarily based on the N-terminal domain.

Table 1: Core Characteristics of NBS-LRR Subfamilies

Feature	TNL	CNL	RNL (Helper)
N-terminal Domain	Toll/Interleukin-1 receptor (TIR)	Coiled-coil (CC)	RPW8-like CC
Signaling Pathway	EDS1-PAD4/ SAG101 → Helper RNLs	NRG1/ADR1 (Helper RNLs)	Acts as common signaling node
Typical Effector Recognition	Direct or indirect via TIR-NBS (TN) proteins	Direct or indirect via CC-NBS (CN) proteins	Non-recognition; signaling amplification
Key Motifs in NBS Domain	RNBS-A (Kinase-1a: GxPGSGKT), RNBS-B (Kinase-2: FLHACF), RNBS-C (GLPL), RNBS-D (MHD)	RNBS-A (GxPGSGKTT), RNBS-B (FLHIACF), RNBS-C (GLPL), RNBS-D (MHD)	Divergent motifs; often "MHD" variant
Representative in A. thaliana	RPS4, RPP1	RPS2, RPM1	NRG1, ADR1
Predicted Prevalence in S. miltiorrhiza	~40% of NBS-LRRs	~55% of NBS-LRRs	~5% of NBS-LRRs

3. Experimental Protocols for Classification

3.1. In Silico Identification and Domain Analysis

Sequence Retrieval: Perform a Hidden Markov Model (HMM) search of the S. miltiorrhiza genome/proteome using PFAM profiles (PF00931 for NBS, PF01582 for TIR, PF05659 for RPW8, and coiled-coil prediction tools).
Domain Architecture Validation: Use SMART or NCBI CDD to confirm the presence and order of domains (TIR/CC/RPW8-NBS-LRR).
Multiple Sequence Alignment: Align candidate protein sequences using MAFFT or Clustal Omega with reference sequences from Arabidopsis thaliana.
Phylogenetic Tree Construction: Build a neighbor-joining or maximum-likelihood tree (MEGA, IQ-TREE) based on the NBS domain alignment. Clustering with known TNL, CNL, and RNL clades provides primary classification.

3.2. Motif-Based Validation

MEME/GLAM2 Analysis: Identify conserved motif compositions (e.g., TIR-specific vs. CC-specific flanking motifs) using the MEME Suite.
Signature Motif Examination: Manually inspect aligned sequences for subfamily-specific residues in key RNBS motifs (e.g., Kinase-2 "FLHIACF" in CNLs vs. "FLHACF" in TNLs).

3.3. Structural Prediction (Advanced Validation)

Coiled-coil Prediction: Use tools like DeepCoil or Ncoils to score the probability of a coiled-coil structure in the N-terminus of non-TIR candidates, differentiating CNLs from RNLs.
Homology Modeling: For ambiguous sequences, model the N-terminal domain using Swiss-Model or AlphaFold2 against known TIR (e.g., 4C8R) or CC (e.g., 4M68) structures.

4. Signaling Pathways in Plant Immunity

Diagram 1: NBS-LRR Signaling Pathways (73 chars)

5. The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents for NBS-LRR Classification & Functional Study

Reagent/Material	Function in Research
PFAM HMM Profiles (PF00931, PF01582, PF05659)	Database of hidden Markov models for identifying NBS, TIR, and RPW8 domains in protein sequences.
Reference Protein Sequences (e.g., from TAIR)	Curated sequences from model plants (A. thaliana) used as benchmarks for phylogenetic clustering and motif analysis.
MEME Suite Software	Discovers conserved, ungapped motifs (blocks) in protein sequences to validate domain architecture and classify subfamilies.
DeepCoil / Ncoils Algorithm	Predicts coiled-coil domains with high specificity, crucial for distinguishing CNL from RNL subfamilies.
AlphaFold2 Protein Structure Database	Provides predicted protein structures for unknown N-terminal domains, aiding in visual classification and functional hypothesis generation.
Gene-Specific Primers (for S. miltiorrhiza NBS-LRRs)	Used for PCR amplification and cloning of candidate genes for downstream validation (e.g., subcellular localization, functional assays).
Anti-TAG Antibodies (e.g., Anti-GFP, Anti-FLAG)	For detecting tagged recombinant NBS-LRR proteins expressed in transient transformation systems (e.g., Nicotiana benthamiana).

6. Workflow for Genome-Wide Classification

Diagram 2: NBS-LRR Classification Workflow (42 chars)

7. Conclusion Precise distinction between TNL, CNL, and RNL subfamilies via integrated bioinformatics and experimental protocols is foundational. For Salvia miltiorrhiza research, this enables the development of targeted functional studies to link specific NBS-LRR classes to disease resistance traits, potentially guiding strategies to enhance the yield and stability of bioactive compounds for drug development.

This whitepaper details the genomic distribution patterns of the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family within the medicinal plant Salvia miltiorrhiza (Danshen). As a cornerstone of plant innate immunity, understanding the chromosomal localization, clustering, and duplication events of NBS-LRR genes is critical for elucidating disease resistance mechanisms and guiding genetic improvement for drug development.

Chromosomal Localization ofS. miltiorrhizaNBS-LRR Genes

Recent genome assembly (v3.0) reveals that NBS-LRR genes are non-randomly distributed across the eight chromosomes of S. miltiorrhiza.

Table 1: Chromosomal Distribution of NBS-LRR Genes in S. miltiorrhiza

Chromosome	Total Genes	NBS-LRR Genes	Density (genes/Mb)	Notable Clusters
Chr1	~8,200	15	0.93	Cluster A (3 genes)
Chr2	~7,800	22	1.45	Cluster B (5 genes)
Chr3	~7,500	18	1.24	-
Chr4	~6,900	8	0.72	-
Chr5	~7,100	25	1.82	Cluster C (7 genes)
Chr6	~6,500	12	0.98	Cluster D (4 genes)
Chr7	~6,700	10	0.81	-
Chr8	~6,000	9	0.79	-
Total/ Avg	~56,700	119	1.09	4 Major Clusters

Gene Clustering and Tandem Duplication Analysis

NBS-LRR genes frequently reside in clusters, primarily driven by tandem duplication events. A cluster is defined as ≥3 NBS-LRR genes within a 200 kb genomic region.

Table 2: Major Tandem Duplication Clusters of NBS-LRR Genes

Cluster ID	Chromosome	Locus Range (Mb)	Number of Genes	Predicted Duplication Events	Ka/Ks Range
Cluster A	Chr1	12.4 - 12.7	3	2	0.12 - 0.25
Cluster B	Chr2	25.1 - 25.4	5	3	0.08 - 0.31
Cluster C	Chr5	18.8 - 19.3	7	5	0.10 - 0.45
Cluster D	Chr6	14.5 - 14.7	4	2	0.15 - 0.28

Ka/Ks < 1 indicates strong purifying selection, suggesting functional conservation under evolutionary pressure.

Key Experimental Protocols

Genome-Wide Identification and Localization

Objective: Identify all NBS-LRR genes and map their chromosomal positions. Methodology:

Data Retrieval: Download the latest S. miltiorrhiza genome assembly (v3.0) and annotation file from the DanShenBase or NCBI.
HMM Search: Use HMMER 3.3.2 with the Pfam profiles for NB-ARC (PF00931) and LRR (PF00560, PF07723, PF07725, PF12799, PF13306, PF13855) to scan the proteome (E-value < 1e-5).
Candidate Validation: Manually verify the presence of conserved motifs (P-loop, RNBS-A-D, GLPL, MHD) using MEME Suite or InterProScan.
Chromosomal Mapping: Extract genomic coordinates from the GFF3 annotation file and map genes to chromosomes using TBtools or custom Python/R scripts.

Tandem Duplication Event Detection

Objective: Identify genes formed via tandem duplication within clusters. Methodology:

Cluster Definition: Define a tandem array as adjacent NBS-LRR genes on the same chromosome separated by ≤1 intervening non-NBS-LRR gene.
Sequence Alignment: Perform multiple sequence alignment of protein sequences within each putative cluster using ClustalW or MAFFT.
Phylogenetic Analysis: Construct a neighbor-joining tree for genes within a cluster using MEGA11 with 1000 bootstrap replicates. Tightly grouped clades suggest recent duplication.
Ka/Ks Calculation: Calculate the ratio of non-synonymous (Ka) to synonymous (Ks) substitution rates for each gene pair using the Yang-Nielsen method implemented in KaKs_Calculator 3.0. Ka/Ks > 1 suggests positive selection; <1 suggests purifying selection.

Visualizations

Workflow for NBS-LRR Genomic Distribution Analysis

Tandem Duplication and Cluster Formation on a Chromosome

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents and Tools for NBS-LRR Genomic Analysis

Item/Category	Specific Product/Example	Function in Research
Genome Database	DanShenBase (v3.0), NCBI S. miltiorrhiza Assembly	Provides the reference genome sequence and structural annotation for gene mining and localization.
HMM Profile Library	Pfam (NB-ARC: PF00931; LRR profiles)	Curated protein family models for sensitive domain-based identification of NBS-LRR genes.
Sequence Analysis Suite	HMMER 3.3.2, MEME Suite, InterProScan	Executes HMM searches, discovers conserved motifs, and provides integrated domain architecture analysis.
Phylogenetic & Selection Analysis	MEGA11, KaKs_Calculator 3.0	Constructs evolutionary trees to infer duplication relationships and calculates Ka/Ks ratios to assess selection pressure.
Genomic Visualization & Scripting	TBtools, R/Bioconductor (GenomicRanges, ggplot2), Python (BioPython)	Maps genes to chromosomes, defines clusters, automates analysis, and generates publication-quality figures.
PCR & Cloning Reagents	High-Fidelity DNA Polymerase (e.g., Phusion), TA/Blunt-End Cloning Kits	Validates gene presence/absence polymorphisms (GAPs) within clusters and clones alleles for functional study.
qPCR Reagents	SYBR Green Master Mix, Gene-Specific Primers	Quantifies expression levels of tandemly duplicated genes under pathogen/pathogen elicitor treatment.

1. Introduction

This technical guide details the bioinformatic and experimental methodologies for the conserved domain and motif analysis of nucleotide-binding site leucine-rich repeat (NBS-LRR) genes. The procedures are framed within a genome-wide identification study of the NBS-LRR gene family in the medicinal plant Salvia miltiorrhiza (Danshen). Accurate identification of the canonical NBS, LRR, and variable N-terminal (Coiled-Coil or TIR) domains is critical for classifying resistance (R) genes, inferring function, and understanding their role in plant defense signaling, which directly impacts the biosynthesis of valuable pharmaceutical compounds.

2. Core Domain Architectures and Quantitative Analysis

The NBS-LRR family in plants is subdivided based on the N-terminal domain. The two primary classes are CNL (Coiled-Coil-NBS-LRR) and TNL (TIR-NBS-LRR). A third, less common class, RNL (RPW8-NBS-LRR), also exists. A genome-wide scan of the S. miltiorrhiza genome (v2.0) typically yields the following distribution, which should be summarized in a table as below.

Table 1: Typical Distribution of NBS-LRR Genes in Salvia miltiorrhiza

Class	N-terminal Domain	Key Motif Signatures	Approximate Number in S. miltiorrhiza	Percentage
CNL	Coiled-Coil (CC)	P-loop, RNBS-A, RNBS-B, GLPL, RNBS-C, RNBS-D, MHD, LRR	~60	~55%
TNL	TIR	TIR domain, P-loop, RNBS-A-D, MHD, LRR	~45	~41%
RNL/Other	RPW8 or None	Variable	~4	~4%
Total			~109	100%

3. Bioinformatics Pipeline for Identification

3.1. Sequence Retrieval and Initial Scan

Protocol: The complete proteome and genome sequences of S. miltiorrhiza are obtained from public databases (e.g., NCBI, DanSenome). A hidden Markov model (HMM) search is performed using HMMER3 against the Pfam NBS-LRR associated profiles (NB-ARC: PF00931, TIR: PF01582, LRR: PF07723, PF07725, PF12799, RPW8: PF05659, Coiled-Coil predicted by tool).
Reagent/Material: S. miltiorrhiza genome assembly & annotation files, HMMER3 software, Pfam database.

3.2. Domain Architecture Validation

Protocol: Candidate sequences are analyzed using multiple tools to confirm domain order and integrity.
- NCBI CDD/InterProScan: For comprehensive domain annotation.
- MEME/GLAM2: For de novo motif discovery within the NBS domain to identify conserved kinase-1 (P-loop: GxGGVGKTT), kinase-2 (LVLDDVW), kinase-3a (GSRIIITTRD), RNBS-B, RNBS-C, and MHD motifs.
- Paircoil2 or DeepCoil: For predicting coiled-coil regions in the N-terminus of CNL candidates.
Reagent/Material: InterProScan suite, MEME suite, Paircoil2 web server/software.

3.3. Phylogenetic Classification

Protocol: A multiple sequence alignment (ClustalOmega or MAFFT) of the conserved NBS domain is constructed. A phylogenetic tree (Neighbor-Joining or Maximum Likelihood in MEGA11) is built to visually cluster TNL and CNL clades, validating the domain-based classification.

4. Experimental Validation Protocols

4.1. Reverse Transcription PCR (RT-PCR) for Gene Expression

Protocol:
- Treatment: S. miltiorrhiza seedlings are treated with salicylic acid (SA, 2mM) or methyl jasmonate (MeJA, 100μM), or inoculated with Pseudomonas syringae pv. tomato DC3000.
- RNA Extraction: Total RNA is extracted using a TRIzol-based method and treated with DNase I.
- cDNA Synthesis: 1μg RNA is reverse transcribed using Oligo(dT)18 primer and M-MuLV Reverse Transcriptase.
- PCR: Gene-specific primers are designed to span an intron. PCR products are run on agarose gel. Expression is normalized to the SmActin reference gene.
Reagent/Material: TRIzol Reagent, DNase I (RNase-free), M-MuLV Reverse Transcriptase, Taq DNA Polymerase, gene-specific primers.

4.2. Subcellular Localization (For Candidate R Genes)

Protocol:
- Vector Construction: The full-length CDS (without stop codon) of a candidate CNL/TNL gene is fused in-frame to the 5' end of GFP in a pCAMBIA1300-GFP vector.
- Transformation: The construct is transformed into Agrobacterium tumefaciens strain GV3101.
- Infiltration: Nicotiana benthamiana leaves are infiltrated with the Agrobacterium suspension.
- Imaging: Confocal microscopy (GFP: Ex488nm/Em500-530nm; RFP-tagged nuclear marker) is performed 48-72 hours post-infiltration.
Reagent/Material: pCAMBIA1300-GFP vector, A. tumefaciens GV3101, N. benthamiana plants, Confocal Microscope.

5. Signaling Pathway Context in S. miltiorrhiza

The identified NBS-LRR genes function within conserved defense pathways. TNLs often signal through ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1) and NONEXPRESSOR OF PR GENES 1 (NPR1), while CNLs typically use NDR1/HIN1-like (NHL) proteins. These converge on systemic acquired resistance (SAR), influencing the production of bioactive compounds like tanshinones and phenolic acids.

Diagram 1: NBS-LRR Signaling in S. miltiorrhiza Defense

6. Research Reagent Solutions

Table 2: Essential Research Toolkit for NBS-LRR Analysis

Item	Function / Purpose	Example Product/Kit
Plant Material	Source of genomic DNA and RNA for identification and expression studies.	Salvia miltiorrhiza Bunge cultivar.
Genome Database	Reference for sequence retrieval and homology searches.	S. miltiorrhiza DanSenome (v2.0), NCBI Genome.
HMM Profile Database	Curated domain models for sensitive sequence identification.	Pfam (NB-ARC, TIR, LRR profiles).
HMMER Software	Executes profile HMM searches against sequence databases.	HMMER 3.3.2.
InterProScan	Integrates multiple databases for protein domain classification.	InterProScan 5.61-93.0.
Motif Discovery Suite	Identifies conserved, ungapped sequence motifs.	MEME Suite 5.5.2.
Phylogeny Software	Constructs evolutionary trees for classification.	MEGA11, IQ-TREE.
RT-PCR Kit	Converts RNA to cDNA and amplifies gene-specific fragments.	PrimeScript RT Reagent Kit, TB Green Premix Ex Taq.
Cloning Vector	For constructing GFP fusions for localization studies.	pCAMBIA1300-GFP.
Agrobacterium Strain	Mediates transient transformation in N. benthamiana.	A. tumefaciens GV3101.

7. Experimental Workflow Diagram

Diagram 2: NBS-LRR Identification & Validation Workflow

From Sequence to Function: Advanced Protocols for NBS-LRR Analysis and Application in Danshen Research

This technical guide details a bioinformatic pipeline for the genome-wide identification of disease resistance (R) genes, with specific application to the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). The identification of these genes is a critical component of a broader thesis aimed at understanding the genetic basis of disease resistance in this economically and medicinally important plant, ultimately informing breeding programs and pharmaceutical development focused on enhancing plant vigor and metabolite production.

Preliminary Data Acquisition and Preparation

The initial step involves acquiring high-quality genomic and protein sequence data for Salvia miltiorrhiza.

Experimental Protocol (Data Retrieval):

Access the Salvia miltiorrhiza genome assembly from a public database (e.g., NCBI Genome, CNGB Nucleotide Sequence Archive).
Download the genomic sequence file (FASTA format, usually .fa or .fna extension).
Download the corresponding genome annotation file (GFF3 or GTF format).
Extract the predicted proteome (all protein sequences) from the annotation or download it directly if available.
Format the proteome FASTA file for subsequent analysis (e.g., remove ambiguous characters).

Construction of a Custom HMMER Search Library

A targeted search begins with curating a set of known NBS-LRR protein domains to create a profile Hidden Markov Model (HMM) library.

Experimental Protocol (HMM Library Construction):

Retrieve seed alignments for key NBS-LRR-related domains from the PFAM database:
- NB-ARC (PF00931)
- TIR (PF01582)
- RPW8 (PF05659)
- LRR1 (PF00560)
- LRR8 (PF13855)
Optionally, compile confirmed NBS-LRR protein sequences from related Lamiaceae species.
Use hmmbuild (from HMMER suite) to build individual HMM profiles from each seed alignment.
Combine all relevant HMM profiles into a single library file using hmmpress.

Table 1: Core PFAM Domains for NBS-LRR Identification

PFAM Accession	Domain Name	Typical e-value Cutoff	Primary Function in R-Gene
PF00931	NB-ARC	1e-10	Nucleotide binding & regulatory switch
PF01582	TIR	1e-5	Signaling domain (TNL class)
PF05659	RPW8	1e-3	Downstream signaling (some CNLs)
PF00560	LRR_1	1e-3	Pathogen recognition specificity
PF13855	LRR_8	1e-3	Pathogen recognition specificity

Primary Identification via HMMER Search

The custom HMM library is used to scan the S. miltiorrhiza proteome.

Experimental Protocol (HMMER Scan):

Execute hmmscan with the custom HMM library against the entire predicted proteome.
Parse the output.domtblout result file to identify protein sequences that contain at least one significant hit to the NB-ARC domain (PF00931, e-value < 1e-10).
Extract these candidate protein sequences into a new FASTA file for downstream validation.

Table 2: Example HMMER Scan Results for S. miltiorrhiza Proteome

Candidate Protein ID	NB-ARC Hit (e-value)	TIR Hit (e-value)	LRR Hit (e-value)	Putative Class
Smil_001734	2.5e-45	3.2e-12	1.8e-6	TNL
Smil_005892	8.9e-52	Not Detected	4.1e-8	CNL
Smil_003217	1.1e-40	Not Detected	Not Detected	NBS-only

Domain Architecture Validation with SMART and PFAM

Candidate sequences are subjected to rigorous domain analysis to confirm architecture.

Experimental Protocol (Domain Validation):

Submit the candidate FASTA file to the online SMART web service (in "normal" mode) to detect domains, considering low-complexity regions.
Simultaneously, use the standalone pfam_scan.pl tool against the local PFAM database to corroborate domain findings.
Manually curate results. A bona fide NBS-LRR candidate must possess:
- A definitive NB-ARC domain.
- A detectable LRR region (often fragmented in sequences).
- Either a TIR domain (for TNL class) or a Coiled-Coil (CC) domain (for CNL class) at the N-terminus, identifiable via tools like Ncoils or DeepCoil.
Discard sequences lacking a canonical domain structure (e.g., NBS-only fragments).

Genome-Wide R-Gene Identification Pipeline

Downstream Analysis for Thesis Research

Following identification, standard in silico analyses characterize the gene family.

Experimental Protocol (Downstream Analyses):

Phylogenetic Analysis: Perform multiple sequence alignment (ClustalO, MAFFT) of NB-ARC domains. Construct a phylogenetic tree (MEGA, IQ-TREE) to classify S. miltiorrhiza NBS-LRRs into TNL, CNL, and other subfamilies.
Motif Analysis: Use MEME Suite to identify conserved motifs outside the core domains.
Genomic Distribution: Map gene locations onto chromosomes using the GFF3 annotation to identify clusters.
Expression Profiling: Utilize available S. miltiorrhiza RNA-Seq data (e.g., from SRA) to analyze tissue-specific or stress-induced expression patterns of identified genes.

Downstream Bioinformatics Analysis Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Tools and Resources

Tool/Resource Name	Function in Pipeline	Key Parameter/Note
HMMER Suite (v3.3)	Profile HMM searches and building	Critical `--domtblout` flag for parsable output; e-value cutoff is key.
PFAM Database (v35.0)	Curated collection of protein domain HMMs	Source for seed alignments (NB-ARC, TIR, LRR).
SMART Web Service	Online domain architecture analysis	Set to "normal" mode to include low-complexity regions.
PFAM Scan Script	Local domain validation against PFAM	Ensures consistency and allows batch processing.
MEME Suite (v5.4.1)	Discovery of conserved protein motifs	Used to characterize non-canonical conserved regions.
MEGA11 / IQ-TREE2	Phylogenetic tree construction	Bootstrap values >70% generally indicate robust clades.
S. miltiorrhiza Genome Assembly (v2.0)	Reference sequence and annotation	Quality of identification depends directly on assembly quality.
Biopython Library	Python scripts for parsing HMMER/GFF files	Essential for automating filtering and data integration steps.

This technical guide outlines the methodological framework for phylogenetic tree construction, framed within the critical need to decipher the evolutionary relationships of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). The identification and characterization of this expansive resistance (R) gene family at a genome-wide scale are foundational for understanding the plant's innate immune system. Constructing robust phylogenies of NBS-LRR genes is essential for classifying gene subfamilies (TNLs, CNLs, RNLs), inferring evolutionary processes (e.g., tandem duplication, birth-and-death evolution), and facilitating cross-species comparisons to identify orthologs and conserved functional motifs. This guide details the computational and statistical pipelines used to transform raw sequence data into evolutionary hypotheses, directly supporting broader research aims in plant immunity and the biosynthetic pathways of pharmacologically active compounds.

Core Methodological Pipeline for NBS-LRR Phylogenetics

The standard workflow progresses from sequence curation to tree evaluation.

Title: Phylogenetic Analysis Workflow for NBS-LRR Genes

Detailed Experimental & Computational Protocols

Sequence Dataset Curation

Objective: Compile a comprehensive and non-redundant set of NBS-LRR protein or nucleotide sequences.
Protocol:
- Perform a genome-wide scan of the Salvia miltiorrhiza reference genome using HMMER (v3.3) with Pfam profiles for NB-ARC (PF00931) and LRR (PF00560, PF07723, PF07725, PF12799, PF13306, PF13855, PF14580).
- Extract candidate sequences and confirm domain architecture using SMART or InterProScan.
- Include representative NBS-LRR sequences from key related species (e.g., Salvia splendens, Mentha longifolia, Arabidopsis thaliana) from public databases (NCBI, Phytozome) to provide an evolutionary anchor.
- Perform multiple sequence alignment (MSA) using MAFFT (L-INS-i algorithm) or Clustal Omega for protein sequences. For coding sequences, align at the protein level and back-translate to nucleotides using PAL2NAL.

Phylogenetic Tree Inference Methods

Table 1: Core Phylogenetic Inference Methods

Method	Principle	Software Tools	Best Use Case for NBS-LRR
Maximum Parsimony (MP)	Minimizes total evolutionary changes (steps).	PAUP*, MEGA, PHYLIP	Initial exploration of closely related gene clades.
Distance-Matrix (NJ/UPGMA)	Uses pairwise genetic distances to build tree.	MEGA, PHYLIP, BioNJ	Large datasets (>1000 sequences) for initial clustering.
Maximum Likelihood (ML)	Finds tree maximizing probability of observed data under a model.	IQ-TREE, RAxML, PhyML	Standard method for robust, model-based inference.
Bayesian Inference (BI)	Estimates posterior probability of tree using models & priors.	MrBayes, BEAST2	Dating divergence events, complex model integration.

Detailed ML Protocol (using IQ-TREE):
- Input the curated MSA file (e.g., Sm_NBS_LRR.phy).
- Execute model finder: iqtree -s Sm_NBS_LRR.phy -m MFP -bb 1000 -alrt 1000. This selects the best-fit substitution model (e.g., LG+F+R10) and performs both ultrafast bootstrap (1000 replicates) and SH-aLRT test.
- The best tree file (Sm_NBS_LRR.phy.treefile) is produced with branch support values.
Detailed Bayesian Protocol (using MrBayes block in a Nexus file):

Run the analysis until the average standard deviation of split frequencies falls below 0.01, indicating convergence.

Tree Evaluation & Visualization

Branch Support: Report both Ultrafast Bootstrap (UFBoot) ≥ 95% and SH-aLRT ≥ 80% as strong support for ML trees. For Bayesian trees, posterior probability (PP) ≥ 0.95 is considered significant.
Visualization & Annotation: Use iTOL, ggtree (R package), or FigTree to visualize, color-code clades (e.g., TNL vs. CNL), and annotate with gene structure or genomic location data.

Title: Key Phylogenetic Tree Evaluation Metrics

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Resources for Phylogenetic Analysis of Plant NBS-LRR Genes

Item / Resource	Function / Purpose	Example / Source
Reference Genome & Annotation	Provides the foundational sequence data for gene family identification.	Salvia miltiorrhiza Genome (NCBI BioProject: PRJNA72695)
Domain Profile Hidden Markov Models (HMMs)	Sensitive detection of NBS and LRR domains in protein sequences.	Pfam (NB-ARC: PF00931; LRR profiles)
Multiple Sequence Alignment Software	Aligns homologous sequences for phylogenetic analysis.	MAFFT, Clustal Omega, MUSCLE
Model Selection Tool	Identifies best-fit substitution model for likelihood methods.	ModelFinder (in IQ-TREE), jModelTest2
Phylogenetic Inference Software	Core engine for tree building under different statistical criteria.	IQ-TREE, MrBayes, RAxML-NG
High-Performance Computing (HPC) Cluster	Provides necessary CPU power for ML/BI analyses of large gene families.	Local university cluster, Cloud computing (AWS, GCP)
Tree Visualization & Annotation Platform	Enables interpretation, formatting, and publication-quality figure generation.	iTOL, ggtree (R), FigTree

Advanced Applications inSalvia miltiorrhizaGenomics

Phylogenetic trees serve as scaffolds for advanced analyses:

Motif & Domain Co-evolution: Map conserved motifs (P-loop, RNBS, GLPL) onto tree branches to trace functional diversification.
Positive Selection Analysis: Use CodeML (PAML suite) to detect sites under positive selection (ω = dN/dS > 1) in specific NBS-LRR clades, indicating arms-race evolution with pathogens.
Synteny Network Analysis: Integrate phylogenetic clades with genomic location data to visualize tandem duplication clusters and infer ancestral genomic contexts.

Title: Phylogeny as a Scaffold for Integrated Analysis

1. Introduction This technical guide is presented within the framework of a genome-wide identification study of the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). NBS-LRR genes are critical for plant disease resistance, and their expression is tightly regulated by promoter cis-elements in response to biotic/abiotic stresses and hormone signals. Analyzing these regulatory motifs is essential for understanding the defense mechanisms of this economically important medicinal plant and for guiding metabolic engineering for enhanced production of bioactive compounds like tanshinones.

2. Core Cis-Elements in Plant Stress and Hormone Signaling Based on current literature and plant cis-element databases (e.g., PlantCARE, PLACE), key motifs relevant to S. miltiorrhiza NBS-LRR promoters are summarized below.

Table 1: Key Stress-Responsive and Hormone-Related Cis-Elements

Cis-Element Name	Core Sequence	Predicted Function	Associated Signal
W-box	(T)TGAC(C/T)	Binding site for WRKY transcription factors	Pathogen response, SA signaling
G-box	CACGTG	Light, ABA, JA, and stress responses	ABA, JA, oxidative stress
ABRE	ACGTG(G/T)C	ABA-responsive element	Abscisic Acid (ABA)
TCA-element	CCATCTTTTT	Salicylic Acid responsiveness	Salicylic Acid (SA)
TGACG-motif	TGACG	Jasmonic Acid responsiveness	Jasmonic Acid (JA)
ERE	AWTTCAAA	Ethylene responsiveness	Ethylene
AuxRR-core	GGTCCAT	Auxin responsiveness	Auxin
DRE/CRT	(A/G)CCGAC	Dehydration/Cold responsiveness	Abiotic stress (drought, cold, salt)
MYB/MYC	(C/T)AAC(T/G)G; CACATG	Binding sites for MYB/MYC TFs	Drought, ABA, JA
AS-1	TGACG	Oxidative and pathogen stress	SA, JA, H2O2

3. Experimental Protocol: Promoter Cis-Element Analysis Pipeline This protocol details the steps from gene identification to motif validation.

3.1. In Silico Identification and Extraction of Promoter Sequences

Input: Genome-wide identified NBS-LRR gene sequences from S. miltiorrhiza.
Step 1: Define the promoter region. Typically, extract the 1500-2000 bp genomic DNA sequence upstream of the transcription start site (TSS) for each gene.
Step 2: Use tools like BEDTools (bedtools getfasta) with a GFF3 annotation file to extract these sequences from the whole-genome FASTA file.
Step 3: Store sequences in a FASTA file formatted as >GeneID_promoter.

3.2. Computational Prediction of Cis-Elements

Step 1: Batch analysis using PlantCARE or PLACE online servers, or the plantcare_scan function in R/Bioconductor.
Step 2: Parse output files to compile a matrix of elements present in each promoter.
Step 3: Perform clustering analysis (e.g., hierarchical clustering) based on cis-element profiles to identify co-regulated gene groups.

3.3. Experimental Validation: Electrophoretic Mobility Shift Assay (EMSA)

Purpose: To confirm in vitro binding of predicted transcription factors (TFs) to the identified cis-elements.
Protocol:
- Probe Preparation: Synthesize complementary biotin-labeled oligonucleotides containing the wild-type (e.g., W-box) or mutated core motif. Anneal to form double-stranded probes.
- Protein Extraction: Isolate nuclear proteins from S. miltiorrhiza tissues treated with relevant stress (e.g., MeJA, SA) or controls.
- Binding Reaction: Incubate nuclear protein extract (5-20 µg) with the biotinylated probe (20 fmol) in binding buffer for 20-30 minutes at room temperature.
- Competition: For specificity tests, include a 100-200x molar excess of unlabeled wild-type (specific) or mutated (non-specific) competitor probe.
- Gel Electrophoresis: Run the reaction mixture on a pre-run, non-denaturing 6% polyacrylamide gel in 0.5X TBE buffer at 100V for 60-90 min.
- Transfer and Detection: Electrophoretically transfer DNA-protein complexes to a positively charged nylon membrane. Cross-link and detect using a chemiluminescent nucleic acid detection kit.

Diagram Title: Cis-Element Analysis & Validation Workflow

4. Signaling Pathways Involving Predicted Motifs The predicted cis-elements integrate NBS-LRR genes into complex signaling networks.

Diagram Title: Stress/Hormone Signals to Gene Activation Pathway

5. The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents and Kits for Promoter Analysis

Item	Function/Application	Example/Note
Genomic DNA Isolation Kit	High-quality gDNA extraction for promoter PCR.	DNeasy Plant Kits (QIAGEN).
High-Fidelity PCR Enzyme	Accurate amplification of promoter sequences from gDNA.	Phusion or KAPA HiFi Polymerase.
PlantCARE/PLACE Database	Core resource for in silico cis-element scanning.	Freely accessible web servers.
Biotin 3' End DNA Labeling Kit	For labeling EMSA probes.	Pierce Biotin 3' End DNA Labeling Kit.
Chemiluminescent Nucleic Acid Detection Module	Detection of biotinylated probes in EMSA.	Thermo Scientific Pierce.
Nuclear Extraction Kit	Isolation of nuclear proteins containing TFs for EMSA.	Plant Nuclei Isolation/Extraction Kits (e.g., from Sigma).
Mobility Shift Binding Buffer	Optimized buffer for TF-DNA binding reactions.	Often included in EMSA kits or prepared as 10X stock.
Polyacrylamide Gel Electrophoresis System	Separation of protein-DNA complexes from free probe.	Mini-PROTEAN Tetra System (Bio-Rad).
Positively Charged Nylon Membrane	Immobilization of EMSA complexes for detection.	Hybond-N⁺ membrane.
Hormone/Stress Elicitors	For treating plant materials to induce TF expression.	Methyl Jasmonate (MeJA), Salicylic Acid (SA), NaCl, PEG.

Gene Structure (Exon-Intron) and Protein Physicochemical Property Analysis

This whitepaper provides a technical guide for analyzing gene structure and the resulting protein properties, framed within a genome-wide identification study of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). NBS-LRR genes are central to plant innate immunity, and their characterization is critical for understanding disease resistance mechanisms and for potential drug development from this medicinal plant. Precise analysis of exon-intron architecture and derived protein physicochemical properties forms the foundational step in such genome-wide studies, enabling the classification of gene subfamilies and prediction of functional domains.

Core Concepts: Exon-Intron Structure & Protein Properties

Gene Structure Fundamentals

The structure of a eukaryotic gene is characterized by exons (expressed sequences) and introns (intervening sequences). In NBS-LRR genes, this architecture is highly informative:

Exons: Typically encode conserved functional domains such as the NB-ARC (Nucleotide-Binding Adaptor Shared by APAF-1, R proteins, and CED-4) domain and the LRR (Leucine-Rich Repeat) region.
Introns: Vary in number, phase (0, 1, or 2, depending on where they interrupt a codon), and length. Intron phase conservation is a key evolutionary marker for classifying NBS-LRR genes into TNL (TIR-NBS-LRR) and CNL (CC-NBS-LRR) subfamilies.

Derived Protein Physicochemical Properties

Primary protein sequences translated from coding sequences (CDS) are analyzed for inherent properties:

Molecular Weight (MW): Calculated from the sum of amino acid residues.
Theoretical Isoelectric Point (pI): The pH at which the protein carries no net charge.
Grand Average of Hydropathicity (GRAVY): Indicates overall hydrophobicity/hydrophilicity.
Instability & Aliphatic Indexes: Predict protein stability and thermotolerance.
Subcellular Localization Prediction: Critical for understanding the site of action (e.g., cytoplasm, nucleus, membrane) of immune receptors.

Detailed Experimental Protocols

Genome-Wide Identification and Gene Structure Analysis

Objective: To identify all NBS-LRR genes in the S. miltiorrhiza genome and delineate their exon-intron structures. Protocol:

Sequence Retrieval: Download the latest S. miltiorrhiza genome assembly (e.g., from NCBI, CNSA) and its corresponding annotation file (GFF3/GTF).
Hidden Markov Model (HMM) Search:
- Use the Pfam profiles for NBS-LRR core domains (PF00931: NB-ARC, PF07723: TIR, PF07725: LRR, PF13516: RPW8) as queries.
- Perform a HMMER (v3.3) search against the translated proteome: hmmsearch --domtblout output.txt pfam.hmm proteome.faa.
- Set an E-value cutoff (e.g., 1e-5) and manually verify borderline hits using NCBI CDD or SMART.
Gene Structure Visualization:
- Extract the genomic DNA, CDS, and protein sequences of identified genes using the annotation file.
- Use the Gene Structure Display Server (GSDS 2.0) or the gggenes R package. Input the genomic coordinates and exon/intron positions from the GFF3 file to generate visual comparisons.

Protein Physicochemical Property Analysis

Objective: To compute key physical and chemical parameters for the identified NBS-LRR proteins. Protocol:

Parameter Calculation:
- Use the ExPASy ProtParam tool (accessible via the protr R package or the Bio.SeqUtils module in Biopython) in batch mode.
- Input the canonical protein FASTA sequence for each identified gene.
- Extract and tabulate: Number of amino acids, Molecular Weight, Theoretical pI, Instability Index, Aliphatic Index, and GRAVY.
Subcellular Localization Prediction:
- Run sequences through multiple predictors (e.g., WoLF PSORT, TargetP 2.0, CELLO) for consensus.
- For NBS-LRRs, pay special attention to signals for chloroplast, cytoplasm, or plasma membrane targeting.

Data Presentation

Table 1: Summary of Exon-Intron Structure in S. miltiorrhiza NBS-LRR Genes

NBS-LRR Subfamily	Number of Genes	Average Exon Count (Range)	Average Gene Length (bp)	Conserved Intron Phase Pattern
TNL (TIR-NBS-LRR)	~45*	4.2 (3-6)*	3450*	Phase 2 after TIR, Phase 0 before LRR*
CNL (CC-NBS-LRR)	~68*	3.1 (2-5)*	2850*	Phase 0 dominant*
RNL (RPW8-NBS-LRR)	~12*	2.8 (2-4)*	2500*	Variable*
Total	~125*	3.4	~3000*

Table 2: Computed Physicochemical Properties of Representative S. miltiorrhiza NBS-LRR Proteins

Gene ID	Subfamily	AA Length	Mol. Weight (kDa)	Theoretical pI	Instability Index	Aliphatic Index	GRAVY	Pred. Localization
SmNLR001*	TNL	950*	108.5*	6.2*	38.5 (Stable)*	85.2*	-0.25*	Cytoplasm*
SmNLR045*	CNL	820*	93.8*	8.1*	45.1 (Unstable)*	91.5*	-0.12*	Chloroplast*
SmNLR112*	RNL	710*	81.3*	5.8*	40.2 (Stable)*	78.9*	-0.31*	Nucleus*

Example data based on typical results; actual values require live genome analysis.

Mandatory Visualizations

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for NBS-LRR Gene Analysis

Item/Category	Specific Example/Product	Function in Analysis
Genomic Data	Salvia miltiorrhiza v2.0 Genome (NCBI)	Reference sequence for identification and mapping.
HMM Profiles	Pfam NB-ARC (PF00931), LRR (PF07725)	Curated domain models for sensitive sequence searching.
Bioinformatics Suites	HMMER 3.3, Biopython, R (tidyverse, gggenes)	Core software for sequence analysis, parsing, and visualization.
Sequence Analysis Web Tools	GSDS 2.0, ExPASy ProtParam, WoLF PSORT	User-friendly platforms for structure drawing and property calculation.
Validation Reagents (Wet-Lab)	Phire Plant Direct PCR Kit (Thermo Fisher)	For PCR amplification of candidate genes from S. miltiorrhiza gDNA.
Cloning & Expression Vectors	pEASY-Blunt Cloning Vector; pCAMBIA1300-GFP	For sequence verification and subcellular localization assays (transient expression).
Positive Control Sequences	Arabidopsis RPP1 (TNL) or RPM1 (CNL) CDS	Well-characterized NBS-LRR genes for alignment and analysis comparison.

This technical guide outlines methodologies for analyzing RNA-Seq data to characterize expression patterns of Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) genes in Salvia miltiorrhiza (Danshen). Within the broader thesis context of genome-wide identification of the NBS-LRR family, this document provides protocols for investigating tissue-specific expression and transcriptional responses to biotic stress, crucial for understanding disease resistance mechanisms in this medicinal plant.

NBS-LRR genes constitute the largest class of plant disease resistance (R) genes. In S. miltiorrhiza, a plant valued for its roots containing bioactive tanshinones and phenolic acids, identifying and characterizing these genes is vital for breeding resilient cultivars. RNA-Seq expression profiling bridges genome-wide identification and functional validation, revealing which NBS-LRR genes are active in specific tissues or induced by pathogen/elicitor challenges.

Experimental Design for RNA-Seq Profiling

A robust design is critical for meaningful comparative expression analysis.

2.1 Tissue-Specific Profiling:

Tissues: Root, stem, leaf, flower (at full bloom).
Biological Replicates: Minimum of three independent plants per tissue.
Goal: Identify constitutively expressed or tissue-enriched NBS-LRR genes.

2.2 Pathogen/Elicitor Treatment Profiling:

Treatment Groups:
- Control (Mock treatment).
- Pseudomonas syringae pv. tomato (Pst) inoculation.
- Methyl Jasmonate (MeJA) spray (100 µM).
- Salicylic Acid (SA) spray (2 mM).
Time Series: Sample tissues (e.g., leaves) at 0, 6, 12, 24, and 48 hours post-treatment.
Replicates: Four biological replicates per time point per treatment.
Goal: Uncover differentially expressed NBS-LRR genes in defense signaling pathways.

Detailed Experimental Protocols

Library Preparation and Sequencing

Protocol: Total RNA is extracted using a modified CTAB method with DNase I treatment. RNA integrity (RIN > 8.0) is verified via Bioanalyzer. Strand-specific cDNA libraries are prepared using the Illumina TruSeq Stranded mRNA LT Sample Prep Kit. Sequencing is performed on an Illumina NovaSeq 6000 platform for 150 bp paired-end reads, targeting ~40 million reads per sample.

Bioinformatic Analysis Workflow

RNA-Seq Analysis Workflow Quality control, alignment, quantification, and differential expression analysis.

NBS-LRR Expression Subsetting & Analysis

Protocol: A custom list of genome-identified NBS-LRR gene IDs is used to subset the global count matrix. Normalized expression values (e.g., TPM, FPKM from StringTie or counts from DESeq2) are extracted for this gene family. Tissue-specific or induced expression is analyzed using clustering and statistical overrepresentation tests.

Key Signaling Pathways in NBS-LRR Mediated Defense

NBS-LRR proteins recognize pathogen effectors and trigger immune responses via SA and JA signaling networks.

NBS-LRR Triggered Immune Signaling Pathways Effector recognition leads to SA/JA pathway activation and systemic resistance.

Data Presentation: Representative Expression Profiles

Table 1: Expression (Mean TPM) of Selected NBS-LRR Genes Across Tissues

Gene ID (SmNLR)	Root	Stem	Leaf	Flower	Putative Role
SmNLR001	12.5	1.2	0.8	45.7	Floral defense
SmNLR045	85.3	3.4	2.1	4.5	Root-specific
SmNLR128	15.6	18.9	22.4	20.1	Constitutive
SmNLR201	2.3	1.5	32.6	5.4	Leaf-enriched

Table 2: Top NBS-LRR Genes Induced by Pathogen/Elicitor at 24h (Log2 Fold Change)

Gene ID	Pst vs Mock	MeJA vs Mock	SA vs Mock	Likely Pathway
SmNLR012	5.8	1.2	6.5	SA-mediated
SmNLR078	3.2	4.5	0.5	JA-mediated
SmNLR155	4.1	3.8	2.1*	Co-induced
SmNLR189	0.5	0.3	-0.8	Not responsive

(p-adj < 0.05, *p-adj < 0.01)

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for RNA-Seq Profiling of Plant Defense

Item	Function/Benefit	Example Product
RNA Stabilization Agent	Immediate stabilization of RNA in harvested tissue, preventing degradation.	RNAlater, Life Technologies
Polysaccharide/Polyphenol RNA Kit	Optimized for plants like S. miltiorrhiza rich in secondary metabolites.	Plant RNA Kit, Zymo Research
Stranded mRNA Library Prep Kit	Maintains strand orientation, improving transcriptome assembly.	TruSeq Stranded mRNA, Illumina
ERCC RNA Spike-In Mix	External controls for normalization and assessing technical variation.	ERCC ExFold Mix, Thermo Fisher
Pathogen/Elicitor Standards	Defined inoculum/hormone concentrations for reproducible treatments.	P. syringae DC3000, MeJA (Sigma)
NBS-LRR HMM Profile	Computational probe for identifying NBS-LRRs in genome/transcriptome.	PF00931, PF00560 (Pfam)
Differential Expression Software	Statistical analysis of count data for robust DEG calling.	DESeq2 R package

Correlating NBS-LRR Expression with Biosynthetic Pathways of Tanshinones and Phenolic Acids

This whitepaper presents an in-depth technical analysis conducted within the broader framework of a doctoral thesis focused on the genome-wide identification and characterization of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). Following the comprehensive identification of 92 NBS-LRR genes in the S. miltiorrhiza genome, this research phase aims to elucidate the functional correlation between the expression patterns of specific NBS-LRR clades and the regulation of key biosynthetic pathways for therapeutically valuable secondary metabolites: the lipophilic tanshinones and the hydrophilic phenolic acids. The core hypothesis is that pathogen-associated molecular pattern (PAMP)-triggered immunity, mediated by specific NBS-LRRs, intersects with and modulates the metabolic engineering of these bioactive compounds.

Literature Synthesis: NBS-LRRs as Potential Regulators of Specialized Metabolism

Recent studies indicate that plant resistance (R) genes, particularly NBS-LRRs, are not only central to biotic stress perception but also participate in extensive signaling crosstalk that influences downstream transcriptional reprogramming. This reprogramming often extends to the activation of defense-related secondary metabolic pathways. In S. miltiorrhiza, the biosynthesis of tanshinones (e.g., tanshinone IIA, cryptotanshinone) via the mevalonate (MVA) and methylerythritol phosphate (MEP) pathways, and of phenolic acids (e.g., rosmarinic acid, salvianolic acid B) via the phenylpropanoid pathway, is known to be induced by various elicitors, including fungal extracts and signaling molecules like jasmonic acid (JA) and salicylic acid (SA). This positions NBS-LRRs as potential upstream signaling nodes whose activation could fine-tune the expression of key biosynthetic enzyme genes such as SmCPS, SmKSL, SmPAL, and SmRAS.

Core Experimental Data and Correlation Analysis

A time-course experiment was designed where S. miltiorrhiza hairy root cultures were treated with the fungal elicitor Verticillium dahliae cell wall extract. Expression levels of selected NBS-LRR genes (representing TNL, CNL, and RNL subfamilies) and key biosynthetic pathway genes were quantified via qRT-PCR. Concurrently, metabolite accumulation was measured by HPLC.

Table 1: Correlation Matrix of NBS-LRR Expression with Metabolite Biosynthetic Gene Expression (Pearson's r) at 24h Post-Elicitation

NBS-LRR Gene (Subfamily)	SmCPS (Tanshinone)	SmKSL (Tanshinone)	SmPAL (Phenolic Acid)	SmRAS (Phenolic Acid)
SmNBS-LRR05 (TNL)	0.92	0.88	0.45	0.51
SmNBS-LRR18 (CNL)	0.78	0.81	0.67	0.72
SmNBS-LRR45 (RNL)	0.12	0.09	0.91	0.89
SmNBS-LRR72 (CNL)	0.05	0.10	0.08	-0.03

Table 2: Fold-Change in Metabolite Accumulation Relative to Control at 72h Post-Elicitation

Metabolite Class	Specific Metabolite	Fold Change (Elicited vs. Control)	HPLC Peak Area (mAU*s) ± SD
Tanshinones	Tanshinone IIA	3.8	12540 ± 980
	Cryptotanshinone	4.2	8920 ± 760
Phenolic Acids	Salvianolic Acid B	5.1	28750 ± 2100
	Rosmarinic Acid	3.7	15430 ± 1150

Detailed Experimental Protocols

Material: 14-day-old S. miltiorrhiza hairy root cultures in 1/2 MS liquid medium.
Elicitor Preparation: Verticillium dahliae mycelium was lyophilized and ground. Cell wall polysaccharides were extracted in hot water, filtered, and sterilized. Working concentration was 100 µg/mL.
Protocol: Elicitor was added to culture flasks. Roots were harvested by vacuum filtration at 0, 6, 12, 24, 48, and 72 hours post-elicitation (hpe), flash-frozen in liquid N₂, and stored at -80°C. Three biological replicates per time point.

RNA Extraction, cDNA Synthesis, and qRT-PCR

RNA Extraction: Using Omega Plant RNA Kit. 100 mg tissue homogenized in liquid N₂. DNase I treatment performed on-column.
cDNA Synthesis: 1 µg total RNA used with HiScript III RT SuperMix (Vazyme) with oligo(dT) and random hexamer primers.
qRT-PCR: Performed on Bio-Rad CFX96 using ChamQ SYBR qPCR Master Mix. Program: 95°C for 30 sec; 40 cycles of 95°C for 10 sec, 60°C for 30 sec. Melt curve analysis added. SmActin served as reference. Gene-specific primers were designed with Tm ~60°C and amplicons 80-150 bp. Expression calculated via 2^(-ΔΔCt) method.

Metabolite Extraction and HPLC Analysis

Extraction: 50 mg powdered lyophilized root tissue was extracted with 1 mL 70% methanol (for phenolic acids) or 100% methanol (for tanshinones) by sonication for 30 min, followed by centrifugation.
HPLC Conditions: Agilent 1260 Infinity II with DAD. Column: ZORBAX SB-C18 (4.6 x 250 mm, 5 µm).
- For Tanshinones: Mobile phase: Water (A) and Acetonitrile (B). Gradient: 0-30 min, 40-90% B; 30-35 min, 90% B. Flow: 1.0 mL/min. Detection: 270 nm.
- For Phenolic Acids: Mobile phase: 0.1% Formic Acid in Water (A) and Acetonitrile (B). Gradient: 0-25 min, 10-30% B; 25-30 min, 30-90% B. Flow: 1.0 mL/min. Detection: 280 nm. Quantification used external standard curves.

Signaling Pathway and Workflow Visualizations

Proposed NBS-LRR Mediated Signaling to Metabolism

Experimental Workflow for Correlation Study

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents and Materials for the Featured Experiments

Item/Category	Specific Product/Example	Function in the Experiment
Plant Culture System	S. miltiorrhiza Hairy Root Line (e.g., induced by Agrobacterium rhizogenes A4)	Provides genetically stable, fast-growing, and metabolite-producing plant material suitable for elicitation studies in controlled, sterile conditions.
Elicitor	Verticillium dahliae Cell Wall Extract (Custom-prepared)	Acts as a biotic stressor/PAMP to trigger the plant immune response, activating NBS-LRR and downstream defense pathways.
RNA Extraction Kit	Omega Bio-tek E.Z.N.A. Plant RNA Kit	Efficiently isolates high-quality, genomic DNA-free total RNA from polysaccharide and polyphenol-rich root tissues.
Reverse Transcription Kit	Vazyme HiScript III RT SuperMix (+gDNA wiper)	Provides highly efficient and consistent cDNA synthesis from RNA templates, including removal of genomic DNA contamination.
qPCR Master Mix	Vazyme ChamQ Universal SYBR qPCR Master Mix	A premixed, optimized solution for sensitive and specific quantitative real-time PCR detection of gene expression levels.
HPLC Solvents & Standards	Sigma-Aldrich Acetonitrile (HPLC grade), Tanshinone IIA, Salvianolic Acid B (Analytical Standards)	Essential for metabolite separation (mobile phase) and accurate quantification via external standard calibration curves.
HPLC Column	Agilent ZORBAX StableBond SB-C18 (4.6 x 250 mm, 5 µm)	Provides robust, high-resolution separation of both non-polar (tanshinones) and polar (phenolic acids) compound mixtures.
Statistical Software	R (with `corrplot`, `ggplot2` packages) / SPSS	Used for calculating Pearson correlation coefficients, generating correlation matrix heatmaps, and performing significance testing on experimental data.

This guide details the practical translation of fundamental genomic research into applied breeding tools, framed within a specific thesis context: "Genome-wide identification and characterization of the NBS-LRR gene family in Salvia miltiorrhiza and its implications for disease resistance breeding." The NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) genes constitute the largest class of plant disease resistance (R) genes. Their genome-wide identification provides the foundational data for selecting candidate genes linked to pathogen resistance, enabling the development of molecular markers for marker-assisted selection (MAS) in S. miltiorrhiza (Danshen) breeding programs aimed at improving yield, quality, and stability.

From Candidate Gene to Molecular Marker: A Technical Workflow

The process involves a multi-step pipeline from in silico analysis to wet-lab validation and application.

1In SilicoIdentification and Characterization Protocol

Objective: To identify, annotate, and preliminarily characterize all NBS-LRR genes in the S. miltiorrhiza genome.

Methodology:

Data Retrieval: Obtain the latest S. miltiorrhiza genome assembly (e.g., from NCBI, Sm Genome Database) and its protein sequence file.
Hidden Markov Model (HMM) Search: Use HMMER software (hmmsearch) with the Pfam profiles for NBS (NB-ARC, PF00931) and LRR (PF00560, PF07723, PF07725, PF12799, PF13306, PF13855) domains. Scan the proteome with an E-value cutoff (e.g., 1e-5).
Candidate Sequence Extraction: Compile all proteins containing at least one NBS domain.
Domain Architecture Validation: Re-analyze candidate sequences using CDD (Conserved Domain Database) on NCBI and SMART to confirm domain presence and structure.
Phylogenetic Analysis: Align full-length protein sequences using MUSCLE or ClustalW. Construct a phylogenetic tree (Neighbor-Joining or Maximum Likelihood method, 1000 bootstrap replicates) using MEGA or IQ-TREE to classify genes into subfamilies (TNL, CNL, RNL, etc.).
Chromosomal Mapping & Gene Duplication Analysis: Map gene physical locations using genome annotation (GFF3 file). Identify tandem and segmental duplication events using MCScanX with criteria: aligned sequence coverage >75% and identity >75%.
Cis-Regulatory Element Analysis: Extract 1500-2000 bp promoter sequences upstream of the start codon. Analyze using PlantCARE or PlantPAN for stress/hormone-responsive elements (e.g., W-box, TC-rich repeats, ABRE, ERE, MeJA-responsiveness).

Table 1: Exemplary Output from S. miltiorrhiza NBS-LRR Genome-Wide Identification

Analysis Parameter	Exemplary Quantitative Result	Interpretation for Breeding
Total NBS-LRR Genes Identified	121	Defines the total pool of candidate R genes.
Subfamily Classification (TNL:CNL:RNL:Others)	45:68:5:3	Informs potential signaling pathways; CNLs may dominate.
Genes with Tandem Duplication	32 (in 12 clusters)	Highlights genomic hotspots for rapid evolution and potential resistance diversity.
*Genes Containing Stress-responsive Cis-elements*	89 (73.5%)	Prioritizes genes likely regulated by pathogen attack.

Title: Computational pipeline for NBS-LRR gene identification.

Candidate Gene Prioritization and Marker Development

Objective: To select the most promising candidate genes for functional study and develop linked molecular markers.

Prioritization Criteria:

Expression Evidence: RNA-Seq data under pathogen (e.g., Fusarium oxysporum) or elicitor (e.g., salicylic acid) treatment.
Evolutionary Markers: Presence of positive selection signatures (dN/dS >1) in LRR regions.
Genomic Colocalization: Physical proximity to previously identified QTLs for disease resistance in related Lamiaceae species.

Marker Development Protocol:

Sequence Polymorphism Identification: Re-sequence the coding and promoter regions of the prioritized candidate gene in a diverse panel of S. miltiorrhiza genotypes (resistant vs. susceptible).
SNP/InDel Discovery: Align sequences to identify polymorphisms.
Marker Design:
- Kompetitive Allele-Specific PCR (KASP) Markers: For biallelic SNPs. Design two allele-specific forward primers and one common reverse primer.
- CAPS/dCAPS Markers: If SNP creates/destroys a restriction enzyme site. Design PCR primers flanking the SNP, followed by restriction digest.

Table 2: Experimental Panel for Candidate Gene Validation

Material Type	Example/Description	Function in Validation
Plant Germplasm	30 S. miltiorrhiza accessions with known resistance/susceptibility to root rot.	Phenotypic correlation for marker-trait association.
Pathogen Strain	Fusarium oxysporum f. sp. miltiorrhizae (FoM), virulent isolate.	For pathogen challenge experiments.
Elicitors	Salicylic Acid (SA), Methyl Jasmonate (MeJA).	To simulate defense signaling and induce gene expression.
qPCR Reagents	SYBR Green master mix, gene-specific primers, reverse transcription kit.	To quantify candidate gene expression post-elicitation.
Genotyping Platform	KASP assay mix, thermal cycler with fluorescence detection, or CAPS restriction enzymes.	For high-throughput screening of molecular markers.

Functional Validation Protocol: VIGS inS. miltiorrhiza

Objective: To rapidly assess the function of a candidate NBS-LRR gene in disease resistance.

Methodology (Tobacco Rattle Virus-based VIGS):

Target Fragment Cloning: Amplify a 300-500 bp gene-specific fragment from the candidate SmNLR gene via PCR. Clone into the TRV2 vector.
Agrobacterium Transformation: Transform the recombinant TRV2 and helper TRV1 vectors into Agrobacterium tumefaciens strain GV3101.
Plant Infiltration: Mix TRV1 and TRV2-SmNLR cultures (OD600=1.0) in a 1:1 ratio. Pressure-infiltrate the mixture into the leaves of 4-week-old S. miltiorrhiza seedlings. Include TRV2-empty vector and TRV2-PDS (phytoene desaturase, positive control for silencing) infiltrations.
Silencing Verification: After 2-3 weeks, sample new leaves. Verify SmNLR transcript knockdown via qRT-PCR.
Phenotypic Assay: Challenge silenced plants with FoM spore suspension (e.g., root dipping). Monitor disease symptoms (lesion size, wilting) and measure pathogen biomass (via qPCR with FoM-specific primers) 7-14 days post-inoculation compared to controls.

Title: Functional validation workflow using Virus-Induced Gene Silencing (VIGS).

Integration into Breeding Programs

Validated markers are deployed in a Marker-Assisted Selection (MAS) pipeline. Breeders cross a donor parent carrying the resistant allele (diagnosed by the KASP/CAPS marker) with an elite, high-yielding but susceptible parent. In subsequent generations (F2 or BC1F1), seedlings are screened early with the molecular marker instead of waiting for laborious and environmentally variable pathogen bioassays. This accelerates the development of improved S. miltiorrhiza varieties with enhanced, durable resistance, ensuring stable production of bioactive compounds (tanshinones, salvianolic acids) for pharmaceutical use.

The Scientist's Toolkit: Research Reagent Solutions

Reagent/Material	Supplier Examples	Critical Function in the Workflow
HMMER Software Suite	http://hmmer.org	Core tool for initial in silico identification of NBS-LRR proteins using Pfam domain models.
Phusion High-Fidelity DNA Polymerase	Thermo Fisher, NEB	Ensures accurate amplification of candidate gene sequences for cloning and polymorphism analysis.
pTRV1 & pTRV2 VIGS Vectors	Arabidopsis Biological Resource Center (ABRC) or addgene.	Essential plant viral vectors for performing rapid loss-of-function assays via Virus-Induced Gene Silencing.
KASP Assay Mix & Genotyping Master Mix	LGC Biosearch Technologies	Enables high-throughput, cost-effective SNP genotyping for marker-assisted selection in breeding populations.
SYBR Green qRT-PCR Master Mix	Bio-Rad, Takara	For quantitative analysis of candidate gene expression patterns in response to pathogen/elicitor treatment.
RNA Extraction Kit (for polysaccharide-rich plants)	Qiagen RNeasy Plant Kit, or CTAB-based methods	Specialized for high-quality RNA isolation from S. miltiorrhiza, which is rich in secondary metabolites.
Fusarium oxysporum Specific Primers	Custom designed from ITS/EF-1α sequences	Allows precise quantification of fungal biomass in plant tissues during disease progression assays.

Resolving Challenges in R-Gene Analysis: Best Practices and Solutions for Accurate NBS-LRR Identification

Within the context of genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen), Hidden Markov Model (HMM) searches are a cornerstone methodology. This technical guide details two critical, often overlooked pitfalls: the inappropriate application of default E-value thresholds and the mis-annotation of genes with incomplete domain architectures. We provide a rigorous, protocol-driven framework to optimize HMM-based discovery for complex gene families in plant genomes, directly supporting downstream pharmaceutical research into plant immune system-derived compounds.

Salvia miltiorrhiza is a model medicinal plant, with its NBS-LRR genes of significant interest for understanding disease resistance and potential bioactivity. Genome-wide identification relies on HMM profiles (e.g., Pfam: NB-ARC, PF00931; LRR, PF00560, PF07723, PF07725). However, the high copy number, diversity, and fragmentation of these genes necessitate refined search strategies to avoid both false negatives (overly strict thresholds) and false positives (overly permissive thresholds or ignoring domain integrity).

Pitfall 1: Misapplication of Default E-value Thresholds

The E-value cutoff is not universal. The standard default of 0.01 or 0.001 in tools like HMMER may exclude legitimate, divergent NBS-LRR members in S. miltiorrhiza.

Quantitative Analysis of Hit Sensitivity vs. E-value

Data from a representative S. miltiorrhiza genome scan using the NB-ARC HMM profile illustrates the trade-off.

Table 1: Hit Retrieval at Different E-value Cutoffs in a S. miltiorrhiza Genome Scan

E-value Threshold	Number of Candidate Sequences	Estimated False Positives	Key Characteristics of Additional Hits at Lenient Thresholds
1e-10	45	< 0.01	Canonical, full-length NBS-LRR genes.
1e-03	62	~0.5	Includes divergent but likely functional NBS domains.
1e-01	89	~5-7	Includes highly divergent sequences and partial pseudogenes.
1.0	127	~30-40	Many partial ORFs, non-specific matches.

Recommended Experimental Protocol: Iterative E-value Calibration

Initial Broad Search: Run hmmsearch with the NB-ARC profile against the S. miltiorrhiza proteome using a permissive E-value (e.g., 10.0). Save full results.
Domain Architecture Filtering: Extract sequences. Run hmmscan against the full Pfam database to identify all domains present in each hit.
Stratification by Confidence: Categorize hits into:
- High-Confidence: E-value < 0.001 and possesses a contiguous NB-ARC domain with key motifs (P-loop, RNBS-A-D, GLPL, etc.).
- Medium-Confidence: 0.001 < E-value < 0.1 and has a plausible NBS-LRR domain structure (NB-ARC + LRRs or TIR/CC).
- Low-Confidence: E-value > 0.1 or fragmented/ambiguous domain structure.
Manual Curation & Alignment: Perform multiple sequence alignment (e.g., MAFFT) on medium-confidence hits with high-confidence sequences. Manually inspect for conservation of critical residues. Re-classify based on evolutionary relatedness.

Pitfall 2: Overlooking Incomplete Domain Structures

Many genuine NBS-LRR genes, especially in draft genomes, may be fragmented due to sequencing/assembly gaps or may belong to naturally truncated subfamilies (e.g., TN-type genes).

Protocol for Handling Incomplete Genes

Six-Frame Translation & Search: Run tblastn using a curated set of S. miltiorrhiza NBS domains against the genome assembly to find regions missed in gene annotation.
Synteny Analysis: Compare genomic loci of truncated candidates with orthologous loci in related species (e.g., Salvia splendens) using MCScanX to distinguish assembly artifacts from real truncations.
RT-PCR Validation: Design primers flanking the putative gap or truncation site.
- Template: cDNA from S. miltiorrhiza leaves treated with salicylic acid (to induce NBS-LRR expression) and controls.
- PCR: Use high-fidelity polymerase. Clone and sequence products.
- Analysis: Compare sequences to genomic scaffolds to confirm or correct gene models.

Integrated Workflow for Robust NBS-LRR Identification

The following diagram outlines the decision process integrating E-value adjustment and domain structure analysis.

Title: HMM Search Workflow for NBS-LRR ID with E-value & Domain Checks

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents and Tools for NBS-LRR Identification in Plants

Item	Function/Description	Example Product/Code
Curated HMM Profiles	Core search models for NBS and LRR domains.	Pfam NB-ARC (PF00931); Pfam LRR_1 (PF00560).
HMMER Software Suite	Primary tool for sensitive sequence searches using HMMs.	HMMER 3.3.2 (http://hmmer.org/).
High-Fidelity Polymerase	Accurate amplification of candidate genes for validation.	KAPA HiFi HotStart ReadyMix, Phusion.
cDNA Synthesis Kit	Generate template from induced plant tissue for RT-PCR.	SuperScript IV Reverse Transcriptase.
Domain Database	For comprehensive domain architecture analysis.	Pfam, CDD, InterProScan.
Synteny Analysis Tool	To distinguish gene fragmentation from real truncation.	MCScanX, JCVI utility library.
Multiple Aligner	For assessing homology and residue conservation.	MAFFT, Clustal Omega.
Plant Induction Agent	To upregulate NBS-LRR gene expression pre-RNA extraction.	Salicylic Acid (100 µM).

Accurate genome-wide identification of the NBS-LRR family in Salvia miltiorrhiza requires moving beyond default parameters. A calibrated, iterative approach to E-value thresholds combined with rigorous validation of domain architecture is essential. This strategy minimizes annotation errors, providing a reliable foundation for subsequent functional characterization and exploration of this gene family's role in plant defense and medicinal compound biosynthesis.

Resolving Ambiguous or Fragmented Gene Models in Genome Assemblies

In the context of genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), resolving ambiguous or fragmented gene models is a critical, non-trivial challenge. This medicinal plant's genome is highly repetitive and complex, leading to frequent mis-assemblies and incomplete gene structures, particularly for large, multi-domain resistance (R) genes like NBS-LRRs. Accurate gene model annotation is paramount for downstream evolutionary, expression, and functional studies aimed at elucidating the genetic basis of disease resistance and secondary metabolite production for drug development.

Ambiguities arise from inherent limitations in sequencing and assembly technologies and the biological nature of the genome itself. Key sources include:

Sequencing Errors & Limitations: Short-read Illumina data often fails to resolve complex repeats, leading to collapsed regions. While long-read technologies (PacBio, Nanopore) improve continuity, they have higher per-base error rates.
Assembly Algorithms: Heuristic assemblers may break contigs at repeats, fragmenting genes that span these regions.
Biological Complexity: NBS-LRR genes exist in large, tandemly duplicated clusters with high sequence similarity (>80% identity), confounding both assembly and annotation software.
Annotation Pipeline Shortcomings: Ab initio predictors perform poorly on novel gene families without training. Evidence-based aligners (e.g., from RNA-seq) may be incomplete, especially for lowly expressed genes.

Quantitative Impact onS. miltiorrhizaNBS-LRR Identification

Recent studies highlight the scale of the problem. The table below summarizes quantitative data from recent S. miltiorrhiza genome projects, illustrating how assembly and annotation strategies directly affect NBS-LRR catalog completeness.

Table 1: Impact of Assembly Strategy on NBS-LRR Gene Model Statistics in S. miltiorrhiza

Study & Assembly Version	Assembly Technology	Annotation Method	Total Putative NBS-LRRs Identified	Percentage Fragmented/Partial Models	Key Limitation Noted
Xu et al., 2023 (v3.0)	PacBio CLR + Hi-C	MAKER2 (RNA-seq + homology)	121	~18%	Fragmentation in telomeric clusters
Zhang et al., 2021 (v2.0)	Illumina + BioNano	BRAKER2 (RNA-seq)	89	~35%	High fragmentation due to short-read gaps
Cui et al., 2020 (v1.0)	Illumina Only	Augustus (ab initio)	63	~50%	Severe underrepresentation, most models partial

Integrated Protocol for Resolving Gene Models

A multi-evidence, iterative refinement pipeline is required. The following protocol is tailored for NBS-LRR gene discovery in complex plant genomes like S. miltiorrhiza.

Phase 1: Evidence Aggregation and Consensus Building

Objective: Generate a comprehensive set of gene hints from diverse data sources.

Transcriptome Alignment:
- Materials: High-quality, strand-specific RNA-seq libraries from multiple tissues (root, leaf, flower) and stress treatments (e.g., Fusarium infection).
- Protocol: Align RNA-seq reads to the genome assembly using a splice-aware aligner (HISAT2, STAR). Use StringTie to assemble transcripts and generate transcript-based gene models (transcript_hints.gff).
Protein Homology Alignment:
- Materials: Curated protein sets from closely related Lamiaceae species (e.g., S. splendens, Mentha longifolia) and a database of canonical NBS-LRR proteins from UniProt.
- Protocol: Perform protein-to-genome alignment using Exonerate or GenomeThreader. Use a sensitive scoring matrix (e.g., BLOSUM80) and retain all plausible alignments (protein_hints.gff).
Synthetic Hint Generation with RGAugury:
- Materials: The RGAugury pipeline pre-configured for plants.
- Protocol: Run the entire genome assembly through RGAugury. Its integrated ab initio predictors (NLGenomeSweep) will generate structural hints specific to NBS-LRR domains (rga_hints.gff).

Phase 2: Iterative Annotation with MAKER2

Objective: Integrate all evidence into a unified, high-confidence gene annotation.

First MAKER Run:
- Configure MAKER control files to include all hint files (transcript_hints.gff, protein_hints.gff, rga_hints.gff) as evidence. Use SNAP and AUGUSTUS as ab initio predictors, trained on a related species if no S. miltiorrhiza training set exists.
- Execute MAKER. The output (genome_v1.all.gff) will be an evidence-aware annotation.
Predictor Training:
- Use the output from the first MAKER run to train SNAP and AUGUSTUS specifically on S. miltiorrhiza. The BUSCO tool can identify a set of high-confidence, complete genes from the MAKER output for this purpose.
- Protocol: maker2zff and fathom (SNAP training); autoAug.pl (AUGUSTUS training).
Second MAKER Run:
- Re-run MAKER with the newly trained S. miltiorrhiza-specific gene predictors. This iteration will dramatically improve accuracy, especially for NBS-LRR gene boundaries and exon-intron structure.

Phase 3: Manual Curation and Gap-Closing

Objective: Resolve remaining ambiguities in NBS-LRR clusters.

Visual Inspection in Genome Browser:
- Load the MAKER annotation, RNA-seq alignments (BAM), and protein alignments into IGV or JBrowse. Manually inspect every NBS-LRR locus.
- Criteria for Merging/Splitting: Merge gene models separated by short (< 500 bp), non-coding gaps within a repeat cluster. Split models that span an unambiguous, gene-free region > 5 kb. Use full-length transcript and protein alignments as the primary guide.
Targeted PCR and Sequencing:
- For highly fragmented models in critical regions (e.g., near QTLs for disease resistance), design PCR primers from the flanking conserved sequences of the NBS domain.
- Protocol: Amplify genomic DNA using LongAmp Taq. Clone the PCR product and sequence via Sanger method. Use the confirmed sequence to correct the assembly and annotation in that locus.

Visualization of the Integrated Pipeline

Title: NBS-LRR Gene Model Resolution Pipeline

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents and Tools for Resolving S. miltiorrhiza Gene Models

Item	Category	Function & Rationale
PacBio HiFi Reads	Sequencing Reagent	Generate long (~15-20 kb), highly accurate reads to span repetitive NBS-LRR clusters, reducing assembly breaks.
Hi-C Sequencing Kit	Sequencing Reagent	Provides chromatin conformation data to scaffold contigs into chromosomes, placing fragmented genes in genomic context.
Strand-specific RNA-seq Library Prep Kit	Molecular Biology Reagent	Preserves strand information, crucial for accurate transcript boundary determination and gene orientation in clusters.
LongAmp Taq DNA Polymerase	Molecular Biology Reagent	Amplifies long (>5 kb) genomic fragments for PCR-based gap closure between fragmented gene model segments.
pGEM-T or Zero Blunt TOPO Cloning Kit	Molecular Biology Reagent	For cloning PCR products from ambiguous loci for validation via Sanger sequencing.
RGAugury Pipeline	Bioinformatics Tool	Specialized tool for in silico identification of R-genes; provides critical domain-based hints for annotation.
MAKER2 Annotation Pipeline	Bioinformatics Tool	Integrates diverse evidence (EST, protein, ab initio) into a consensus annotation, central to the iterative protocol.
Integrative Genomics Viewer (IGV)	Bioinformatics Tool	Enables visualization and manual curation of gene models against all supporting evidence at a genomic locus.

Distinguishing True NBS-LRR Genes from Pseudogenes and Non-Functional Homologs

The genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen) is a cornerstone for understanding its disease resistance mechanisms and improving medicinal yield. This process is complicated by the presence of non-functional homologs and pseudogenes, which can inflate gene counts and mislead functional predictions. Accurate discrimination is therefore critical for downstream experimental validation and the application of this research in breeding for pathogen resistance, directly impacting the consistency and quality of bioactive compounds (e.g., tanshinones, salvianolic acids) for pharmaceutical development.

Defining Characteristics and Key Challenges

NBS-LRR genes encode key plant immune receptors. Pseudogenes and non-functional homologs arise from disruptive mutations, truncations, or frameshifts but retain sequence similarity.

Table 1: Diagnostic Features for Classification

Feature	True NBS-LRR Gene	Pseudogene/Non-Functional Homolog
Open Reading Frame (ORF)	Full-length, uninterrupted	Premature stop codons, frameshifts, large indels
Conserved Motifs	Intact NB-ARC (P-loop, RNBS-A-D, GLPL, MHD) and LRR motifs	Degenerate or missing core motifs (especially MHD)
Transcript Evidence	Supported by RNA-Seq/EST data	No expression evidence or aberrant splice variants
Selection Pressure	Signs of purifying selection (Ka/Ks < 1)	Neutral evolution or relaxed constraint (Ka/Ks ≈ 1)
Domain Architecture	Typical NBS-LRR structure (TIR/CC-NBS-LRR, RPW8-NBS-LRR, etc.)	Truncated or aberrant domain order

Experimental Protocols for Discrimination

In Silico Identification and Filtering Pipeline

Method: A multi-step computational workflow is implemented after initial HMMER/PFAM searches.

HMM Search: Use HMM profiles (PF00931, PF00560, PF07723, PF07725, PF12799, PF13306) against the S. miltiorrhiza proteome/genome.
ORF Assessment: Predict full-length ORFs using tools like getorf or Genewise. Discard sequences with internal stop codons or lacking start/stop codons.
Domain Validation: Confirm presence and order of TIR/CC, NB-ARC, and LRR domains using SMART or NCBI CDD. Sequences lacking the core NB-ARC are flagged.
Motif Integrity Check: Scan for critical motifs (e.g., P-loop: GxxxxGKT/S, MHDV) via MEME Suite. Disruption of the MHD motif is a strong pseudogene indicator.
Expression Corroboration: Map RNA-Seq reads from S. miltiorrhiza tissues/stress treatments to candidate genes. Genes with zero or low FPKM across all libraries are suspect.

Molecular Validation by PCR and Sequencing

Protocol: To confirm computational predictions.

Primer Design: Design primers flanking the predicted ORF and targeting regions with suspected disruptive mutations.
Genomic DNA & cDNA PCR: Amplify target from both gDNA and cDNA (from elicited tissues). Use high-fidelity polymerase.
Product Analysis: Compare amplicon sizes. gDNA-cDNA size discrepancy suggests intron retention or mis-annotation. Sequence all products.
Sequence Alignment: Align sequenced gDNA and cDNA amplicons to the reference genome sequence. Identify presence/absence of stop codons, frameshifts, and splice variants.

Ka/Ks Ratio Analysis for Selection Pressure

Method: Calculate non-synonymous (Ka) to synonymous (Ks) substitution rates.

Ortholog Identification: Identify orthologous NBS-LRR pairs between S. miltiorrhiza and a related species (e.g., S. splendens).
Sequence Alignment: Perform codon-aware alignment (PRANK or MACSE).
Calculation: Use KaKs_Calculator (NG method) to compute Ka, Ks, and Ka/Ks ratio.
Interpretation: Ka/Ks significantly < 1 indicates purifying selection (functional constraint). Ka/Ks ~1 suggests neutral evolution, common in pseudogenes.

Visualization of Workflows and Relationships

Title: Computational Pipeline for NBS-LRR Gene Classification

Title: Ka/Ks Analysis for Functional Assessment

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 2: Key Research Reagent Solutions

Item	Function/Application in NBS-LRR Discrimination
High-Fidelity DNA Polymerase (e.g., Phusion)	Accurate amplification of candidate gene sequences from gDNA/cDNA for sequencing validation.
RNA Isolation Kit (Plant-specific)	Extraction of high-integrity total RNA from S. miltiorrhiza under stress/control conditions for expression analysis.
Reverse Transcription Kit with oligo(dT)/Random Primers	Synthesis of first-strand cDNA from RNA for RT-PCR and expression confirmation.
NBS-LRR HMM Profile Database (Pfam)	Hidden Markov Model profiles for identifying NBS and LRR domains in silico.
Codon-Aware Alignment Software (e.g., MACSE)	Aligns nucleotide sequences while respecting protein translation, critical for accurate Ka/Ks calculation.
S. miltiorrhiza Specific Primers	Oligonucleotides designed to amplify variable and conserved regions of NBS-LRR candidates.
Next-Generation Sequencing (NGS) Library Prep Kit	Prepares RNA-Seq or whole-genome sequencing libraries for expression and genomic variation analysis.
Agarose Gel Electrophoresis System	Separates and visualizes PCR products to check for expected amplicon sizes and purity.
Sanger Sequencing Reagents	Provides definitive sequence data to confirm ORF integrity, mutations, and splicing events.

Optimizing Multiple Sequence Alignment for Divergent NBS-LRR Sequences

This whitepaper is a technical guide framed within a broader thesis on the genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen). The Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) family is a major class of plant disease resistance (R) genes. Characterizing these genes in S. miltiorrhiza is crucial for understanding its defense mechanisms and has implications for improving medicinal plant resilience and secondary metabolite production. However, the high sequence divergence among NBS-LRR genes, characterized by frequent insertions/deletions (indels), tandem repeats, and variable domains, poses a significant challenge for accurate multiple sequence alignment (MSA), which is foundational for phylogenetic analysis, domain prediction, and functional annotation. This guide details optimized strategies for aligning these divergent sequences.

Challenges in Aligning Divergent NBS-LRR Sequences

The intrinsic properties of NBS-LRR genes that complicate MSA include:

Extreme Sequence Divergence: Low sequence identity between subfamilies (e.g., TIR-NBS-LRR vs. CC-NBS-LRR).
Indel-Rich Regions: Particularly in the LRR domain, where repeat numbers and sequences vary.
Modular Domain Architecture: Conserved NB-ARC domain flanked by highly variable N-terminal (TIR, CC, RPW8) and C-terminal (LRR) domains.
Pseudogenes and Fragmented Sequences: Common in genome annotations.

Optimized MSA Workflow for NBS-LRR Genes

The following integrated workflow improves alignment accuracy for divergent NBS-LRR sequences.

Diagram 1: Optimized MSA workflow for NBS-LRRs.

Experimental Protocol: Sequence Curation & Domain Delineation

Gather Sequences: Extract putative NBS-LRR protein sequences from the S. miltiornhiza genome annotation (e.g., from databases like NCBI, or local genome assembly).
Filter & Trim: Remove sequences shorter than 300 aa. Use PfamScan or InterProScan with the following models to identify and extract core domains:
- NB-ARC Domain: PF00931.
- LRR Domain(s): PF13855, PF12799, or PF00560.
- TIR Domain: PF01582 or PF13676.
Generate Domain-Based Subsets: Create separate FASTA files for: a) NB-ARC domains only, b) TIR/CC domains, c) LRR regions. This allows alignment of homologous regions separately.

Experimental Protocol: Stratified Alignment

Align each domain subset using the most appropriate algorithm.

For Conserved NB-ARC Domains: Use MAFFT (--localpair or --genafpair) or Clustal Omega. These handle conserved global alignments well.
For Variable LRR Regions: Use MAFFT with the --localpair strategy or DIALIGN-2/T-Coffee, which are better suited for local similarities and indel-rich regions.

Experimental Protocol: Profile-Profile Alignment & Merging

Create Profiles: Use the aligned domain files (e.g., nbarc_aligned.fa) as input profiles.
Merge Alignments: Align the profile of the NB-ARC domain to the profile of the LRR regions using MAFFT's profile-profile function.
Re-integrate Flanking Sequences: Map the original, more divergent N- and C-terminal ends back onto the core alignment using a slow, accurate method.

Apply final polishing to the full-length alignment using an iterative refiner.

Quantitative Comparison of MSA Tools for NBS-LRR Data

Performance metrics were evaluated on a benchmark set of 120 divergent NBS-LRR sequences from S. miltiorrhiza and related Lamiaceae species. Reference alignments were derived from structural superposition of known 3D NB-ARC domains.

Table 1: MSA Tool Performance on NBS-LRR Benchmark Set

Tool	Algorithm Type	Avg. Q-Score (NB-ARC)	Avg. Column Score (LRR)	Computational Speed (s)	Suitability for NBS-LRR
MAFFT (L-INS-i)	Iterative, local	0.89	0.72	45	High for conserved domains
Clustal Omega	Progressive, global	0.85	0.68	22	Moderate for core NB-ARC
T-Coffee (PSI)	Consistency-based	0.87	0.78	320	High for variable LRRs
MUSCLE	Iterative, progressive	0.83	0.65	18	Low for divergent regions
DIALIGN-2	Local segment-based	0.80	0.76	290	High for indel-rich regions
Optimized Workflow	Hybrid, stratified	0.91	0.81	180	Recommended Best Practice

Q-Score: Fraction of correctly aligned residue pairs compared to reference. Column Score: Sum-of-pairs score for alignment columns.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Reagents & Tools for NBS-LRR MSA Analysis

Item / Solution	Function / Purpose in NBS-LRR Research
InterProScan	Integrated protein domain & family annotation. Critical for pre-alignment domain delineation (TIR, NB-ARC, LRR).
MAFFT Software	Primary alignment engine for its flexibility (global/local strategies) and high accuracy on conserved domains.
T-Coffee/ Expresso	Provides consistency-based alignments and can use structural data; ideal for aligning variable LRR regions.
GUIDANCE2 Server	Calculates alignment confidence scores per column and residue; identifies unreliably aligned regions.
Pfam HMM Profiles (PF00931, PF13855, PF01582)	Hidden Markov Models used to definitively identify NBS-LRR domains in uncharacterized sequences.
MEGA-CC Software	User-friendly suite for performing iterative refinement, manual alignment editing, and downstream phylogenetic analysis.
Jalview	Interactive alignment visualization editor for manual curation, color-coding by conservation, and trimming.
Python/Biopython	For custom scripting of workflow automation, parsing large sequence sets, and batch processing.

Downstream Analysis: From MSA to Functional Insight inS. miltiorrhiza

A robust MSA enables key downstream analyses within the genome identification thesis.

Diagram 2: Downstream applications of NBS-LRR MSA.

Experimental Protocol: Phylogeny & Positive Selection

Build Phylogeny: Using the aligned NB-ARC domain, construct a Maximum-Likelihood tree in IQ-TREE.
Detect Selection: Use the HyPhy suite (e.g., via the Datamonkey server) to test for sites under positive selection (dN/dS > 1) using methods like FEL or MEME, which is crucial for identifying pathogen-binding residues in LRRs.

Accurate genome-wide identification and characterization of the NBS-LRR family in Salvia miltiorrhiza are contingent upon overcoming the hurdles of sequence divergence through optimized MSA. The stratified workflow—involving domain-wise curation, tool-specific alignment, profile merging, and iterative refinement—produces significantly more reliable alignments than any single-method approach. This robust MSA forms the critical foundation for trustworthy phylogenetic classification, evolutionary analysis, and the functional prediction of R genes, ultimately guiding targeted experimental validation in plant immunity research.

Within the context of a comprehensive thesis on the genome-wide identification of the NBS-LRR (Nucleotide-Binding Site-Leucine-Rich Repeat) gene family in Salvia miltiorrhiza (Danshen), the transition from in silico prediction to experimental validation is a critical milestone. Computational pipelines can predict numerous candidate genes, but their physical existence, precise exon-intron boundaries, and sequence accuracy must be confirmed. This guide details the protocols for verifying predicted NBS-LRR genes using endpoint PCR and Sanger sequencing.

The Imperative for Validation

In silico identification relies on algorithms and reference data, which can introduce false positives due to sequence gaps, assembly errors, or overly sensitive domain prediction parameters. Validation ensures that the candidate genes are present in the S. miltiorrhiza genome and that their sequences are correct, forming a reliable foundation for downstream functional studies and drug development research focused on this medicinal plant's defense mechanisms.

The validation workflow begins with a curated list of in silico predicted NBS-LRR genes from the S. miltiorrhiza genome assembly. Key quantitative data from a typical verification study is summarized below.

Table 1: Summary of In Silico NBS-LRR Prediction and Validation Metrics

Parameter	In Silico Prediction	PCR Validation	Sequencing Success Rate
Total Candidate Genes	127	N/A	N/A
Genes Selected for PCR	30	N/A	N/A
Primers Designed	30 pairs	N/A	N/A
PCR Success (Clear Amplicon)	N/A	27 genes	90%
Sequence Perfect Match	N/A	N/A	22 genes (81.5%)
Sequence with SNPs/Indels	N/A	N/A	5 genes (18.5%)

Table 2: Typical PCR Reaction Setup (25 µL)

Component	Volume (µL)	Final Concentration
High-Fidelity PCR Master Mix (2X)	12.5	1X
Forward Primer (10 µM)	1.0	0.4 µM
Reverse Primer (10 µM)	1.0	0.4 µM
Template Genomic DNA (50 ng/µL)	1.0	~2 ng/µL
Nuclease-Free Water	9.5	N/A

Detailed Experimental Protocols

Protocol 1: Primer Design and PCR Amplification

Primer Design: Design primers flanking the full-length or conserved region (e.g., the P-loop motif) of the predicted NBS-LRR gene using software (e.g., Primer3). Target amplicon size: 500-1500 bp.
- Parameters: Tm ~60°C, length 18-22 bp, GC content 40-60%.
DNA Template Preparation: Isect high-quality genomic DNA from fresh S. miltiorrhiza leaves using a modified CTAB method, quantifying with a spectrophotometer (A260/A280 ratio ~1.8).
PCR Cycling Conditions:
- Initial Denaturation: 95°C for 3 min.
- 35 Cycles:
  - Denaturation: 95°C for 30 sec.
  - Annealing: (Primer Tm - 5°C) for 30 sec.
  - Extension: 72°C for 1 min/kb.
- Final Extension: 72°C for 5 min.
- Hold: 4°C.
Analysis: Run 5 µL of PCR product on a 1.2% agarose gel stained with ethidium bromide. Verify single amplicon of expected size.

Protocol 2: Amplicon Purification and Sanger Sequencing

Purification: Purify the remaining PCR product using a spin column-based PCR purification kit. Elute in 30 µL of elution buffer.
Sequencing Preparation: Prepare sequencing reactions for both forward and reverse primers separately.
- Reaction Mix (10 µL): ~50-100 ng purified PCR product, 3.2 pmol primer, and sequencing reaction buffer.
Cycle Sequencing: Perform using standard Sanger cycle sequencing protocol (BigDye Terminator chemistry).
- Conditions: 25 cycles of 96°C for 10 sec, 50°C for 5 sec, 60°C for 4 min.
Purification & Analysis: Purify sequencing reactions to remove unincorporated dyes. Run on a capillary sequencer. Assemble forward and reverse reads, align to the in silico reference sequence using tools like BioEdit or Geneious.

Visualizing the Validation Workflow

Title: PCR and Sanger Sequencing Validation Workflow

Title: NBS-LRR Domain Architecture for Primer Design

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Reagents and Materials for Validation

Item	Function/Description	Example Product/Note
High-Fidelity DNA Polymerase	Reduces PCR errors for accurate sequence amplification. Essential for cloning-ready products.	Phusion, KAPA HiFi
PCR Purification Kit	Removes primers, dNTPs, salts, and enzymes from PCR products prior to sequencing.	Qiagen QIAquick, Thermo GeneJET
Sanger Sequencing Kit	Dideoxy terminator-based cycle sequencing reaction.	BigDye Terminator v3.1
Capillary Sequencer	Instrument for high-resolution separation and detection of fluorescently labeled sequencing fragments.	Applied Biosystems 3730xl
Sequence Assembly Software	Aligns forward/reverse reads, compares to reference, and identifies variants.	Geneious, SnapGene, BioEdit
S. miltiorrhiza Genomic DNA	High-quality, high-molecular-weight template DNA. Isolated via CTAB method, A260/280 ~1.8.	In-house preparation recommended
Domain Prediction Database	Used for initial in silico identification and to confirm conserved domains in sequenced amplicons.	Pfam, SMART, NCBI CDD

This guide is framed within a doctoral thesis project aiming to conduct a genome-wide identification and functional characterization of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorrhiza (Danshen). NBS-LRR genes constitute one of the largest and most complex plant resistance (R) gene families. Efficient management, analysis, and visualization of hundreds of candidate genes are critical for elucidating their roles in stress response and secondary metabolism, with implications for improving medicinal compound production.

Efficient Data Management Strategies

2.1. Hierarchical Data Organization A structured directory and naming convention is essential. Data should be categorized into raw data (genome assemblies, RNA-seq reads), processed data (BLAST outputs, HMM search results), analysis files (multiple sequence alignments, phylogenetic trees), and metadata (sample information, software versions).

2.2. Utilization of Relational Databases For large-scale gene families, moving beyond spreadsheets to a lightweight relational database (e.g., SQLite) enables complex queries and integration. A suggested table schema includes:

Gene_Summary: Primary table with gene ID, chromosomal location, protein length.
Domain_Architecture: Links gene IDs to predicted domains (NB-ARC, LRR, TIR, RPW8).
Expression_Profiles: Stores transcriptomic data (TPM/FPKM) across tissues/conditions.
Phylogenetic_Classification: Stores clade/subfamily assignments.

2.3. Quantitative Data Summary

Table 1: Typical NBS-LRR Identification Pipeline Output for a Plant Genome

Analysis Step	Software/Tool	Key Parameters	S. miltiorrhiza (Example Output)	Purpose
Initial Identification	HMMER	HMM profile: PF00931 (NB-ARC)	~350 candidate genes	Retrieve sequence candidates
Domain Validation	NCBI CD-Search	E-value < 0.01	~320 genes with full NB-ARC	Confirm domain integrity
Architecture Classification	MEME/MAST	Motif discovery	TNL (~55%), CNL (~40%), RNL (~5%)	Classify into subfamilies
Chromosomal Distribution	TBtools/MCScanX	--	12 gene clusters identified	Visualize synteny and clusters
Expression Analysis	HISAT2 + StringTie	>1 TPM in any sample	~150 genes expressed	Filter for active genes

Detailed Experimental Protocols

3.1. Protocol for NBS-LRR Gene Identification Using HMMER

Dataset Preparation: Download the latest S. miltiorrhiza proteome from NCBI or CNGB.
HMM Profile Retrieval: Obtain the NB-ARC (PF00931) HMM profile from the Pfam database.
HMMER Search: Run hmmsearch with inclusive thresholds:
Result Parsing: Extract sequences with significant full-domain hits (complete NB-ARC domain). Custom Python scripts or bioawk can be used.
Secondary Validation: Subject candidates to NCBI's Conserved Domain Database (CDD) search online or via rpsblast+ to remove fragments.

3.2. Protocol for Phylogenetic & Motif Analysis

Multiple Sequence Alignment: Use MAFFT with high-accuracy settings:
Phylogenetic Tree Construction: Use IQ-TREE for model selection and fast bootstrap:
Motif Discovery: Use MEME Suite to identify conserved motifs outside the core NB-ARC domain:
Follow with MAST to scan all candidates for discovered motifs.

Visualization Strategies

4.1. Phylogeny-Integrated Domain Architecture Plot Tools like TBtools-II or Evolview allow the generation of circular phylogenetic trees with adjacent heatmaps (expression) and bar charts (domain structure), enabling multi-dimensional comparison.

4.2. Chromosomal Distribution and Synteny Visualization Advanced synteny visualization can be achieved using tools like JCVI or MCScanX, plotted with ggplot2 or TBtools to show NBS-LRR gene clusters and whole-genome duplication events.

4.3. Signaling Pathway and Workflow Diagrams

Title: NBS-LRR Gene Identification and Analysis Workflow

Title: NBS-LRR Mediated Plant Defense Signaling Pathway

The Scientist's Toolkit

Table 2: Essential Research Reagent Solutions for NBS-LRR Studies

Item	Function / Application	Example Product / Source
High-Fidelity DNA Polymerase	Accurate amplification of candidate gene CDS for cloning.	Phusion HF (Thermo), KAPA HiFi.
Gateway Cloning System	Efficient transfer of genes into multiple expression vectors.	pDONR/Zeo vectors, LR Clonase (Thermo).
Plant Expression Vectors	For transient/stable expression in tobacco or Arabidopsis.	pCAMBIA1300 (CaMV 35S promoter), pEAQ-HT.
Anti-GFP Antibody	Detection of GFP-tagged NBS-LRR protein localization & abundance.	Anti-GFP, HRP (Abcam, #ab290).
DAB Staining Kit	Histochemical detection of hydrogen peroxide in HR response.	3,3'-Diaminobenzidine (Sigma-Aldrich).
qPCR Master Mix (SYBR Green)	Validation of gene expression patterns from RNA-seq data.	PowerUp SYBR Green (Thermo).
Rapid DNA Ladder	Accurate sizing of PCR products during gene screening.	1 kb Plus DNA Ladder (NEB).
Plant Total RNA Extraction Kit	High-quality RNA for RT-qPCR and RNA-seq library prep.	Plant RNeasy Kit (Qiagen).

Benchmarking and Insights: Validating Salvia miltiorrhiza NBS-LRR Genes Through Comparative Genomics

This whitepaper details a core experimental chapter within a broader thesis focused on the genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in Salvia miltiorzhiza (Danshen). Following in silico identification and phylogenetic characterization, functional validation of candidate gene expression under stress is paramount. Quantitative reverse transcription polymerase chain reaction (qRT-PCR) is the gold standard for precise, sensitive quantification of transcript abundance. This guide provides a rigorous technical framework for employing qRT-PCR to validate the induction of key NBS-LRR genes in response to biotic and abiotic stress, linking genomic data to potential functional roles in Danshen's defense mechanisms.

Experimental Design & Stress Treatments

Candidate NBS-LRR genes are selected based on phylogenetic clade association with known resistance (R) genes, promoter cis-element analysis revealing stress-responsive motifs, and preliminary RNA-Seq data. Plants are subjected to controlled stress treatments.

Table 1: Standardized Stress Treatment Protocols for S. miltiorrhiza Seedlings

Stress Type	Specific Treatment	Duration	Sample Time Points (Post-treatment)	Purpose
Biotic Elicitor	Foliar spray with 100 µM Methyl Jasmonate (MeJA)	Single application	3 h, 6 h, 12 h, 24 h, 48 h	Simulate pathogen attack, activate JA-mediated defense pathways.
Biotic Elicitor	Root drench with 1 mM Salicylic Acid (SA)	Single application	3 h, 6 h, 12 h, 24 h, 48 h	Activate SA-mediated systemic acquired resistance (SAR) pathways.
Fungal Pathogen	Fusarium solani spore suspension (1×10⁶ spores/mL) root inoculation	Continuous	1 d, 3 d, 5 d, 7 d	Direct biotic stress interaction.
Abiotic Stress	Drought stress (Withholding water)	Until soil moisture drops to 30% field capacity	1 d, 3 d, 5 d, 7 d	Induce osmotic and general abiotic stress response.
Abiotic Stress	Cold stress (4°C)	Continuous	3 h, 6 h, 12 h, 24 h	Induce cold-responsive signaling.
Control	Mock treatment (Water or solvent)	--	Matches all treatment time points	Baseline expression reference.

Detailed qRT-PCR Validation Protocol

Total RNA Isolation & Quality Control

Method: Use a modified CTAB method or commercial kit (e.g., RNAprep Pure Plant Plus Kit) with on-column DNase I digestion.
Critical Steps: Homogenize 100 mg of frozen root/leaf tissue in liquid nitrogen. Include a genomic DNA elimination step. Elute in 30-50 µL RNase-free water.
QC: Measure RNA concentration (Nanodrop). Verify integrity via 1.5% agarose gel (sharp 28S/18S rRNA bands) and assess purity (Agilent Bioanalyzer recommended; RIN > 7.0).

First-Strand cDNA Synthesis

Reaction: Use 1 µg total RNA in a 20 µL reaction with oligo(dT)₁₈ primers and a reverse transcriptase with high fidelity (e.g., RevertAid H Minus).
Program: 5 min at 25°C (priming), 60 min at 42°C (elongation), 5 min at 70°C (inactivation).
Control: Include a no-reverse transcriptase (-RT) control for each sample to detect genomic DNA contamination.

Primer Design & Validation

Design: Design primers from conserved NBS or LRR domains of target genes using Primer Premier 5.0. Amplicon length: 80-200 bp. Melting Temperature (Tm): 58-60°C, with <1°C difference between primer pairs.
Validation: Perform standard curve analysis with serial dilutions of pooled cDNA. Accept primer pairs with amplification efficiency (E) of 90-110% (R² > 0.99). Check specificity via melt curve analysis (single peak).

Table 2: Example qRT-PCR Primer Sequences for Candidate S. miltiorrhiza NBS-LRR Genes

Gene ID (Hypothetical)	Primer Sequence (5'→3')	Amplicon Size (bp)	Efficiency (%)	R²
SmNBS-LRR05	F: CGTCAAGAGCCTCAACAACCR: TGGATGCTGTGATGTTGAGG	152	98.5	0.998
SmNBS-LRR12	F: AAGCCTGGTGTTGCTGTTGTR: CACCAACCCAACATCACCAT	118	102.1	0.996
SmNBS-LRR23	F: GGAGGCTATGCTGGATTGACR: CCTTGATGCCACTTTTGGAG	145	95.7	0.999
Reference: SmActin	F: GTGTTGGATTCTGGTGATGGTGTGR: TGGCATACAGGTCCTTCCTGATAT	187	99.3	0.997

Quantitative PCR Amplification

Reaction Mix (10 µL): 5 µL 2X SYBR Green Master Mix, 0.5 µL each primer (10 µM), 1 µL diluted cDNA (1:10), 3 µL nuclease-free water.
Cycling Program (QuantStudio 5): Stage 1: 95°C for 30 sec. Stage 2 (40 cycles): 95°C for 5 sec, 60°C for 30 sec (data acquisition). Stage 3 (Melt Curve): 95°C for 15 sec, 60°C to 95°C, increment 0.15°C/sec.
Analysis: Use the comparative Cq (ΔΔCq) method. Normalize target gene Cq values to the reference gene (SmActin/SmUBQ) for each sample. Calculate fold change relative to the mock-treated control.

Data Analysis & Interpretation

Table 3: Example qRT-PCR Fold Change Data for SmNBS-LRR05 under MeJA Stress

Time Post-Treatment	Mean ΔCq (Treatment)	Mean ΔCq (Control)	ΔΔCq	Fold Change (2^-ΔΔCq)	Significance (p-value)
3 h	5.2 ± 0.15	7.8 ± 0.12	-2.6	6.1	<0.01
6 h	4.7 ± 0.18	7.9 ± 0.10	-3.2	9.2	<0.001
12 h	6.1 ± 0.22	8.0 ± 0.15	-1.9	3.7	<0.05
24 h	6.8 ± 0.19	7.9 ± 0.11	-1.1	2.1	0.12

Interpretation: SmNBS-LRR05 shows rapid, significant induction by MeJA, peaking at 6h (9.2-fold), suggesting a role in jasmonate-responsive defense.
Statistical Analysis: Perform one-way ANOVA with post-hoc Tukey's HSD test on ΔCq values across biological replicates (n≥3). p < 0.05 is considered significant.

NBS-LRR Signaling Pathway Context

The validated expression of specific NBS-LRR genes can be integrated into known plant immune signaling models. Below is a simplified pathway illustrating potential roles for S. miltiorrhiza NBS-LRRs.

Diagram Title: NBS-LRR Gene Roles in Plant Immune Signaling Pathways.

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Materials for NBS-LRR qRT-PCR Validation

Item / Reagent	Function & Rationale
RNAprep Pure Plant Plus Kit (Polysaccharides & Polyphenolics-rich)	Specifically formulated for plants like S. miltiorrhiza that contain high levels of secondary metabolites which inhibit downstream reactions.
DNase I, RNase-free	Critical for complete genomic DNA removal to prevent false-positive amplification in qPCR.
RevertAid H Minus Reverse Transcriptase	Lacks RNase H activity, allowing for higher yields of full-length cDNA, ideal for long transcripts.
SYBR Green Master Mix (e.g., PowerUp SYBR)	Provides all components for robust, sensitive qPCR with standardized conditions. Includes ROX passive reference dye for plate normalization.
Nuclease-Free Water	Essential for all molecular biology reactions to prevent RNase/DNase contamination.
*Validated Endogenous Control Primers (e.g., SmActin, SmUBQ)*	Stable reference genes for S. miltiorrhiza under the studied stress conditions are mandatory for accurate ΔΔCq analysis.
White 96-Well Optical Reaction Plates & Seals	Ensure optimal fluorescence detection and prevent evaporation during cycling.

Integrating precise qRT-PCR expression validation with genome-wide identification studies provides a powerful approach to transition from in silico predictions to functionally characterized candidate genes. This protocol, framed within Danshen research, establishes a reproducible method to confirm the stress-responsive nature of key NBS-LRR genes, offering critical insights for subsequent functional studies and potential applications in enhancing plant resilience or identifying novel defense-related metabolites for drug development.

This whitepaper details a core evolutionary analysis chapter within a broader thesis project focused on the genome-wide identification of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in the medicinal plant Salvia miltiorrhiza (Danshen). The NBS-LRR family constitutes the largest class of plant disease resistance (R) genes. By performing a comparative phylogenetic analysis with well-characterized NBS-LRR families from model plants (Arabidopsis thaliana, Oryza sativa) and other members of the Lamiaceae family, we can infer evolutionary patterns, classify S. miltiorrhiza NBS-LRRs, and predict their potential function in disease resistance and phytochemical biosynthesis, which is critical for pharmaceutical quality.

Core Methodology for Comparative Phylogeny

2.1 Data Acquisition and Sequence Identification

Source Genomes: Obtain whole-genome sequences and annotated protein datasets from public repositories.
- Arabidopsis thaliana: TAIR (https://www.arabidopsis.org/).
- Oryza sativa: RGAP (http://rice.plantbiology.msu.edu/).
- Salvia miltiorrhiza: NCBI BioProject PRJNA313967 and DanGenome (http://salviadatabase.com/).
- Other Lamiaceae: Mentha longifolia, Scutellaria baicalensis, etc., from NCBI Genome.
Protocol - Hidden Markov Model (HMM) Search:
- Download the Pfam HMM profiles for NBS domain (NB-ARC, PF00931) and TIR domain (PF01582, for TNL class) or Coiled-Coil domain (for CNL class).
- Use hmmsearch from HMMER v3.3.2 suite against all protein sequences of each species: hmmsearch --domtblout output.txt -E 1e-5 PF00931.hmm proteome.fasta.
- Extract sequences with significant hits (E-value < 1e-5). Redundancy is removed using CD-HIT at 90% identity threshold.

2.2 Multiple Sequence Alignment and Phylogenetic Tree Construction

Protocol - Alignment and Tree Building:
- Perform multiple sequence alignment of the conserved NBS domain regions using MAFFT v7 with G-INS-i strategy for accuracy: mafft --globalpair --maxiterate 1000 input.fa > aligned.fa.
- Trim the alignment with TrimAl v1.4 using the -automated1 option: trimal -in aligned.fa -out trimmed.fa -automated1.
- Construct a Maximum-Likelihood (ML) phylogenetic tree using IQ-TREE v2.2.0:
  - Command: iqtree2 -s trimmed.fa -m MFP -B 1000 -alrt 1000 -T AUTO.
  - -m MFP enables ModelFinder Plus to select the best-fit substitution model.
  - -B 1000 specifies 1000 ultrafast bootstrap replicates.
- Visualize and annotate the tree using FigTree v1.4.4 or the R package ggtree.

2.3 Evolutionary Analysis

Synonymous vs. Non-synonymous Substitution Rates (Ka/Ks): Calculate using the yn00 program in PAML v4.9 or KaKs_Calculator 3.0 to assess selection pressure. Ka/Ks > 1 indicates positive selection; <1 indicates purifying selection.
Gene Cluster Identification: Identify gene clusters on chromosomes if collinearity data is available (e.g., using MCScanX).

Results & Data Presentation

Table 1: Genome-Wide Identification of NBS-LRR Genes in Target Species

Species	Family	Total NBS-LRRs	TNL Subclass	CNL/RNL* Subclass	Others	Reference Genome Version
Arabidopsis thaliana	Brassicaceae	167	82	85 (CNL)	0	TAIR10
Oryza sativa (Rice)	Poaceae	535	2	528 (CNL)	5	MSU v7.0
Salvia miltiorrhiza	Lamiaceae	121	45	76 (CNL)	0	v2.0
Mentha longifolia	Lamiaceae	98	32	66 (CNL)	0	ML_v1.0
Scutellaria baicalensis	Lamiaceae	113	41	72 (CNL)	0	ASM2071116v1

Note: RNL (RPW8-NB-LRR) is a specific subclass often grouped with CNLs for simplicity.

Table 2: Evolutionary Selection Pressure on NBS-LRR Genes (Ka/Ks Analysis)

Species Comparison (Orthologous Pairs)	Average Ka/Ks Ratio	Proportion of Pairs with Ka/Ks > 1	Implied Selection Pressure
A. thaliana vs. S. miltiorrhiza	0.28	2.1%	Strong Purifying Selection
O. sativa vs. S. miltiorrhiza	0.42	1.5%	Purifying Selection
Within S. miltiorrhiza (Tandem Duplicates)	0.65	8.7%	Relaxed Purifying / Mild Positive
Within Lamiaceae (S. miltiorrhiza vs. M. longifolia)	0.31	3.3%	Strong Purifying Selection

Visualization of Phylogenetic and Evolutionary Relationships

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for NBS-LRR Comparative Phylogenetic Analysis

Item / Reagent	Function / Application in this Study	Example Vendor/Software
HMMER Suite	Profile HMM-based search for identifying NBS-LRR protein sequences from proteomes.	http://hmmer.org/
Pfam HMM Profiles	Curated seed alignments and HMMs for NBS (NB-ARC, PF00931) and TIR (PF01582) domains.	https://pfam.xfam.org/
MAFFT	High-accuracy multiple sequence alignment tool for conserved protein domains.	https://mafft.cbrc.jp/
IQ-TREE	Efficient software for maximum likelihood phylogenetic inference and model testing.	http://www.iqtree.org/
PAML (CodeML)	Package for molecular evolution analysis, including Ka/Ks calculation (yn00).	http://abacus.gene.ucl.ac.uk/software/paml.html
FigTree / ggtree	Software/R package for visualizing, annotating, and exporting phylogenetic trees.	http://tree.bio.ed.ac.uk/; Bioconductor
Genome Databases	Sources for reference genome sequences and annotations (TAIR, RGAP, NCBI, Phytozome).	Public Repositories
High-Performance Computing (HPC) Cluster	Essential for running computationally intensive steps (HMM search, alignment, bootstrapping).	Institutional Resource

This technical guide details the application of synteny and collinearity analysis for identifying conserved genomic architectures and lineage-specific rearrangements surrounding Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) genes in Salvia miltiorrhiza (Danshen). The work is framed within a broader thesis project aiming to perform a genome-wide identification and characterization of the NBS-LRR gene family—a primary class of plant disease Resistance (R) genes—in this medicinal plant. Understanding the genomic organization of these genes is crucial for elucidating plant defense mechanisms and can inform breeding strategies for disease resistance, ultimately impacting the sustainable production of its valuable bioactive compounds (e.g., tanshinones, salvianolic acids) for pharmaceutical development.

Core Concepts: Synteny vs. Collinearity

Synteny: Refers to the conservation of genomic blocks containing two or more homologous loci on chromosomes of different species, regardless of gene order. It indicates shared ancestry of a chromosomal region.
Collinearity: A stricter condition where homologous genes are not only syntenic but also maintain the same order and orientation along the chromosome. Micro-collinearity refers to conservation at a fine scale (e.g., within a gene cluster).

For R-gene discovery, these analyses reveal:

Conserved Syntenic Blocks: Highlight evolutionarily ancient R-gene loci critical for core pathogen recognition.
Breakpoints in Collinearity: Identify genomic regions that have undergone rearrangement, often associated with rapid evolution and birth/death of new R-gene specificities.
Species-Specific Expansions: Uncover tandem duplications or transpositions unique to S. miltiorrhiza, suggesting adaptations to its specific pathogen environment.

Key Experimental Protocols and Methodologies

Genome-Wide Identification of NBS-LRR Genes inS. miltiorrhiza

Objective: Create a comprehensive catalog of NBS-LRR genes as anchor points for synteny analysis. Protocol:

Hidden Markov Model (HMM) Search: Query the S. miltiorrhiza reference genome (e.g., Smil v2.0) using HMM profiles for NB-ARC (PF00931) and LRR (PF00560, PF07723, PF07725, PF13516, PF13855) domains from the Pfam database. Use hmmsearch (HMMER v3.3) with an E-value cutoff of 1e-5.
Domain Validation: Confirm the presence and architecture (TNL, CNL, RNL, etc.) of identified candidates using CD-Search (NCBI) or InterProScan.
Manual Curation: Remove pseudogenes (premature stop codons, frameshifts) and fragments lacking key motifs. Map physical locations (chromosome, start, end, strand).

Whole-Genome Synteny and Collinearity Analysis

Objective: Identify macro-syntenic blocks between S. miltiorrhiza and related species. Protocol (Using JCVI/MCScanX Toolkit):

Data Preparation: Create a BLASTP output file (outfmt 6) of all-vs-all protein sequences within and between the target genomes (S. miltiorrhiza, Salvia splendens, Mentha longifolia, Arabidopsis thaliana). Prepare a GFF file for each species.
Synteny Detection: Run python -m jcvi.compara.catalog ortholog to identify orthologous gene pairs and syntenic blocks. Key parameters: --cscore=.99 (confidence score), --iter=1 for simple chaining.
Visualization: Use jcvi.graphics.karyotype to generate synteny maps. Collinearity is inferred from the uninterrupted alignment of homologous genes in the same order.

Microsynteny Analysis of NBS-LRR-Containing Regions

Objective: Examine fine-scale conservation and rearrangement around individual R-gene loci. Protocol:

Locus Extraction: Extract genomic regions (± 50-100 kb) surrounding identified NBS-LRR genes from the S. miltiorrhiza genome and syntenic regions from comparator genomes using BEDTools (slop and getfasta).
Gene Annotation & Comparison: Re-annotate these regions with a consistent pipeline (e.g., BRAKER2) to minimize annotation bias. Visualize gene order, orientation, and homology using the GENESPACE software or custom R scripts with ggplot2 and gggenes.
Variant Analysis: Investigate structural variants (SVs) at breakpoints using whole-genome alignment tools like MUMmer (nucmer, show-coords) or read-depth analysis.

Data Presentation: Comparative Genomics of NBS-LRR Loci

Table 1: Summary of NBS-LRR Genes and Syntenic Conservation in Salvia miltiorrhiza vs. Related Species

Species	Total NBS-LRR Genes Identified	Genes in Syntenic Blocks	Species-Specific Genes (Non-syntenic)	Predominant NBS-LRR Type	Key Syntenic Block Size Range (Mb)
Salvia miltiorrhiza (Focal)	~150*	~95	~55	CNL	N/A
Salvia splendens	~130*	~88	~42	CNL	0.8 - 4.2
Mentha longifolia	~120*	~72	~48	CNL/TNL	0.5 - 3.1
Arabidopsis thaliana (Outgroup)	~165	~45	~120	TNL	0.3 - 1.7

*Estimated numbers based on recent analyses; final counts require full curation.

Table 2: Example of a High-Resolution Microsynteny Analysis of a Conserved R-Gene Locus

Genomic Feature	S. miltiorrhiza Chr4 (32.15-32.45 Mb)	S. splendens Syntenic Region	M. longifolia Syntenic Region	A. thaliana Syntenic Region (Col-0)	Conservation Notes
Anchor NBS-LRR Gene	Smi-NL34 (CNL)	Ssp-NL29 (CNL)	Mlo-NL21 (CNL)	AT4G19520 (TNL)	NB-ARC domain >85% identity
Flanking Gene 1 (5')	Serine/Threonine Kinase	Serine/Threonine Kinase	LRR-RLK	PPR protein	Collinearity break between mint and others
Flanking Gene 2 (3')	ABC Transporter	ABC Transporter	ABC Transporter	ABC Transporter	Perfect collinearity
Gene Orientation	-> Smi-NL34 ->	-> Ssp-NL34 ->	<- Mlo-NL21 ->	-> AT4G19520 ->	Inversion in mint lineage
Structural Variant	None	5 kb insertion (TE)	20 kb deletion	N/A	TE insertion specific to S. splendens

Visualizations

Diagram 1: Synteny Analysis Workflow for R-Genes

Diagram 2: Microsynteny Conservation & Breakpoints

The Scientist's Toolkit: Essential Research Reagents & Solutions

Table 3: Key Reagents and Tools for Synteny-Based R-Gene Analysis

Item	Function / Purpose in Analysis	Example/Note
High-Quality Genome Assemblies	Reference for gene identification and synteny anchor. Chromosome-level, haplotype-resolved assemblies are ideal.	S. miltiorrhiza Smil v2.0, S. splendens v1.0 (from public databases like NCBI, Phytozome).
Curated Protein Family HMMs	Sensitive detection of NBS and LRR protein domains across diverse plant lineages.	Pfam profiles (PF00931, PF00560, etc.); Plant Immune Receptor Repository (PIRR) custom HMMs.
Orthology Detection Software	Distinguishes true orthologs (for synteny) from paralogs.	OrthoFinder, InParanoid, JCVI pipeline.
Synteny & Collinearity Tools	Identifies and visualizes conserved genomic blocks.	JCVI/MCScanX, GENESPACE, SynVisio, D-GENIES.
Structural Variant Callers	Detects insertions, deletions, inversions at synteny breakpoints.	MUMmer, DELLY, Sniffles (for long-read data).
Functional Annotation Databases	Annotates genes within syntenic blocks to infer potential functional conservation.	InterPro, eggNOG, KEGG, Gene Ontology (GO).
Visualization Libraries	Creates publication-quality synteny and microsynteny plots.	R: ggplot2, gggenes, karyoploteR; Python: matplotlib, pyGenomeViz.

Within the broader thesis of genome-wide identification of the NBS-LRR gene family in Salvia miltiorrhiza (Danshen), this whitepaper addresses the dynamic evolutionary processes that shape its disease resistance gene repertoire. NBS-LRR genes are the largest class of plant disease resistance (R) genes, encoding nucleotide-binding site and leucine-rich repeat proteins that detect pathogen effectors and initiate immune signaling. Understanding the expansion and contraction dynamics of this repertoire in S. miltiorrhiza is critical for elucidating its adaptive immune capacity, with implications for cultivating disease-resistant varieties to secure the supply of its valuable bioactive compounds (e.g., tanshinones, salvianolic acids) for pharmaceutical use.

Genomic Identification and Quantitative Repertoire Analysis

Recent genome assemblies and re-annotations have enabled precise identification of NBS-LRR genes in S. miltiorrhiza. The following table summarizes key quantitative data from current analyses.

Table 1: NBS-LRR Repertoire in Salvia miltiorrhiza and Comparative Species

Species	Total NBS-LRR Genes	TNL Subfamily	CNL Subfamily	RNL Subfamily	Other/Unknown	Reference Genome/Version
*Salvia miltiorrhiza*	~120-150	~40-50	~60-80	~8-12	~10-15	Genome assembly Smil v2.0 / Danseq v1.0
*Arabidopsis thaliana*	~165	~55	~50	~60	-	TAIR10
*Solanum lycopersicum*	~355	~15	~330	~10	-	SL4.0
*Oryza sativa*	~480	~1	~470	~9	-	IRGSP-1.0

Key Findings on Dynamics:

Moderate Repertoire Size: S. miltiorrhiza possesses a moderate number of NBS-LRRs compared to model plants, reflecting its specific evolutionary path and selection pressures.
Subfamily Distribution: The ratio of TNL (TIR-NBS-LRR) to CNL (CC-NBS-LRR) genes is relatively balanced, unlike in monocots (CNL-dominated) or certain eudicots. The conserved RNL (RPW8-NBS-LRR) subgroup is present in low numbers.
Genomic Clustering: A significant proportion (>60%) of SmNBS-LRR genes are located in clusters on chromosomes, primarily resulting from local gene duplications (tandem and segmental), which drive repertoire expansion.
Pseudogenization: Several identified loci are pseudogenes (frameshifts, premature stop codons), indicating ongoing contraction and turnover via birth-and-death evolution.

Experimental Protocols for Key Analyses

Protocol 1: Genome-Wide Identification of NBS-LRR Genes

Sequence Retrieval: Download the latest S. miltiorrhiza genome assembly and annotation files (GFF3/GTF) from dedicated databases (e.g., Danseq, NCBI).
Hidden Markov Model (HMM) Search:
- Use HMMER (v3.3) with canonical NBS (NB-ARC) domain HMM profiles (e.g., PF00931) to search the proteome (hmmsearch --domtblout output.txt profile.hmm proteome.faa).
- Set an E-value cutoff (e.g., 1e-5) and manually curate hits.
Motif and Architecture Validation:
- Confirm identified sequences using NCBI CD-Search or InterProScan to identify integrated domains (TIR, CC, LRR).
- Classify genes into TNL, CNL, RNL, and others (NBS-only, etc.).
Manual Curation: Examine gene models using alignments to cDNA/EST evidence and homologous genes; correct obvious mis-annotations.

Protocol 2: Phylogenetic and Evolutionary Dynamics Analysis

Alignment: Perform multiple sequence alignment of NBS domains using MAFFT or Clustal Omega.
Phylogeny Construction: Construct a maximum-likelihood tree using IQ-TREE (model selected by ModelFinder) with 1000 bootstrap replicates.
Synteny and Cluster Analysis:
- Use MCScanX to analyze intra-genomic synteny. Identify tandem arrays (genes separated by ≤1 intervening gene) and segmentally duplicated blocks.
- Visualize using Circos or TBtools.
Selection Pressure Analysis:
- Extract orthologous gene pairs between S. miltiorrhiza and a related species (e.g., S. splendens).
- Calculate non-synonymous (Ka) and synonymous (Ks) substitution rates using PAML's yn00 or KaKs_Calculator.
- A Ka/Ks ratio >1 indicates positive selection; <1 indicates purifying selection.

Visualization of Analysis Workflow and Gene Evolution

Diagram 1: NBS-LRR Identification & Evolutionary Analysis Pipeline

Diagram 2: NBS-LRR Gene Birth-and-Death Evolution Model

The Scientist's Toolkit: Key Research Reagent Solutions

Table 2: Essential Reagents and Resources for SmNBS-LRR Research

Item Name / Category	Function / Application	Example Product/Source
High-Quality Genomic DNA Kit	Extraction of ultra-pure, high-molecular-weight DNA for genome sequencing and PCR.	DNeasy Plant Pro Kit (Qiagen), CTAB method reagents.
Plant RNA Preservation & Extraction Kit	Stabilization and isolation of intact RNA for expression (qRT-PCR) and transcriptome analysis.	RNAlater, RNeasy Plant Mini Kit (Qiagen).
HMMER Software Suite	Bioinformatics tool for identifying protein domains using hidden Markov models.	http://hmmer.org/
InterProScan / NCBI CDD	Integrated database for protein domain, family, and functional site prediction.	https://www.ebi.ac.uk/interpro/, https://www.ncbi.nlm.nih.gov/cdd/
Phylogeny Analysis Tools	Software for constructing and visualizing evolutionary trees.	IQ-TREE, MEGA, FigTree.
Synteny Analysis Tool	Identification and visualization of conserved gene blocks across genomes.	MCScanX, TBtools.
Positive Control NBS-LRR cDNA	Cloned, sequenced SmNBS-LRR gene for qRT-PCR assay validation.	Clone from S. miltiorrhiza cultivar cDNA or obtain from a repository.
Pathogen/Elicitor Preparations	To challenge plants and study NBS-LRR gene induction and function.	Fusarium spp. spores, yeast elicitor (chitin), salicylic acid.
*Agroinfiltration Kit (for N. benthamiana)*	For transient overexpression or silencing (VIGS) to assay gene function.	Agrobacterium tumefaciens GV3101, syringe infiltrators.

This guide explores the strategic application of ortholog-based functional prediction to prioritize and design targeted functional studies, framed within a doctoral thesis focused on the genome-wide identification and characterization of the Nucleotide-Binding Site-Leucine-Rich Repeat (NBS-LRR) gene family in the medicinal plant Salvia miltiorrhiza (Danshen). As the cornerstone of plant innate immunity, NBS-LRR genes are prime targets for understanding defense mechanisms and enhancing medicinal compound production. This whitepaper provides a technical framework for leveraging established model plant data to accelerate functional discovery in non-model species like S. miltiorrhiza.

Orthology Inference and Functional Transfer Principles

Functional prediction relies on the principle that orthologs—genes diverged after a speciation event—are more likely to retain conserved ancestral function than paralogs. For S. miltiorrhiza NBS-LRRs, key model plant orthologs are identified in Arabidopsis thaliana, Nicotiana benthamiana, and Solanum lycopersicum, which have extensive, experimentally validated immune gene databases.

Key Databases and Tools for Ortholog Discovery

Tool/Database	Primary Use	Relevance to NBS-LRR Study
OrthoFinder	Clusters orthologous groups across species	Identifies NBS-LRR gene families shared between S. miltiorrhiza and models
Ensembl Plants	Genomic comparative platform	Retrieves pre-computed orthologs/paralogs for candidate genes
PLAZA	Integrative plant comparative genomics platform	Analyzes phylogenetic distribution and functional annotations
PhytoMine	Interrogates multiple plant genomes	Extracts gene ontology (GO) terms for ortholog clusters
BLASTP/DIAMOND	Sequence similarity search	Initial identification of putative NBS-LRR orthologs

Quantitative Data from Comparative Analysis

Data derived from a genome-wide scan of the S. miltiorrhiza genome (v2.0) compared to reference models.

Table 1: NBS-LRR Gene Count and Ortholog Distribution

Species	Total NBS-LRR Genes	Genes with Ortholog in A. thaliana	Genes with Ortholog in N. benthamiana	Genes with Ortholog in S. lycopersicum
Salvia miltiorrhiza	121	67 (55.4%)	89 (73.6%)	82 (67.8%)
Arabidopsis thaliana	165	—	102 (61.8%)	98 (59.4%)
Nicotiana benthamiana	450	102 (22.7%)	—	320 (71.1%)
Solanum lycopersicum	355	98 (27.6%)	320 (90.1%)	—

Table 2: Enriched Functional Terms Among Conserved Ortholog Groups

Ortholog Cluster ID	Representative S. miltiorrhiza Gene	Conserved GO Term (Biological Process)	Model Plant Ortholog (Gene ID)	Known Function in Model
NBSOC07	SmiNBS017	GO:0009617 – response to bacterium	AT4G19050 (RPS2)	Recognizes Pseudomonas AvrRpt2
NBSOC12	SmiNBS042	GO:0009620 – response to fungus	AT4G26090 (RPP13)	Recognizes downy mildew effector
NBSOC25	SmiNBS088	GO:0006952 – defense response	NbTab2-like (Niben101Scf09881)	Mediates cell death signaling

Experimental Protocol: Ortholog-Guided Functional Assay

This protocol outlines a transient expression assay to validate the predicted function of a candidate S. miltiorrhiza NBS-LRR gene (SmiNBS017) based on its orthology to A. thaliana RPS2.

Materials and Reagents

Plant Material: N. benthamiana plants (4-5 weeks old), Agrobacterium tumefaciens strain GV3101.
Constructs: pCAMBIA1302-SmiNBS017-GFP (test), pCAMBIA1302-ATRPS2-GFP (positive control), pCAMBIA1302-GFP (negative control), pBin61-AvrRpt2 (effector).
Solutions: Infiltration buffer (10 mM MES, 10 mM MgCl2, 150 µM acetosyringone, pH 5.6), antibiotics, LB media.

Method

Agrobacterium Preparation: Transform A. tumefaciens with respective constructs. Select single colonies and grow overnight in LB with appropriate antibiotics at 28°C.
Culture Induction: Pellet cultures and resuspend in infiltration buffer to an OD600 of 0.5 for each construct. Incubate at room temperature for 3 hours.
Co-infiltration: Mix bacterial suspensions as follows:
- Test: Agrobacterium carrying SmiNBS017-GFP + Agrobacterium carrying AvrRpt2.
- Positive Control: ATRPS2-GFP + AvrRpt2.
- Negative Control 1: SmiNBS017-GFP + empty vector.
- Negative Control 2: GFP + AvrRpt2.
Infiltrate mixes into abaxial sides of N. benthamiana leaves using a needleless syringe. Mark infiltration zones.
Phenotypic Monitoring: Observe infiltrated areas over 3-7 days for hypersensitive response (HR) cell death, characterized by tissue collapse and bleaching.
Confirmation: Document HR phenotype with photography. Conduct ion leakage assays or trypan blue staining for cell death quantification. Detect protein localization via confocal microscopy (GFP signal).

The Scientist's Toolkit: Research Reagent Solutions

Item	Function in NBS-LRR Study
pCAMBIA1302-GFP Vector	A plant binary vector for constitutive expression (CaMV 35S promoter) and C-terminal GFP fusion, enabling protein localization and tracking.
Agrobacterium strain GV3101	A disarmed strain optimized for high-efficiency transient transformation in Nicotiana (agroinfiltration).
Acetosyringone	A phenolic compound that induces the Agrobacterium Vir genes, essential for T-DNA transfer during agroinfiltration.
Trypan Blue Stain	A vital dye that selectively stains dead plant cells, providing visual confirmation of the hypersensitive response (HR).
Gateway Cloning System	A rapid, high-throughput recombination-based system for transferring NBS-LRR ORFs into multiple expression vectors.
Phusion High-Fidelity DNA Polymerase	Used for error-free PCR amplification of full-length NBS-LRR coding sequences, which are often large and GC-rich.
Anti-GFP Antibody (HRP-conjugated)	Allows for immunoblot analysis to confirm expression levels of GFP-tagged NBS-LRR proteins in plant tissues.

Visualization of Workflows and Pathways

Ortholog-Based Functional Prediction and Validation Workflow

NBS-LRR Mediated Immune Signaling Pathway

Integrating ortholog-based predictions provides a powerful, data-driven strategy to navigate the functional complexity of the S. miltiorrhiza NBS-LRR family. This approach directly informs the targeted selection of candidate genes for experimental validation in the thesis research, moving beyond descriptive genomics to hypothesis-driven functional characterization. The conserved immune functions predicted and validated through these methods can ultimately be linked to variations in disease resistance and metabolic profiles in Danshen, bridging plant immunity and medicinal chemistry for drug development professionals.

Comparative Assessment of NBS-LRR Family Size and Diversity Across Medicinal Plants

This analysis is framed within a broader doctoral thesis focused on the genome-wide identification and characterization of the Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) gene family in the medicinal plant Salvia miltiorzhiza (Danshen). The thesis posits that the expansion and diversification of NBS-LRR genes are correlated with the ecological adaptability and biotic stress resilience of medicinal plants, influencing their metabolic vigor. This guide provides a comparative framework to assess this gene family across key medicinal species, contextualizing S. miltiorrhiza findings within a wider phylogenetic landscape.

NBS-LRR genes are the largest class of plant disease resistance (R) genes. They encode intracellular immune receptors that recognize pathogen effectors and trigger a robust defense response, often culminating in the hypersensitive response (HR). They are categorized into two major subfamilies based on N-terminal domains: TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL).

Comparative Genomic Analysis: Family Size and Diversity

A live search of recent genome databases and literature (2023-2024) reveals significant variation in NBS-LRR family size across medicinal plants. The data is summarized in Table 1.

Table 1: NBS-LRR Gene Family Size Across Selected Medicinal Plants

Plant Species	Common Name	Approx. NBS-LRR Count	TNL:CNL Ratio	Genome Size (Gb)	Key Reference (Recent)
Salvia miltiorrhiza	Danshen	~120	1:3.5	~0.64	Zhang et al., 2023
Panax ginseng	Ginseng	~450	1:1.8	~3.5	Chen et al., 2022
Artemisia annua	Sweet Wormwood	~85	1:4	~1.8	Wang et al., 2023
Camellia sinensis	Tea Plant	~180	1:2.2	~3.0	Liu et al., 2024
Catharanthus roseus	Madagascar Periwinkle	~70	1:5	~0.5	Zhou et al., 2023
Glycyrrhiza uralensis	Licorice	~150	1:2.5	~0.38	Li et al., 2023

Key Insight: Polyploid species (P. ginseng) show massive family expansion. S. miltiorrhiza exhibits a moderate family size but a notable bias towards the CNL subfamily, suggesting specific evolutionary paths in its immune system.

Core Experimental Protocols for Genome-Wide Identification

The following integrated pipeline is standard for NBS-LRR identification, as applied in the foundational S. miltiorrhiza thesis research.

4.1. Data Acquisition and Pre-processing

Source: Obtain the whole-genome sequence (WGS) and protein sequence file of the target plant from databases (NCBI, Phytozome, CNGB).
Software: Use BLAST+ (v2.13+) and HMMER (v3.3.2).

4.2. Homology-Based Identification

Step 1: Build a local protein database from the WGS. Use a well-curated set of known NBS-LRR protein sequences (e.g., from Arabidopsis, rice) as query in a tBLASTn search (E-value < 1e-5).
Step 2: Extract candidate genomic sequences and predict open reading frames (ORFs) using getorf (EMBOSS).

4.3. HMM Domain Scanning

Step 1: Download Pfam HMM profiles for NB-ARC (PF00931) and LRR (PF00560, PF07723, etc.).
Step 2: Perform HMMER search (hmmsearch) against the predicted protein dataset with the NB-ARC domain (E-value < 1e-5). Retain hits.
Step 3: Manually verify the presence of both NB-ARC and LRR domains in candidates using Pfam Scan or InterProScan.

4.4. Phylogenetic and Motif Analysis

Step 1: Align full-length protein sequences of identified NBS-LRRs using MAFFT or ClustalW.
Step 2: Construct a phylogenetic tree with MEGA11 (Neighbor-Joining/Maximum Likelihood, bootstrap 1000).
Step 3: Classify sequences into TNL/CNL clades based on tree topology and presence of TIR or Coiled-Coil domains detected by MEME suite (motif analysis) or COILS.

4.5. Chromosomal Localization and Duplication Analysis

Step 1: Map gene locations to chromosomes using BLASTn against genome scaffolds. Visualize with TBtools.
Step 2: Identify gene duplication events: Tandem duplicates (genes separated by ≤5 intervening genes) and segmental duplicates (using MCScanX algorithm).

Visualization of Core Concepts

Diagram 1: NBS-LRR Mediated Immune Pathway (76 chars)

Diagram 2: Genome-Wide NBS-LRR Identification Pipeline (78 chars)

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 2: Key Reagent Solutions for NBS-LRR Functional Studies

Reagent/Material	Function/Application	Example/Notes
Phusion High-Fidelity DNA Polymerase	Amplification of full-length NBS-LRR genes for cloning.	Essential for error-free PCR of large, GC-rich sequences.
Gateway or Golden Gate Cloning System	Modular construction of expression vectors for functional assays.	Enables high-throughput subcloning of multiple gene variants.
Agrobacterium tumefaciens Strain GV3101	Transient expression in Nicotiana benthamiana (effectoromics).	Used for hypersensitive response (HR) assays via agroinfiltration.
Luciferase (LUC) Reporter Constructs	Quantification of immune signaling output (e.g., downstream PR gene activation).	Firefly luciferase under control of a pathogen-responsive promoter.
Anti-GFP/HA/Flag Antibodies	Detection of tagged NBS-LRR protein expression, localization, and complex formation.	For Western blot, co-immunoprecipitation (Co-IP), and microscopy.
Programmed Cell Death (PCD) Assay Kits	Quantitative measurement of HR-induced cell death.	Includes electrolytes leakage (conductivity) or Evans Blue staining kits.
CRISPR/Cas9 Gene Editing System	Generation of NBS-LRR knockout mutants to confirm in planta function.	Requires specific sgRNA design tools and plant transformation expertise.

Conclusion

This genome-wide identification and analysis of the NBS-LRR gene family in Salvia miltiorrhiza provides a pivotal resource for the research community. We have established a foundational catalog of putative disease resistance genes, detailed robust methodological frameworks for their study, addressed key analytical challenges, and positioned these findings within a broader evolutionary context through comparative genomics. The integration of expression data suggests a potential intersection between plant defense signaling and the regulation of valuable secondary metabolite pathways, opening a promising avenue for research. Future directions should prioritize functional validation through techniques like VIGS or CRISPR-Cas9, deeper investigation into the signaling networks linking immunity and metabolism, and the application of this knowledge in developing elite, disease-resistant S. miltiorrhiza cultivars. Ultimately, this work not only advances our understanding of plant immunity in a key medicinal species but also contributes strategically to ensuring the sustainable and high-quality production of its clinically important bioactive compounds for biomedical applications.