Template Configuration Viewer
View and understand all template configurations for phenotype prediction
Purpose
Templates define how MicrobeLLM interacts with language models to extract bacterial phenotype and knowledge information. Each template consists of:
- System Template: Sets the AI assistant's role and instructions
- User Template: Defines the query format with placeholders for species names
- Validation Config: Specifies expected response format and normalization rules
Key Insight: Different templates can yield different results from the same model. Use this viewer to understand each template's design and purpose.
How to Use This Viewer
Template Types:
template1_knowlege Templates
System Template
Defines the assistant's role and instructionsClassify the knowledge level for the binomial species name:
- limited: Minimal to basic information available, challenging to make accurate predictions
- moderate: Moderate information available, including some phenotypic, morphological, genetic, or physiological haracteristics
- extensive: Wealth of comprehensive information available, enabling highly accurate predictions and assessment
User Template
Defines the user's query format with placeholdersRespond with a JSON object for {binomial_name} with the knowledge level category in lowercase in this format:
{
""knowledge_group"": ""<limited|moderate|extensive>""
}
Validation Config
Defines expected response structure and validation rules{
"template_info": {
"name": "template1_knowledge",
"type": "knowledge",
"description": "Basic knowledge level assessment template (limited, moderate, extensive)",
"version": "1.0",
"purpose": "This template evaluates the breadth of scientific knowledge available for bacterial species. It asks the LLM to categorize organisms into three knowledge levels based on how much research, literature, and data exists about them.",
"usage_context": {
"when_to_use": "Use this template when you need to assess which organisms are well-studied versus poorly understood in the scientific literature.",
"typical_workflow": "The template is typically used as a first-pass filter to identify organisms that warrant deeper investigation or to understand research gaps in microbiology."
},
"interpretation_guide": {
"limited": "Organisms with minimal scientific literature, often newly discovered or understudied species. These may have basic taxonomic information but lack detailed phenotypic or genomic characterization.",
"moderate": "Organisms with a reasonable body of research including some genomic data, basic phenotypic characterization, and presence in multiple studies. Not model organisms but reasonably well-documented.",
"extensive": "Well-studied model organisms or pathogens with comprehensive literature, complete genomes, extensive phenotypic data, and often used in research. Examples include E. coli, B. subtilis, or major pathogens."
},
"quality_indicators": {
"high_quality_response": "The model provides a clear categorization with implicit reasoning based on actual knowledge availability",
"low_quality_response": "The model fails to categorize, provides 'NA', or shows no correlation with actual research availability"
}
},
"expected_response": {
"format": "json",
"required_fields": [
"knowledge_group"
],
"optional_fields": []
},
"field_definitions": {
"knowledge_group": {
"type": "string",
"required": true,
"description": "Knowledge level category for the organism",
"allowed_values": [
"limited",
"moderate",
"extensive"
],
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"limited": ["limited", "minimal", "basic", "low", "little", "poor"],
"moderate": ["moderate", "medium", "intermediate", "fair", "some"],
"extensive": ["extensive", "comprehensive", "detailed", "high", "full", "complete", "thorough"]
}
},
"validation_error_messages": {
"missing": "Required field 'knowledge_group' is missing from response",
"invalid_value": "Invalid knowledge level. Expected one of: limited, moderate, extensive",
"wrong_type": "Field 'knowledge_group' must be a string"
}
}
},
"parsing_instructions": {
"json_extraction": {
"method": "regex",
"pattern": "\\{.*\\}",
"flags": ["DOTALL"]
},
"fallback_parsing": {
"enabled": true,
"method": "keyword_search",
"keywords": ["knowledge_group", "knowledge level", "level"]
}
},
"success_criteria": {
"minimum_required_fields": 1,
"require_all_mandatory": true,
"allow_extra_fields": false
},
"error_handling": {
"on_parse_failure": "return_null",
"on_validation_failure": "return_errors",
"on_missing_required": "return_errors"
}
}
About This Template
This template evaluates the breadth of scientific knowledge available for bacterial species. It asks the LLM to categorize organisms into three knowledge levels based on how much research, literature, and data exists about them.
Usage Context
When to use: Use this template when you need to assess which organisms are well-studied versus poorly understood in the scientific literature.
Typical workflow: The template is typically used as a first-pass filter to identify organisms that warrant deeper investigation or to understand research gaps in microbiology.
Template Configuration Files
Template Information
- System template file:
/net/llm-bioeval/demo/llm-bioeval/templates/system/template1_knowlege.txt - User template file:
/net/llm-bioeval/demo/llm-bioeval/templates/user/template1_knowlege.txt - Validation config file:
/net/llm-bioeval/demo/llm-bioeval/templates/validation/template1_knowlege.json - Template type: Knowledge
- Character count: System: 391, User: 172, Validation: 3511
Usage Notes
- The system template sets the context and instructions for the AI model
- The user template contains placeholders like
{binomial_name}that get replaced with actual values - The validation config defines expected response structure and automatically normalizes LLM outputs
- All three files work together to ensure consistent, validated results from the language model
Validation Details
- Description: Basic knowledge level assessment template (limited, moderate, extensive)
template1_phenotype Templates
System Template
Defines the assistant's role and instructionsGiven the binomial species name, predict the following phenotypic characteristics: gram staining, motility, aerophilicity, extreme environment tolerance, biofilm formation, animal pathogenicity, biosafety level, health association, host association, plant pathogenicity, spore formation, hemolysis, and cell shape. Provide the predictions in a structured JSON format, including only the most likely category for each characteristic, except for aerophilicity where multiple categories can be predicted.
Allowed categories:
- Gram Staining: gram stain negative, gram stain positive, gram stain variable
- Motility: TRUE, FALSE
- Aerophilicity: aerobic, aerotolerant, anaerobic, facultatively anaerobic
- Extreme Environment Tolerance: TRUE, FALSE
- Biofilm Formation: TRUE, FALSE
- Animal Pathogenicity: TRUE, FALSE
- Biosafety Level: biosafety level 1, biosafety level 2, biosafety level 3
- Health Association: TRUE, FALSE
- Host Association: TRUE, FALSE
- Plant Pathogenicity: TRUE, FALSE
- Spore Formation: TRUE, FALSE
- Hemolysis: alpha, beta, gamma, non-hemolytic
- Cell Shape: bacillus, coccus, spirillum, tail
Provide the predictions in a structured JSON format, including only the most likely category for each characteristic, except for aerophilicity where multiple categories can be predicted.
User Template
Defines the user's query format with placeholdersRespond with a JSON object for {binomial_name} in this format:
{
"gram_staining": "<gram stain negative|gram stain positive|gram stain variable>",
"motility": "<TRUE|FALSE>",
"aerophilicity": [
"<aerobic|aerotolerant|anaerobic|facultatively anaerobic>",
"<aerobic|aerotolerant|anaerobic|facultatively anaerobic>",
...
],
"extreme_environment_tolerance": "<TRUE|FALSE>",
"biofilm_formation": "<TRUE|FALSE>",
"animal_pathogenicity": "<TRUE|FALSE>",
"biosafety_level": "<biosafety level 1|biosafety level 2|biosafety level 3>",
"health_association": "<TRUE|FALSE>",
"host_association": "<TRUE|FALSE>",
"plant_pathogenicity": "<TRUE|FALSE>",
"spore_formation": "<TRUE|FALSE>",
"hemolysis": "<alpha|beta|gamma|non-hemolytic>",
"cell_shape": "<bacillus|coccus|spirillum|tail>"
}
Validation Config
Defines expected response structure and validation rules{
"template_info": {
"name": "template1_phenotype",
"type": "phenotype",
"description": "Comprehensive phenotype prediction template",
"version": "1.0",
"purpose": "This template extracts detailed phenotypic predictions for bacterial species across 13 different characteristics. It tests the model's ability to infer biological properties from species names and any embedded knowledge.",
"usage_context": {
"when_to_use": "Use this template when you need comprehensive phenotypic predictions including metabolic, pathogenic, and morphological characteristics.",
"typical_workflow": "This is the primary phenotype template, providing the most complete set of predictions. Results can be compared against known phenotypic data to evaluate model accuracy."
},
"interpretation_guide": {
"gram_staining": "Fundamental cell wall property: positive (thick peptidoglycan), negative (thin peptidoglycan with outer membrane), or variable",
"motility": "Whether the organism can move independently, typically via flagella or other mechanisms",
"aerophilicity": "Oxygen requirements - can be multiple values (e.g., facultatively anaerobic organisms)",
"extreme_environment_tolerance": "Ability to survive in harsh conditions (high/low pH, temperature extremes, high salt, etc.)",
"biosafety_level": "CDC/WHO classification based on pathogenic risk (BSL-1: minimal risk, BSL-2: moderate risk, BSL-3: serious risk)",
"pathogenicity": "Animal/plant pathogenicity indicates disease-causing potential in respective hosts"
},
"quality_indicators": {
"high_quality_response": "Predictions align with known biological constraints (e.g., obligate anaerobes shouldn't be aerobic), internally consistent responses",
"low_quality_response": "Biologically impossible combinations, missing critical fields for well-known organisms, or excessive uncertainty"
}
},
"expected_response": {
"format": "json",
"required_fields": [],
"optional_fields": [
"gram_staining", "motility", "aerophilicity", "extreme_environment_tolerance",
"biofilm_formation", "animal_pathogenicity", "biosafety_level",
"health_association", "host_association", "plant_pathogenicity",
"spore_formation", "hemolysis", "cell_shape"
]
},
"field_definitions": {
"gram_staining": {
"type": "string",
"required": false,
"description": "Gram staining result",
"allowed_values": ["gram stain positive", "gram stain negative", "gram stain variable"],
"visualization": {
"color_mapping": {
"gram stain positive": {"label": "Positive", "background": "#d4edda", "color": "#155724"},
"gram stain negative": {"label": "Negative", "background": "#f8d7da", "color": "#721c24"},
"gram stain variable": {"label": "Variable", "background": "#fff3cd", "color": "#856404"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"gram stain positive": ["gram stain positive", "gram positive", "gram+", "positive"],
"gram stain negative": ["gram stain negative", "gram negative", "gram-", "negative"],
"gram stain variable": ["gram stain variable", "variable"]
}
}
},
"motility": {
"type": "string",
"required": false,
"description": "Motility capability",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "motile", "positive", "1"],
"FALSE": ["false", "no", "non-motile", "nonmotile", "immobile", "negative", "0"]
}
}
},
"aerophilicity": {
"type": "array",
"required": false,
"description": "Oxygen requirements (can have multiple values)",
"allowed_values": ["aerobic", "aerotolerant", "anaerobic", "facultatively anaerobic"],
"visualization": {
"color_mapping": {
"aerobic": {"label": "Aerobic", "background": "#cce5ff", "color": "#004085"},
"anaerobic": {"label": "Anaerobic", "background": "#e2e3e5", "color": "#383d41"},
"facultatively anaerobic": {"label": "Facultative", "background": "#d1ecf1", "color": "#0c5460"},
"aerotolerant": {"label": "Aerotolerant", "background": "#e7e8ea", "color": "#495057"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"aerobic": ["aerobic", "aerobe", "oxygen-requiring"],
"aerotolerant": ["aerotolerant", "aerotolerance"],
"anaerobic": ["anaerobic", "anaerobe", "oxygen-free"],
"facultatively anaerobic": ["facultatively anaerobic", "facultative anaerobic", "facultative", "facultatively"]
},
"allow_single_value": true,
"max_values": 4
}
},
"extreme_environment_tolerance": {
"type": "string",
"required": false,
"description": "Tolerance to extreme environmental conditions",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "tolerant", "positive", "1"],
"FALSE": ["false", "no", "intolerant", "negative", "0"]
}
}
},
"biofilm_formation": {
"type": "string",
"required": false,
"description": "Ability to form biofilms",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "biofilm-forming", "positive", "1"],
"FALSE": ["false", "no", "non-biofilm-forming", "negative", "0"]
}
}
},
"animal_pathogenicity": {
"type": "string",
"required": false,
"description": "Pathogenic to animals",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "pathogenic", "positive", "1"],
"FALSE": ["false", "no", "non-pathogenic", "negative", "0"]
}
}
},
"biosafety_level": {
"type": "string",
"required": false,
"description": "Biosafety classification level",
"allowed_values": ["biosafety level 1", "biosafety level 2", "biosafety level 3"],
"visualization": {
"color_mapping": {
"biosafety level 1": {"label": "BSL-1", "background": "#d4edda", "color": "#155724"},
"biosafety level 2": {"label": "BSL-2", "background": "#fff3cd", "color": "#856404"},
"biosafety level 3": {"label": "BSL-3", "background": "#ffeaa7", "color": "#b8860b"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"biosafety level 1": ["biosafety level 1", "bsl-1", "bsl1", "level 1"],
"biosafety level 2": ["biosafety level 2", "bsl-2", "bsl2", "level 2"],
"biosafety level 3": ["biosafety level 3", "bsl-3", "bsl3", "level 3"]
}
}
},
"health_association": {
"type": "string",
"required": false,
"description": "Association with human health",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "health-associated", "positive", "1"],
"FALSE": ["false", "no", "not health-associated", "negative", "0"]
}
}
},
"host_association": {
"type": "string",
"required": false,
"description": "Association with a host organism",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "host-associated", "positive", "1"],
"FALSE": ["false", "no", "free-living", "negative", "0"]
}
}
},
"plant_pathogenicity": {
"type": "string",
"required": false,
"description": "Pathogenic to plants",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "phytopathogenic", "plant pathogenic", "positive", "1"],
"FALSE": ["false", "no", "non-phytopathogenic", "negative", "0"]
}
}
},
"spore_formation": {
"type": "string",
"required": false,
"description": "Ability to form spores",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "spore-forming", "sporulating", "positive", "1"],
"FALSE": ["false", "no", "non-spore-forming", "vegetative", "negative", "0"]
}
}
},
"hemolysis": {
"type": "string",
"required": false,
"description": "Hemolytic activity",
"allowed_values": ["alpha", "beta", "gamma", "non-hemolytic"],
"visualization": {
"color_mapping": {
"alpha": {"label": "Alpha", "background": "#cce5ff", "color": "#004085"},
"beta": {"label": "Beta", "background": "#f8d7da", "color": "#721c24"},
"gamma": {"label": "Gamma", "background": "#d1ecf1", "color": "#0c5460"},
"non-hemolytic": {"label": "Non-hemolytic", "background": "#e2e3e5", "color": "#6c757d"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"alpha": ["alpha", "α", "alpha-hemolytic"],
"beta": ["beta", "β", "beta-hemolytic"],
"gamma": ["gamma", "γ", "gamma-hemolytic"],
"non-hemolytic": ["non-hemolytic", "non", "none", "no hemolysis"]
}
}
},
"cell_shape": {
"type": "string",
"required": false,
"description": "Cellular morphology",
"allowed_values": ["bacillus", "coccus", "spirillum", "tail"],
"visualization": {
"color_mapping": {
"bacillus": {"label": "Rod", "background": "#f3e5f5", "color": "#4a148c"},
"coccus": {"label": "Spherical", "background": "#e1f5fe", "color": "#01579b"},
"spirillum": {"label": "Spiral", "background": "#fff8e1", "color": "#e65100"},
"tail": {"label": "Tail", "background": "#e8f5e8", "color": "#2e7d32"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"bacillus": ["bacillus", "rod", "rod-shaped", "bacilli"],
"coccus": ["coccus", "sphere", "spherical", "cocci"],
"spirillum": ["spirillum", "spiral", "helical", "spirilla"],
"tail": ["tail", "appendage", "flagellar"]
}
}
}
},
"parsing_instructions": {
"json_extraction": {
"method": "regex",
"pattern": "\\{.*\\}",
"flags": ["DOTALL"]
},
"fallback_parsing": {
"enabled": true,
"method": "line_based",
"keywords": [
"gram_staining", "motility", "aerophilicity", "extreme_environment_tolerance",
"biofilm_formation", "animal_pathogenicity", "biosafety_level",
"health_association", "host_association", "plant_pathogenicity",
"spore_formation", "hemolysis", "cell_shape"
]
}
},
"success_criteria": {
"minimum_required_fields": 0,
"require_all_mandatory": false,
"allow_extra_fields": true
},
"error_handling": {
"on_parse_failure": "return_null",
"on_validation_failure": "return_partial",
"on_missing_required": "return_errors"
}
}
About This Template
This template extracts detailed phenotypic predictions for bacterial species across 13 different characteristics. It tests the model's ability to infer biological properties from species names and any embedded knowledge.
Usage Context
When to use: Use this template when you need comprehensive phenotypic predictions including metabolic, pathogenic, and morphological characteristics.
Typical workflow: This is the primary phenotype template, providing the most complete set of predictions. Results can be compared against known phenotypic data to evaluate model accuracy.
Template Configuration Files
Template Information
- System template file:
/net/llm-bioeval/demo/llm-bioeval/templates/system/template1_phenotype.txt - User template file:
/net/llm-bioeval/demo/llm-bioeval/templates/user/template1_phenotype.txt - Validation config file:
/net/llm-bioeval/demo/llm-bioeval/templates/validation/template1_phenotype.json - Template type: Phenotype
- Character count: System: 1304, User: 813, Validation: 14010
Usage Notes
- The system template sets the context and instructions for the AI model
- The user template contains placeholders like
{binomial_name}that get replaced with actual values - The validation config defines expected response structure and automatically normalizes LLM outputs
- All three files work together to ensure consistent, validated results from the language model
Validation Details
- Description: Comprehensive phenotype prediction template
template2_knowlege Templates
System Template
Defines the assistant's role and instructionsDetermine the knowledge level for the binomial strain name based on the extent and depth of available scientific literature and understanding:
- limited: Strains with minimal to basic information available, including newly discovered or poorly studied strains. These strains have limited data on their fundamental characteristics, making it challenging to make accurate predictions about their properties and behavior. The lack of extensive research hinders the ability to draw meaningful conclusions or make reliable assessments across various domains.
- moderate: Strains with a moderate amount of information available, including phenotypic, morphological, and some genetic or physiological characteristics. While these strains have been studied more comprehensively than those in the Limited category, the available data may still have some gaps in understanding their full metabolic functions, ecological roles, and potential applications in various contexts.
- extensive: Strains with a wealth of comprehensive information available, including extensive research on their phenotypic, morphological, genetic, physiological, and ecological characteristics. The in-depth knowledge available for these strains enables highly accurate predictions and assessments of their properties, behavior, and potential applications across various contexts. The scientific literature covers a wide range of aspects, providing a holistic understanding of these well-studied strains.
If the strain name is not a real or recognized bacterial strain, or if there is no information available to determine the knowledge level, respond with NA.
User Template
Defines the user's query format with placeholdersRespond with a JSON object for {binomial_name} with the knowledge level category in lowercase in this format:
{
"knowledge_group": "<limited|moderate|extensive|NA>"
}
Validation Config
Defines expected response structure and validation rules{
"template_info": {
"name": "template2_knowledge",
"type": "knowledge",
"description": "Knowledge level assessment template with NA support",
"version": "1.0",
"purpose": "This template evaluates the scientific knowledge available for bacterial species with explicit support for 'NA' responses. It's designed to handle cases where LLMs cannot assess knowledge levels, which is particularly important for testing model calibration.",
"usage_context": {
"when_to_use": "Use this template when you want to allow models to explicitly state when they cannot determine the knowledge level, providing a more nuanced view of model confidence.",
"typical_workflow": "This template is useful for distinguishing between species the model believes are poorly studied versus species the model simply cannot assess."
},
"interpretation_guide": {
"limited": "Organisms with minimal scientific literature, often newly discovered or understudied species. These may have basic taxonomic information but lack detailed phenotypic or genomic characterization.",
"moderate": "Organisms with a reasonable body of research including some genomic data, basic phenotypic characterization, and presence in multiple studies. Not model organisms but reasonably well-documented.",
"extensive": "Well-studied model organisms or pathogens with comprehensive literature, complete genomes, extensive phenotypic data, and often used in research. Examples include E. coli, B. subtilis, or major pathogens.",
"NA": "The model cannot assess the knowledge level or is uncertain. This is a valuable response indicating model calibration and awareness of its limitations."
},
"quality_indicators": {
"high_quality_response": "The model appropriately uses 'NA' for uncertain cases while providing clear categorizations for well-known species",
"low_quality_response": "The model never uses 'NA' (overconfident) or uses 'NA' excessively (underconfident)"
}
},
"expected_response": {
"format": "json",
"required_fields": [
"knowledge_group"
],
"optional_fields": []
},
"field_definitions": {
"knowledge_group": {
"type": "string",
"required": true,
"description": "Knowledge level category for the organism",
"allowed_values": [
"limited",
"moderate",
"extensive",
"NA"
],
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"limited": ["limited", "minimal", "basic", "low", "little", "poor"],
"moderate": ["moderate", "medium", "intermediate", "fair", "some"],
"extensive": ["extensive", "comprehensive", "detailed", "high", "full", "complete", "thorough"],
"NA": ["na", "n/a", "n.a.", "not available", "not applicable", "unknown", "unavailable", "none", "null", "no data", "no information"]
}
},
"validation_error_messages": {
"missing": "Required field 'knowledge_group' is missing from response",
"invalid_value": "Invalid knowledge level. Expected one of: limited, moderate, extensive, NA",
"wrong_type": "Field 'knowledge_group' must be a string"
}
}
},
"parsing_instructions": {
"json_extraction": {
"method": "regex",
"pattern": "\\{.*\\}",
"flags": ["DOTALL"]
},
"fallback_parsing": {
"enabled": true,
"method": "keyword_search",
"keywords": ["knowledge_group", "knowledge level", "level"]
}
},
"success_criteria": {
"minimum_required_fields": 1,
"require_all_mandatory": true,
"allow_extra_fields": false
},
"error_handling": {
"on_parse_failure": "return_null",
"on_validation_failure": "return_errors",
"on_missing_required": "return_errors"
}
}
About This Template
This template evaluates the scientific knowledge available for bacterial species with explicit support for 'NA' responses. It's designed to handle cases where LLMs cannot assess knowledge levels, which is particularly important for testing model calibration.
Usage Context
When to use: Use this template when you want to allow models to explicitly state when they cannot determine the knowledge level, providing a more nuanced view of model confidence.
Typical workflow: This template is useful for distinguishing between species the model believes are poorly studied versus species the model simply cannot assess.
Template Configuration Files
Template Information
- System template file:
/net/llm-bioeval/demo/llm-bioeval/templates/system/template2_knowlege.txt - User template file:
/net/llm-bioeval/demo/llm-bioeval/templates/user/template2_knowlege.txt - Validation config file:
/net/llm-bioeval/demo/llm-bioeval/templates/validation/template2_knowlege.json - Template type: Knowledge
- Character count: System: 1628, User: 171, Validation: 3863
Usage Notes
- The system template sets the context and instructions for the AI model
- The user template contains placeholders like
{binomial_name}that get replaced with actual values - The validation config defines expected response structure and automatically normalizes LLM outputs
- All three files work together to ensure consistent, validated results from the language model
Validation Details
- Description: Knowledge level assessment template with NA support
template2_phenotype Templates
System Template
Defines the assistant's role and instructionsGiven the gene list of an organism, predict the following phenotypic characteristics: gram staining, motility, aerophilicity, extreme environment tolerance, biofilm formation, animal pathogenicity, biosafety level, health association, host association, plant pathogenicity, spore formation, hemolysis, and cell shape. Provide the predictions in a structured JSON format, including only the most likely category for each characteristic, except for aerophilicity where multiple categories can be predicted.
Allowed categories:
- Gram Staining: gram stain negative, gram stain positive, gram stain variable
- Motility: TRUE, FALSE
- Aerophilicity: aerobic, aerotolerant, anaerobic, facultatively anaerobic
- Extreme Environment Tolerance: TRUE, FALSE
- Biofilm Formation: TRUE, FALSE
- Animal Pathogenicity: TRUE, FALSE
- Biosafety Level: biosafety level 1, biosafety level 2, biosafety level 3
- Health Association: TRUE, FALSE
- Host Association: TRUE, FALSE
- Plant Pathogenicity: TRUE, FALSE
- Spore Formation: TRUE, FALSE
- Hemolysis: alpha, beta, gamma, non-hemolytic
- Cell Shape: bacillus, coccus, spirillum, tail
Provide the predictions in a structured JSON format, including only the most likely category for each characteristic, except for aerophilicity where multiple categories can be predicted.
User Template
Defines the user's query format with placeholdersRespond with a JSON object for {binomial_name} in this format:
{
"gram_staining": "<gram stain negative|gram stain positive|gram stain variable>",
"motility": "<TRUE|FALSE>",
"aerophilicity": [
"<aerobic|aerotolerant|anaerobic|facultatively anaerobic>",
"<aerobic|aerotolerant|anaerobic|facultatively anaerobic>",
...
],
"extreme_environment_tolerance": "<TRUE|FALSE>",
"biofilm_formation": "<TRUE|FALSE>",
"animal_pathogenicity": "<TRUE|FALSE>",
"biosafety_level": "<biosafety level 1|biosafety level 2|biosafety level 3>",
"health_association": "<TRUE|FALSE>",
"host_association": "<TRUE|FALSE>",
"plant_pathogenicity": "<TRUE|FALSE>",
"spore_formation": "<TRUE|FALSE>",
"hemolysis": "<alpha|beta|gamma|non-hemolytic>",
"cell_shape": "<bacillus|coccus|spirillum|tail>"
}
Validation Config
Defines expected response structure and validation rules{
"template_info": {
"name": "template2_phenotype",
"type": "phenotype",
"description": "Alternative phenotype prediction template with gene-focused approach",
"version": "1.0",
"purpose": "This template provides an alternative approach to phenotype prediction, potentially using different prompt structures or emphasizing genetic/genomic aspects. It tests how different formulations affect prediction accuracy and completeness.",
"usage_context": {
"when_to_use": "Use this template to compare phenotype prediction consistency across different prompt formulations, or when you want to emphasize genetic/genomic information in predictions.",
"typical_workflow": "Often used alongside template1_phenotype to evaluate prompt sensitivity and identify which formulation yields more accurate or complete phenotypic predictions."
},
"interpretation_guide": {
"consistency_check": "Compare results with template1_phenotype to assess model reliability across different prompt formulations",
"genetic_emphasis": "This template may elicit responses that focus more on genetically-determined traits versus environmentally-influenced characteristics",
"validation_approach": "Cross-validate predictions from both phenotype templates against known bacterial databases"
},
"quality_indicators": {
"high_quality_response": "Biologically consistent predictions that align with known phenotypic constraints, minimal contradictions with template1 results",
"low_quality_response": "Frequent contradictions with template1, biologically impossible trait combinations, or significantly different response patterns without clear justification"
}
},
"expected_response": {
"format": "json",
"required_fields": [],
"optional_fields": [
"gram_staining", "motility", "aerophilicity", "extreme_environment_tolerance",
"biofilm_formation", "animal_pathogenicity", "biosafety_level",
"health_association", "host_association", "plant_pathogenicity",
"spore_formation", "hemolysis", "cell_shape"
]
},
"field_definitions": {
"gram_staining": {
"type": "string",
"required": false,
"description": "Gram staining result",
"allowed_values": ["gram stain negative", "gram stain positive", "gram stain variable"],
"visualization": {
"color_mapping": {
"gram stain positive": {"label": "Positive", "background": "#d4edda", "color": "#155724"},
"gram stain negative": {"label": "Negative", "background": "#f8d7da", "color": "#721c24"},
"gram stain variable": {"label": "Variable", "background": "#fff3cd", "color": "#856404"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"gram stain positive": ["gram stain positive", "gram positive", "gram+", "positive"],
"gram stain negative": ["gram stain negative", "gram negative", "gram-", "negative"],
"gram stain variable": ["gram stain variable", "variable"]
}
}
},
"motility": {
"type": "string",
"required": false,
"description": "Motility capability",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "motile", "positive", "1"],
"FALSE": ["false", "no", "non-motile", "nonmotile", "immobile", "negative", "0"]
}
}
},
"aerophilicity": {
"type": "array",
"required": false,
"description": "Oxygen requirements (can have multiple values)",
"allowed_values": ["aerobic", "aerotolerant", "anaerobic", "facultatively anaerobic"],
"visualization": {
"color_mapping": {
"aerobic": {"label": "Aerobic", "background": "#cce5ff", "color": "#004085"},
"anaerobic": {"label": "Anaerobic", "background": "#e2e3e5", "color": "#383d41"},
"facultatively anaerobic": {"label": "Facultative", "background": "#d1ecf1", "color": "#0c5460"},
"aerotolerant": {"label": "Aerotolerant", "background": "#e7e8ea", "color": "#495057"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"aerobic": ["aerobic", "aerobe", "oxygen-requiring"],
"aerotolerant": ["aerotolerant", "aerotolerance"],
"anaerobic": ["anaerobic", "anaerobe", "oxygen-free"],
"facultatively anaerobic": ["facultatively anaerobic", "facultative anaerobic", "facultative", "facultatively"]
},
"allow_single_value": true,
"max_values": 4
}
},
"extreme_environment_tolerance": {
"type": "string",
"required": false,
"description": "Tolerance to extreme environmental conditions",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "tolerant", "positive", "1"],
"FALSE": ["false", "no", "intolerant", "negative", "0"]
}
}
},
"biofilm_formation": {
"type": "string",
"required": false,
"description": "Ability to form biofilms",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "biofilm-forming", "positive", "1"],
"FALSE": ["false", "no", "non-biofilm-forming", "negative", "0"]
}
}
},
"animal_pathogenicity": {
"type": "string",
"required": false,
"description": "Pathogenic to animals",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "pathogenic", "positive", "1"],
"FALSE": ["false", "no", "non-pathogenic", "negative", "0"]
}
}
},
"biosafety_level": {
"type": "string",
"required": false,
"description": "Biosafety classification level",
"allowed_values": ["biosafety level 1", "biosafety level 2", "biosafety level 3"],
"visualization": {
"color_mapping": {
"biosafety level 1": {"label": "BSL-1", "background": "#d4edda", "color": "#155724"},
"biosafety level 2": {"label": "BSL-2", "background": "#fff3cd", "color": "#856404"},
"biosafety level 3": {"label": "BSL-3", "background": "#ffeaa7", "color": "#b8860b"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"biosafety level 1": ["biosafety level 1", "bsl-1", "bsl1", "level 1"],
"biosafety level 2": ["biosafety level 2", "bsl-2", "bsl2", "level 2"],
"biosafety level 3": ["biosafety level 3", "bsl-3", "bsl3", "level 3"]
}
}
},
"health_association": {
"type": "string",
"required": false,
"description": "Association with human health",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "health-associated", "positive", "1"],
"FALSE": ["false", "no", "not health-associated", "negative", "0"]
}
}
},
"host_association": {
"type": "string",
"required": false,
"description": "Association with a host organism",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "host-associated", "positive", "1"],
"FALSE": ["false", "no", "free-living", "negative", "0"]
}
}
},
"plant_pathogenicity": {
"type": "string",
"required": false,
"description": "Pathogenic to plants",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "phytopathogenic", "plant pathogenic", "positive", "1"],
"FALSE": ["false", "no", "non-phytopathogenic", "negative", "0"]
}
}
},
"spore_formation": {
"type": "string",
"required": false,
"description": "Ability to form spores",
"allowed_values": ["TRUE", "FALSE"],
"visualization": {
"color_mapping": {
"TRUE": {"label": "True", "background": "#d4edda", "color": "#155724"},
"FALSE": {"label": "False", "background": "#f8d7da", "color": "#721c24"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"TRUE": ["true", "yes", "spore-forming", "sporulating", "positive", "1"],
"FALSE": ["false", "no", "non-spore-forming", "vegetative", "negative", "0"]
}
}
},
"hemolysis": {
"type": "string",
"required": false,
"description": "Hemolytic activity",
"allowed_values": ["alpha", "beta", "gamma", "non-hemolytic"],
"visualization": {
"color_mapping": {
"alpha": {"label": "Alpha", "background": "#cce5ff", "color": "#004085"},
"beta": {"label": "Beta", "background": "#f8d7da", "color": "#721c24"},
"gamma": {"label": "Gamma", "background": "#d1ecf1", "color": "#0c5460"},
"non-hemolytic": {"label": "Non-hemolytic", "background": "#e2e3e5", "color": "#6c757d"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"alpha": ["alpha", "α", "alpha-hemolytic"],
"beta": ["beta", "β", "beta-hemolytic"],
"gamma": ["gamma", "γ", "gamma-hemolytic"],
"non-hemolytic": ["non-hemolytic", "non", "none", "no hemolysis"]
}
}
},
"cell_shape": {
"type": "string",
"required": false,
"description": "Cellular morphology",
"allowed_values": ["bacillus", "coccus", "spirillum", "tail"],
"visualization": {
"color_mapping": {
"bacillus": {"label": "Rod", "background": "#f3e5f5", "color": "#4a148c"},
"coccus": {"label": "Spherical", "background": "#e1f5fe", "color": "#01579b"},
"spirillum": {"label": "Spiral", "background": "#fff8e1", "color": "#e65100"},
"tail": {"label": "Tail", "background": "#e8f5e8", "color": "#2e7d32"}
}
},
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"bacillus": ["bacillus", "rod", "rod-shaped", "bacilli"],
"coccus": ["coccus", "sphere", "spherical", "cocci"],
"spirillum": ["spirillum", "spiral", "helical", "spirilla"],
"tail": ["tail", "appendage", "flagellar"]
}
}
}
},
"parsing_instructions": {
"json_extraction": {
"method": "regex",
"pattern": "\\{.*\\}",
"flags": ["DOTALL"]
},
"fallback_parsing": {
"enabled": true,
"method": "line_based",
"keywords": [
"gram_staining", "motility", "aerophilicity", "extreme_environment_tolerance",
"biofilm_formation", "animal_pathogenicity", "biosafety_level",
"health_association", "host_association", "plant_pathogenicity",
"spore_formation", "hemolysis", "cell_shape"
]
}
},
"success_criteria": {
"minimum_required_fields": 0,
"require_all_mandatory": false,
"allow_extra_fields": true
},
"error_handling": {
"on_parse_failure": "return_null",
"on_validation_failure": "return_partial",
"on_missing_required": "return_errors"
}
}
About This Template
This template provides an alternative approach to phenotype prediction, potentially using different prompt structures or emphasizing genetic/genomic aspects. It tests how different formulations affect prediction accuracy and completeness.
Usage Context
When to use: Use this template to compare phenotype prediction consistency across different prompt formulations, or when you want to emphasize genetic/genomic information in predictions.
Typical workflow: Often used alongside template1_phenotype to evaluate prompt sensitivity and identify which formulation yields more accurate or complete phenotypic predictions.
Template Configuration Files
Template Information
- System template file:
/net/llm-bioeval/demo/llm-bioeval/templates/system/template2_phenotype.txt - User template file:
/net/llm-bioeval/demo/llm-bioeval/templates/user/template2_phenotype.txt - Validation config file:
/net/llm-bioeval/demo/llm-bioeval/templates/validation/template2_phenotype.json - Template type: Phenotype
- Character count: System: 1307, User: 813, Validation: 13782
Usage Notes
- The system template sets the context and instructions for the AI model
- The user template contains placeholders like
{binomial_name}that get replaced with actual values - The validation config defines expected response structure and automatically normalizes LLM outputs
- All three files work together to ensure consistent, validated results from the language model
Validation Details
- Description: Alternative phenotype prediction template with gene-focused approach
template3_knowlege Templates
System Template
Defines the assistant's role and instructionsDetermine the knowledge level for the the binomial species name based on the extent of available data and research:
- limited: Species with minimal data and research, typically with few strains or subspecies (<5 strains, <2 subspecies), little genetic information (<10 scientific articles), no complete genome sequences, and limited presence in culture collections (absent or very few strains). This level indicates a lack of comprehensive studies, making it challenging to draw reliable conclusions about the species' characteristics and behavior. Examples of bacteria in this category might include newly discovered species or rare isolates, such as Chryseobacterium solincola or Bacillus eiseniae.
- moderate: Species with moderate data and research, with more strains or subspecies (5-10 strains, 2-4 subspecies), some genome sequencing (partial or one complete genome), moderate presence in culture collections, and a fair amount of scientific literature (10-50 articles). This level indicates a reasonable amount of study, but there might be gaps in understanding the full range of characteristics and applications. Examples of bacteria in this category could include species like Lactobacillus plantarum or Pseudomonas putida, which have been studied to some extent but may not have extensive research available.
- extensive: Species with comprehensive data and extensive research, having numerous strains or subspecies (>10 strains, >4 subspecies), multiple complete genome sequences, widespread presence in culture collections, and a wealth of scientific literature (>50 articles). This level indicates a vast amount of knowledge, allowing for highly accurate predictions and a thorough understanding of the species' characteristics and potential applications. Examples of bacteria in this category would include well-studied species such as Escherichia coli, Bacillus subtilis, or Streptococcus pneumoniae, which have been extensively researched and have a wealth of information available.
If the species name is not a real or recognized species, or if there is no information available to determine the knowledge level, respond with NA.
User Template
Defines the user's query format with placeholdersRespond with a JSON object for {binomial_name} with the knowledge level category in lowercase in this format:
{
"knowledge_group": "<limited|moderate|extensive|NA>"
}
Validation Config
Defines expected response structure and validation rules{
"template_info": {
"name": "template3_knowledge",
"type": "knowledge",
"description": "Alternative knowledge level assessment template with NA support",
"version": "1.0",
"purpose": "This template provides an alternative prompt structure for evaluating scientific knowledge about bacterial species. It includes explicit NA support to test how different prompt formulations affect model responses and calibration.",
"usage_context": {
"when_to_use": "Use this template to compare how different prompt structures influence knowledge assessment consistency across models.",
"typical_workflow": "Often used alongside template1 and template2 to evaluate prompt sensitivity and identify the most reliable formulation for knowledge assessment."
},
"interpretation_guide": {
"limited": "Organisms with minimal scientific literature, often newly discovered or understudied species. These may have basic taxonomic information but lack detailed phenotypic or genomic characterization.",
"moderate": "Organisms with a reasonable body of research including some genomic data, basic phenotypic characterization, and presence in multiple studies. Not model organisms but reasonably well-documented.",
"extensive": "Well-studied model organisms or pathogens with comprehensive literature, complete genomes, extensive phenotypic data, and often used in research. Examples include E. coli, B. subtilis, or major pathogens.",
"NA": "The model cannot assess the knowledge level or is uncertain. This response helps evaluate model calibration and self-awareness."
},
"quality_indicators": {
"high_quality_response": "Consistent categorizations across different prompt formulations, with appropriate use of NA for uncertain cases",
"low_quality_response": "Highly variable responses to different prompts for the same species, or inconsistent NA usage"
}
},
"expected_response": {
"format": "json",
"required_fields": [
"knowledge_group"
],
"optional_fields": []
},
"field_definitions": {
"knowledge_group": {
"type": "string",
"required": true,
"description": "Knowledge level category for the organism",
"allowed_values": [
"limited",
"moderate",
"extensive",
"NA"
],
"validation_rules": {
"case_sensitive": false,
"trim_whitespace": true,
"normalize_mapping": {
"limited": ["limited", "minimal", "basic", "low", "little", "poor"],
"moderate": ["moderate", "medium", "intermediate", "fair", "some"],
"extensive": ["extensive", "comprehensive", "detailed", "high", "full", "complete", "thorough"],
"NA": ["na", "n/a", "n.a.", "not available", "not applicable", "unknown", "unavailable", "none", "null", "no data", "no information"]
}
},
"validation_error_messages": {
"missing": "Required field 'knowledge_group' is missing from response",
"invalid_value": "Invalid knowledge level. Expected one of: limited, moderate, extensive, NA",
"wrong_type": "Field 'knowledge_group' must be a string"
}
}
},
"parsing_instructions": {
"json_extraction": {
"method": "regex",
"pattern": "\\{.*\\}",
"flags": ["DOTALL"]
},
"fallback_parsing": {
"enabled": true,
"method": "keyword_search",
"keywords": ["knowledge_group", "knowledge level", "level"]
}
},
"success_criteria": {
"minimum_required_fields": 1,
"require_all_mandatory": true,
"allow_extra_fields": false
},
"error_handling": {
"on_parse_failure": "return_null",
"on_validation_failure": "return_errors",
"on_missing_required": "return_errors"
}
}
About This Template
This template provides an alternative prompt structure for evaluating scientific knowledge about bacterial species. It includes explicit NA support to test how different prompt formulations affect model responses and calibration.
Usage Context
When to use: Use this template to compare how different prompt structures influence knowledge assessment consistency across models.
Typical workflow: Often used alongside template1 and template2 to evaluate prompt sensitivity and identify the most reliable formulation for knowledge assessment.
Template Configuration Files
Template Information
- System template file:
/net/llm-bioeval/demo/llm-bioeval/templates/system/template3_knowlege.txt - User template file:
/net/llm-bioeval/demo/llm-bioeval/templates/user/template3_knowlege.txt - Validation config file:
/net/llm-bioeval/demo/llm-bioeval/templates/validation/template3_knowlege.json - Template type: Knowledge
- Character count: System: 2149, User: 171, Validation: 3782
Usage Notes
- The system template sets the context and instructions for the AI model
- The user template contains placeholders like
{binomial_name}that get replaced with actual values - The validation config defines expected response structure and automatically normalizes LLM outputs
- All three files work together to ensure consistent, validated results from the language model
Validation Details
- Description: Alternative knowledge level assessment template with NA support