IMG: Integrated Microbial Genomes
IMG: Integrated Microbial Genomes

What's New

IMG 2.4 is the 12th release of the Integrated Microbial Genomes (IMG) genomic data management and analysis system. IMG 2.4 was released on Dec 1st, 2007.

IMG 2.4 Content

Genomes

IMG 2.4 content has been updated with new microbial genomes available in RefSeq version 25 (September 14, 2007).

IMG 2.4 contains a total of 3,637 genomes consisting of 818 bacterial, 50 archaeal, 40 eukaryotic genomes, 2,042 viruses (including bacterial phages), and 687 plasmids that did not come from a specific microbial genome sequencing project. Among these genomes:

Plasmid names were curated by adding strain names to organism name when available from publications or other sources. For example, see plasmid (NC_001520) available at NCBI . The original name from NCBI/RefSeq is "Acidithiobacillus ferrooxidans plasmid pTF4.1". The name was changed in IMG to "Acidithiobacillus ferrooxidans MAL4-1 plasmid pTF4.1". In total 423 plasmid names were modified by adding the strain information that was not available in the Host name. More information is available on the Plasmids Page

Note that 32 microbial genomes from IMG 2.3 were replaced in IMG 2.4 because (1) a "Draft" genome has been replaced by its "Finished" version or (2) the composition of the genome has changed through the addition of new replicons, that is, plasmids or chromosomes. For replaced genomes, whenever possible, the gene object identifiers (gene OIDs) for the protein-coding genes (CDS) were mapped to their new version in IMG. 2.4. See IMG Data Evolution History for details.

Annotations

IMG Terms and Pathways

IMG’s native controlled vocabularies for gene function (IMG Terms) and organism-independent functional hierarchies (IMG Pathways and PartsLists) have been extended. IMG 2.4 has 4,148 IMG Terms (1,526 new terms compared to IMG 2.3), and 524 IMG Pathways and Parts Lists (29 new pathways compared to IMG 2.3). In addition, 546,169 IMG genes are associated with IMG Terms (105,063 new gene-term associations compared to IMG 2.3).

Small RNAs

Small RNA genes that are missing from the original RefSeq genome files are added based on the Rfam database (http://www.sanger.ac.uk/Software/Rfam/). In IMG 2.4 2,962 small RNA genes were detected in 162 genomes.

rRNAs and tRNAs

tRNA and rRNA genes (23S, 16S and 5S) that are missing from the original RefSeq genome files are added using tRNAscan-SE v1.23 for tRNA genes and similarity comparisons to existing RNA genes fro rRNA genes. In IMG 2.4 3,687 tRNA and 1,555 tRNA genes were added in 149 genomes.

Pfam and TIGRfam

Starting with IMG 2.4, genes are associated with Pfam and TIGRfam annotations using hmmsearch, a tool in the HMMER package (http://hmmer.janelia.org).

In order to speed up data processing at a slight cost to sensitivity, instead of a full scan of all available HMM models using HMM (which can run very slowly), BLAST was used as a pre-filter to narrow down the candidate HMM models. A non stringent e-value cutoff was used for BLAST in order to pick up any sub-sequence from the seed sequences of an HMM model that could be a candidate for full HMM scoring. Thus, the BLAST e-value cutoff was set to 10 for Pfam and 1 for TIGRfam seed sequences databases, respectively. hmmsearch was then applied on the candidate models with a per family noise cutoff (--cut_nc). The scoring for domain level hits were recorded for Pfam, while scoring for the full model is recorded for TIGRfam.

Fusions

A gene is defined as a fusion if it is formed from the composition (fusion) of two or more previously separate genes (component genes). The identification of such genes is based on the computation described in AboutIMG IMG Content/Fusions section.

Only genes from finished genomes were considered as putative components in order to avoid false predictions from fragmented genes in draft genomes. Furthermore, genes that frequently appear as fragmented in finished genomes, such as transposases and integrases, and pseudogenes were excluded from fusion calculations.

IMG 2.4 User Interface

The User Interface has been reorganized and extended in order to improve its overall usability.

Main Menu

The UI main menu was changed as follows (see UI Map):

Genomes

Find Genomes

Lookup of old genome (taxon) identifier and automatic mapping to new identifier is provided for both Genome Search with filter Taxon Object ID as well as for Quick Genome Search.

Genome Details

At the top of the Organism Details page there are links to the main parts within the page: Organism Information, Genome Statistics, Exploration Tools, and Export Genome Data.

Genes

Gene Details

At the top of the Gene Details page there are links to the main parts within the page: Gene Information, Evidence for Function Predictions, Sequence Search Tools, and Homolog Selection. The Homolog Toolkit has been renamed Customized Homolog Display and is part of Homolog Selection.

Functions

Find Functions
  • Pfam Browser: Pfam domains are organized, when possible, into Pfam Categories that are based on COG categories.
  • External links to individual COGs, Pfams and TIGRfams have been provided in the COG Browser, Pfam Browser, and TIGRfam Browser, respectively.
  • IMG Terms, IMG Pathways, IMG Parts List, and IMG Networks are consolidated under IMG Terms & Pathways.
Function Details

External links to COG, Pfam, and TIGRfam have been provided in the COG Category Details and Pfam Category Details, and TIGRfam Role Details, respectively.

Analysis Carts

Gene Cart

At the top of the Gene Cart page there are links to the main parts within the page: Gene List, Upload & Export, Comparison Tools, and Profile Tools.

Function Cart

At the top of the Function Cart page there are links to the main parts within the page: Function List, Upload & Export, and Profile Tools.