IMG: Integrated Microbial Genomes
IMG: Integrated Microbial Genomes

Architecture

IMG has a multi-tier architecture, as shown below.

system architecture diagram

A web browser such as Mozilla, Internet Explorer, Safari, or Firefox, can be used for accessing IMG. The browser connects to a remote Apache web server running the IMG Web Data Explorer application. The application is implemented using Perl 5.8.x and employs the GD package for graphics.

The Exploration Viewers and Tools component handles the data exploration operations, such as gene search and genome browser, and provides support for running external tools such as BLAST, ClustalW, and JalView.

The User File Handler handles files consisting of genes and genomes of interest to users. These files can be generated using IMG's data export capabilities or can be created locally as tab delimited files; subsequently, these files can be loaded into IMG for conducting further analysis. For registered users, the User FIle Handler also provides support for loading user specific gene annotations from a local tab delimited files into the IMG warehouse.

The IMG back end (data server) consists of the IMG warehouse implemented with the Oracle 9i database management system, BLAST databases for similarity searches against NR, SwissProt, Pfam, and IMG genes, and auxiliary data files that contain scaffold DNA sequences, KEGG map images. The IMG back end also includes pre-computed statistics and phylogenetic profiles, BLAST homolog results, and other cache data for improving performance, such as precomputed gene/scaffold/cog mapping data for the ortholog neighborhood viewer and the genome line positions in the phylogenetic genome browser.

An ETL (extract-transfor-load) Perl based toolkit is employed for extracting, cleaning, integrating, and loading data from external resources into the IMG warehouse. In addition, custom tools compute gene relationships and clusters and load these data into the warehouse.