Object Identifiers
In IMG, objects (such as Taxon, Chromosome, Scaffold, Transcript, Feature, Gene, Protein), are identified by unique numerical identifiers. Unique numeric OIDs are generated automatically for each new genome loaded into the system. For ORFs (gene objects with locus type: 'CDS') stable internal identifiers are maintained for across IMG releases.
When a draft genome is replaced with a finished version, new identifiers are generated for ORFs, with old identifiers mapped to the new identifiers and recorded as alternate identifiers in IMG. This mapping is carried out using taxon name, AA_checksum and flanking gene neighbors (for paralogs) values. Note that only the AA sequence signature (checksum), and the taxon information can be used for this mapping, since locus tag , product name and start/end coordinates may change between the draft and finished version. For some draft genomes the new ORF identifiers may not be mappable to old identifiers. In such cases, the old identifiers are marked as obsolete and are deleted.
When IMG is queried with an old ORF identifier, IMG will find the alternate identifiers, and map old identifiers to new identifiers transparently whenever such a mapping exists.
Note that the ORF identification mechanism in IMG is similar to the GI numbers in Genbank, where new versions of a sequence are assigned new GI numbers.
