Computational Immunology - Immunological Database

Immunological Database

After the recent advances in sequencing and proteomics technology, there have been many fold increase in generation of molecular and immunological data. The data are so diverse that they can be categorized in different databases according to their use in the research. Until now there are total 31 different immunological databases noted in the Nucleic Acids Research (NAR) Database Collection, which are given in the following table. The information given in the table is taken from the database descriptions in NAR Database Collection.

Database	Description
ALPSbase	Autoimmune lymphoproliferative syndrome database
AntigenDB	Sequence, structure, and other data on pathogen antigens.
AntiJen	Quantitative binding data for peptides and proteins of immunological interest.
BCIpep	This database stores information of all experimentally determined B-cell epitopes of antigenic proteins. This is a curated database where detailed information about the epitopes are collected and compiled from published literature and existing databases. It covers a wide range of pathogenic organisms like virus, bacteria, protozoa and fungi. Each entry in database provides full information about a B-cell epitope that includes amino acid sequences, source of the antigenic protein, immunogenicity, model organism and antibody generation/neutralization test.
dbMHC	dbMHC provides access to HLA sequences, tools to support genetic testing of HLA loci, HLA allele and haplotype frequencies of over 90 populations worldwide, as well as clinical datasets on nematopoietic stem cell transplantation, and insulin dependent diabetes mellitus (IDDM), Rheumatoid Arthritis (RA), Narcolepsy and Spondyloarthropathy. For more information go to this link http://www.oxfordjournals.org/nar/database/summary/604
DIGIT	Database of ImmunoGlobulin sequences and Integrated Tools.
FIMM	FIMM is an integrated database of functional molecular immunology that focuses on the T-cell response to disease-specific antigens. FIMM provides fully referenced information integrated with data retrieval and sequence analysis tools on HLA, peptides, T-cell epitopes, antigens, diseases and constitutes one backbone of future computational immunology research. Antigen protein data have been enriched with more than 27,000 sequences derived from the non-redundant SwissProt-TREMBL-TREMBL_NEW (SPTR) database of antigens similar or related FIMM antigens across various species to facilitate a comprehensive analysis of conserved or variable T-cell epitopes.
GPX-Macrophage Expression Atlas	The GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators. GPX Macrophage Expression Atlas (GPX-MEA) follows the MIAME standard and includes an objective quality score with each experiment. It places special emphasis on rigorously capturing the experimental design and enables the statistical analysis of expression data from different micro-array experiments. This is the first example of a focussed macrophage gene expression database that allows efficient identification of transcriptional patterns, which provide novel insights into biology of this cell system.
HaptenDB	It is a comprehensive database of hapten molecules. This is a curated database where information is collected and complied from published literature and web resources. Presently database has more than 1700 entries where each entry provides comprehensive detail about a hapten molecule that includes: i) nature of the hapten; ii) methods of anti- hapten antibody production; iii) information about carrier protein; iv) coupling method; v) assay method (used for characterization) and vi) specificities of antibodies. The Haptendb covers wide array of haptens ranging from antibiotics of biomedical importance to pesticides. This database will be very useful for studying the serological reactions and production of antibodies.
HPTAA	HPTAA is a database of potential tumor-associated antigens that uses expression data from various expression platforms, including carefully chosen publicly available microarray expression data, GEO SAGE data and Unigene expression data.
IEDB-3D	Structural data within the Immune Epitope Database.
IL2Rgbase	X-linked severe combined immunodeficiency mutations.
IMGT	IMGT is an integrated knowledge resource specialized in IG, TR, MHC, IG superfamily, MHC superfamily and related proteins of the immune system of human and other vertebrate species. IMGTW comprises 6 databases, 15 on-line tools for sequence, gene and 3D structure analysis, and more than 10,000 pages of resources Web. Data standardization, based on IMGT-ONTOLOGY, has been approved by WHO/IUIS.
IMGT_GENE-DB	IMGT/GENE-DB is the IMGT® comprehensive genome database for immunoglobulins (IG) and T cell receptors (TR) genes from human and mouse, and, in development, from other vertebrate species (e.g. rat). IMGT/GENE-DB is part of IMGT®, the international ImMunoGeneTics information system®, the high-quality integrated knowledge resource specialized in IG, TR, major histocompatibility complex (MHC) of human and other vertebrate species, and related proteins of the immune system (RPI) that belong to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF).
IMGT/HLA	There are currently over 1600 officially recognised HLA alleles and these sequences are made available to the scientific community through the IMGT/HLA database. In 1998, the IMGT/HLA database was publicly released. Since this time, the database has grown and is the primary source of information for the study of sequences of the human major histocompatibilty complex. The initial release of the database contained allele reports, alignment tools, submission tools as well as detailed descriptions of the source cells. The database is updated quarterly with all the new and confirmatory sequences submitted to the WHO Nomenclature Committee and on average an additional 75 new and confirmatory sequences are included in each quarterly release. The IMGT/HLA database provides a centralized resource for everybody interested, either centrally or peripherally, in the HLA system.
IMGT/LIGM-DB	IMGT/LIGM-DB is the IMGT® comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences, from human and other vertebrate species, with translation for fully annotated sequences, created in 1989 by LIGM http://www.imgt.org/textes/IMGTinformation/LIGM.html), Montpellier, France, on the Web since July 1995. IMGT/LIGM-DB is the first and the largest database of IMGT®, the international ImMunoGeneTics information system®, the high-quality integrated knowledge resource specialized in IG, TR, major histocompatibility complex (MHC) of human and other vertebrate species, and related proteins of the immune system (RPI) that belong to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF). IMGT/LIGM-DB sequence data are identified by the EMBL/GenBank/DDBJ accession number. The unique source of data for IMGT/LIGM-DB is EMBL which shares data with GenBank and DDBJ.
Interferon Stimulated Gene Database	Interferons (IFN) are a family of multifunctional cytokines that activate transcription of a subset of genes. The gene products induced by IFN are responsible for the antiviral, antiproliferative and immunomodulatory properties of this cytokine. In order to obtain a more comprehensive understanding of the genes regulated by IFNs we have used different microarray formats to identify over 400 interferon stimulated genes (ISG). To facilitate the dissemination of this data we have compiled a database comprising the ISGs assigned into functional categories. The database is fully searchable and contains links to sequence and Unigene information. The database and the array data are accessible via the world wide web at (http://www.lerner.ccf.org/labs/williams/ ). We intend to add published ISG-sequences and those discovered by further transcript profiling to the database to eventually compile a complete list of ISGs.
IPD-ESTDAB	The Immuno Polymorphism Database (IPD) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD-ESTDAB is a database of immunologically characterised melanoma cell lines. The database works in conjunction with the European Searchable Tumour Cell Line Database (ESTDAB) cell bank, which is housed in TÜbingen, Germany and provides immunologically characterised tumour cells.
IPD-HPA - Human Platelet Antigens	Human platelet antigens are alloantigens expressed only on platelets, specifically on platelet membrane glycoproteins. These platelet-specific antigens are immunogenic and can result in pathological reactions to transfusion therapy. The IPD-HPA section contains nomenclature information and additional background material about Human platelet antigen. The different genes in the HPA system have not been sequenced to the same level as some of the other projects and so currently only single nucleotide polymorphisms (SNP) are used to determine alleles. This information is presented in a grid of SNP for each gene The IPD and HPA nomenclature committee hope to expand this to provide full sequence alignments when possible.
IPD-KIR - Killer-cell Immunoglobulin-like Receptors	The Killer-cell Immunoglobulin-like Receptors (KIR) are members of the immunoglobulin super family (IgSF) formerly called Killer-cell Inhibitory Receptors. KIRs have been shown to be highly polymorphic both at the allelic and haplotypic levels. They are composed of two or three Ig-domains, a transmembrane region and cytoplasmic tail, which can in turn be short (activatory) or long (inhibitory). The Leukocyte Receptor Complex (LRC), which encodes KIR genes, has been shown to be polymorphic, polygenic and complex in a manner similar to the MHC. The IPD-KIR Sequence Database contains the most up to date nomenclature and sequence alignments.
IPD-MHC	The MHC sequences of many different species have been reported, along with different nomenclature systems used in the naming and identification of new genes and alleles in each species. The sequences of the major histocompatibility complex from number of different species are highly conserved between species. By bringing the work of different nomenclature committees and the sequences of different species together it is hoped to provide a central resource that will facilitate further research on the MHC of each species and on their comparison. The first release of the IPD-MHC database involved the work of groups specialising in non-human primates, canines (DLA) and felines (FLA) and incorporated all data previously available in the IMGT/MHC database. This release included data from five species of ape, sixteen species of new world monkey, seventeen species of old world monkey, as well as data on different canines and felines. Since the first release, sequences from cattle (BoLA), swine (SLA), and rats (RT1) have been added and the work to include MHC sequences from chickens, horses (ELA) is still going on.
MHCBN	MHCBN is a comprehensive database comprising over 23000 peptides sequences, whose binding affinity with MHC or TAP molecules has been assayed experimentally. It is a curated database where entries are compiled from published literature and public databases. Each entry of the database provides full information like (sequence, its MHC or TAP binding specificity, source protein) about peptide whose binding affinity (IC50) and T cell activity is experimentally determined. MHCBN has number of web-based tools for the analysis and retrieval of information. All database entries are hyperlinked to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, PubMed and OMIM to provide the information beyond the scope of MHCBN. Current version of MHCBN contains 1053 entries of TAP binding peptides. The information about the diseases associated with various MHC alleles is also included in this version.
MHCPEP	This database contains list of MHC-binding peptides
MPID-T2	MPID-T2 (http://biolinfo.org/mpid-t2/) is a highly curated database for sequence-structure-function information on MHC-peptide interactions. It contains all structures of major histocompatibility complex proteins (MHC) containing bound peptides, with emphasis on the structural characterization of these complexes. Database entries have been grouped into fully referenced redundant and non-redundant categories. The MHC-peptide interactions have been presented in terms of a set of sequence and structural parameters representative of molecular recognition. MPID will facilitate the development of algorithms to predict whether a query peptide sequence will bind to a specific MHC allele. MPID data has been sorted primarily on the basis of MHC Class, followed by organism (MHC source), next by allele type and finally by the length of peptide in the binding groove (peptide residues within 5 Å of the MHC). Data on inter-molecular hydrogen bonds, gap volume and gap index available in MPID are pre-computed and the interface area due to complex formation is calculated based on accessible surface area calculations. The available MHC-peptide databases have addressed sequence information as well as binding (or the lack thereof) of peptide sequences.
MUGEN Mouse Database	Murine models of immune processes and immunological diseases.
Protegen	Protective antigen database and analysis system.
SuperHapten	SuperHapten is a manually curated hapten database integrating information from literature and web resources. The current version of the database compiles 2D/3D structures, physicochemical properties and references for about 7,500 haptens and 25,000 synonyms. The commercial availability is documented for about 6,300 haptens and 450 related antibodies, enabling experimental approaches on cross-reactivity. The haptens are classified regarding their origin: pesticides, herbicides, insecticides, drugs, natural compounds, etc. Queries allow identification of haptens and associated antibodies according to functional class, carrier protein, chemical scaffold, composition or structural similarity.
The Immune Epitope Database (IEDB)	The Immune Epitope Database (IEDB, www.iedb.org), provides a catalog of experimentally characterized B and T cell epitopes, as well as data on MHC binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules were determined to be immune epitopes. Epitopes recognized in humans, non-human primates, rodents, pigs, cats and all other tested species are included. Both positive and negative experimental results are captured. Over the course of four years, the data from 180,978 experiments were curated manually from the literature, covering about 99% of all publicly available information on peptide epitopes mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens.
TmaDB	To analyse TMA output a relational database (known as TmaDB) has been developed to collate all aspects of information relating to TMAs. These data include the TMA construction protocol, experimental protocol and results from the various immunocytological and histochemical staining experiments including the scanned images for each of the TMA cores. Furthermore, the database contains pathological information associated with each of the specimens on the TMA slide, the location of the various TMAs and the individual specimen blocks (from which cores were taken) in the laboratory and their current status. TmaDB has been designed to incorporate and extend many of the published common data elements and the XML format for TMA experiments and is therefore compatible with the TMA data exchange specifications developed by the Association for Pathology Informatics community.
VBASE2	VBASE2 is an integrative database of germ-line V genes from the immunoglobulin loci of human and mouse. It presents V gene sequences from the EMBL database and Ensembl together with the corresponding links to the source data. The VBASE2 dataset is generated in an automatic process based on a BLAST search of V genes against EMBL and the Ensembl dataset. The BLAST hits are evaluated with the DNAPLOT program, which allows immunoglobulin sequence alignment and comparison, RSS recognition and analysis of the V(D)J-rearrangements. As a result of the BLAST hit evaluation, the VBASE2 entries are classified into 3 different classes: class 1 holds sequences for which a genomic reference and a rearranged sequence is known. Class 2 contains sequences, which have not been found in a rearrangement, thus lacking evidence of functionality. Class 3 contains sequences which have been found in different V(D)J rearrangements but lack a genomic reference. All VBASE2 sequences are compared with the datasets from the VBASE-, IMGT- and KABAT-databases (latest published versions), and the respective references are provided in each VBASE2 sequence entry. The VBASE2 database can be accessed by either a text based query form or by a sequence alignment with the DNAPLOT program. A DAS-server shows the VBASE2 dataset within the Ensembl Genome Browser and links to the database.
Epitome	Epitome is a database of all known antigenic residues and the antibodies that interact with them, including a detailed description of the residues involved in the interaction and their sequence/structure environments. Each entry in the database describes one interaction between a residue on an antigenic protein and a residue on an antibody chain. Every interaction is described using the following parameters: PDB identifier, antigen chain ID PDB position of the antigenic residue, type of antigenic residue and its sequence environment, antigen residue secondary structure state, antigen residue solvent accessibility, antibody chain ID, type of antibody chain (heavy or light), CDR number, PDB position of the antibody residue, and type of antibody residue and its sequence environment. Additionally, interactions can be visualized using an interface to Jmol.

Online resources for allergy information are also available on http://www.allergen.org. Such data is valuable for investigation of cross-reactivity between known allergens and analysis of potential allergenicity in proteins. The Structural Database of Allergen Proteins (SDAP) stores information of allergenic proteins. The Food Allergy Research and Resource Program (FARRP) Protein Allergen-Online Database contains sequences of known and putative allergens derived from scientific literature and public databases. Allergome emphasizes the annotation of allergens that result in an IgE-mediated disease.

Read more about this topic: Computational Immunology