The initial phase of this project will focus on the rapid development of a database platform for integration of diverse experimental data, based on our existing framework.

The current database core includes key human and model-organism genomic data, along with gene function and structure information from various online resources such as GO and InterPro. Gene homology datasets are also included to enable mapping of data across species. To effectively extend the current framework it will be important to understand fully the nature of the available biological data, so that they can be accurately modelled and mapped to a common core dataset. For most types of data the core dataset will be gene-centric, but in some cases, such as epidemiological or pathological datasets, data will be mapped to a relevant tissue or cancer type rather than a gene.

Mapping of data to a core dataset will enable simple cross-referencing that will underpin all subsequent analysis.