Emory Integrated Computational Core
The EICC, one of the Emory Integrated Core Facilities, provides computational and bioinformatics services to Emory investigators and is the “digital hub” for the EICF. The EICC has 500 sq ft of dedicated office space on the 7th floor of the Woodruff Memorial Research Building which provides for meeting customers, weekly meetings with the members of the Emory Integrated Genomics Core, and for monthly meetings of computational service providers from other cores within the EICF. Servers and storage are located on the Emory campus in a climate controlled and secure data center. The EICC infrastructure and services include:
EICC Cluster: The EICC offers comprehensive computational services and bioinformatics pipelines for the analysis of -omics data. The EICC operates a small HPC system for short computational jobs.
The cluster serves multiple functions related to core projects, including running NGS analysis pipelines, high-performance and parallel computing for disciplines such as proteomics, metabolomics, and imaging, hosting personal data analysis platforms such as Galaxy and GenePattern. It's composed of 1 head node, 5 high-memory compute nodes, 2 load-balanced web server nodes, and 1 database node.
The cluster has a 104TB local storage array and offers access to PBNAS - a 2.0 PB storage system, as well as to Emory Isilon storage (1PB research-grade storage, 500TB of HIPAA compliant storage). A 10 Gbps ethernet switch provides a high-speed Storage Area Network (SAN) fabric. All storage arrays and compute nodes utilize the SAN for data transfer, and are configured to connect via the 10 Gbps high-speed network.
The cluster is connected to the Internet2 high-speed network for large data transfers to and from external systems. The cluster runs Scientific Linux 7 64-bit operating system on all nodes and utilizes Slurm for job submission and management. Configuration: One head node: 2 x 3.3 GHz 8-core CPUs, 64 GB RAM. Five compute nodes: 4 x 2.2 GHz 16-core CPUs, 512 GB RAM, 10 Gbps ethernet. Two web servers: 2 x3.3 GHz 8-core CPUs, 128 GB RAM, 10 Gbps ethernet. One database server: 2 x 3.3 GHz 8-core CPUs, 128 GB RAM, 10 Gbps ethernet.
Amazon Web Services (AWS) Cloud Computing
Amazon Web Services (AWS) is an on-demand delivery of IT resources in the cloud with pay-as-you-go pricing. The AWS infrastructure is highly durable, available, elastic and scalable. The Emory AWS environment is an AWS environment that is established according to the Emory business, security and compliance practices.
Access to the Emory AWS Console must be authenticated with Emory Single sign-on. The virtual private cloud within the environment is protected by the Emory firewall. Secured connection (SSH or RDP) to an EC2 instance must be made from a workstation already located on the Emory network or via a VPN tunnel that is authenticated by 2-factor authentication.
All AWS services have been reviewed by the Emory Security Team, and specific guidelines about utilizing these services for HIPAA or identifiable health information will be published.
Emory University provides access to the Emory AWS environment to researchers as part of the overall IT support for research. AWS computing and data storage expenses, however, are not covered by the university, and must be budgeted for in grant applications. The EICC works with LITS to provide guidance on AWS usage and optimization.
We divide computational services into two main categories.
- The first enables expert users to access existing pipelines or develop their own custom analyses.
- The second category provides investigators the ability to have analyses performed by an EICC computational/bioinformatics expert for a set fee per project.
Galaxy provides a wide variety of bioinformatic tools that allow the analysis, manipulation and visualization of large genome-wide datasets from a wide variety of platforms, including microarrays and next-generation sequencing instruments. The EICC also supports an enterprise HIPAA compliant LabKey server for Emory investigators. Collaborators outside Emory can also access this LabKey server infrastructure when collaborating with Emory investigators.
Standard analysis pipelines using other open-source software packages are implemented for DNA/RNA-seq/ChIP-seq/16S microbiome sequencing projects for human, animal, and microbial genomes. We have implemented the QIIME 2 pipeline for microbiome data analyses.
Custom tools or other pipelines (such as mothur) are also available. For the analysis of RNA-seq data, we have implemented the Star and HTseq-count pipeline. Custom tools and pipelines can be developed for specialized projects such as fusion transcript detection. For targeted sequencing, exome sequencing, and whole genome sequencing, we use a custom PEMapper and PECaller pipeline. For variant annotation, we use the bystro.io software package.
We have also implemented and analyze data sets with other mapping and variant identification pipelines (BWA, GATK). Substantial capacity exists for these integrated computing resources to support computational/bioinformatic analyses for EICC users.