abstract blue triangle background pattern

Cool Data

By Maryalice Yakutchik

Down in the basement of the Bloomberg School’s Wolfe Street building, past a wall of antiquated mailboxes, near dubious signage indicating the whereabouts of a fax, an incessant hum emanates from a secure room monitored by a surveillance camera.

Inside that room, cold air blasts from two 10-ton air conditioners, each the size of two old telephone booths and capable of cooling about 10 average-sized homes.

“Computers like it cool,” explains Fernando Pineda, PhD, director of the Department of Biostatistics’ High Performance Scientific Computing Core (HPSCC), half of which is now housed here since outgrowing a server room on the third floor.

Consuming more than 30 kilowatts of power, these machines generate lots of heat and depend on prodigious air conditioning to avoid meltdowns.

That the facility—which provides large-scale research computing and storage capabilities for Johns Hopkins researchers in biostatistics, statistical genetics, genomics, computational biology and bioinformatics—burst out of its original physical space not five years after it was established was no great surprise to Pineda, an associate professor in Molecular Microbiology and Immunology.

“Genomics is very computationally intensive,” Pineda says. “Our computing and storage capacity has doubled every year for the past six years, and there’s no end in sight. It’s a chronic problem; not a problem you solve, but a problem you manage.”

“Our computing and storage capacity has doubled every year for the past six years, and there's no end in sight." —Fernando Pineda

Some of the HPSCC’s best customers are genetic epidemiologists in the Department of Epidemiology. They analyze the 3 billion or so base pairs that constitute the human genome to identify and characterize genes that might be linked to complex diseases such as hypertension and cancer as well as schizophrenia and autism.

“Biologists used to rely on test tubes, Petri plates and lab notebooks,” Pineda observes. “Now they use next-generation sequencing machines that spew out massive data sets requiring complex analyses involving mammoth calculations.”

For instance, just one of the HPSCC’s prolific customers (Andy Feinberg’s lab in the Center for Epigenetics at the Johns Hopkins School of Medicine) can generate as much as one terabyte of data every week. That’s 10 to the 12th bytes. To put this amount of data in material-world terms, you can think of one terabyte as 50,000 trees made into paper and printed. Ten terabytes equals the entire printed collection of the U.S. Library of Congress. And the National Archives of Britain holds more than 900 years of written material, which amounts to about 60 terabytes of data.

“We currently have capacity for 95 terabytes on spinning disk and another 50 on tape,” Pineda says, indicating racks of storage devices, each of which someone has affectionately labeled with a name: There’s Fran and Stan, and Thumper I and II. Luckily, there’s room for the inevitable Fran II and Stan II, and Thumper III and IV.

“There’s always going to be more data,” Pineda says, explaining that sequencing costs soon are bound to drop below analysis and storage costs.

Surprisingly, computing and storage capacity isn’t the issue that keeps Pineda awake at night. It’s the cooling, he says: “The power and the cooling. And of those two, the cooling is the big headache.”