Raw scientific data is something like gold ore — tons of rock containing a few precious nuggets. Daniela Witten, an assistant professor of biostatistics at the University of Washington in Seattle, is developing artificial intelligence programs to sort the slurry, helping researchers develop more personalized and effective treatments for cancer and other diseases.
“In the last 10 years, the field of biology has totally transformed,” says Witten, who, at 27, made the Forbes list of 30 under 30 last year with time to spare.
While a biologist a generation ago might have spent a career studying a single protein, leaps in technology now make it possible to measure thousands of proteins or map the DNA sequence of a cancer cell.
“A single experiment can generate a gigabyte of data — if not more,” says Witten.
But such data dumps can be overwhelming and of little use just sitting idle on a hard drive. The expanding study of genes, proteins and other components of cells offers the promise of developing personalized treatments, but there is now a gap between the research lab and the clinic, in part because of the difficulty in making sense of massive amounts of information.
Witten is using statistical machine learning to discover the nuggets of gold in the ore of biological research. Machine learning is a branch of artificial intelligence, the use of algorithms such as those used by Google to guide online searches. A statistical analysis of the 3 billion base pairs of DNA making up a cancer cell may be able to identify the pairs — or combination of pairs — responsible for certain characteristics of the cancer. That analysis, Witten says, narrows the focus of further research from 3 billion to a more manageable number.
Witten, the daughter of two Princeton physicists, stumbled into statistics while at Stanford.
“Before I got to college, I was planning to study foreign languages,” Witten said in an interview with the blog Simply Statistics. “Like most undergrads, I changed my mind, and eventually I majored in biology and math. I spent a summer in college doing experimental biology, but quickly discovered that I had neither the hand-eye coordination nor the patience for lab work.”
Use of statistics, Witten says, has broad applications, from cancer research to technology to finance.
“If you’re a statistician, you get to play in everyone’s backyard,” says Witten, who got a Ph.D. in statistics from Stanford University in 2010.
Witten is now collaborating on a textbook on statistical machine learning. The book is intended for an advanced undergraduate classroom, but is also aimed at professionals panning for gold in rivers of data.
Get inspired: Learn about others who are making a difference with MNN's Innovation Generation project.