Yan, Donghui

Dr. Donghui Yan

Research experience

PhD in Statistics, University of California, Berkeley

My research interests lie broadly in statistical methodology and
machine learning algorithms as well as applied statistics in various
domains.

Research projects

Some of my past and current applied projects include:

  • web log mining for anomaly detection (@Bell Labs)
  • analysis of remote sensing images for land use and land cover applications
  • analysis of tissue images and health analytics
  • personalization and recommendation for e-commerce (@WalMart Labs)
  • sentimental analysis of user reviews, salient comments and summary generation
  • Covid-19 data analysis

More information about my research projects

Potential research projects for students

(1) Recognition of poisonous spider (toxicity level)
Spiders are the largest class of arachnids and rank seventh in total species diversity among all living organisms. There are potentially hundreds (maybe thousands) of spider species anywhere in the world. While many are harmless to human, some are very poisonous. It would be nice to tell if a spider is poisonous (or extremely poisonous) by an image or an instantly taken photo.

— Data source: Spider Identification Community

— May need to crawl the web for spider images and whether the associated spider is posionous.

(2) Analysis of tree rings of California redwoods (giant sequoias)

    Giant sequoias are one of the longest living species in the world. According to Los Angeles Times, the oldest coastal redwood is 2,520 years old and the oldest giant sequoia is about 3,200 years old. The tree growth would keep a faithful record of long-term climate in a large geogological span, and for this reason they are called the “living fossils”. They can be an ideal proxy to study climate change or historical weather anomaly.

— Data source: International Tree Ring Data Bank

(3) Solar flare prediction

” …scientific output of solar research can be greatly enhanced by better exploitation of the existing solar/heliosphere space-data products jointly with ground-based observations”