In order to make decisions ahead of the curve, one must be able to extract maximum information, knowledge, and even wisdom from minimal data in the shortest possible time. The study aims to uncover the most critical information from limited datasets to predict future trends or control the development of events. Traditional big data research primarily focuses on analyzing large datasets, but in reality, many fields face the challenge of small sample sizes and high-dimensional data. For instance, the semiconductor and biomedical industries often need to identify correlations among numerous variables from limited samples. The research focuses on two key areas:
1.Decision Tree Models: Develop sample-efficient regression trees (SERT) and multi-layer classifiers (MLC) that can learn more effectively from small datasets and provide more interpretable models. These models have demonstrated excellent performance in semiconductor yield analysis and bioinformatics.
2.Variable Selection: Propose a new method for evaluating the relative importance of variables, enabling more accurate selection of key variables from high-dimensional data. This is particularly significant for screening key genes from large-scale genomic data.
Through the above research, aim to pioneer new directions in data science research, especially in the analysis of small sample, high-dimensional data. The findings not only contribute to improving prediction accuracy but also help researchers better understand the mechanisms of complex systems.