HaiYing Wang    王海鹰

Department of Statistics
University of Connecticut

Room 319 Philip E. Austin Building
215 Glenbrook Rd. U-4120
Storrs, CT 06269-4120
Phone: (860) 486-6142

About Me

Research Interests

  • Incomplete data analysis
  • Model selection and model averaging
  • Nonparametric and semi-parametric regression
  • Optimum experimental design
  • Sub-sample methods for big data

Work in progress

  1. Wang, H. and J. K. Kim (2021+). Maximum sampled conditional likelihood for informative subsampling.   pdf


  1. Yao, Y. and Wang, H. (2020+). A review on optimal subsampling methods for massive datasets. Journal of Data Science. Accepted   pdf
  2. Zuo, L., Zhang, H., Wang, H., and Liu, L. (2020). Sampling-based estimation for massive survival data with additive hazards model. Statistics in Medicine. DOI:10.1002/sim.8783   pdf
  3. Zhang, H. and Wang, H. (2020). Distributed subdata selection for big data via sampling-based approach. Computational Statistics & Data Analysis. DOI:10.1016/j.csda.2020.107072   pdf
  4. Pronzato, L. and Wang, H. (2020). Sequential online subsampling for thinning experimental designs. Journal of Statistical Planning and Inference. Accepted pdf
  5. Yao, Y. and Wang, H. (2020). A selective review on statistical techniques for big data. In Modern Statistical Methods for Health Research, accepted. Springer   pdf
  6. Wang, H. (2020). Logistic Regression for Massive Data with Rare Events. The 37st International Conference on Machine Learning (ICML-2020). Accepted.  pdf   julia code
  7. Yu, J., Wang, H., Ai, M., and Zhang, H. (2020). Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. Journal of the American Statistical Association. DOI:10.1080/01621459.2020.1773832  pdf
  8. Cheng, Q., Wang, H., and Yang, M. (2020). Information-based optimal subdata selection for big data logistic regression. Journal of Statistical Planning and Inference. DOI:10.1016/j.jspi.2020.03.004   pdf
  9. Lee, J., Wang, H., and Schifano, E. (2020). Online updating method to correct for measurement error in big data streams. Computational Statistics and Data Analysis. DOI:10.1016/j.csda.2020.106976   pdf
  10. Hu, G. and Wang, H. (2020). Most likely optimal subsampled Markov Chain Monte Carlo. Journal of Systems Science and Complexity. Accepted. pdf
  11. Wang, H. and Ma, Y. (2020). Optimal subsampling for quantile regression in big data. Biometrika. Accepted. pdf   julia&R code
  12. Zhou, Y., Qiu, L., Wang, H., and Chen, X. (2020). Induction of Activity Synchronization among Primed Hippocampal Neurons out of Random Dynamics is Key for Trace Memory Formation and Retrieval. The FASEB Journal. 34: 3658 - 3676. DOI:10.1096/fj.201902274R
  13. Xue, Y., Wang, H. , Yan, J., Schifano, E. (2019). An Online Updating Approach for Testing the Proportional Hazards Assumption with Streams of Survival Data. Biometrics, 171-182. DOI:10.1111/biom.13137   pdf
  14. Wang, H. (2019). More Efficient Estimation for Logistic Regression with Optimal Subsample. Journal of Machine Learning Research, 20(132):1−59pdf  Code and Data
  15. Ai, M., Yu, J., Zhang, H. and Wang, H. (2019). Optimal Subsampling Algorithms for Big Data Generalized Linear Models. Statistica Sinica. DOI: 10.5705/ss.202018.0439. pdf
  16. Wang, H. (2019). Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm. Journal of Statistical Theory and Practice. DOI: 10.1007/s42519-019-0048-5.  pdf   julia Code
  17. Zhou, Y., Qiu, L., Sterpka, A., Wang, H., Chu, F., and Chen, X. (2019). Comparative Phospho- proteomic Profiling of Type III Adenylyl Cyclase Knockout and Control, Male, and Female Mice. Frontiers in Cellular Neuroscience, 13, 34. https://doi.org/10.3389/fncel.2019.00034.
  18. Yao, Y. and Wang, H. (2019). Optimal subsampling for softmax regression. Statistical Papers, 60(2):235-249.  pdf
  19. Wang, H., Yang, M. and Stufken, J. (2019) Information-Based Optimal Subdata Selection for Big Data Linear RegressionJournal of the American Statistical Association, 393-405.  pdf   R Package  Updated R code
  20. Wang, H., Zhu, R. and Ma, P. (2018) Optimal Subsampling for Large Sample Logistic Regression. Journal of the American Statistical Association, 829-844.  pdf  R package
  21. Stang, S., Wang, H., Gardnera, K., Mo, W. (2018). Influences of water quality and climate on the water-energy nexus: A spatial comparison of two water systems. Journal of Environmental Management, 218, 613-621
  22. Zhang, X., Wang, H., Ma, Y. and Carroll, R. J. (2017) Linear Model Selection when Covariates Contain Errors. Journal of the American Statistical Association, 1553-1561.  pdf  Supplementary
  23. Li, Y., He, X., Wang, H. and Sun, J. (2016) Joint Analysis of Longitudinal Data and Informative Observation Times with Time-Dependent Random Effects. New Developments in Statistical Modeling, Inference and Application, 37-51.   pdf
  24. Lane, A., Wang, H. and Flournoy, N. (2016) Conditional inference in two-stage response-adaptive experiments via the bootstrap MoDa 11 - Advances in Model-Oriented Design and Analysis, 173-181. Springer.  pdf
  25. Mo, W, Wang, H. and Jacobsa, J. (2016). Understanding the influence of climate change on the embodied energy of water supply. Water Research, 220-229.
  26. Wang, H., Schaeben, H. and Keidel, F. (2015) Optimized Subsampling for Logistic Regression with Imbalanced Large Datasets. Proceeding of the 17th annual conference of the International Association for Mathematical Geosciences, 1113-1119.
  27. Li, Y., He, X., Wang, H. and Sun, J. (2015) Semiparametric Regression of Multivariate Panel Count Data with Informative Observation Times. Journal of Multivariate Analysis, 140, 209-219. pdf
  28. Li, Y., He, X., Wang, H. and Sun, J. (2016) Regression Analysis of Longitudinal Data with Correlated Censoring and Observation Times. Lifetime Data Analysis, 22, 343-362. pdf
  29. Wang, H. and Flournoy, N. (2015) On the consistency of the maximum likelihood estimation for the three parameter lognormal distribution. Statistics and Probability Letters, 105, 57-64. pdf
  30. Wang, H., Li, Y. and Sun, J. (2015). Focused and Model Average Estimation for Panel Count Data. Scandinavian Journal of Statistics, 42, 732-745. pdf
  31. Wang, H., Chen, X. and Flournoy, N. (2016). The Focus Information Criterion and Model Averaging for Varying-Coefficient Partially Linear Measurement Error Models. Statistical Papers, 57, 99-113. pdf
  32. Wang, H., Flournoy, N., and Kpamegan, E. (2014). A new bounded log-linear regression model. Metrika 77, 5, 695–720  pdf
  33. Wang, H., Pepelyshev, A. and Flournoy, N. (2013). Optimal design for a new bounded log-linear regression model. MoDa 10 - Advances in Model-Oriented Design and Analysis, 237-245. Springer. pdf
  34. Wang, H. and Zhou, S. Z. F. (2013). Interval Estimation by Frequentist Model Averaging. Communications in Statistics - Theory and Method, 42, 4342-4356. pdf
  35. Wang, H., Zou, G. and Wan, A. T. K. (2013). Adaptive Lasso for Varying-Coefficient Partially Linear Measurement Error Models. Journal of Statistical Planning and Inference, 143, 40-54. pdf
  36. Wang, H. and Sun, D. (2012). Objective Bayesian analysis of a truncated model. Statistics and Probability Letters, 82, 2125-2135. pdf
  37. Wang, H., Zou, G. and Wan, A. T. K. (2012). Model Averaging for Varying-Coefficient Partially Linear Measurement Error Models. Electronic Journal of Statistics, 6, 1017-1039.
  38. Wang, H. and Zou, G. (2012). Frequentist Model Averaging Estimation for Linear Errors-in-Variables Models. Journal of Systems Science and Mathematical Science, 32 (2), 1-14. pdf
  39. Kozak, M., Wang, H. (2010). On stochastic optimization in sample allocation among strata. METRON, LXVIII n.1, pp. 95-103. pdf
  40. Wang, H., Zhang, X. and Zou, G. (2009). Frequentist model averaging estimation: a review. Journal of Systems Science and Complexity, 22 (4), 732-748. pdf
  41. Feng, S., Ding, W., Wang, H., Yu, Z., Chen, Y., Zhang, Y. and Xiao, H. (2008). Sampling procedures for inspection by attributes-Part 3: Skip-lot sampling procedures. In Chinese National Standard, GB/T2828.3-2008.


  • At the University of Missouri
    • Statistics 1200 - Introductory Statistical Reasoning, Fall 2010, Spring 2011, Fall 2011 (3cr.)
    • Statistics 2500 - Introductory to probability and statistics I, Spring 2012 (3cr.)
    • Statistics 3500 - Introductory to probability and statistics II, Fall 2012, Spring 2013 (3cr.)
  • At the University of New Hampshire
    • Math 539 - Introduction to Statistical Analysis, Fall 2014 (4cr.)
    • Math 644 - Statistics for Engineers and Scientists, Fall 2013, Spring 2014, Fall 2014 (4cr.)
    • Math 736/836 - Advanced Statistical Methods for Research, Spring 2014, Spring 2015, Spring 2016 (4cr.)
    • Math 739/839 - Applied Regression Analysis, Fall 2016 (4cr.)
    • Math 755/855 - Probability with Applications, Fall 2015, Fall 2016 (4cr.)
    • Math 756/856 - Principles of Statistical Inference, Spring 2016, Spring 2017 (4cr.)
    • Math 969 - Topics in Probability and Statistics, Spring 2017 (3cr.)
  • At the University of Connecticut
    • BIST/STAT 5505 - Applied Statistics I, Fall 2017, 2018, 2019 (3cr.)
    • STAT 3115Q - Analysis of Experiments (3cr.), Spring, 2018
    • BIST/STAT 6494: Statistical Inference for Big Data (3cr.) Spring, 2018
    • BIST/STAT 5535: Nonparametric Methods (3cr. taught using julia) Fall, 2018, 2020
    • BIST/STAT 5605 - Applied Statistics II, Spring 2019, 2020 (3cr.)