1. Random-walk Domination in Large Graphs
Jeffrey Xu Yu, Department of Systems, Engineering and Engineering Management, The Chinese University of Hong Kong, China
In this talk, we talk two random-walk domination problems in graphs that are motivated by a number of applications in practice including item-placement problem in online social networks, Ads-placement problem in advertisement networks, and resource-placement problem in P2P networks. Consider a graph G. The goal of the first type of random-walk domination problem is to target k nodes such that the total hitting time of an L-length random walk starting from the remaining nodes to the targeted nodes is minimized. The goal of the second type of random-walk domination problem is to find k nodes to maximize the expected number of nodes that hit any one targeted node through an L-length random walk. We show that these problems are two special instances of the submodular set function maximization with cardinality constraint problem. We discuss a dynamic-programming (DP) based greedy algorithm which is with near-optimal performance guarantee. The DP-based greedy algorithm, however, is not very efficient due to the expensive marginal gain evaluation. To further speed up the algorithm, we propose an approximate greedy algorithm with linear time complexity w.r.t. the graph size and also with near-optimal performance guarantee. The approximate greedy algorithm is based on carefully designed random walk sampling and sample-materialization techniques. Our extensive experiments demonstrate the effectiveness, efficiency and scalability of the proposed algorithms.
Dr Jeffrey Xu Yu is a Professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. His current main research interests include graph mining, graph query processing, graph pattern matching, keywords search in databases, and online social networks. Dr. Yu served as an Information Director and a member in ACM SIGMOD executive committee (2007-2011), an associate editor of IEEE Transactions on Knowledge and Data Engineering (2004-2008), and an associate editor in VLDB Journal (2007-2013). Currently he servers as an associate editor in PVLDB (2014), WWW Journal, the International Journal of Cooperative Information Systems, and the Journal on Health Information Science and Systems (HISS). Dr. Yu served/serves in over 300 organization committees and program committees in international conferences/workshops including PC Co-chair of APWeb'04, WAIM'06, APWeb/WAIM'07, WISE'09, PAKDD'10, DASFAA'11, ICDM'12, and NDBC'13, and Conference Co-Chair of APWeb'13. He has papers published in reputed journals and major international conferences including ACM TODS, VLDB J, IEEE TKDE, ACM SIGMOD, ACM SIGKDD, VLDB, IEEE ICDE, and IEEE ICDM.
2. Intelligent Data Analysis: From Mining to Science
Xiaohui Liu, Brunel University, UK (Xiaohui.Liu@brunel.ac.uk)
Making good sense of data is challenging at the best of times, often seen as “work of art” when analysing big data. This talk will look at the big data phenomenon from an evolutionary process viewpoint and examine some of the key issues affecting the transition of data analytics from informal mining activities to solid scientific practices. Intelligent data analysis is an interdisciplinary study concerned with the effective analysis of data, which can help facilitate this transition.
Professor Xiaohui Liu is Director of the Centre for Intelligent Data Analysis at Brunel University. He has advised funding agencies on interdisciplinary research programs in genomics, security and data analytics as well as the Royal Statistical Society on the training of next generation data scientists in light of big data challenges. Professor Liu has published more than 100 papers in leading journals, and his H-index is over 40 on Google Scholar, Scopus, and the Web of Science.
3. Graph based Analytics
Wei Wang, professor in the Department of Computer Science at University of California at Los Angeles and the director of the Scalable Analytics Institute (ScAI)
Graphs are ubiquitous in real-life applications. A large volume of graph data have been generated, such as social networks, biology interaction networks, program flows, and molecular structures. There is a strong need for efficiently subgraph clustering and classification. I will present our recent work in this area, which includes an efficient algorithm for discriminative subgraph mining for graph classification, and a multi-domain graph clustering model. We have demonstrated that these models and methods can achieve better clustering quality and higher classification accuracy on various real and synthetic datasets. I will also outline some ongoing efforts at the newly established UCLA Scalable Analytics Institute.
Wei Wang is a professor in the Department of Computer Science at University of California at Los Angeles (UCLA) and the inaugural director of the Scalable Analytics Institute. She received her PhD degree in Computer Science from UCLA in 1999. She was a professor in Computer Science at the University of North Carolina at Chapel Hill from 2002 to 2012, and was a research staff member at the IBM T. J. Watson Research Center between 1999 and 2002. Dr. Wang's research interests include big data, data mining, bioinformatics and computational biology, and databases. Dr. Wang received the IBM Invention Achievement Awards in 2000 and 2001. She was the recipient of an NSF Faculty Early Career Development (CAREER) Award and named a Microsoft Research New Faculty Fellow in 2005. She was honored with the 2007 Phillip and Ruth Hettleman Prize for Artistic and Scholarly Achievement at UNC. She received the 2012 IEEE ICDM Outstanding Service Award and an Okawa Research Award in 2013. Dr. Wang has been an associate editor of the IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery in Data, Journal of Knowledge and Information Systems, International Journal of Knowledge Discovery in Bioinformatics, and an editorial board member of the International Journal of Data Mining and Bioinformatics and the Open Artificial Intelligence Journal. She serves on the organization and program committees of international conferences including ACM SIGMOD, ACM SIGKDD, ACM BCB, VLDB, ICDE, EDBT, ACM CIKM, IEEE ICDM, SIAM DM, SSDBM, BIBM.
4. Nonconvex Regularized Optimization for Sparse Approximations
Xiaojun Chen, Department of Applied Mathematics, The Hong Kong Polytechnic University, China
Minimization problems with nonsmooth, nonconvex, perhaps even non-Lipschitz regularization terms have wide applications in image restoration, signal reconstruction and variable selection. On non-Lipschitz regularized minimization, we show that finding a global optimal solution is strongly NP-hard. On the other hand, we present lower bounds of nonzero entries in every local optimal solution without assumptions on the data matrix. Such lower bounds can be used to classify zero and nonzero entries in local optimal solutions and select regularization parameters for desirable sparsity of local optimal solutions. Moreover, we introduce several efficient algorithms including reweighted minimization algorithms, smoothing quadratic regularization algorithms, smoothing trust region Newton methods and interior point algorithms. Examples with six widely used nonsmooth nonconvex regularization terms are presented to illustrate the theory and algorithms.
Professor Xiaojun Chen received her PhD degree in Computational Mathematics from Xi'an Jiaotong University, China in 1987 and PhD degree in Applied Mathematics from Okayama University of Science, Japan in 1991. She was a postdoctoral fellow at the University of Delaware, an Australia Research Fellow in the University of New South Wales and a Professor in Hirosaki University, Japan. She joined the Hong Kong Polytechnic University as a Professor in 2007, and became to a Char Professor of Applied Mathematics and Head of Department of Applied Mathematics in 2013. Her current research interests include nonsmooth nonconvex optimization in high dimension, stochastic and dynamic equilibrium problems with important applications in engineering and economics. She has published over 100 papers in major international journals in operations research and computational mathematics. Prof. Chen has won many grants as a principal investigator from several government funding agencies and organized several important international conferences. She serves in the editorial boards of eight mathematical journals including SIAM Journal on Numerical Analysis.
5. Big Data Analytics and Modeling for Healthcare
Jiming Liu, Department of Computer Science, Hong Kong Baptist University, China
Healthcare around the world is in the midst of transformation. The global challenges it faces amongst many include: (1) effective surveillance and prevention of diseases, and (2) efficient and optimal utilization of services. In this talk, I am going to shed light on how big data analytics and modeling will play an instrumental role in solving complex healthcare problems. I will present case studies from our recent work on data-driven modeling and hidden-cause discovery in dealing with the above two healthcare challenges.
Jiming Liu is Chair Professor in Computer Science and Associate Dean of Faculty of Science (Research) at Hong Kong Baptist University. He is a Fellow of the IEEE. Prof. Liu received his M.Eng. and Ph.D. degrees from McGill University, Canada. His current research focuses on Data Mining and Data Analytics, Health Informatics, Computational Epidemiology, Complex Systems, Multi-Agent Computing, and Collective Intelligence. Prof. Liu has served as the Editor-in-Chief of Brain Informatics: Brain Data Computing and Health Studies (Springer) and Web Intelligence and Agent Systems (IOS), and an Associate Editor of IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Cybernetics, Big Data and Information Analytics (AIMS), Computational Intelligence (Wiley), and Neuroscience and Biomedical Engineering (Bentham), among others.
6. Ranking Fraud Detection for Mobile Apps: A Holistic View
Hui Xiong, Rutgers, the State University of New Jersey, USA
Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their App’s sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. In this talk, we provide a holistic view of ranking fraud and introduce a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps’ ranking and rating behaviors through statistical hypotheses tests. In addition, we propose an optimization based aggregation method to integrate all the evidences for fraud detection. Finally, we evaluate the proposed system with real-world App data collected from the iOS App Store for a long time period. In the experiments, we validate the effectiveness of the proposed system, and show the scalability of the detection algorithm as well as some patterns of ranking fraud activities.
Dr. Hui Xiong is currently an Associate Professor and the Vice Chair of the Management Science and Information Systems Department, and the Director of Rutgers Center for Information Assurance at Rutgers, the State University of New Jersey, where he received a two-year early promotion/tenure (2009), the Rutgers University Board of Trustees Research Fellowship for Scholarly Excellence (2009), and the ICDM-2011 Best Research Paper Award (2011). Dr. Xiong received his Ph.D. in Computer Science from the University of Minnesota (UMN), USA, in 2005, the B.E. degree in Automation from the University of Science and Technology of China (USTC), China, and the M.S. degree in Computer Science from the National University of Singapore (NUS), Singapore. His general area of research is data and knowledge engineering, with a focus on developing effective and efficient data analysis techniques for emerging data intensive applications. He has published prolifically in refereed journals and conference proceedings (3 books, 50+ journal papers, and 60+ conference papers). He is the co-Editor-in-Chief of Encyclopedia of GIS by Springer, and an Associate Editor of IEEE Transactions on Knowledge and Data Engineering (TKDE) as well as the Knowledge and Information Systems (KAIS) journal. He has served regularly on the organization and program committees of numerous conferences, including as a Program Co-Chair of the Industrial and Government Track for the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), a Program Co-Chair for the IEEE 2013 International Conference on Data Mining (ICDM). He is a senior member of the ACM and the IEEE.
7. Training Data Scientists
Yangyong Zhu. Shanghai Key Lab of Data Science, School of Computer Science, Fudan University.
Nowadays, almost all disciplines generate huge amount of data for their applications such as business analytics, healthcare, hazard monitoring, etc. Even worse, those data come from interdisciplinary sources with fast velocity such that few existing theories or methods can be utilized to analyze and process the big data. The data bring big challenges as well as big opportunities for research and applications in the real world and have swept into every corner of industry and business, and even into governmental managements and research areas. Therefore, it is highly urgent to boost research and education in the new theories and methods which are of the core of the emerging discipline, i.e., Data Science. Alongside the new discipline is the new job, so-called data scientist, “the sexiest job” of the 21st century. Data scientist teams have been built up by almost all of data gigants to strength their competitions in the market for data analytics. However, it is lack of talent necessary for industries, government, research institutes and universities to take advantage of big data. In the talk, we recommend steps to train the data scientists to meet the challenges and bridge the talent gap.
Yangyong Zhu is a Professor of Computer Science in Fudan University, Shanghai, China. He received a Ph.D. degree in Computer and Software Theory from Fudan University, China, in 1994. His research interests include database, data mining and their applications in finance, economics, insurance, bioinformatics and sociology. His research has been supported in part by the National High-Tech Research and Development Plan (863) of China, the National Natural Science Foundation of China (NSFC), the Development Fund of Shanghai Science and Technology Commission, and Shanghai Leading Academic Discipline Project under Grant. He is a Doctoral Supervisor in School of Computer Science, Fudan University, the director of Dataology and DataScience Research Center, and a member of ACM Computer Society and Chinese computer Federation.
8. Ranking and Total Ordering for Fuzzy Data
Zhenyuan Wang, a tenured full professor in the Department of Mathematics, University of Nebraska at Omaha, USA.
Decision making and data mining in fuzzy environment need to deal with fuzzy quantities. The most popular fuzzy quantities are fuzzy numbers. The concept of fuzzy number is a generalization of real numbers. Ranking or total ordering as a necessary means to compare the magnitude of fuzzy numbers is essential in fuzzy data mining. Up to now, there are more than 30 different ranking methods presented in literature. Though choosing one from those various ranking methods depends on the given real problem, some criteria are still needed to measure the reasonability and their goodness. Of course, total orderings are preferred, by which various rankings can be generated. In fact, we have recently developed a method to define total orderings on the set of all fuzzy numbers based on a new Decomposition Theorem. This new decomposition theorem can also lead to an important conclusion on the cardinality of the set of all fuzzy numbers.
Professor Zhenyuan Wang graduated from Fudan University in 1962. He received his Ph.D. from the Department of Systems Science, State University of New York at Binghamton in 1991. He taught various mathematical courses in Hebei University for many years since 1962, supervised graduate students since 1978, and served as the Chair of the Mathematics Department there from 1985 to 1990. He was a visiting scholar, visiting professor, or research fellow in University Paris VI, Binghamton University (SUNY), the Chinese University of Hong Kong, New Mexico State University, and University of Texas at El Paso during the period from 1979 to 2008. Currently, he is a tenured full professor in the Department of Mathematics, University of Nebraska at Omaha, USA. He received a number of honors and awards including the title of National Expert from the Chinese National Scientific and Technological Commission in 1986 and the Citation Classic Award from the Institute for Scientific Information (USA) in 2000. His research interests are nonadditive measures, nonlinear integrals, probability and statistics, optimization, soft computing, and data mining. He is the author or a co-author of more than 150 research papers and three monographs: “Fuzzy Measure Theory” (1992), “Generalized Measure Theory” (2008), and “Nonlinear Integrals and Their Applications in Data Mining” (2010). Currently, he are serving as an associate editor/ member of editorial board for four international journals.
9. Big Data Analytics and Data Science
Yong Shi, the Executive Deputy Director, Chinese Academy of Sciences Research Center on Fictitious Economy & Data Science.
At present, Big Data becomes reality that no one can ignore. Big Data is our environment whenever we need to make a decision. Big Data is a buzz word that makes everyone understands how important it is. Big Data shows a big opportunity for academia, industry and government. Big Data then is a big challenge for all parties. This talk will discuss some fundamental issues of Big Data problems, such as data heterogeneity vs. decision heterogeneity, data stream research and data-driven decision management. Furthermore, this talk will provide a number of real-life Big Data Applications. In the conclusion, the talk suggests a number of open research problems in Data Science, which is a growing field beyond Big Data.
10. Brain Informatics: Brain Big Data Computing and Health Studies
Ning Zhong, Department of Life Science and Informatics, Maebashi Institute of Technology, Japan, and The International WIC Institute, Beijing University of Technology, China
Brain Informatics (BI) is a new interdisciplinary and multidisciplinary field that focuses on studying the mechanisms underlying the human information processing system. It brings together researchers and practitioners from diverse fields to explore the main research problems that lie in the interplay between the studies of human brain and the research of informatics, by using powerful equipment, including functional magnetic resonance imaging (fMRI), electroencephalogram (EEG), positron emission tomography (PET), and eye-tracking as well as various wearable, portable, micro and nano devices. The systematic BI methodology has resulted in brain big data, including various raw brain data, data-related information, extracted data features, found domain knowledge related to human intelligence, and so forth. In this talk, I demonstrate a systematic approach to an integrated understanding of macroscopic and microscopic level working principles of the brain by means of experimental, computational, and cognitive neuroscience studies, as well as utilizing advanced Web intelligence centric information technologies. I discuss research issues and challenges with respect to brain data computing from three aspects of Brain Informatics studies that deserve closer attention: systematic investigations for complex brain science problems, new information technologies for supporting systematic brain science studies, and Brain Informatics studies based on Web intelligence research needs. These three aspects offer different ways to study traditional cognitive science, neuroscience, mental health and artificial intelligence.
Ning Zhong received the Ph.D. degree from the University of Tokyo. He is currently head of Knowledge Information Systems Laboratory, and a professor in Department of Life Science and Informatics at Maebashi Institute of Technology, Japan. He is also director and an adjunct professor in the International WIC Institute (WICI), Beijing University of Technology. Prof. Zhong's present research interests include Web Intelligence (WI), Brain Informatics (BI), Data Mining, Granular Computing, and Intelligent Information Systems. In 2000 and 2004, Zhong and colleagues introduced WI and BI as new research directions, respectively. Currently, he is focusing on "WI meets BI" research and brain data computing with three aspects: (1) systematic investigations for complex brain science problems; (2) new information technologies for supporting systematic brain science studies; and (3) BI studies based on WI research needs. The synergy between WI and BI advances our ways of analyzing and understanding of data, information, knowledge, wisdom, as well as their interrelationships, organizations, and creation processes, to achieve human-level Web intelligence reality. In 2010, Zhong and colleagues extended such a vision to develop Wisdom Web of Things (W2T) as a holistic framework for computing and intelligence in the big data era. Such interdisciplinary studies make up the field of brain informatics and its applications in brain big data computing, health studies, ICT for smart-city, brain-inspired intelligent systems among others.
11. Hashing for Big Data Mining
Xingquan Zhu, Associate professor in the Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University.
Big data applications are commonly featured with large scale unstructured data with complex relationships and dynamically increasing volumes, where finding similarities between unstructured data, and building predictive models for data with complex relationships and dynamic volumes are two fundamental challenges. In big data environments, while data volumes, feature dimensions, and data relationships are continuously evolving, hashing provides an effective way to convert unbounded data input into constrained data representation, though which we can build effective data mining models. In this talk, we will propose to use hashing to tackle two fundamental challenges in big data mining: (1) how to characterize the similarity between large scale text documents, by taking semantic context information into consideration; and (2) how to build effective classification models for dynamic changing networks.
In the first part of the talk, we will propose a context-preserving hashing to calculate similarities between texts with preserved context information. While many research have been proposed to use min-wise hashing, random project, and feature hashing etc. to calculate text similarities, all these methods use “flat-set” data representation so they cannot preserve content information and semantic hierarchy in the text. To take into account semantic hierarchy in the texts, we consider a notation of “multi-level exchangeability” which can be applied at word-level, sentence-level, paragraph-level etc. A multi-level exchangeable object is represented by a nested-set, with a recursive min-wise hashing being used to calculate the similarities between texts in a very effective and efficient way.
In the second part of the talk, we will use graph hashing and factorization to build graph classification models for large scale dynamic networks with evolving topological structures. Because network structures are continuous changing with new nodes/edges being included into (or excluded from) the network, we propose to use node hashing to convert a dynamic network to a compressed network, and then use cliques as sub-graph features to represent graphs. A graph factorization approach is also proposed to ensure that the selected sub-graph features can ensure minimum information loss during graph representation. As a result, our model can effectively adaptive to the dynamic changes in the networks for classification.
Xingquan Zhu is an associate professor in the Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University. His research interests mainly include data mining, machine learning, and multimedia systems. Since 2000, he has published more than 170 refereed journal and conference papers in these areas, including two Best Paper Awards and one Best Student Paper Award. Dr. Zhu was an associate editor of the IEEE Transactions on Knowledge and Data Engineering (2008-2012), and is currently serving on the Editor Board of International Journal of Social Network Analysis and Mining SNAM (2010-date) and Network Modeling Analysis in Health Informatics and Bioinformatics Journal (2014-date). He served or is serving as a program committee co-chair for the 14th IEEE International Conference on Bioinformatics and BioEngineering (BIBE-2014), IEEE International Conference on Granular Computing (GRC-2013), 23rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2011), and the 9th International Conference on Machine Learning and Applications (ICMLA-2010). He also served as a conference co-chair for ICMLA-2012.
12. Medical Big Data: Medical Data Mining and Innovative Applications
Yanchun Zhang, Director, Centre for Applied Informatics, Victoria University, Australia
Due to the recent development or maturation of database, data storage, data capturing, patient monitoring and sensor technologies, huge medical and health data have been generated at hospitals and medical organizations at unprecedented speed. Those data are a very valuable resource for improving health delivery, health care and decision making and better risk analysis and diagnosis. Health care and medical service is now becoming more data-intensive and evidence-based since electronic health records are used to track individuals' and communities' health information (particularly changes). These substantially motivate and advance the emergence and the progress of data-centric health data and knowledge management research and practice. In this talk, we will introduce several innovative data mining techniques and case studies to address the challenges encountered in e-health and medical big data. This includes techniques and development on data streams, data clustering, correlation analysis, pattern recognition, abnormally detection and risk predictions.
Yanchun Zhang is a full Professor and Director of Centre for Applied Informatics at Victoria University. Dr Zhang obtained a PhD degree in Computer Science from The University of Queensland in 1991. His research interests include databases, data mining, web services and e-health. He has published over 220 research papers in international journals and conference proceedings including top journals such as ACM Transactions on Computer and Human Interaction (TOCHI), IEEE Transactions on Knowledge and Data Engineering (TKDE), and a dozen of books and journal special issues in the related areas. Dr. Zhang is a founding editor and editor-in-chief of World Wide Web Journal (Springer) and Health InformationScience and Systems Journal (BioMed Central), and also the founding editor of Web Information Systems Engineering Book Series and Health Information Science Book Series. He is Chairman of International Web information Systems Engineering Society (WISE). He was a member of Australian Research Council's College of Experts (2008-2010), and is one of the National "Thousand Talents Program" Experts in China (2010-- ).
13. Ensemble Learning for Cross-Selling Using Multitype Multiway Data
Minghe Sun, the University of Texas at San Antonio, USA
Cross-selling is an integral component of customer relationship management. Using relevant information to improve customer response rate is a challenging task in cross-selling. Incorporating multitype multiway customer behavioral, including related product, similar customer and historical promotion, data into cross-selling models is helpful in improving the classification performance. Customer behavioral data can be represented by multiple high-order tensors. Most existing supervised tensor learning methods cannot directly deal with heterogeneous and sparse multiway data in cross-selling. In this study, two novel ensemble learning methods, multiple kernel support tensor machine (MK-STM) and multiple support vector machine ensemble (M-SVM-E), are proposed for cross-selling using multitype multiway data. The MK-STM and the M-SVM-E can also perform feature selections from large sparse multitype multiway data. Based on these two methods, collaborative and non-collaborative ensemble learning frameworks are developed. In these frameworks, many existing classification and ensemble methods can be combined for classification using multitype multiway data. Computational experiments are conducted on two databases extracted from open access databases. The experimental results show that the MK-STM exhibits the best performance and has better performance than existing supervised tensor learning methods.
Minghe Sun is a professor at the University of Texas at San Antonio. He received his Ph.D. in Management Science and Information Technology from the University of Georgia in 1992, his MBA from the Chinese University of Hong Kong in 1987 and his BS degree in metallurgy from Northeastern University in China in 1982. His work has appeared in almost all major journals in Management Science/Operations Research/Decision Sciences. He received the prestigious Decision Sciences Institute Elwood S. Buffa Doctoral Dissertation Competition Award in 1993 and the Decision Sciences Institute Best Theoretical/Empirical Research Paper Award twice, once in 2003 and once in 2006. He also received the University of Texas System Chancellor’s Council Outstanding Teaching Award in 1999.
14. Big data research in business application: Some thoughts
Wei Huang, Professor of Xi’an Jiaotong University. Ohio University, USA
Will share some thoughts on latest research in business application of big data, including big data industry, CDO and data quality.
Dr. W. Huang is a tenured full professor of MIS department in Ohio University. He has worked as a faculty and/or visiting fellow/professor in global research universities such as University of New South Wales (Australia), Tsinghua University, Harvard University, and Xi’an Jiaotong University. His main research interests include using emerging IS to support organizational/business decision making, social media, Big Data/information quality, etc. He has been awardedresearch grants from research universities in Australia, Hong Kong, China, Singapore and USA (with other research collaborators) in the past twenty years. Wayne has had more than 25 years of full-time teaching/research experience in universities as well as years of IT industrial working experience. He has published more than 120 refereed research papers in international journals, book chapters, and conference proceedings, including leading international IS journals such as MIS Quarterly; IEEE Transactions; Journal of Management Information Systems (JMIS); Communications of ACM (CACM); ACM Transaction on Information Technology (ACM TOIT).. His research work and published papers have been cited by the top-tier international IS journals, including MIS Quarterly (MISQ), Journal of MIS (JMIS), Information Systems Research (ISR), and IEEE transactions.
15. Heterogeneous Information Modeling towards Multimedia Knowledge Discovery
Qingming Huang, Professor with the University of Chinese Academy of Sciences (CAS), China, and an Adjunct Research Professor with the Institute of Computing Technology, CAS.
With the prevalence of online multimedia service and social media services, the amount of heterogeneous online media ( e.g., text, images, videos and audios from different online platforms and social networks) is growing at an explosive rate. Since the heterogeneous media data are endowed with rich structure context and social attributes, methodologies built on single media are not able to deal with wild open content in heterogeneous media and satisfy the diversified requirements of online users. To effectively and efficiently discover knowledge from large online multimedia data corpus, we introduce recent technical advances in heterogeneous media information modeling. First, we discuss several machine learning based approaches which are able to detect meaningful and semantic consistent information (e.g., semantic labels and dense clusters) from noisy high dimensional multimedia data. Second, we give a brief technical insight into research on multi-modal topic detection, which is a challenging issue in Web multimedia data mining. We discuss several cross-modal and cross-domain correlation learning models which construct compact representation on heterogeneous media of different modalities and different platforms. Last, we present an overview of the heterogeneous information modeling and discuss some potential research directions in future study in multimedia research domain.
Qingming Huang (SM’08) received the B.S. degree in computer science and Ph.D. degree in computer engineering from Harbin Institute of Technology, Harbin, China, in 1988 and 1994, respectively. He is currently a Professor with the University of Chinese Academy of Sciences (CAS), China, and an Adjunct Research Professor with the Institute of Computing Technology, CAS. His research areas include multimedia computing, image processing, computer vision, pattern recognition and machine learning. He has published more than 200 academic papers in prestigious international journals including IEEE Trans. on Multimedia, IEEE Trans. on CSVT, IEEE Trans. on Image Processing, etc, and top-level conferences such as ACM Multimedia, ICCV, CVPR and ECCV. He is the associate editor of Acta Automatica Sinica, and the reviewer of various international journals including IEEE Trans. on Multimedia, IEEE Trans. on CSVT, IEEE Trans. on Image Processing, etc. He has served as program chair, track chair and TPC member for various conferences, including ACM Multimedia, CVPR, ICCV, ICME, PSIVT, etc.
16. Storage and Automatic Recognition Techniques for Big Data of Fingerprint
Tiande Guo, Professor, School of Mathematical Sciences, University of Chinese Academy of Sciences
The aims and objectives of this present are to: Introduce storage and automatic recognition techniques for big data of fingerprint. Introductory material is provided on all components/modules of a fingerprint recognition system. Present in detail recent advances in fingerprint recognition, including fingerprint compression, feature extraction, matching and classification techniques for big data of fingerprint. For storage of fingerprint image, a new fingerprint compression algorithm based on sparse representation is introduced. Obtaining an over complete dictionary from a set of fingerprint patches allows us to represent them as a sparse linear combination of dictionary atoms. In the algorithm, we first construct a dictionary for predefined fingerprint image patches. For a new given fingerprint images, represent its patches according to the dictionary by computing l0-minimization and then quantize and encode the representation. The experiments demonstrate that our algorithm is efficient compared with several competing compression techniques (JPEG, JPEG 2000, and WSQ), especially at high compression ratios. The experiments also illustrate that the proposed algorithm is robust to extract minutiae.
Prof. Tiande Guo achieved the master degree and doctor degree from Chinese Academy of Sciences in 1992 and 1998 respectively, majoring in Operation Research. He is currently a professor and executive director of School of Mathematical Sciences of GUCAS. Prof. Guo is the vice director of Optimization and Application Research Center of Academy of Mathematics and Systems Sciences of CAS, director of Mathematical Society and Academic Committee of the Operations Research Society of China. He is a member of executive committee of Chinese Mathematical Society and a member of executive committee of Operations Research Society of China. He also serves as a member of editorial board for several journals such as “Acta Mathematicae Applicatae Sinica”, “Journal of Systems Science and Mathematical Sciences”, “Journal of the Graduate School of the Chinese Academy of Sciences”. He has been devoted to the fingerprint recognition algorithm and the automated fingerprint identification system research for many years and participated in a number of fingerprint identification and fingerprint compression standard-setting work of the Ministry of Public Security of China.
17. Ten rules of data analyst
Criteria data and the test data; segmented data and sampling data; upper and lower data volume; behavioral data and results data; micro data and macro data; higher-order data and low-class data; high-dimensional data and variable-dimensional data; dimension data and time series data; frequency data and low-frequency data.
18. Spatial estimation of industrial water pollution emission at prefectural level
Minjun Shi, Professor, University of Chinese Academy of Sciences (UCAS), Deputy-director of the Center on Fictitious Economy and Data Science (FEDS), Chinese Academy of Sciences, Director of the National Center on Regional Development of UCAS
There is vacancy in observations of industrial water pollution emission at prefectural level in China. Existing data from Ministry of Environmental Protection and provincial environmental protection bureaus can only reveal the nationwide pollution situation, and provincial and 109 key prefecture-level cities’ pollution emission without particular industrial sector emission. This paper aims to estimate China’s yearly COD discharge and Ammonia Nitrogen discharge from industrial waste water at provincial and prefectural level with major industrial sector emission based on data mining approach.
The objective is how to apportion the nationwide pollution emissions of each industrial sector among regions. The technique developed in this paper began with an apportionment by output structure. Considering that pollution emission intensity differs between regions, this paper selected plant scale, the start time of production and region’s treatment level as three most important factors which play huge roles on the pollution emission, and built relevant coefficients to correct the previous estimations. At last, RAS was applied to make the sum of sectoral emission equal to the sum of sub-regional emission.
Insight about the spatial estimation accuracy (uncertainty) is gained by means of theoretical considerations and numerical validations involving real data. According to the test set, which is consist of 5 provincial pollution emission in 7 sectors and 109 prefectural total pollution emission, the average relative error is less than 15% at provincial level by sector, and less than 30% at prefectural level by sector.
Further pollution spatial analysis can be followed after the spatial estimation of industrial water pollution emission.
19. Machine Learning Algorithm Based on Hypersurfaces
Qing He, Professor at the Institute of Computing Technology, Chinese Academy of Sciences(CAS), and he is a Professor of University of Chinese Academy of Sciences (UCAS). He is also the Vice Secretary of Chinese Association for Artificial Intelligence
The talk gives a series of machine learning algorithms based on hypersurface proposed by Qing He, such as classification based on hypersurfaces, the relation between minimal sample set and the classification accuracy, parallel classification algorithms based on hypersurfaces, big data distribution discovery algorithm based on hypersurfaces, outliers detection method based on hypersurfaces, clustering algorithm based on hypersurfaces. Besides the above algorithms, more importantly, a minimal sample set play a decisive impacting on the classification accuracy and play an important role on sampling in the uncertainty distribution big data.
Qing He, the Member of China Computer Federation Artificial Intelligence and Pattern Recognition Committee, the Member of Chinese Insititue of Electronics and Clouding Computing and Big Data Experts Committee. He received the B.S. degree from Hebei Normal University, Shijiazhuang, P. R. C., in 1985, and the M.S. degree from Zhengzhou University, Zhengzhou, P. R. C., in 1987, both in Mathematics. He received the Ph.D. degree in 2000 from Beijing Normal University in Fuzzy Mathematics and Artificial and Intelligence, Beijing, P. R. C. Since 1987 to 1997, he had been teaching at Hebei University of Science and Technology. He is currently a doctoral supervisor at the Institute of Computing and Technology, CAS. His interests include data mining, machine learning, classification, fuzzy clustering,cloud computing,big data. A series of achievements have been gained in fuzzy information processing, fuzzy clustering, knowledge representation, text information processing, and in big data mining based on cluoud computing. More than 100 papers have been published in journals, 30 of which are SCI Indexed, 66 of which are EI Indexed. Multi-strategy data mining platform MSMiner, Web Intelligent Information Processing software GHunt, Hypersurface Classifier HSC, have been organized and developed. Recently, in cloud computing and big data mining applications, Qing He led his machine learning and data mining team (http://mldm.ict.ac.cn/Home.html), commissioned by the China Mobile Research Institute, developed cloud-based parallel data mining platform by the end of 2008 for mining TB level actual data and achieving high-performance, low-cost data mining. Through this innovation, the country received a proprietary cloud-based data mining techniques. Assembly invited he made a technical report in the second, third, sixth China Cloud Computing Conference. He has presided and completed a number of relevant data mining project supported by the National Natural Science Foundation and 863 projects. Moreover, the projects were rated excellent. He proposed a series of effective data mining algorithms and multiple parallel machine learning algorithms. He organized his team developing forty parallel machine learning algorithms. Multiple big data mining software such as PDMiner,COMS,CWMS and WMCS developed by his team have gotten the software copyright and practical applying to telecommunications, electricity, information security, environmental protection, the financial insurance, and dozens of companies, enterprises with the considerable economic and social benefits.
20. Machine Learning Algorithm Based on Hypersurfaces
Shicong Feng, the VP of Technology in Miaozhen Systems.
Opened by a brief introduction to Miaozhen Systems, this talk will mainly present several representative applications in digital marketing based on big data technologies, such as audience targeting (demographic targeting and social targeting), product recommendation, fraud detection and unique visitor identification cross screens. This talk will also share Miaozhen company's University Relationship building and two government sponsored projects.
Shicong Feng received his Ph.D degree in computer science from Peking University in 2003. Shicong was one of the key developers of "Tianwang" search engines, the first and the best search engine in China. Shicong worked as a senior software engineer in Bell labs China from 2003 to 2006, and as a senior researcher in HP Labs China from 2006 to 2011. Now, Shicong is serving as the VP of Technology in Miaozhen Systems, while his responsibilities focus on research and development of novel big data applications.
21. Compatible Targeting Performance Measures for Mixed-type Data
Wenxue Huang, Professor of Guangzhou University.
Assume we work with a multi target project with a structured big data set and each entry is equally treated. A big data set often contains both numerical and categorical variables. A large data modeling project may have variables with different data types to be targeted. An overall targeting performance may be needed for machine learning or for overall project evaluation. This then calls for compatible targeting performance measures. Relevant issues, concepts and specific measures are discussed.
Dr. Wenxue Huang is a Full Professor in the School of Mathematics and Information Sciences at Guangzhou University, and an adjunct professor at both York University, Canada (2003--), and Nanchang University (2005--). He received his Ph.D. in Mathematics at the University of Western Ontario in 1995. Before joining Guangzhou University, he had served as a full professor at Shantou University for 2 years and a half. He has over 11 years of data mining research and development experience in industry and served as Chief Scientist at Generation 5 Math Tech Inc for 8 years. He is the author or co-author of 20 plus peer-reviewed publications in the areas of algebraic groups and monoids, associative algebras, data mining, differential topology, and statistical theory and methods.
He has delivered more than 30 invited talks on algebra and data mining at international academic conferences and at Canadian and Chinese research institutes and universities. Dr. Huang writes reviews for Mathematical Reviews and Zentralblatt fur Mathematika, and sits on the editorial boards of International J. of Data Mining, Modelling and Management, and of International J.~ of Data Analysis Techniques and Strategies. Dr. Huang chairs an NSFC general project 2012-2015.
22. A Statistical Perspective on Algorithmic Leveraging
Ping Ma, associate professor of the Department of Statistics in University of Georgia.
One popular method for dealing with large-scale data sets is sampling. Using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales rows/columns of data matrices to reduce the data size before performing computations on the subproblem. Existing work has focused on algorithmic issues, but none of it addresses statistical aspects of this method. Here, we provide an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model. In particular, for several versions of leverage-based sampling, we derive results for the bias and variance, both conditional and unconditional on the observed data. We show that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other. This result is particularly striking, given the well-known result that, from the algorithmic perspective of worst-case analysis, leverage-based sampling provides uniformly superior worst-case algorithmic results, when compared with uniform sampling. Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller least-squares problem with “shrinked” leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) least-squares problem (LEVUNW). The empirical results indicate that our theory is a good predictor of practical performance of existing and new leverage-based algorithms and that the new algorithms achieve improved performance.
Ping Ma is an associate professor of the Department of Statistics in University of Georgia. He was Beckman Fellow of the Center for Advanced Study in University of Illinois at Urbana-Champaign, Faculty Fellow at National Center for Supercomputing and Applications, and a recipient of the National Science Foundation CAREER Award. His paper won the best paper award of Canadian Journal of Statistics in 2011. He serves on multiple editorial boards including Journal of the American Statistical Association, Journal of Statistical Planning and Inference, and Statistical Applications in Genetics and Molecular Biology.