The award from the Association for Computing Machinery recognizes outstanding work in the areas of data science, machine learning and data mining.
Ren (MS ’16, Ph D ’18), who is now an assistant professor of computer science at USC, points out that many real-world applications rely on being able to quickly understand and analyze text data – from news, medical texts, and any number of other sources – and at volumes he says are “almost impossible for human to digest and curate.” But different types of data about different subjects are often expressed in ways specific to that subject, or in language unique to the individual author – in other words, they’re messy.
And systems they have developed have been adopted by a number of companies and institutions.
He and collaborators, for instance, have built a system that turns data from literally millions of medical papers into a searchable knowledge graph being used by Stanford and UCLA medical schools.
Note: Some theses and dissertations may be embargoed up to a year after graduation before public release; please see the Graduate School Information.
Copyright of the original thesis is retained by the author.The Mines Institutional Repository is a database designed to store, index, distribute, and preserve the scholarship of faculty, researchers, staff, and students of the School in digital form.It is part of Mountain Scholar: Digital Collections of Colorado & Wyoming and provides free, worldwide open access to scholarly and administrative works produced by or about the Colorado School of Mines.The flood of online information now being produced has provided a rich environment for data mining and discovery, but that near-endless flow of text is difficult to make sense of.As an Illinois Computer Science Ph D student, Xiang Ren devoted his work to finding better ways to sort and categorize that jumble of text, and his dissertation was recognized this month with the 2018 SIGKDD Doctoral Dissertation Award.The applications of this research are far reaching for the fields of data mining, information retrieval, profiling and demographic inference, online advertising and fraud detection.Perozzi, whose advisor was Stony Brook Professor Steven Skiena, defended his thesis in May 2016.His work involves graph embeddings — ways of representing the knowledge encoded in the structure of networks to make them accessible for machine learning models.Focused on developing scalable algorithms and models for attributed graphs, Perozzi presented an online learning algorithm utilizing recent advances in deep learning to result in rich graph embeddings.SIGKDD is the ACM’s Special Interest Group on Knowledge Discovery and Data Mining.SIGKDD selects one winner and two runner-ups each year to receive the award.