glam/docs/oclc/extracted_kg_fundamentals/OEBPS/xhtml/preface.xhtml

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en">
<head>
<title>Knowledge Graphs</title>
<meta content="text/html; charset=utf-8" http-equiv="default-style"/>
<link href="../styles/stylesheet.css" rel="stylesheet" type="text/css"/>
<meta content="urn:uuid:531250ec-8629-4bbe-b4be-eb1eb0e84538" name="Adept.expected.resource"/>
</head>
<body epub:type="frontmatter">
<div class="body">
<p class="SP"> </p>
<section aria-label="Preface" role="doc-preface">
<header>
<h1 class="FMH"><span aria-label="xxv" id="pg_xxv" role="doc-pagebreak"/><b>Preface</b></h1>
</header>
<p class="noindent">Graphs have had a tremendous influence on both mathematics and artificial intelligence (AI), as evidenced by the large body of work on graph theory, as well as by the pervasiveness of graph models in subcommunities such as planning and knowledge representation. Knowledge graphs (KGs), a term that has become popular only in the last decade due to industrial-scale projects such as Google’s Knowledge Graph and Amazon’s Product Graph, may not seem particularly novel in this broader context. Certainly, <i>knowledge bases</i> (KBs), the predecessor to KGs in the Natural Language Processing (NLP) and other AI communities, have been around for a very long time, dating back several decades. Automatic construction and population of KBs, a prominent topic in this book, witnessed an explosion of research in the 1990s. Other techniques, such as instance matching (IM) and statistical relational learning (SRL), the focus of part IV of this book, have had an equally storied history. The first papers on what we designate as IM today were published more than a half-century ago, in the context of linking patient records.</p>
<p>The question might then arise: why is <i>now</i> the time to publish a textbook on KGs? Our answer to this (necessarily one of opinion rather than fact) is that the last decade has seen an astounding confluence of technologies and circumstances that have made KGs a relevant and popular branch of AI. Activities in both research and industry bear this observation out. Major conferences and journals regularly feature workshops, sessions, and special issues on KGs. The authors themselves have been involved heavily in many of these. Since the publication of encyclopedic KGs such as DBpedia, many KGs have been openly published on the web. Some already existed as KBs or general-purpose resources but are being repurposed or rebranded as KGs due to their relational properties. In fact, it is not uncommon to find KG “ecosystems,” some in high-growth mode and others relatively mature. We dedicate an entire part of this book to describing these communities. Without a doubt, the emergence of data-driven decision science and the popularity currently enjoyed by AI, machine learning, and neural networks in the media, industry, and academia alike have both contributed to the flourishing of such ecosystems.</p>
<p><span aria-label="xxvi" id="pg_xxvi" role="doc-pagebreak"/>At the current time, the COVID-19 pandemic has further revealed how far KGs have come. Within a short time (just a couple of months), practitioners and researchers have taken publicly released data (such as an academic corpus) and set up public-facing KGs using toolkits that were well-established in the community for building domain-specific KGs. This not only illustrates that KGs offer a value proposition that is increasingly seeing mainstream adoption, but also demonstrates the maturity of the tools and algorithms themselves. Only a few short years ago, it would have required enormous personnel and resources to set up a KG from raw, heterogeneous web data.</p>
<p>Here, we attempt to provide a relatively comprehensive treatment of KGs. We recognize, however, that not every important piece of work can be covered (or covered in full) without making the book unwieldy. Our approach has been to pay careful attention to the fundamentals and techniques that have withstood the test of time thus far. We start from the very beginning and do not assume that the reader has a strong background in machine learning, NLP, or Semantic Web (the main areas that are allied with research in KGs). Nevertheless, having a basic background in these areas would make the reading of the text easier and more insightful. Hence, we believe that the book would be particularly suitable for courses taught at the upper undergraduate and graduate levels. We also hope that it will provide value to researchers who have some background in other areas of AI but are looking for an exposure to KGs as a way of enriching their own work.</p>
<p>We have attempted to make the chapters as independent and cohesive as possible, such that even a beginning researcher in the area could use individual chapters as short surveys or background on the subject matter. However, the book is best read in sequence, and in particular, chapters 1 and 2 are important for laying the foundation for the rest of the book. We also recommend reviewing chapter 2 prior to beginning part IV as many of the models in that chapter will be reconsidered and supplemented in chapters 11 and 12.</p>
<p>In teaching a course on the subject, we recommend a strong focus on the first two chapters, and depending on the course, selective focus on a mixture of the chapters in parts II, III, and IV. For example, if the instructor is particularly oriented toward the Semantic Web, less time may be allocated to the material in part II, and it may be more worthwhile to have multiple lectures on chapters 8, 12, and 14. A course more focused on text and natural language, on the other hand, would want to have multiple lectures on the chapters in part II (especially chapters 4 and 6), and chapters 11 and 13. A course focusing on advanced methods for NLP students may want to give significant attention to the chapters on nontraditional information extraction (IE) and KG embeddings (KGEs). Undergraduate classes may want to cover all the chapters in parts I–IV, not unlike introductory courses to machine learning and AI, where the goal is a broad exposure (with appropriate homework and assignments) rather than deep coverage. If the course is a seminar or at the graduate level, some of the key papers noted in the “Bibliographic Notes” section in each chapter should be assigned as required reading in addition to the chapter itself. Where possible, we <span aria-label="xxvii" id="pg_xxvii" role="doc-pagebreak"/>recommend one classic and one recent paper. For many chapters, this approach is doable and provides valuable insight both into the research as it is relevant today and its origins as a problem statement.</p>
<p>Every chapter always concludes with a “Software and Resources” section, where we do our best to provide links and pointers to resources that (so far as we know) have been robustly adopted in their respective communities. We hope that these sections will be useful to practitioners and companies (especially companies not traditionally known for speedy technological adoptions) that are looking to <i>do</i> more with KGs. As a caveat, the very nature of such resources (almost always web-based) and the current fast pace of KG research indicate that we can never guarantee the persistence or relevance of the resource beyond the time that this book comes out. In attempting to mitigate the effects of such transience, we have given more weight to packages that already have a proven record of standing the test of time.<sup><a href="preface.xhtml#fn1x0" id="fn1x0-bk">1</a></sup> However, we recognize that it is inevitable that some resources that should have been included in the relevant chapter got omitted, despite our best efforts. The same goes for the “Bibliographic Notes” sections, where we have attempted to provide referential citations for almost all the material that we directly cover in the main body of the text. Because much of the material we cover has had a long history and is often the work of tens, if not hundreds, of researchers, we also extensively cite surveys on these topics that have inspired our own interpretation and rendition of the subject. This is also the main reason why we chose to opt for a relatively comprehensive “Bibliographic Notes” section at the end of each chapter, rather than crowd the main text with many citations, as would be the norm in proper academic research papers.</p>
<p>Additionally, the book has a set of exercises at the end of every chapter. In total, we have provided almost 130 exercises. We have tried to be consistent with our own pedagogical styles by having exercises at various levels of abstraction and difficulty. Toward the end of the book, as we delve deeper into part V, the exercises steadily become more project-oriented and thought-based, as would befit the chapters (on KG ecosystems) there. In earlier chapters, the exercises are more conceptual, challenging students and other readers to demonstrate a degree of mastery that would be expected after becoming acquainted with that material. Even in those chapters though, we have attempted to provide some abstract questions to provoke deeper thought from the (particularly attentive) reader.</p>
<p class="TXR"><i>Mayank Kejriwal, Craig A. Knoblock, and Pedro Szekely</i><br/><i>April 2020</i></p>
<ol class="footnotes">
<li><p class="FN" role="doc-footnote"><sup><a href="preface.xhtml#fn1x0-bk" id="fn1x0">1</a></sup> The reader clicking on URL links directly in digital versions of the book should beware of “hanging” hyphens (i.e., at the ends of lines, when the URL runs long), because these may not be part of the true URL. If removing the offending hyphen or hyphens still does not yield a valid URL (or the desired resource), we recommend searching for that resource by its name using a search engine.</p></li>
</ol>
</section>
</div>
</body>
</html>