93 lines
No EOL
27 KiB
HTML
93 lines
No EOL
27 KiB
HTML
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en">
|
||
<head>
|
||
<title>Knowledge Graphs</title>
|
||
<meta content="text/html; charset=utf-8" http-equiv="default-style"/>
|
||
<link href="../styles/stylesheet.css" rel="stylesheet" type="text/css"/>
|
||
<meta content="urn:uuid:531250ec-8629-4bbe-b4be-eb1eb0e84538" name="Adept.expected.resource"/>
|
||
</head>
|
||
<body epub:type="frontmatter">
|
||
<div class="body">
|
||
<p class="SP"> </p>
|
||
<section epub:type="loi">
|
||
<header>
|
||
<h1 class="FMH"><span aria-label="xv" id="pg_xv" role="doc-pagebreak"/><b>List of Figures</b></h1>
|
||
</header>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-1">1.1</a></b> A version of the Königsberg Bridge Problem [as illustrated by Euler (1741) himself in his figure 1 from <i>Solutio problematis ad geometriam situs pertinentis</i>, Eneström 53]. On the right, the problem is modeled as a graph, with divisions as nodes, and bridges as edges. Figure obtained from MAA Euler Archive (copyright expired).</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-2">1.2</a></b> Representing knowledge as a graph: A simple KG fragment.</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-3">1.3</a></b> Results displayed by Google in response to queries such as “places to visit in Los Angeles,” which include not just a ranked list of webpages, but actual entities corresponding to user intent.</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-4">1.4</a></b> A knowledge panel describing the first movie in the <i>Lord of the Rings</i> series, in response to the keyword query “lord of the rings.”</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-5">1.5</a></b> An illustration of an academic KG, showing two publications with overlapping authors. While oval nodes represent resources or entities, rectangles are used to represent literals, such as strings and numbers.</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-6">1.6</a></b> An illustration of a <i>Product</i> KG, showing two different products.</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-7">1.7</a></b> An illustration of a social network, illustrated as a KG.</p>
|
||
<p class="LOF"> <b><a href="chapter_1.xhtml#fig1-8">1.8</a></b> A fragment of an <i>Event</i> KG expressing distinct geopolitical phenomena.</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-1">2.1</a></b> An example of a KG triple (:mayank_kejriwal, foaf:name, “Mayank Kejriwal”) represented in RDF. The prefix <i>foaf:</i> is shorthand for <a href="http://xmlns.com/foaf/0.1/">http://xmlns.com/foaf/0.1/</a> (i.e., the subject in the triple is actually “http://xmlns.com/foaf/0.1/name,” and similarly, the prefix <i>:</i> before mayank_kejriwal is meant to express that mayank_kejriwal lies in the <i>default</i> namespace). In this case, the “object” is a literal; by convention, literals are represented as rectangles, while URI nodes and blank nodes are elliptical.</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-2">2.2</a></b> A KG fragment in its conceptual graph representation (above), and N-triples representation (below). The <i>rdf</i> and <i>foaf</i> prefixes represent the namespaces indicated in the N-triples fragment. The authors’ URIs are for indicative purposes only. Prefixes are not permitted in the N-triples representation.</p>
|
||
<p class="LOF"> <span aria-label="xvi" id="pg_xvi" role="doc-pagebreak"/> <b><a href="chapter_2.xhtml#fig2-3">2.3</a></b> A KG fragment expressed in Turtle as a predicate list.</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-4">2.4</a></b> A triple as it would be represented in a property graph model. Note how both the nodes and the edges are key-value data structures rather than URIs or literals.</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-5">2.5</a></b> One approach to represent a set of triples or a triplestore as a property table. In a property-centric representation such as this, properties are elevated to the status of metadata as column headers (or even tables in themselves for multivalued properties) rather than values (the values in the cells of the second column in the original triplestore representation). In the derived tables, note that objects (or type) will always be the cell values, not including the first column, which is always the subject.</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-6">2.6</a></b> Info boxes for “The Joker” in the English and French Wikipedias, retrieved from <i>en.wikipedia.org/wiki/Joker_(character)</i> and <i>fr.wikipedia.org/wiki/Joker_(comics)</i>, respectively.</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-7">2.7</a></b> A KG fragment from Wikidata for “The Joker.”</p>
|
||
<p class="LOF"> <b><a href="chapter_2.xhtml#fig2-8">2.8</a></b> A simplified illustration of the Semantic Web Layer Cake.</p>
|
||
<p class="LOF"> <b><a href="chapter_3.xhtml#fig3-1">3.1</a></b> An illustration of the semantic information provided by the WordNet lexical resource for a common noun such as “politician.”</p>
|
||
<p class="LOF"> <b><a href="chapter_3.xhtml#fig3-2">3.2</a></b> A context graph (with two layers) of a target document.</p>
|
||
<p class="LOF"> <b><a href="chapter_3.xhtml#fig3-3">3.3</a></b> The interface of the NYU DDT, used for discovering relevant webpages over the web for an Ebola-related domain.</p>
|
||
<p class="LOF"> <b><a href="chapter_4.xhtml#fig4-1">4.1</a></b> Two contrasting versions and applications of IE for constructing KGs over raw sources. Web IE operates over webpages, and attempts to extract a KG with entities, relations, or even events, while NER extracts instances of concepts such as PERSON or LOCATION. Concepts come from an ontology that can be domain-specific. To supplement the instances, and interconnect them with relations, relation extraction has to be executed as another step.</p>
|
||
<p class="LOF"> <b><a href="chapter_4.xhtml#fig4-2">4.2</a></b> A simple concept ontology, named instances of which need to be extracted by an NER system. The ontology fragment is based on the real-world DBpedia ontology. RDFS and DBO, which stand for “Resource Description Framework Schema” and “DBpedia Ontology,” respectively, indicate that the “vocabulary” terms (e.g., dbo:writer) lie in these namespaces.</p>
|
||
<p class="LOF"> <b><a href="chapter_4.xhtml#fig4-3">4.3</a></b> A practical architecture for NER can depend heavily on other elements of the pipeline, such as preprocessing and tokenization, as well as the availability of external resources such as lexicons and pretrained word embeddings (described in subsequent sections of this chapter). A complete description of these modules is beyond the scope of this book, but it may be found in any standard text or survey on Natural Language Processing.</p>
|
||
<p class="LOF"> <span aria-label="xvii" id="pg_xvii" role="doc-pagebreak"/> <b><a href="chapter_4.xhtml#fig4-4">4.4</a></b> A CRF as applied to the task of NER. Unlike ordinary supervised classification, which would make an i.i.d. assumption and try to classify each term in the input sequence independently, CRFs (and other models like it) model dependencies to get better accuracy without necessarily becoming intractable. The dark nodes are output nodes that technically produce a probability over the full set of labels [which includes all concepts, and also a <i>Not Applicable</i> (NA)-type concept indicating that no named entity is present]. There are standard mechanisms for handling multiword extractions like “Bay Area.”</p>
|
||
<p class="LOF"> <b><a href="chapter_4.xhtml#fig4-5">4.5</a></b> Representation learning (“embedding”) over words, given a sufficiently large corpus of documents (sets of sequences of words). We show words in the neighborhood of “politics.” For visualization purposes, the vectors have been projected to two dimensions. The mechanics behind representation learning are detailed in chapter 10, which describes how to embed KGs, including the actual neural network architectures that tend to be used for these embeddings.</p>
|
||
<p class="LOF"> <b><a href="chapter_4.xhtml#fig4-6">4.6</a></b> A character-based RNN architecture as applied to NER for an input sentence such as “Michael’s birthday is coming.”</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-1">5.1</a></b> A practical illustration of the web IE problem. We use bounding boxes to illustrate the elements that would need to be extracted. In some cases, the elements (like the “Contact us” link) may not be visible on the page itself, but they are obtained from the HTML via an <i>< a</i> href > tag or property. The KG fragment is illustrated in <a href="chapter_5.xhtml#fig5-2">figure 5.2</a>. The original webpage was taken from <i>lawyers.findlaw.com/lawyer/firm/auto-dealer-fraud/los-angeles/california</i>.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-2">5.2</a></b> A KG fragment containing the extracted information from the webpage illustrated in <a href="chapter_5.xhtml#fig5-1">figure 5.1</a>.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-3">5.3</a></b> An EC model (above) of a hypothetical webpage describing reviews for academic papers, and some Stalker rules (below) for extracting reviewers and their accept/reject decisions.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-4">5.4</a></b> An overview of how RoadRunner performs unsupervised web IE.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-5">5.5</a></b> Illustrative examples of the <i>horizontal listing</i>, <i>attribute/value</i>, and <i>navigational</i> web table types.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-6">5.6</a></b> Illustrative examples of the <i>vertical listing</i>, <i>matrix</i>, and <i>matrix calendar</i> web table types.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-7">5.7</a></b> An “other” table type illustration that does not fit neatly into the taxonomy described in the text.</p>
|
||
<p class="LOF"> <b><a href="chapter_5.xhtml#fig5-8">5.8</a></b> A tabular data source (containing information on polling from an instance of the UK general election, accessed on Wikipedia on October 18, 2019) for possible KG construction.</p>
|
||
<p class="LOF"> <span aria-label="xviii" id="pg_xviii" role="doc-pagebreak"/> <b><a href="chapter_7.xhtml#fig7-1">7.1</a></b> An illustration of domain-specific IE (<i>natural disaster</i> domain) over social media data. Instances that might be extracted in such domains are underlined and include the natural disaster itself (<i>#CaliforniaFires</i>), support events (<i>Bake sale</i>) that themselves have arguments (<i>Stident Leadership Council</i>), span of the disaster (<i>1.8 million acres</i>), and even causal information (<i>prohibiting controlled burns</i>).</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-1">8.1</a></b> A fragment of a bibliographic KG illustrating the IM problem.</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-2">8.2</a></b> The two-step template that is often used for efficiently tackling real-world IM problems. The workflow can be customized in numerous ways, depending on both data modality and assumption about the underlying IM methods. For example, instead of linking entities between two KGs (say, Freebase and DBpedia), instances within a single KG may have to be resolved. For unsupervised methods, training sets may not be required (or the training set may be used only in blocking or similarity, not in both). If the KGs are modeled according to different ontologies and the IM or blocking is sensitive to this, then ontology matching may be necessary as a “step 0.”</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-3">8.3</a></b> An illustration of SN blocking. On the left is a table with instance IDs and the instances’ BKVs. The table is sorted according to the BKVs. A window of size 3 is slid over the table, and instances within the window are paired and added to the candidate set (initialized as empty). The final candidate set (sent on to the similarity step in the two-step IM workflow) is shown at the bottom right.</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-4">8.4</a></b> An illustration of feature extraction from a mention-pair of instances (X,Y) in the candidate set, given a library of <i>m</i> feature extraction functions. We have not drawn the property edges for clarity. Different concepts in the ontology are intuitively expressed by the text (e.g., the node “American” is of concept type “Cuisine” in the ontology). Not every feature function in the library may be applicable to every pair of attributes. Usually, a dummy value of –1 is used in the feature vector to express when one of the <i>m</i> functions is inapplicable to a concept. Given <i>k</i> attributes (in this case, <i>k</i> = 4) for each instance (of a given type), this ensures that each feature vector is of a dimensionality of exactly <i>km</i> unless a conscious decision is made to compare attributes across types. We use<span class="ellipsis">…</span> to indicate the presence of multiple unknown features and feature values.</p>
|
||
<p class="LOF"> <span aria-label="xix" id="pg_xix" role="doc-pagebreak"/> <b><a href="chapter_8.xhtml#fig8-5">8.5</a></b> A naive hard-clustering method, based on connected components, for recovering instance clusters from thresholded, scored pairwise outputs obtained from a two-step IM pipeline. The nodes represent instances, the circles represent the clusters (in this case, equivalent to connected components), and an edge exists only between two instances if the similarity between them was above the threshold (and assuming also that they were compared in the first place—that is, were placed in the candidate set). A single edge between the two components (say, D and F) would “collapse” both components into one cluster. Soft clustering approaches take a more sophisticated approach. Other approaches do not apply thresholding but rather work directly with weighted edges in the graph.</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-6">8.6</a></b> An ENS as a representation of IM outputs. The top image illustrates an ENS population given both a clustering scheme and entity linking, while the bottom image only assumes basic pairwise outputs from the similarity step. A population given either clustering or entity linking (but not both) is similarly feasible. Applications could directly query the ENS, sometimes in a pay-as-you-go fashion. Using additional RDF machinery, the <i>owl:sameAs</i> (or other ontologically equivalent) links could be further annotated with provenance, confidence, and other meta-attributes.</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-7">8.7</a></b> An illustration of ETL for database-centric workflows. When applied to KGs, the “extract” phase is equivalent to the methods described in part II of this book, while the “transform” phase would include steps like IM, SRL (described in chapter 9), and other approaches that lead to a single, clean KG. The “load” phase would upload the KG to an infrastructure where it can be queried, accessed, and used to facilitate analytics, a subject that will be covered in part IV.</p>
|
||
<p class="LOF"> <b><a href="chapter_8.xhtml#fig8-8">8.8</a></b> A taxonomy of data-cleaning problems, as originally conceived of in the database community.</p>
|
||
<p class="LOF"> <b><a href="chapter_9.xhtml#fig9-1">9.1</a></b> A visual representation of a Markov Network as an undirected graph (above) and a factor graph (below).</p>
|
||
<p class="LOF"> <b><a href="chapter_9.xhtml#fig9-2">9.2</a></b> A ground Markov Network, assuming that the constants {Mary, Bob} are applied to the first two formulas in <a href="chapter_9.xhtml#fig9-1">figure 9.1</a>.</p>
|
||
<p class="LOF"> <b><a href="chapter_9.xhtml#fig9-3">9.3</a></b> An illustration of KG identification.</p>
|
||
<p class="LOF"> <b><a href="chapter_10.xhtml#fig10-1">10.1</a></b> An illustration of Firth’s hypothesis over a common corpus like Wikipedia pages. Words that are in the same semantic class (such as “cat” and “dog”) tend to share similar contexts and are clustered close together in the vector space. Because the projection of the vectors (which are usually in the tens, if not hundreds, of real-valued dimensions) is in 2D, the “clusters” appear closer together.</p>
|
||
<p class="LOF"> <span aria-label="xx" id="pg_xx" role="doc-pagebreak"/><b><a href="chapter_10.xhtml#fig10-2">10.2</a></b> Comparative illustrations of the skip-gram and CBOW neural models, which have become extremely popular in the word representation learning (“embedding”) due to their speed and good performance on word analogy tasks. Here, <i>x</i><sub><i>t</i></sub> is the target word, and <i>x</i><sub>1</sub><i>, <span class="ellipsis">…</span>, x</i><sub><i>k</i></sub> are the “context” words for some predetermined window size <i>k</i>. While CBOW takes the context words as input and predicts the target word, skip-gram operates in the opposite way.</p>
|
||
<p class="LOF"> <b><a href="chapter_10.xhtml#fig10-3">10.3</a></b> An illustration of DeepWalk, a network-embedding approach that relies on word embeddings as an underlying mechanism, on the Zachary Karate Club network. The concept is relatively simple: first, a “corpus” of random walks is constructed, with each random walk interpreted by the word-embedding algorithm (e.g., CBOW or skip-gram, but more modern embedding implementations could also be applied) as a sentence, and with nodes as words. The result is an embedding for each word (in this case, node). The algorithm can be extended in multiple ways (e.g., for directed graphs) and at this time, more sophisticated embeddings of this type have been proposed in the network community as well (LINE, node2vec, and several others). However, the algorithm continues to be reasonably popular, probably owing to its simplicity.</p>
|
||
<p class="LOF"> <b><a href="chapter_10.xhtml#fig10-4">10.4</a></b> Illustration of the architecture of a Siamese neural network.</p>
|
||
<p class="LOF"> <b><a href="chapter_10.xhtml#fig10-5">10.5</a></b> An illustration of basic translation (in the context of KGEs) that is exploited by all of the Trans* algorithms in increasingly sophisticated ways. In this example, an entity such as “London” can be translated into “United Kingdom” using the relation “capital_of:,” which is a single vector that allows entities from one class (in this example) to be translated to entities from another class. In practice, translation tends to work well when entities can be (at least implicitly) clustered in such semantically meaningful ways, although more sophisticated variants are able to learn very general translations for relations that may not be between entities belonging to such well-defined classes.</p>
|
||
<p class="LOF"> <b><a href="chapter_11.xhtml#fig11-1">11.1</a></b> The Protégé interface in a biomedical domain (taken from the Wiki available at <i>protegewiki.stanford.edu/wiki/Main_Page</i>).</p>
|
||
<p class="LOF"> <b><a href="chapter_12.xhtml#fig12-1">12.1</a></b> A simple SPARQL query, the elements of which are described in the text.</p>
|
||
<p class="LOF"> <b><a href="chapter_12.xhtml#fig12-2">12.2</a></b> A partial SPARQL query illustrating an aggregation.</p>
|
||
<p class="LOF"> <b><a href="chapter_12.xhtml#fig12-3">12.3</a></b> An example of a SPARQL query with a subquery.</p>
|
||
<p class="LOF"> <b><a href="chapter_12.xhtml#fig12-4">12.4</a></b> Relational RDF store representations (for all three categories described in the text) for the same RDF data set.</p>
|
||
<p class="LOF"> <b><a href="chapter_12.xhtml#fig12-5">12.5</a></b> Illustrative (i.e., conceptual) example of Elasticsearch boolean tree query. Here, <i>gte</i> and <i>lte</i> stand for the symbols ≥ and ≤, respectively. The other leaf nodes are key-value pairs.</p>
|
||
<p class="LOF"> <b><a href="chapter_12.xhtml#fig12-6">12.6</a></b> Graph representation of the Cypher snippet described in the text.</p>
|
||
<p class="LOF"> <b><a href="chapter_14.xhtml#fig14-1">14.1</a></b> Wikipedia infoboxes have been used to automatically populate KBs such as DBpedia on a large scale.</p>
|
||
<p class="LOF"> <span aria-label="xxi" id="pg_xxi" role="doc-pagebreak"/><b><a href="chapter_14.xhtml#fig14-2">14.2</a></b> An illustration of microdata (Schema.org) for a movie website (the <i>Rotten Tomatoes</i> page for the 2019 <i>Lion King</i> remake). Elements such as the rating are embedded in the source HTML as Schema.org (shown using the dark, solid oval) and could be extracted into a KG based on syntax. Search engines find it easier to work with such semantically rich data for precisely this reason, leading to more informed search results for a querying user. The manifestations of Schema.org snippets (including the ratings both on the <i>Rotten Tomatoes</i> page and in a Google search) are shown using the two dashed ovals.</p>
|
||
<p class="LOF"> <b><a href="chapter_15.xhtml#fig15-1">15.1</a></b> An example of semantics being incorporated into modern search engines like Google to satisfy user intent without requiring even a click. In this case, a search for an entity like “Leonardo da Vinci” yields a knowledge panel powered by the Google Knowledge Graph, and it provides some of the core information about the entity most likely to be useful to a typical user conducting the search. We also illustrate a magnified version of the panel (right).</p>
|
||
<p class="LOF"> <b><a href="chapter_15.xhtml#fig15-2">15.2</a></b> An example of an OGP snippet embedded in the HTML source of the IMDB page associated with the <i>Lion King</i> (2019). By embedding these snippets, developers can turn their web objects into graph objects. The protocol has been extremely popular with developers catering to social media companies like Facebook. Accessed on Nov. 17 at <i><a href="http://www.imdb.com/title/tt6105098/">www.imdb.com/title/tt6105098/</a></i>.</p>
|
||
<p class="LOF"> <b><a href="chapter_16.xhtml#fig16-1">16.1</a></b> A snapshot of changes to the GO resource webpage from five years ago (top) to April 2019 (bottom).</p>
|
||
<p class="LOF"> <b><a href="chapter_16.xhtml#fig16-2">16.2</a></b> An illustration of ChEBI search results for the molecular entity “ecogonine benzoate.”</p>
|
||
<p class="LOF"> <b><a href="chapter_16.xhtml#fig16-3">16.3</a></b> Relationships between the SWEET ontologies (integrative and faceted).</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-1">17.1</a></b> A typical workflow showing the working of the DIG system.</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-2">17.2</a></b> An illustration of the DIG application, involving several interleaved steps that a nontechnical subject matter (or domain) expert could take to set up their own domain-specific KG and search engine. The precise details and text on the dashboard are less important than the steps shown here and described in the text (and are being updated periodically).</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-3">17.3</a></b> Examples of three clusters, each containing structurally and contextually similar webpages.</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-4">17.4</a></b> An illustration (from the <i>Securities Fraud</i> domain) of the semantic typing facility in Inferlink. To create this screen, semantic typing, in terms of the defined set of fields, has already been done. For example, the second column has been typed with “age” semantics and other elements.</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-5">17.5</a></b> An illustration of the search capabilities offered by the DIG system, with HT investigations as the use-case. Critical details have been blurred or obfuscated due to the illicit nature of some of the material.</p>
|
||
<p class="LOF"> <span aria-label="xxii" id="pg_xxii" role="doc-pagebreak"/><b><a href="chapter_17.xhtml#fig17-6">17.6</a></b> A structured search form in the DIG system. This form can be used for an ordinary <i>product</i> domain, but in this case, it was built for a prototype used to investigate the <i>counterfeit electronics</i> domain.</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-7">17.7</a></b> Facets and filtering in the DIG system (see the gray sidebar). By “checking” off certain boxes or adding more search terms (top of the sidebar), a user can try to make the search more precise.</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-8">17.8</a></b> Entity-centric search (also called <i>dossier generation</i>) in the DIG system from the <i>illegal firearms sales</i> domain. In this case, an investigator could look up all information extracted for the entity “glock 26.”</p>
|
||
<p class="LOF"> <b><a href="chapter_17.xhtml#fig17-9">17.9</a></b> Provenance in the DIG system for an extracted entity “square enix,” which is a game. In addition to the document ID, the provenance shows that a single extraction algorithm, based on dictionaries (or glossaries), was used. For the third method, the source (on which the algorithm was executed) is the raw HTML, while for the other two, it is the scraped text. Here, <i>content_relaxed</i> means that the text was extracted in a recall-friendly way from the HTML (meaning that some irrelevant elements, such as ad and code snippets embedded in the HTML, were extracted in addition to the relevant content), while <i>content_strict</i> implies a more precision-friendly approach.</p>
|
||
<p class="LOF"><b><a href="chapter_17.xhtml#fig17-10">17.10</a></b> The THOR system developed for providing situational awareness to responders in the aftermath of natural disasters and other crises.</p>
|
||
<p class="LOF"><b><a href="chapter_17.xhtml#fig17-11">17.11</a></b> The THOR dashboard over a real data set collected from Twitter in the aftermath of a Nepal earthquake, which devastated the region in 2015. In general, THOR is capable of working over myriad kinds of data and disasters, especially in those regions of the world where the native language is not English.</p>
|
||
</section>
|
||
</div>
|
||
</body>
|
||
</html> |