Keywords:
Natural language processing (Computer science).
;
Electronic books.
Type of Medium:
Online Resource
Pages:
1 online resource (281 pages)
Edition:
1st ed.
ISBN:
9783030448301
URL:
https://ebookcentral.proquest.com/lib/geomar/detail.action?docID=6229434
Language:
English
Note:
Intro -- Foreword -- ``Don't Read This Book. Use It!'' by Ken Barker -- ``Most of the Knowledge in the World Is Encoded Not in Knowledge Graphs but in Natural Language'' by Denny Vrandecic -- Preface -- Purpose of the Book -- Overview of the Chapters in This Book -- Materials -- Relation to Other Books in the Area -- Acknowledgements -- Contents -- Part I Preliminaries and Building Blocks -- 1 Hybrid Natural Language Processing: An Introduction -- 1.1 A Brief History of Knowledge Graphs, Embeddings, and Language Models -- 1.2 Combining Knowledge Graphs and Neural Approaches for NLP -- 2 Word, Sense, and Graph Embeddings -- 2.1 Introduction -- 2.2 Distributed Word Representations -- 2.3 Word Embeddings -- 2.4 Sense and Concept Embeddings -- 2.5 Knowledge Graph Embeddings -- 2.6 Conclusion -- 3 Understanding Word Embeddings and Language Models -- 3.1 Introduction -- 3.2 Language Modeling -- 3.2.1 Statistical Language Models -- 3.2.2 Neural Language Models -- 3.3 Fine-Tuning Pre-trained Language Models for Transfer Learning in NLP -- 3.3.1 ELMo -- 3.3.2 GPT -- 3.3.3 BERT -- 3.4 Fine-Tuning Pre-trained Language Models for Bot Detection -- 3.4.1 Experiment Results and Discussion -- 3.4.2 Using the Transformers Library to Fine-Tune BERT -- 3.4.2.1 The Transformers Library -- 3.4.2.2 Download the Dataset -- 3.4.2.3 BERT Tokenization -- 3.4.2.4 Fine-Tuning the Model -- 3.4.2.5 Other Evaluation Metrics -- 3.4.2.6 Model Inference -- 3.5 Conclusion -- 4 Capturing Meaning from Text as Word Embeddings -- 4.1 Introduction -- 4.2 Download a Small Text Corpus -- 4.3 An Algorithm for Learning Word Embeddings (Swivel) -- 4.4 Generate Co-occurrence Matrix Using Swivel prep -- 4.5 Learn Embeddings from Co-occurrence Matrix -- 4.5.1 Convert tsv Files to bin File -- 4.6 Read Stored Binary Embeddings and Inspect Them -- 4.6.1 Compound Words.
,
4.7 Exercise: Create Word Embeddings from Project Gutenberg -- 4.7.1 Download and Pre-process the Corpus -- 4.7.2 Learn Embeddings -- 4.7.3 Inspect Embeddings -- 4.8 Conclusion -- 5 Capturing Knowledge Graph Embeddings -- 5.1 Introduction -- 5.2 Knowledge Graph Embeddings -- 5.3 Creating Embeddings for WordNet -- 5.3.1 Choose Embedding Algorithm: HolE -- 5.3.1.1 Install scikit-kge -- 5.3.1.2 Install and Inspect holographic_embeddings -- 5.3.2 Convert WordNet KG to the Required Input -- 5.3.2.1 KG Input Format Required by SKGE -- 5.3.2.2 Converting WordNet 3.0 into the Required Input Format from Scratch -- 5.3.3 Learn the Embeddings -- 5.3.4 Inspect the Resulting Embeddings -- 5.3.4.1 skge Output File Format -- 5.3.4.2 Converting Embeddings to a More Manageable Format -- 5.4 Exercises -- 5.4.1 Exercise: Train Embeddings on Your Own KG -- 5.4.2 Exercise: Inspect WordNet 3.0 Pre-calculated Embeddings -- 5.5 Conclusion -- Part II Combining Neural Architectures and Knowledge Graphs -- 6 Building Hybrid Representations from Text Corpora, Knowledge Graphs, and Language Models -- 6.1 Introduction -- 6.2 Preliminaries and Notation -- 6.3 What Is Vecsigrafo and How to Build It -- 6.4 Implementation -- 6.5 Training Vecsigrafo -- 6.5.1 Tokenization and Word-Sense Disambiguation -- 6.5.1.1 Disambiguators -- 6.5.1.2 Tokenizations -- 6.5.1.3 Example WordNet -- 6.5.1.4 Cogito Example Tokenization -- 6.5.2 Vocabulary and Co-occurrence Matrix -- 6.5.2.1 Standard Swivel Prep -- 6.5.2.2 Joint-subtoken Prep -- 6.5.3 Learn Embeddings from Co-occurrence Matrix -- 6.5.3.1 Convert tsv Files to bin File -- 6.5.4 Inspect the Embeddings -- 6.6 Exercise: Explore a Pre-computed Vecsigrafo -- 6.7 From Vecsigrafo to Transigrafo -- 6.7.1 Setup -- 6.7.2 Training Transigrafo -- 6.7.3 Extend the Coverage of the Knowledge Graph -- 6.7.4 Evaluating a Transigrafo.
,
6.7.5 Inspect Sense Embeddings in Transigrafo -- 6.7.6 Exploring the Stability of the Transigrafo Embeddings -- 6.7.6.1 How Often Do Senses Occur in SemCor? -- 6.7.6.2 What Do Frequent Sense rcosim Plots Look Like -- 6.7.7 Additional Reflections -- 6.8 Conclusion -- 7 Quality Evaluation -- 7.1 Introduction -- 7.2 Overview of Evaluation Methods -- 7.2.1 Recommended Papers in This Area -- 7.3 Practice: Evaluating Word and Concept Embeddings -- 7.3.1 Visual Exploration -- 7.3.2 Intrinsic Evaluation -- 7.3.2.1 Compute a Relatedness Score -- 7.3.2.2 Conclusions for Intrinsic Evaluation -- 7.3.3 Word Prediction Plots -- 7.3.3.1 Conclusion for Word Prediction -- 7.3.4 Extrinsic Evaluation -- 7.4 Practice 2: Assessing Relational Knowledge Captured by Embeddings -- 7.4.1 Download the embrela Project -- 7.4.2 Download Generated Datasets -- 7.4.3 Load the Embeddings to Be Evaluated -- 7.4.4 Learn the Models -- 7.4.5 Analyzing Model Results -- 7.4.5.1 Load and Aggregate Evaluation Results for the Trained Models -- 7.4.5.2 Loading Pre-aggregated Results -- 7.4.6 Data Pre-processing: Combine and Add Fields -- 7.4.7 Calculate the Range Thresholds and Biased Dataset Detection -- 7.4.8 Finding Statistically Significant Models -- 7.4.8.1 Combining Filters -- 7.4.9 Conclusion of Assessing Relational Knowledge -- 7.5 Case Study: Evaluating and Comparing Vecsigrafo Embeddings -- 7.5.1 Comparative Study -- 7.5.1.1 Embeddings -- 7.5.1.2 Word Similarity Results -- 7.5.1.3 Inter-Embedding Agreement -- 7.5.1.4 Word-Concept Prediction -- 7.5.1.5 Relation Prediction -- 7.5.2 Discussion -- 7.5.2.1 Vecsigrafo (and SW2V) Compared to Conventional Word Embeddings -- 7.5.2.2 Vecsigrafo Compared to KG Embeddings -- 7.6 Conclusion -- 8 Capturing Lexical, Grammatical, and Semantic Information with Vecsigrafo -- 8.1 Introduction -- 8.2 Approach.
,
8.2.1 Vecsigrafo: Corpus-Based Word-Concept Embeddings -- 8.2.2 Joint Embedding Space -- 8.2.3 Embeddings Evaluation -- 8.3 Evaluation -- 8.3.1 Dataset -- 8.3.2 Word Similarity -- 8.3.3 Analogical Reasoning -- 8.3.4 Word Prediction -- 8.3.5 Classification of Scientific Documents -- 8.4 Discussion -- 8.5 Practice: Classifying Scientific Literature Using Surface Forms -- 8.5.1 Import the Required Libraries -- 8.5.2 Download Surface form Embeddings and SciGraph Papers -- 8.5.2.1 Downloading Data from Google Drive -- 8.5.3 Read and Prepare the Classification Dataset -- 8.5.4 Surface form Embeddings -- 8.5.5 Create the Embeddings Layer -- 8.5.6 Train a Convolutional Neural Network -- 8.6 Conclusion -- 9 Aligning Embedding Spaces and Applications for Knowledge Graphs -- 9.1 Introduction -- 9.2 Overview and Possible Applications -- 9.2.1 Knowledge Graph Completion -- 9.2.2 Beyond Multi-Linguality: Cross-Modal Embeddings -- 9.3 Embedding Space Alignment Techniques -- 9.3.1 Linear Alignment -- 9.3.2 Non-linear Alignment -- 9.4 Exercise: Find Correspondences Between Old and Modern English -- 9.4.1 Download a Small Text Corpus -- 9.4.2 Learn the Swivel Embeddings over the Old Shakespeare Corpus -- 9.4.2.1 Calculating the Co-occurrence Matrix -- 9.4.2.2 Learning the Embeddings from the Matrix -- 9.4.2.3 Read Stored Binary Embeddings and Inspect Them -- 9.4.3 Load Vecsigrafo from UMBC over WordNet -- 9.4.4 Exercise Conclusion -- 9.5 Conclusion -- Part III Applications -- 10 A Hybrid Approach to Disinformation Analysis -- 10.1 Introduction -- 10.2 Disinformation Detection -- 10.2.1 Definition and Background -- 10.2.2 Technical Approach -- 10.3 Application: Build a Database of Claims -- 10.3.1 Train a Semantic Claim Encoder -- 10.3.1.1 Training Dataset: STS-B -- 10.3.1.2 Load the BERT Model -- 10.3.1.3 Define Training Method.
,
10.3.2 Create a Semantic Index of Embeddings and Explore It -- 10.3.2.1 Define a Method to Populate the Index -- 10.3.3 Populate Index with STS-B dev -- 10.3.4 Create Another Index for a Claims Dataset -- 10.3.5 Load Dataset into a Pandas DataFrame -- 10.3.5.1 Create Claim Iterator -- 10.3.5.2 Populate a claim_index -- 10.3.5.3 Explore Dataset -- 10.3.6 Conclusion of Building a Database of Claims -- 10.4 Application: Fake News and Deceptive Language Detection -- 10.4.1 Basic Document Classification Using Deep Learning -- 10.4.1.1 Dataset: Deceptive Language (Fake Hotel Reviews) -- 10.4.1.2 Tokenize and Index the Dataset -- 10.4.1.3 Define the Experiment to Run -- 10.4.1.4 Discussion -- 10.4.2 Using HolE Embeddings -- 10.4.2.1 Download the Embeddings -- 10.4.2.2 Load the Embeddings and Convert to the Format Expected by clsion -- 10.4.2.3 Tokenize and Index the Dataset -- 10.4.2.4 Define the Experiment and Run -- 10.4.2.5 Discussion -- 10.4.3 Using Vecsigrafo UMBC WNet Embeddings -- 10.4.3.1 Download the Embeddings -- 10.4.3.2 Tokenize and Index Dataset -- 10.4.3.3 Define the Experiment and Run -- 10.4.4 Combine HolE and UMBC Embeddings -- 10.4.4.1 Combine the Embeddings -- 10.4.5 Discussion and Results -- 10.5 Propagating Disinformation Scores Through a Knowledge Graph -- 10.5.1 Data Commons ClaimReview Knowledge Graph -- 10.5.1.1 KG Schema and Credibility Injections -- 10.5.1.2 Data Commons ClaimReview Knowledge Graph Instances -- 10.5.1.3 Data Commons ClaimReview Discredibility Scores -- 10.5.2 Discredibility Scores Propagation -- 10.5.2.1 Discredibility Propagation at Schema Level -- 10.5.2.2 Discredibility Propagation at Instance Level -- 10.5.2.3 Discussion and Further Improvements -- 10.6 Conclusion -- 11 Jointly Learning Text and Visual Information in the Scientific Domain -- 11.1 Introduction.
,
11.2 Figure-Caption Correspondence Model and Architecture.
Permalink