NLP: Part III

Having moved on from the series of books where third-person narrative was used, and chapters weren’t titled with the POV character, I had little reason to continue the project, and a lack of forward momentum which would have kept me intrigued. But what follows was where I left it off and where NLP fell short for my use-case.

Enter OpenNLP

OpenNLP was much more biddable for me than CoreNLP was. The things I’d have expected to be available to the objects returned to me (sentence of the token, part of speech for the token, probability that the prediction made by the coreference resolution is correct) were more present in OpenNLP. Here’s an example.
RealNLP
That NNP circled is important. According to the Penn Treebank P.O.S. Tags, that indicates a “proper noun, singular”, which, if we are trying to determine who is speaking, is important. My code ended up using this as a key determinant in it’s prediction model.

Why NLP Falls Short in My Case

Coreference Resolution, as I understand it, is reliant on a larger branch of NLP research, Named Entity Recognition, or NER. NER is used to classify an entity as a person, location, organization, amount, date, time, whatever. The problem for my use case is that the names of the characters are not in the names dictionaries of either of the two packages I tried out.
Worse, some of the characters in the book I used to benchmark my implementation have names that are in the wrong dictionary (i.e. Malta the person, is actually recognized as a Country in CoreNLP).

Wrapping Up

In the end, having left behind the Robin Hobb series almost a month ago, I think Frequency Counts were preferable. The runtime was about 1 second per book compared to 10 minutes for the NLP-based approach. Sure it would’ve been cool to have it work, but I don’t think NLP is cut out to determine the name of the POV character for a section of a book.
So here was the final score (Correct predictions in bold):

Frequency Counts 14/15

Prologue: [She Who Remembers]
1: [Malta, Keffria, Reyn]
2: [Ronica, Serilla]
3: [Kennit, Kennit, Wintrow, Kennit, Wintrow, Althea]
4: [Tintaglia, Reyn]
5: [Althea]

CoreNLP 7/15

Prologue: [She Who Remembers]
Chapter 1: [The Satrap, Jani, Some]
Chapter 2: [Ronica Vestrit, Davad Restart]
Chapter 3: [Captain Kennit, Wintrow, Wintrow, Wintrow, Ah, No]
Chapter 4: [Beautiful, Reyn Khuprus]
Chapter 5: [Althea]

OpenNLP 9/15

Prologue: [Not every serpent]
Chapter 1: [Malta, Keffria, Reyn]
Chapter 2: [Ronica, Bingtown]
Chapter 3: [Captain Kennit, Wintrow, This voice, Wintrow, A moment, The serpent]
Chapter 4: [Tintaglia, Reyn]
Chapter 5: [Jek]
Here is a code sample from this last attempt:
Capture


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *