I found this paper on neural summarization while searching for neural network approaches to sentence compression. Mirella Lapata has co-authored influential papers on compression using classical methods, so I was curious to read her deep learning approach. The paper also uses pointer networks, which I have seen in other recent summarization papers, another reason I chose to read this one in depth.
I am not quite sure what I make of this paper. The authors present a giant neural network (an LSTM on top of a CNN) which is broadly in line with a lot of similar recent work in other areas of NLP. The major contributions seem to be:
The authors collect gold standard summaries by downloading article/highlight summary summary pairs from the DailyMail, a common dataset. They then use a series of hand-written rules to identify which sentences from an article ‘‘match a highlight’’. They also filter articles with highlights containing words which are not drawn from a given document, as their model generates ‘‘by extraction’’.
Using this DailyMail dataset for supervision, the authors present a somewhat complex, multi-component neural network model which:
The authors argue that that step 4 is a hybrid of extractive and true abstractive summarization. In many respects, step 4 (section 4.3 in the paper) represents the major contribution of the work.
This model appears to generate a word in a summary instead of a label for each sentence in the input document. Doesn’t this then imply that the summary must have as many word tokens as the number of sentences in the document? If so, this is a sort of strange constraint for a model.