Licensed under: https://creativecommons.org/licenses/by-nc/4.0/
HI-LING
LINGUISTICS IN THE HIGH SCHOOL
Lesson 4: Stylometry
Definition: Stylometry is the study of writing styles and authorship, by which we can identify authors based on linguistic and structural features.
Key Concepts
-
Stylometry
-
Identifying Plagiarism
-
Identifying Intent or Motivation
UNIT 1: What is Stylometry?
There are multiple types of written evidence that a forensic linguist may come into contact with, from suicide notes to anonymous hate mail to witness statements or even cases of plagiarism. In stylometry, we analyze these texts to identify an author's unique writing style. This can include elements like word choice, sentence structure, and punctuation. Identifying these features can help us determine who wrote a text.
​
In the context of plagiarism, stylometry is used to determine authorship attribution. This process involves comparing an unknown text with a known sample of an author's work to identify inconsistencies and similarities, which can be indicative of potential plagiarism.
​
For instance, you can examine the word choice, uncovering similarities between suspected plagiarized content and its source material. Additionally, stylometry looks at punctuation patterns, sentence structures and the use of function words. These elements can reveal changes from an author's characteristic style. Plagiarized content often lacks the genuine emotional tone or sentiment present in the original text, and stylometry can detect the differences in how emotions are conveyed.
​
Advanced plagiarism detection tools utilize machine learning and statistical analysis to systematically compare suspected plagiarized text with a database of known sources. By setting specific thresholds for similarity, these tools can flag potential plagiarism. Combining stylometric analysis with contextual information and an understanding of the author's intent further refines the detection process, helping to differentiate between unintentional similarities and deliberate plagiarism.
Now, we're going to take a real life court case where one author was sued for plagiarism. We're going to first determine together if you think it has been plagiarized.
Below are extracts from both books to get you started.
​
In pairs, discuss what elements of this text stand out to you as potentially indicative of an author's style. Consider criteria such as word choice, sentence length, and any unique characteristics. We will discuss this all together afterwards.
​
​
​
​
​
​
​
​
​
​
Activity 1: Detecting Plagiarism
Extract 1:
​
"Ridgeway and Zoe looked silently about them. The room was the size of a luxury hotel room and furnished in much the same way. Besides the sofa and chairs, there was a television set, a rack of current magazines, a small computer terminal displaying financial quotes, and a wet bar stocked with liquor. Ridgeway went to the wet bar, set the wrapped painting down on the counter, and filled a tumbler with water from a chilled bottle of Perrier."
Extract 2:
​
"Langdon and Sophie stepped into another world. The small room before them looked like a lavish sitting room at a fine hotel. Gone were the metal and rivets, replaced with oriental carpets, dark oak furniture and cushioned chairs. On the broad desk in the middle of the room, two crystal glasses sat beside an opened bottle of Perrier, its bubbles still fizzing. A pewter pot of coffee steamed beside it."
So is the extract plagarized? This case was brought by Lewis Perdue, author of "Daughter of God" and "The Da Vinci Legacy" written in 2000 and 1983 respectively, who made the claim that Dan Brown had plagiarized his copyrighted work, as the extent of copying the plot, settings and characters were excessive even for a novel in the same genre. Extract 1 comes from The Da Vinci Legacy by Lewis Perdue and extract 2 is from Da Vinci Code by Dan Brown. Ultimately, the judge ruled that it was not a case of copyright infringement, however the forensic linguist working on the case felt differently.
UNIT 2: How to Identify Authorship
Now, let's dig deeper into learning how we can identify authorship. This can be crucial in various contexts, from identifying potential suspects in a crime or ensuring that a will is indeed valid and written by the correct person. Here are several factors that we can look into when it comes to identifying the author through his/her background and what their motivations are:
Terminology and Vocabulary
Specific word choices: Pay attention to the words an author uses. Their vocabulary can reveal their level of education, profession, or interests.
Domain-specific terminology: If the author employs jargon or specialized terminology related to a particular field, it may indicate their background in that domain.
Punctuation and Grammar
Grammatical correctness: Proper grammar usage can indicate a level of education or attention to detail.
​
Punctuation choices: The use of ellipses, exclamation marks, or other punctuation marks can reveal the author's emotional state or personal writing style.
Audience
Pronoun usage: The choice of pronouns such as "we," "you," or "they" can provide clues about the author's relationship with the audience and their intent.
​
Addressing the reader: How the author addresses the reader (e.g., using imperative commands, asking questions), can also reveal their intent (e.g., instructing, persuading, engaging)
Tone and Style
Formality vs. Informality: The formality of language and style can indicate the author's intent. A formal tone might suggest professional or academic writing, while an informal one could be more casual or personal.
​
Emotive language: Emotional language or vivid descriptions can hint at the author's emotional state or intent, such as persuasion or storytelling.
Cultural References
Cultural references: References to cultural elements, events, or idiomatic expressions can indicate the author's cultural background.
​
Regional language variations: Regional dialects or language variations might give away the author's geographic origin.
In Summary
In practice, the process of identifying an author's intent and background is often a combination of these linguistic elements. It requires careful reading, critical thinking, and contextual awareness to draw meaningful conclusions about the author's purpose and background based on their written communication.
Sentence Structure
Sentence length: Short, concise sentences might indicate a more straightforward and direct communication style, while longer, complex sentences could suggest a more formal or academic approach.
Information Gaps and Omission
What the author chooses not to say or intentionally omits can be as revealing as what they include. Understanding these gaps can provide insights into their intent and background.
So Detective X is still hard at work trying to solve the break-in at the cheese factory, and we'll explore how we can use those factors that we just learned about to analyze the note that was left behind by the robber.
​
As you can see it is a grocery list, and Detective X has gathered three other samples of handwriting for us to review. In small groups, closely examine the samples of handwriting. Look for patterns or clues that could help us determine which handwriting sample matches our robber's (first image on the left).
​​
Afterward, we'll discuss your findings. Let's get those detective minds working!
Activity 2: Solving the Break-In
Do any of the samples match? While the original and #4 are very similar, unfortunately it's not a match. So Detective X will keep hunting for the robber.
Final thoughts for this lesson:
​​
-
We explored the concept of stylometry, which helps us identify writing styles and authorship.
-
We delved into factors that can help us determine authorship and how it's used to solve cases of plagiarism and identify potential suspects.
​
Now, we want you to reflect on how these concepts can be applied in your own lives. How can they be useful in research, communication, or online interactions?
​
​
Sources
Bredthauer, S. (2013). Verstellungen in inkriminierten Schreiben. Kölner Wissenschaftsverlag.
Eilika Fobbe. (2011). Forensische Linguistik : eine Einführung / monograph. Narr Francke Attempto.
Olsson, J., & Luchjenbroers, J. (2014). Forensic linguistics. Bloomsbury.
Olsson, J. (2017). Wordcrime : solving crime through forensic linguistics. Bloomsbury Academic.