Events are via MS Teams, and are free to attend. The MS Teams link is sent the day before the event. If you have problems registering, or have any questions, please contact the organiser, Costas Gabrielatos ([email protected]).

—————————————————————————————————————————————————————

MEETING #13: Friday 15 November 2024, 2-4 pm (GMT) – Two presentations

Registration is free. CLICK HERE TO REGISTER.

Topic: Discourse-Oriented Corpus Studies

2-3 pm
Katia Adimora (Edge Hill University)
Mexican immigration/immigrants in American and Mexican newspapers

Abstract

The study employed Corpus Assisted Discourse Studies (CADS) methodology to conduct discourse prosody analysis to reveal hidden attitudes towards Mexican immigration/immigrants in American and Mexican press. It created American immigration corpus (AIC) and Mexican immigration corpus (MIC).  The AIC includes 12,595 articles (16,619,925 words) from: The New York Times, The Washington Post, USA Today, Los Angeles Times, The Arizona Republic, Chicago Tribune. The MIC includes 20,865 articles (12,258,123 words) from:  El Universal, Elimparcial.com, Reforma, El Norte, Lacronica.com, and Mural.

The results suggest that positive attitudes towards Mexican immigration/immigrants surpass negative attitudes in both corpora, with MIC newspapers being more positive than AIC newspapers, which does not always coincide with public opinion. The attitudes fluctuated during the study period and seemed to correlate with socio-political events and the political leaning of newspapers. In addition, while AIC newspapers were more prone to use impersonal thematic frames to describe immigration issues, MIC newspapers were more likely to use personal episodic frames, which might contribute to more empathy towards Mexican immigrants among the Mexican readership.

The most frequent positive attitudes in both corpora were opposition to Trump’s and Republican anti-immigration policies, and favourable comments towards the rights of immigrants and pro-immigration laws and regulations.

On the other hand, portraying immigrants as criminals was the most frequent negative attitude in both corpora. Also, common negative attitudes in both corpora expressed support for Trump’s anti-immigrant policies, opposed pro-immigrant rules and regulations, and criticised the (perceived) high number of immigrants.

In AIC newspapers, ‘illegal immigrant(s)’ was used with negative discourse prosody, whereas in MIC newspapers, the term ‘inmigrant(es) illegal(es)’ expressed neutral (positive and negative) attitudes.

3-4 pm
Dan Malone (Edge Hill University)
When is the extreme also typical? Using prototypicality to investigate representations of the lone-wolf terrorist

The term lone wolf figuratively conjures an image of an individual acting in isolation, perhaps motivated by a desire to break from societal norms. When applied to terrorism, lone wolf draws attention to the perceived aloneness of the perpetrator. However, evidence from the Lone Wolf Corpus (Malone, 2020) reveals that representations in the British press showed notable diachronic trends in how the lone-wolf terrorist’s (LWT) aloneness was (re)presented, which in turn indexed broader discursive shifts.

In this presentation, I report on my approach to investigating representations of the LWT by adopting a prototypical categorisation framework to analysing discourse prosodies (i.e., implicit and explicit attitudes) (Stubbs, 2001: 66) of connection. This categorisation hinges on four key attributes identified during manual corpus annotation: (1) perpetration, (2) ideological motivation, (3) logistical support, and (4) resource provision. These attributes address whether the LWT was represented as operating in complete isolation or receiving some form of assistance, either direct or indirect, from individuals or organisations.

Five distinct connection types emerged from the data, reflecting different combinations of these attributes: the Prototypical Lone Wolf Terrorist, depicted as ideologically self-driven and operationally independent; Assisted by Non-Affiliated Individual(s); Inspired by Organisation; Informed by Organisation; Directed by Organisation; and Member of Organisation. Each connection type was quantified, and its frequency was statistically analysed to trace diachronic discursive shifts.

The findings reveal a discursive reconstruction of the LWT over time. In the early period (2010-2014), the LWT was more frequently presented as a solitary actor, but later portrayals (particularly during 2015-2017) increasingly associated the lone wolf with broader, often Islamist, networks. This shift resulted in the LWT being depicted not as a fully independent individual, but rather as institutionalised and depersonalised—a faceless agent acting on behalf of extremist organisations.

References

Malone, D. (2020) Developing a complex query to build a specialised corpus: Reducing the issue of polysemous query terms. Paper presented at Corpora and Discourse International Conference 2020, University of Sussex, UK.

Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Blackwell.

—————————————————————————————————————————————————————

MEETING #14: Friday 24 January 2025, 2pm (GMT)
Topic: Corpus Methodology, Multi-Dimensional Analysis
Elen Le Foll (University of Cologne, Germany)
Modelling Textbook English using a Modified Multi-Feature/Dimensional Analysis (MDA) Framework

Abstract

English as it is represented in secondary school English as a Foreign Language (EFL) textbooks is often perceived as somehow different from ‘real-life’, ‘authentic’ English. Indeed, previous studies have shown that individual lexico-grammatical features are often misrepresented (see Le Foll 2024 for a synthesis of the literature). This is problematic given that textbooks are an important and highly influential vector of foreign language input in secondary education. It is therefore worth asking: Does Textbook English constitute a special variety of English? And, if so, in what ways does it differ from ‘real-life’, extra-curricular English?

This talk focuses on the modified version of the multi-feature/multi-dimensional analysis (see Biber 1988; Berber Sardinha & Veirano Pinto 2014; 2019: 19) framework used to answer these questions in Le Foll (2024). MDA is used to compare the language of nine series of EFL textbooks used at in lower secondary education in Germany, France and Spain with three target language reference corpora. Inspired by Diwersy et al. (2014) (2014) and Neumann & Evert (2021), this modified MDA framework is based on principal component analysis (PCA) and extensive multi-dimensional visualisations. The framework further incorporates additional steps designed to increase both the reproducibility and replicability of the results.

Following a theoretical introduction to both the research questions at hand and the MDA framework, the open-source tools used to conduct MDAs in this study are presented from a practical point of view. Together, we examine the functionalities of the Multi-Feature Tagger of English (MFTE Le Foll 2021; see also Le Foll & Shakir 2023) and a number of useful R libraries. To this end, we draw on the RMarkdown scripts that are part of the Online Supplements of Le Foll (2024; https://elenlefoll.github.io/TextbookMDA). Finally, we discuss the steps taken to improve the reproducibility and replicability of the results, in line with the principles of Open Science.

References

Berber Sardinha, Tony & Marcia Veirano Pinto (eds.). 2014. Multi-Dimensional Analysis, 25 Years on: A Tribute to Douglas Biber (Studies in Corpus Linguistics 60). Amsterdam: John Benjamins.

Berber Sardinha, Tony, Marcia Veirano Pinto, Cristina Mayer, Maria Carolina Zuppardi & Carlos Henrique Kauffmann. 2019. Adding Registers to a Previous Multi-Dimensional Analysis. In Tony Berber Sardinha & Marcia Veirano Pinto (eds.), Multi-Dimensional Analysis: Research Methods and Current Issues, 165–188. New York, NY: Bloomsbury.

Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511621024.

Diwersy, Sascha, Stephanie Evert & Stella Neumann. 2014. A weakly supervised multivariate approach to the study of language variation. In Benedikt Szmrecsanyi & Bernhard Wälchli (eds.), Aggregating dialectology, typology, and register analysis: Linguistic variation in text and speech, 174–204. Berlin: De Gruyter.

Le Foll, Elen. 2021. Introducing the Multi-Feature Tagger of English (MFTE). Perl. Osnabrück University. https://github.com/elenlefoll/MultiFeatureTaggerEnglish. (5 January, 2022).

Le Foll, Elen. 2024. Textbook English: A Multi-Dimensional Approach (Studies in Corpus Linguistics 116). Amsterdam: John Benjamins.

Le Foll, Elen & Muhammad Shakir. 2023. Introducing a New Open-Source Corpus-Linguistic Tool: The Multi-Feature Tagger of English (MFTE). Presented at the ICAME44, NWU Vanderbijlpark (South Africa).

Neumann, Stella & Stephanie Evert. 2021. A register variation perspective on varieties of English. In Elena Seoane & Douglas Biber (eds.), Corpus-based approaches to register variation (Studies in Corpus Linguistics 103), 144–178. Amsterdam: Benjamins.

—————————————————————————————————————————————————————

MEETING #15 Thursday 6 or Friday 7 March 2025
Speaker and Title TBC

—————————————————————————————————————————————————————

MEETING #16 Friday 2 May 2025, 2-3 pm
Topic: LLMs and Lexical Priming Theory
Michael Pace-Sigge (University of Eastern Finland)
Large-Language-Model Tools and the Theory of Lexical Priming: Where technology and human cognition meet and diverge

Abstract

This paper revisits Michael Hoey’s Lexical Priming Theory (2005) in the light of recent discussions of Large Language Models as forms of machine learning (commonly referred to as AI), which have been the centre of a lot of publicity in the wake of tools like OpenAI’s ChatGPT or Google’s BARD/Gemini. Historically, theories of language have faced inherent difficulties, given language’s exclusive use by humans and the complexities involved in studying language acquisition and processing. The intersection between Hoey’s theory and Machine Learning tools, particularly those employing Large Language Models (LLMs), has been highlighted by several researchers. Hoey’s theory relies on the psychological concept of priming, aligning with approaches dating back to Ross M. Quillian’s 1960s proposal for a “Teachable Language Comprehender.” The theory posits that every word is primed for discourse based on cumulative effects, a concept mirrored in how LLMs are trained on vast corpora of text data.

This paper tests LLM-produced samples against naturally (human-)produced material in the light of a number of language usage situations, investigates results from A.I. research and compares the results with how Hoey describes his theory. While LLMs can display a high degree of structural integrity and coherence, they still appear to fall short of meeting human-language criteria which include grounding and the objective to meet a communicative need.

References

Hoey, M. (2005). Lexical Priming. London: Routledge.

Hoey, M. (2009). Corpus-driven approaches to grammar. In: Römer, U. & Schulze, R: Exploring the lexis-grammar interface. Amsterdam/Philadelphia: John Benjamins.pp. 33-47.

Pace-Sigge, M. & Sumakul, T. (2022). What Teaching an Algorithm Teaches When Teaching Students How to Write Academic Texts. In Jantunen, Jarmo Harri, et al. Diversity of Methods and Materials in Digital Human Sciences. Proceedings of the Digital Research Data and Human Sciences DRDHum Conference 2022.

Quillian, R. M.  (1967). Word concepts: A theory and simulation of some basic semantic capabilities. Behavioural Science, 12(5), 410-430.  https://doi.org/10.1002/bs.3830120511

Tools

Brezina, V. & Platt, W. (2023) #LancsBox X, Lancaster University, http://lancsbox.lancs.ac.uk.

Google [2023] (2024). BARD/Gemini. https://BARD.google.com/chat

OpenAI. [2022] (2024) ChatGPT.(GPT 3.5)  https://chat.openai.com/

Scott, M. (2023). WordSmith Tools version 8, Stroud: Lexical Analysis Software.