Weblog Alex Reuneker

Bug fixes and new feature n-gram generator

— Posted in Taal by

Unfortunately, due to work on large-file loading, some bugs slipped in, causing the n-gram generator to present incorrect results. Luckily, one of the users attended me to this problem, and the last few days I have fixed a number of related bugs. Atop that, I have implemented a number of checks to prevent really incorrect results in the future.

Finally, I have added n option to remove possessive 's, so now you can choose whether you’d like ‘Harry’s’ to be counted as ‘Harrys’ or ‘Harry’. Some general statistics (word totals, TTR) were added to.

To try the new version, head over to https://www.reuneker.nl/files/ngram.

Digital Humanities Small Grant 2023-2024

— Posted in Taal by

Recently, I was awarded the Digital Humanities Small Grant 2023-2024 by the Leiden University Centre for Digital Humanities. This grant enables me to appoint two student-assistants to participate in the project, as described in the excerpt from the grant proposal below.

In this interdisciplinary project, combining the disciplines of Dutch Linguistics and Digital Humanities, two student assistants will search, index and read available literature on Dutch verb spelling, and they will use and evaluate methodologies from the domain of Digital Humanities to explore a dataset of 6 million verb-spelling answers collected by the first supervisor through the non-profit website Gespeld.nl since 2013.

I’m grateful to Digital Humanities, look forward to working together with two students on the project and by doing so, I hope to enhance our knowledge of spelling difficulties in Dutch verb spelling using data-driven techniques and big-data statistics on data from Gespeld.

Updates for the N-gram generator

— Posted in Taal by

Once in a while I receive emails from researchers all over the world with thanks and/or suggestions for the scripts I provide online, such as frequency list and n-grams generators. For this latter tool, I had a nice email conversation with a researcher from overseas, which led to the following enhancements and updates. I really enjoy these kinds of things, so if you have any suggestions or feedback – you know where to find me.

  • Slight efficiency rewrite of output rendering. (2024-01-26)
  • Added feature for respecting or ignoring sentence boundaries. (2024-01-25)
  • Added feature for including or excluding numbers. (2024-01-25)
  • Added top limits above 1.000 (2.000, 3.000, 4.000, 5.000, 10.000) to respect or ignore sentence boundaries. (2024-01-25)
  • Added feature for (virtually) unlimited results. (2024-01-22)
  • Added feature for unigrams. (2024-01-22)

Lezing Engelse leenwerkwoorden VIOT 2024

— Posted in Taal by

Eind januari 2024 presenteer ik op VIOT 2024 (Vereniging Interuniversitair Overleg Taalbeheersing) aan de Universiteit Twente. De lezing gaat over de spelling van (voornamelijk Engelse) leenwerkwoorden, zoals updaten en netflixen. In de lezing presenteer ik kwantitatieve analyses waaruit blijkt dat leenwerkwoorden significant vaker incorrect worden vervoegd dan niet-leenwerkwoorden. Daarnaast tonen de resultaten aan dat een beperkt aantal typen werkwoorden gebaseerd op de uitgang van de stam, het grootste deel van de fouten veroorzaakt.

Voor meer informatie, zie het abstract en VIOT 2024.

Review of 'Connecting Conditionals'

— Posted in Taal by

The latest issue of Nederlandse Taalkunde (Dutch Linguistics) includes a review of my dissertation 'Connecting Conditionals', written by Timothy Colleman. If you'd like an overview of what my thesis is about, and a critical appreciation of it, head on over to AUP.

enter image description here

Review of 'Connecting Conditionals'

Pagina 5 of 39