Could LLM AI Technology Be Leveraged in Corpus Linguistic Analysis?

In our previous four posts we’ve argued that LLM AIs should not be in the driver’s seat of ordinary meaning inquiries. In so stating, we don’t deny that AI tools have certain advantages over most current corpus tools: Their front-end interface is more intuitive to use and they can process data faster than human coders.

These are two-edged swords for reasons we discussed yesterday. Without further refinements, the user-friendliness of the interface and speed of the outputs could cut against the utility of LLM AIs in the empirical inquiry into ordinary meaning—by luring the user into thinking that a sensible-sounding answer generated by an accessible, tech-driven tool must be rooted in empiricism.

That said, we see two means of leveraging LLM AIs’ advantages while minimizing these risks. One is for linguists to learn from the AI world and leverage the above advantages into the tools of corpus linguistics. Another is for LLM AIs to learn from corpus linguists by building tools that open the door to truly empirical analysis of ordinary language.

Corpus linguistics could take a page from the LLM AI playbook

Corpus linguists could learn from the chatbot interface. The front-end interface of widely used corpora bears a number of limitations—including the non-intuitive nature of the interface, especially for non-linguists. The software requires users to implement search methods and terms that are not always intuitive or natural—drop-down buttons requiring a technical understanding of the operation of the interface and terminology like collocate, KWIC, and association measure. In sharp contrast, AIs like ChatGPT produce a response to a simple query written in conversational English.

Maybe chatbot technology could be incorporated into corpus software—allowing the use of conversational language in place of buttons and dropdown menus. A step in that direction has been taken in at least one widely used corpus software tool that now allows users to prompt ChatGPT (or another LLM) to perform post-processing on corpus results.

This is a step in an interesting direction. But there are at least four barriers to the use of this tool in empirical textualism. The user has no way to know (1) what language to use in order to prompt the chatbot to carry out an analysis of interest, (2) how the chatbot operationalized the constructs mentioned in the query, (3) what methods the chatbot used  to process the concordance lines and determine a result, or (4) whether the same query will produce the same results in the future. For these reasons, we believe this approach has too much AI and not enough corpus linguistics. But we are intrigued by the attempt to make corpus linguistics more accessible and user-friendly.

We anticipate a middle ground between existing corpus interfaces, which can be technical and unintuitive, and the highly user-friendly chatbots, which lack transparency and replicability. We imagine a future in which users can import their own corpus and type queries into a chatbot interface. Instead of immediately delivering a result based on black-box operationalizations and methods, the chatbot might reply with clarification questions to confirm exactly what the user wants to search for. Once the user can be sure that the chatbot is performing the desired search query, the chatbot could produce results, along with a detailed description of the exact operational definitions and methods that were used, allowing the user to transparently report the methods and results. As a final step, the chatbot might allow users to save the search settings in a manner allowing researchers to confirm that the same search in the same corpus will generate the same results.

This type of tool would rely on best practices in the field of corpus linguistics while allowing users to interact with the tool in a conversational way to gain access to those analyses without having extensive training in corpus linguistics methods.

LLM AIs could take a page from the corpus linguistics playbook

We can imagine a future where AIs could allow users to search for and receive empirical data on ordinary language usage—not in outsourcing the ultimate question of ordinary meaning to the AI (as in Snell and DeLeon), but in a manner preserving transparency and falsifiability of the corpus inquiry while making the processes faster, larger-scale, and more accessible to non-linguists.

It’s plausible that an AI could be trained to apply a coding framework (developed by humans) to the results of a corpus linguistics search—analyzing terms as they appear in the concordance lines to determine whether and to what extent they are used in a certain way. Human intervention would be necessary to check for accuracy. But the process could be streamlined in a manner aimed at increasing speed and accessibility.

To use our landscaping example, researchers could train a chatbot to apply the framework we developed for the study in our draft article for coding each instance of “landscaping” on whether the language was used to refer to botanical elements, non-botanical elements, or both. Again the chatbot’s performance on a sample could be evaluated for accuracy against the standard set by human coders who applied the framework to the same sample. The coding framework and prompting language could then be refined with the goal of improving the accuracy of the AI. If the AI never achieves satisfactory levels of accuracy then it would be abandoned and researchers would revert back to human coding.

Drawing the line

Some researchers may be tempted to propose a third step in which they ask the AI to analyze the quantitative results of the coding and report whether the ordinary meaning of “landscaping” includes non-botanical elements. For us, this is a step too far in the direction of Snell-like AI outsourcing—a step toward robo-judging. It would violate our principles of transparency, replicability, and empiricism. And it would outsource crucial decisions about what ordinary meaning is, how much evidence is enough to decide that non-botanical elements are included, and how the data should be used and weighted as part of answering the larger question about meaning. In short, it would outsource judging.

Judges don’t need to be told the ordinary meaning of a word or phrase—by a human or a computer. They need empirical evidence of how words and phrases are commonly used so they can discern the ordinary meaning of the law by means that are transparent and empirical.

Corpus tools can do that. LLM AIs, as currently constituted, cannot. But we look forward to a future in which the strengths of both sets of tools can be leveraged in a single inquiry that is simple, accessible, and transparent and that produces falsifiable evidence of ordinary meaning.

The post Could LLM AI Technology Be Leveraged in Corpus Linguistic Analysis? appeared first on Reason.com.