Research
Research Areas
Computational Linguistics
Current Projects
DECCA: Detection of Errors and Correct in Corpus Annotation
The success of data-driven approaches and stochastic modeling in computational linguistic research and applications is rooted in the availability of electronic natural language corpora. Despite the central role that annotated corpora play for computational linguistic research and applications, the question of how errors in the annotation of corpora can be detected and corrected has received only little attention. The project is designed to address this important gap by exploring an error detection and correction method that is applicable to a wide range of corpus annotations.
- Project Web page (publications, code): http://decca.osu.edu
- People involved: Markus Dickinson, Adriane Boyd
- Funded by: National Science Foundation IIS program
- Duration: April 2006-June 2007
Broad coverage dictionaries and ontologies for natural language processing (NLP) are difficult and costly to create and maintain by hand. It is therefore desirable to learn them from distributional information, such as can be obtained from unlabeled or sparsely labeled text corpora. Many linguistic and psycholinguistic theories are distributional, but emphasize local neighborhood structure more than do previous NLP approaches. Successful visualization techniques such as keyword-in-context also rely on the preservation of neighborhood structure. A similar emphasis is present in emerging techniques for data reduction, such as LLE and min-cut algorithms, whose application to language data the project is investigating.
While the immediate goal of the project is to gain a better understanding of lexical tuning and acquisition, the resulting dictionaries, ontologies and mapping techniques have the potential to help information professionals (such as librarians, translators, patent examiners and paralegal researchers) to navigate through corpora, to understand the significance of the data that they see, and to incorporate insights derived from the data into their working practice.
We are integrating computational linguistics into the undergraduate curriculum of the Department of Linguistics, creating new courses designed primarily to appeal to students majoring in the humanities, and to offer such students fresh options in meeting the scientific, mathematical and quantitative components of the university's breadth requirement.
- Funded by: National Science Foundation HLT program
- Duration: Feb 2004-Jan 2009
- People involved: Chris Brew, Kirk Baker, Jianguo Li and Anna Feldman (with considerable gratititude to Jiri Hana)
-
Publications:
- Jiri Hana,Anna Feldman and Chris Brew. A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources. Empirical Methods in Natural Language Processing (EMNLP). July 2004. Barcelona. Spain.
- Jiri Hana and Anna Feldman Portable Language Technology: Russian via Czech. Midwest Computational Linguistics Colloquium. June 2004. Bloomington. Indiana.
- Jiri Hana, Anna Feldman and Chris Brew. Tagging Russian using Czech resources. Invited talk. Cognitive Science Colloquium. May 14, 2004.
- Jianguo Li and Chris Brew: Automatic extraction of subcategorisation frames from spoken corpora, to be presented at Verb Workshop 2005, Saarland, February 28 - March 1, 2005
- Kirk Baker: Regular and irregular pseudoverb classification using XMOD, presented at MCWOP-10
- Arts and Humanities Grant for Innovation
- Funded by: Ohio State University, Office of Research
- People involved: Glaucia Silva, Luiz Alexandre Amaral
- Duration: March 2005 to September 2007
- Project page: http://tagarela.osu.edu
- Research Network with partners in 5 universities
- Funded by: German Research Foundation (DFG)
- People involved at OSU: Kordula De Kuthy
- Duration: January 2005 to December 2006
-
Related Publications:
- Kordula De Kuthy & Detmar Meurers (2003): "The secret life of focus exponents, and what it tells us about fronted verbal projections". In Proceedings of the Tenth Int. Conference on HPSG. Stanford: CSLI Publications.
- Kordula De Kuthy & Detmar Meurers: "Dealing with Optional Complements in HPSG-Based Grammar Implementations". In Proceedings of the Tenth Int. Conference on HPSG. Stanford: CSLI Publications.
- Robert D. Levine & Detmar Meurers (2006): "Head-Driven Phrase Structure Grammar: Linguistic Approach, Formal Foundations, and Computational Realization". Keith Brown (Ed.): Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier.
- Robert D. Levine & Detmar Meurers (2006): "Declarative Models of Syntax". Keith Brown (Ed.): Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier.
- Detmar Meurers, Kordula De Kuthy & Vanessa Metcalf (2003). "Modularity of grammatical constraints in HPSG-based grammar implementations". Proceedings of the ESSLLI 2003 Workshop "Ideas and strategies for multilingual grammar Engineering". Vienna, Austria.
Completed projects
MiLCA: Media-intensive teaching modules in the computational linguistics curriculum
- People involved: Mike Daniels, Kordula De Kuthy, Vanessa Metcalf, in cooperation with Erhard Hinrichs and his colleagues at the Seminar für Sprachwissenschaft of the Universität Tübingen and Gerald Penn at the Department of Computer Science of the University of Toronto.
- Part of the MiLCA Consortium, Module A4 "Grammar Formalisms and Parsing".
- Funded by: German Federal Ministry of Education and Research.
- Duration: July 2001 to December 2003
-
Related Publications:
- Detmar Meurers, Kordula De Kuthy & Vanessa Metcalf: "Modularity of grammatical constraints in HPSG-based grammar implementations". Proceedings of the ESSLLI 2003 Workshop "Ideas and strategies for multilingual grammar Engineering". Vienna, Austria.
- Mike Daniels and Detmar Meurers: "Improving the Efficiency of Parsing with Discontinuous Constituents". In Shuly Wintner (ed.): Proceedings of NLULP'02: The 7th International Workshop on Natural Language Understanding and Logic Programming.
- W. Detmar Meurers, Gerald Penn & Frank Richter: "A Web-Based Instructional Platform for Constraint-Based Grammar Formalisms and Parsing" In Dragomir Radev and Chris Brew (eds): Proceedings of the Workshop "Effective Tools and Methodologies for Teaching NLP and CL" held at the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002).
- Frank Richter, Ekaterina Ovchinnikova, Beata Trawinski, Detmar Meurers: "Interactive Graphical Software for Teaching the Formal Foundations of Head-Driven Phrase Structure Grammar" In Gerhard Jäger, Paola Monachesi, Gerald Penn and Shuly Wintner (eds): Proceedings of Formal Grammar 2002. July 2002. pp. 137-148
- People involved: Markus Dickinson.
- Funded by: OSU College of Humanities Seed Grant
- Duration: 2002
-
Related Publications:
- Detmar Meurers: On the use of electronic corpora for theoretical linguistics. Lingua.
- Markus Dickinson and Detmar Meurers: Detecting Errors in Part-of-Speech Annotation. Proceedings of EACL.
- People involved: Vanessa Metcalf
- Funded by: Faculty Innovator Grant awarded by the Instructional Technologies Advisory Committee of Technology Enhanced Learning and Research (TELR)
- Duration: 2002
-
Related Publication:
- The ConTroll System as Large Grammar Development Platform
- People involved: Chris Brew, Martin Jansche, and Pauline Welby
- Collaboration with G-Data Software GmbH
- People involved: Paul Davis and Chris Brew
- Funded by: Motorola
-
Related Publications:
- Stone Soup Translation: The Linked Automata Model (dissertation)
- Stone Soup Translation (shorter paper, from TMI 2002)
- People involved: Bob Kasper, Craige Roberts, and Paul Davis
- Funded by Motorola
- Duration: 1998-2000
-
Related Publications:
- An Integrated Approach to Reference and Presupposition Resolution
- Presupposition Resolution with Discourse Information Structures
- People involved: Bob Kasper, Don Sylvan (Political Science), Rick Herrmann (Political Science), Paul Davis, and Jon Pevehouse (Political Science)
- Collaboration with the Mershon Center and the Center for Cognitive Science
- Duration: 1998
- Project Description
- People involved: Bob Kasper, Mike Calcagno, and Paul Davis
- Duration: 1997
- Project Description
-
Related Publication:
- Know When to Hold 'Em: Shuffling Deterministically in a Parser for Nonconcatenative Grammar

