To enable researchers to search for specific patterns across collections of data, CLARIN offers a search engine that connects to the local text collections that are available in the centres. The data itself stays at the centre where it is hosted – which is why the underlying technique is called federated content search.
The search engine summarises and displays what is available, no login is required. An easy next step is to go to the centre's specialised search interface to perform a more sophisticated query.
The technology behind this federated content search is SRU/CQL and a CLARIN-specific extension to this protocol.
Federated Content Search vs. Metadata Search
The federated content search approach differs from the metadata search, for example, as performed in the Virtual Language Observatory, where all metadata is first harvested (copied to a single server) and then centrally indexed. This is for several reasons:
- Legal issues make it impossible for some resources to be copied to another location
- The size of many datasets makes decentralised indexing the most viable option
- Most language resources are annotated in a collection-specific manner, which makes it hard to use or develop one single search engine that can cope with all of them.
Although more scaleable, federated content search comes at the cost of being less powerful than a local search and certain features are absent, such as ranking.
Federated content search is therefore particularly useful as a first step to discover where interesting language resources are hosted and at which centre(s) a more specialised search could be useful.
Learn More
- Tutorial: How to use the Content Search
- Use Case: The German research data consortium, Text+, extended the CLARIN-FCS specification to allow the search and retrieval of lexical resources, such as dictionaries, encyclopedias, normative data, and terminological databases.
-
Eckart, Herold, Körner, Wiegand (2023): A Federated Search and Retrieval Platform for Lexical Resources in Text+ and CLARIN (eLex 2023, Brno, PDF, video)
-
Körner and Kretschmer (2024): LexFCS - Extending the Federated Content Search for Lexical Resources (Bazaar@CLARIN2024, Barcelona, PDF)
-
For more technical details, see the For Infrastructure Developers section.
-