focal.ie logainm.ie ainm.ie ceardlann/workshop nuachtlitir/newsletter · FIONTAR

Link between focal.ie and the New Corpus for Ireland

Each time you see this icon next to an Irish term on focal.ie, you can click the icon to search for usage examples of that term in the New Corpus for Ireland.

An ceangal idir focal.ie agus Nua-Chorpas na hÉireann

Gach uair a fheiceann tú an deilbhín seo in aice le téarma Gaeilge ar focal.ie, is féidir leat cliceáil ar an deilbhín chun samplaí úsáide an téarma sin a chuardach i Nua-Chorpas na hÉireann.

What is the New Corpus for Ireland?

The New Corpus for Ireland is a large collection of texts in Irish with approximately 30 million words. It contains a wide range of texts including works of fiction, informative texts, news reports, official documents and much more. The corpus is designed to be used for linguistic research – for example, to find examples of words being used in context or to investigate word frequency.

The New Corpus for Ireland was developed by Foras na Gaeilge as part of the New English-Irish Dictionary project. In addition, the corpus is available to the public on a dedicated website, corpas.focloir.ie. The website was built by Lexical Computing Limited with the collaboration of Fiontar, DCU. The corpus website is interlinked with the focal.ie terminology database so that usage examples for Irish terms can be easily searched in the corpus by clicking an icons next to the term. (This only applies to Irish terms. Irish terms that do not have an icon are not available in the corpus.)

Cad é Nua-Chorpas na hÉireann?

Is bailiúchán mór téacsanna Gaeilge é Nua-Chorpas na hÉireann a bhfuil timpeall is 30 milliún focal ann. Tá réimse leathan téacsanna ar fáil sa chorpas: saothair ficsin, téacsanna faisnéise, tuairiscí nuachta, cáipéisí oifigiúla agus eile. Tá an corpas cóirithe sa chaoi gur féidir taighde teangeolaíoch a dhéanamh air – mar shampla, samplaí a aimsiú d'fhocail faoi leith agus iad á n-úsáid i gcomhthéacs, nó minicíocht focal a fhiosrú.

Foras na Gaeilge a thionscain Nua-Chorpas na hÉireann mar chuid de thionscadal an Fhoclóra Nua Béarla-Gaeilge. Sa bhreis air sin, tá an corpas ar fáil don phobal ar shuíomh faoi leith, corpas.focloir.ie. An comhlacht Lexical Computing Limited a chruthaigh an suíomh sin i gcomhar le Fiontar, DCU. Tá suíomh an chorpais nasctha leis an mbunachar téarmaíochta focal.ie sa tslí gur féidir samplaí úsáide na dtéarmaí Gaeilge a lorg sa chorpas go saoráideach ach cliceáil ar dheilbhín in aice leis na téarmaí. (Téarmaí Gaeilge amháin atá i gceist. Téarmaí nach bhfuil deilbhín in aice leo, níl siad le fáil sa chorpas.)

Access to the New Corpus for Ireland

When you first click the icon, you will see a pop-up window asking for a name and password. This is because it is necessary to register before you can access the corpus. Text sample that is included in a corpus does not lose any of the legal protection offered by copyright law. Because of this, registration is necessary to access the New Corpus for Ireland. If you are not registered yet, cancel the pop-up window and you will get instructions on how to register. Registration involves filling in a form online and then waiting for an e-mail confirmation from Foras na Gaeilge. The confirmation should normally arrive within one working day. You will be able to use the corpus after that.

If you are already registered, type your name and password in the pop-up window. This only needs to be done once; the website will not ask you for this again if you are already logged in.

Rochtain ar Nua-Chorpas na hÉireann

An chéad uair a chliceálann tú ar an deilbhín, feicfidh tú preabfhuinneog ag iarraidh ainm agus pasfhocal ort. Tá sé sin amhlaidh de bharr go gcaithfear clárú roimh ré chun an corpas a úsáid. Nuair a chuimsítear téacs mar chuid de chorpas ní chailleann an téacs aon chuid den chosaint dhlíthiúil atá aige faoi dhlí an chóipchirt. Is mar gheall air seo a chaithfear clárú chun rochtain a fháil ar Nua-Chorpas na hÉireann. Mura bhfuil tú cláraithe cheana, cuir an phreabfhuinneog ar ceal agus gheobhaidh tú treoracha maidir leis an modh cláraithe. Níl le déanamh ach foirm a líonadh ar líne agus fanacht go bhfaighidh tú deimhniú ar an ríomhphost ó Fhoras na Gaeilge. De ghnáth, ba cheart go bhfaighfeá an deimhniú sin taobh istigh de lá oibre. Beidh tú ábalta an corpas a úsáid ansin.

Má tá tú cláraithe cheana, scríobh d’ainm agus do phasfhocal sa phreabfhuinneog sin. Ní gá é seo a dhéanamh ach uair amháin; ní iarrfaidh an suíomh arís ort é má tá tú logáilte isteach cheana.

How to use the corpus

When you have come to the corpus from focal.ie, you will get a page with three kinds of results:

  1. Concordance: You will see a list of examples in which the word or term you searched for is used in a sentence. These examples were selected from the corpus automatically. The home page lists approximately ten examples and you can get more by clicking the “more” link.
  2. Collocations: The box on the right-hand side gives the ten most frequent words that co occur with the word you searched for. For example, if you search for doras ‘door’, you will get words such as oscail ‘open’, dúnta ‘closed’, plab ‘slam’ and others. Once again, you can see more of these words by clicking the “more” link. This list of collocates has been extracted from the corpus automatically.
  3. Statistics: At the bottom of the page, you will see some statistical data about how the word you searched for is used, such as genre and dialect. For example if you search for fata – one of the words for ‘potato’ – you will see that this word is used almost exclusively in the Connacht dialect. Once again, these statistics have been extracted from the corpus automatically. You can see more statistics by clicking the “more” link.

If you want, you can use the corpus independently of focal.ie by going to corpas.focloir.ie, logging in and typing a word or term in the search box on the home page.

Conas an corpas a úsáid

Tar éis duit teacht chuig an gcorpas ó focal.ie, gheobhaidh tú leathanach agus trí chineál sonraí ann:

  1. Comhchordacht: Feicfidh tú liosta samplaí den fhocal nó den téarma á úsáid in abairtí a roghnaíodh go huathoibríoch as an gcorpas. Beidh timpeall is deich sampla á léiriú ar an leathanach seo agus is féidir teacht ar a thuilleadh díobh ach cliceáil ar an nasc “tuilleadh”.
  2. Comhlogaíochtaí: Sa bhosca ar dheis, feicfidh tú liosta de na deich bhfocal is minice a úsáidtear in éineacht leis an bhfocal a chuardaigh tú. Mar shampla, má chuardaíonn tú doras, gheobhaidh tú focail ar nós oscail, dúnta, plab agus eile. Arís, is féidir tuilleadh díobh a fháil ach cliceáil ar an nasc “tuilleadh”. Tá liosta na gcomhlogaíochtaí seo bunaithe ar eolas a baineadh as an gcorpas go huathoibríoch.
  3. Staitisticí: Ag bun an leathanaigh, gheobhaidh tú eolas staitistiúil i dtaobh úsáid an fhocail de réir critéar áirithe, mar shampla de réir seánra nó canúna. Mar shampla, má chuardaíonn tú fata, feicfidh tú gur i gConnachta is coitianta é. Arís, tá na staitisticí seo bunaithe ar eolas a baineadh as an gcorpas go huathoibríoch. Is féidir teacht ar a thuilleadh staitisticí ach cliceáil ar an nasc “tuilleadh”.

Ina theannta sin, is féidir an corpas a úsáid go neamhspleách ar focal.ie ach dul chuig corpas.focloir.ie, logáil isteach agus focal nó téarma a scríobh isteach sa bhosca cuardaigh ar an leathanach baile.

Advanced searches

If you want to perform more complicated searches on the corpus, you can use the options in the menu on the left-hand side. Here is a summary of the options available.

Concordance: This is where you can search for and list sentences from the corpus based on the words that occur within them. This search is more powerful than the one on the home page; for example, if you select “lemma” in the drop-down box, you can search for all forms of a word: type fuinneog ‘window’ and you will get sentences where any inflected or mutated form of the word occurs: fuinneoige, bhfuinneog and so on. You can also sort and filter the results in several ways.

Word List: This is where you can extract various word lists from the corpus, such as a list of the most frequently occurring words in Irish.

Word Sketch: This section gives you an opportunity to see which words are most frequently used along with the word you are looking for. The results are presented in several lists according to the grammatical relation that exists between the two words. For example, if you search for a verb, you will get one list of its direct objects, another list of its subjects, and so on. Remember that this information was extracted from the corpus automatically and so it may not always be accurate.

Thesaurus: This section allows you to type a word and get a list of other words that are similar to it with respect to their patterns of usage. For example if you search for the adjective folláin ‘healthy, wholesome’, you will get a list that includes sláintiúil ‘healthy’, sábháilte ‘safe’ and others. This is basically a list of words that seem like synonyms because they are used in similar ways. But again, remember that this information was extracted from the corpus automatically and the words you receive may not necessarily be synonyms.

Sketch-Diff: This is a tool for investigating the difference between two words, based on other words that occur with them. If you type two words which are close to each other in meaning, for example leanbh ‘baby’ and páiste ‘child’, you will get information that may help you understand the difference between them: you will see that the words used mainly with leanbh ‘baby’ include saolaigh ‘give birth’ and baist ‘baptize’ while the words used mainly with páiste ‘child’ include múin ‘teach’ and foghlaim ‘learn’.

Cuardaigh níos casta

Más maith leat cuardaigh níos casta a dhéanamh sa chorpas ná mar is féidir ar an leathanach baile, is féidir leas a bhaint as na roghanna sa roghchlár ar chlé. Seo achoimre ar na roghanna atá ar fáil ann.

Comhchordacht: Anseo, is féidir leat abairtí a aimsiú agus a liostú as an gcorpas bunaithe ar na focail a thagann chun cinn iontu. Tá an cuardach seo níos cumhachtaí ná an ceann atá ar fáil ar an leathanach baile; mar shampla, má roghnaíonn tú “leama” sa bhosca aníos, beidh tú ábalta cuardach a dhéanamh beag beann ar fhoirm an fhocail: clóscríobh fuinneog agus gheobhaidh tú abairtí ina bhfuil foirm infhillte nó chlaochlaithe ar bith den fhocal: fuinneoige, bhfuinneog agus mar sin de. Sa bhreis air sin, is féidir an liosta abairtí a shórtáil agus a scagadh de réir critéar éagsúil.

Liosta Focal: Anseo, is féidir liostaí éagsúla focal a bhaint as an gcorpas, mar shampla liosta na bhfocal is coitianta a thagann chun cinn sa Ghaeilge, agus mar sin de.

Achoimre Focal: Rannóg é seo a thugann deis duit féachaint ar na focail eile a úsáidtear go minic in éineacht leis an bhfocal a chuardaigh tú. Cuirtear na torthaí i láthair i liostaí de réir an choibhnis ghramadaí atá idir an dá fhocal. Mar shampla, má chuardaíonn tú briathar, gheobhaidh tú liosta amháin de na cuspóirí is minice a ghabhann leis an mbriathar, liosta eile de na hainmnithe is minice a ghabhann leis, agus mar sin de. Cuimhnigh go mbaintear an t-eolas seo as an gcorpas go huathoibríoch agus, dá bhrí sin, ní gá go mbeidh sé go hiomlán cruinn i gcónaí.

Teasáras: Is féidir focal a chlóscríobh anseo agus liosta a fháil de na focail eile atá cosúil leis ó thaobh a bpatrún úsáide. Mar shampla, má chuardaíonn tú an aidiacht folláin, gheobhaidh tú roinnt aidiachtaí eile ar nós sláintiúil agus sábháilte. Go bunúsach, is liosta é seo d’fhocail ar dócha gur comhchiallaigh iad de bharr go n-úsáidtear ar bhealach cosúil iad. Ach arís, cuimhnigh gur baineadh an t-eolas as an gcorpas go huathoibríoch agus, dá bhrí sin, ní gá go mbeidh na focail seo ina gcomhchiallaigh i gcónaí.

Difríocht: Is áis í seo chun an difríocht idir dhá fhocal a fhiosrú, bunaithe ar na focail a bhíonn in aice leo. Má scríobhann tú dhá fhocal atá gar dá chéile ó thaobh brí de, abair leanbh agus páiste, gheobhaidh tú eolas a chuideoidh leat an difríocht eatarthu a thuiscint: feicfidh tú gur focail ar nós saolaigh agus baist a úsáidtear de ghnáth le leanbh, agus gur focail ar nós múin agus foghlaim a úsáidtear de ghnáth le páiste.

More information

The corpus website is based on a corpus query system called Sketch Engine created by Lexical Computing Limited. If you need more information on using all the features available on the website, a detailed guide is available in Sketch Engine’s own help section.

Breis eolais

Tá suíomh an chorpais bunaithe ar Sketch Engine, bogearra cuardaigh corpais de chuid Lexical Computing Limited. Má tá breis eolais uait maidir leis na háiseanna atá ar fáil ar an suíomh, tá treoir chuimsitheach le fáil i gcomhaid chabhrach Sketch Engine féin.

Sonraí téarmaíochta: © 2006 Foras na Gaeilge
Réiteach teicniúil: © 2006 Fiontar