Basque » euskara.dic

Offline

26.07.2012 02:31

euskara.dic

euskara.dic !

In this thread we can discuss our future Basque word list, i.e. our Scrabble3D euskara.dic !

Welcome!

• I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble

Offline

26.07.2012 12:12

#2 RE: euskara.dic

Euskarbel

Thanks a lot, Joan!

#3 RE: euskara.dic

Heh, nice to see all those digraphs :)

Where's the dictionary coming from? Are we using a GNU spellchecker?

Offline

02.08.2012 00:24

#4 RE: euskara.dic

We have no basque dictionary.

Our Catalan OpenSource friend jmontane has only posted the letter set so far.
But he has no dictionary either.

Could you maybe help us with a basque.dic, akerbeltzalba?

#5 RE: euskara.dic

Hi akerbeltzalba, and all

I'm busy these days, I planned submit a word list based on LibreOffice spellchecker (XUXEN dictionary).

What do you think?

Offline

02.08.2012 12:20

#6 RE: euskara.dic

As we (we, that means Linhart ) are working for example on a Latin dictionary just now, we know that such lists like spell checker lists contain proper names and other words that are not valid in Scrabble. Often there are grammatically wrong forms, like for example passive forms of intransitive verbs. If such a list is used only for spell checking purposes, this is not really a problem, but in Scrabble games - and especially in Scrabble3D - it would be a great problem, if playing against the computer he will always place a lot of wrong, not valid words.

Of course, it is a good start to begin with a GNU spell checker list, but if we want to have a dictionary of good quality, that spell checker list ought to be totally revised and adapted for Scrabble3D purposes, which might take years of very very accurate work. Gero has worked several years on our deutsch.dic as well, because in special cases the grammar is not as easy as it seems to be. And I know which huge work on our future latin.dic Linhart is doing just now. So we really know what we are talking about...

Of course, a Basque spell checker list is better than no list at all...

But, jmontane, you wrote: "a word list based on LibreOffice spellchecker (XUXEN dictionary)". So what do you mean exactly by saying "based"? Have you already revised/adapted that list for Scrabble purposes?

#7 RE: euskara.dic

Zitat von Bussinchen im Beitrag #6
But, jmontane, you wrote: "a word list based on LibreOffice spellchecker (XUXEN dictionary)". So what do you mean exactly by saying "based"? Have you already revised/adapted that list for Scrabble purposes?

I know the differences between spell-checker word list and Scrabble word list, :)

LibreOffice/OpenOffice/hunspell/myspell dicts use two files. eu_ES.dic and eu_ES.aff (for Basque language). The first one (.dic) is a list of entries, followed by affixes to apply in each entry. The second one (.aff) keep the affixes (preffixes and suffixes to aply at the entries).

So, some (an easy) steps to adapt a Basque spell-cheker dict to Scrabble purposes are:

1st: remove entries from eu_ES.dic
A.- Words starting with starting uppercase (propper nouns, trademarks, ...)
B.- Words with hyphens (compound words)
C.- Words with characters not present in Scrabble tile distribution (ñ)
D.- Words finished with dot (.) (abbreviations)
E.- Symbols: m kg cm mm ...

2nd: remove affixes that generate non desired words forms.

3st: genereate all inflected forms, using unmunch command from hunspell

That's the general idea.

Some remarks,

About 1st step (A,B,C,D) it's easy to do. 1-E is a little hard, but usually are words of 2 a 3 chars.
About 2nd step, I don't know nothing about Basque morphology, so any advice?
About 3rd step, Basque is a high inflection language. My first attempt to generate all inflected words fails, :(. I have found an alternative to unmunch more stronger to very large word lists.

Offline

02.08.2012 18:46

#8 RE: euskara.dic

Linhart has worked on such lists before, for example when he created our persian.dic. Since he already has that experience, I think he can give us good advice here.

Akerbeltzalba ( ) knows Basque (!), he speaks Basque (!) --> http://en.wikipedia.org/wiki/User:Akerbeltz. So I think that he is very competent to give us some advice here as well!

Scotty

Offline

Administrator

3.793

02.08.2012 19:04

#9 RE: euskara.dic

Zitat von Bussinchen im Beitrag #8
Akerbeltzalba speaks Basque

Unbelievable! But I'd be more interested in his experiences in Chinese.

Download: Sourceforge.net | Help:Scrabble3D Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook

Offline

02.08.2012 19:40

#10 RE: euskara.dic

We have gathered very very very high competences here in our beloved Scrabble3D project indeed!!!

You all are really great!!!

#11 RE: euskara.dic

Bussinchen writes that I have experience with spellchecker lists. This is true, but unfortunately I will not be able to give you any advice how to handle such lists with the hunspell program since I wrote my own program for this purpose, and this is only suited for the Latin word list.

The Persian word list has a different structure, it does not consist of two lists (.aff and .dic), but only of one.

Sorry!

#12 RE: euskara.dic

It depends very much on the language I'd say. Gaelic and Irish were relatively easy to build on the back of the spellchecker files. I haven't seen the Xuxen files, perhaps jmontane can post a sample of the words he has generated? On the whole I'm slightly concerned about autogenerating anything in Basque because of it being a polysynthetic language but on the other hand, most software the Basques have produced is good so it could work. I know what the Basque Regional Government had its hand in the development of Xuxen anyway, so a guarded "maybe" at this stage :)

Zitat von Scotty
Unbelievable! But I'd be more interested in his experiences in Chinese

As far as Scrabble goes, no chance. You would need a "letterset" that's about 6,000 "units" big. Not really feasible I'm afraid.

Offline

06.08.2012 14:54

#13 RE: euskara.dic

Adminchen:

Please let's continue our discussion about Scrabble3D games in Chinese language in this thread:

Chinese Scrabble3D

Welcome!

#14 RE: euskara.dic

Ok I grabbed the Xuxen oxt file from here http://extensions.services.openoffice.org/en/project/xuxen and looked at the .dic and .aff files (you can get them by renaming the extension from .oxt to .zip and then unzipping).

It's kinda as I expected. It's a dictionary file and a massive affix file. But it looks very well done so if someone could come up with some fancy code that generates all the words that the .aff file is supposed to generate, then we'd have a dictionary we can use.

What I mean with that is this; if you look in the .dic file, you'll see (for example)

1
 
aarondar/60

This means that the word aarondar can take all affixes from the .aff file which begin with SFX 60, for example

1
2
3
4
5
6
7
8
 
SFX 60    0       ra      .
SFX 60    0       rago/243      .
SFX 60    0       ragoa/243      .
SFX 60    0       ragoak/243      .
SFX 60    0       ragoarekin/243      .
SFX 60    0       ragoarendako/243      .
SFX 60    0       ragoarentzat/243      .
SFX 60    0       ragoaren/238      .
 

So this creates aarondarra, aarondarrago, aarondarragoak and so on. What I haven't quite figured is what the /243 is for, I'll have to ask but I think it's to prevent recursion i.e. it tells you what you may not stick on the word once you have generated it, so aarondarrago + rago is illegal.

If someone is capable of doing this sort of code, then yes, we could use the Xuxen dictionary.

Do, or do not. There is no try.

#15 RE: euskara.dic