You're not registered yet. Click here to register. Credits 
You can register here for free.
This topic has 14 replies
and has been read 1.195 times
 Basque
Bussinchen Online




Posts: 17

Thu Jul 26, 2012 2:31 am
euskara.dic Quote · reply



euskara.dic !


In this thread we can discuss our future Basque word list, i.e. our Scrabble3D euskara.dic !

Welcome!


I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


Bussinchen Online




Posts: 17

Thu Jul 26, 2012 12:12 pm
#2 RE: euskara.dic Quote · reply

Euskarbel

Thanks a lot, Joan!



I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


akerbeltzalba Offline




Posts: 142

Thu Aug 02, 2012 12:09 am
#3 RE: euskara.dic Quote · reply

Heh, nice to see all those digraphs :)

Where's the dictionary coming from? Are we using a GNU spellchecker?


Bussinchen Online




Posts: 17

Thu Aug 02, 2012 12:24 am
#4 RE: euskara.dic Quote · reply

We have no basque dictionary.

Our Catalan OpenSource friend jmontane has only posted the letter set so far.
But he has no dictionary either.

Could you maybe help us with a basque.dic, akerbeltzalba?


I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


jmontane Offline




Posts: 63

Thu Aug 02, 2012 8:26 am
#5 RE: euskara.dic Quote · reply

Hi akerbeltzalba, and all

I'm busy these days, I planned submit a word list based on LibreOffice spellchecker (XUXEN dictionary).

What do you think?


Bussinchen Online




Posts: 17

Thu Aug 02, 2012 12:20 pm
#6 RE: euskara.dic Quote · reply

As we (we, that means Linhart ) are working for example on a Latin dictionary just now, we know that such lists like spell checker lists contain proper names and other words that are not valid in Scrabble. Often there are grammatically wrong forms, like for example passive forms of intransitive verbs. If such a list is used only for spell checking purposes, this is not really a problem, but in Scrabble games - and especially in Scrabble3D - it would be a great problem, if playing against the computer he will always place a lot of wrong, not valid words.

Of course, it is a good start to begin with a GNU spell checker list, but if we want to have a dictionary of good quality, that spell checker list ought to be totally revised and adapted for Scrabble3D purposes, which might take years of very very accurate work. Gero has worked several years on our deutsch.dic as well, because in special cases the grammar is not as easy as it seems to be. And I know which huge work on our future latin.dic Linhart is doing just now. So we really know what we are talking about...

Of course, a Basque spell checker list is better than no list at all...

But, jmontane, you wrote: "a word list based on LibreOffice spellchecker (XUXEN dictionary)". So what do you mean exactly by saying "based"? Have you already revised/adapted that list for Scrabble purposes?


I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


jmontane Offline




Posts: 63

Thu Aug 02, 2012 6:27 pm
#7 RE: euskara.dic Quote · reply

Zitat von Bussinchen im Beitrag #6
But, jmontane, you wrote: "a word list based on LibreOffice spellchecker (XUXEN dictionary)". So what do you mean exactly by saying "based"? Have you already revised/adapted that list for Scrabble purposes?


I know the differences between spell-checker word list and Scrabble word list, :)

LibreOffice/OpenOffice/hunspell/myspell dicts use two files. eu_ES.dic and eu_ES.aff (for Basque language). The first one (.dic) is a list of entries, followed by affixes to apply in each entry. The second one (.aff) keep the affixes (preffixes and suffixes to aply at the entries).

So, some (an easy) steps to adapt a Basque spell-cheker dict to Scrabble purposes are:

1st: remove entries from eu_ES.dic
A.- Words starting with starting uppercase (propper nouns, trademarks, ...)
B.- Words with hyphens (compound words)
C.- Words with characters not present in Scrabble tile distribution (ñ)
D.- Words finished with dot (.) (abbreviations)
E.- Symbols: m kg cm mm ...

2nd: remove affixes that generate non desired words forms.

3st: genereate all inflected forms, using unmunch command from hunspell

That's the general idea.

Some remarks,

About 1st step (A,B,C,D) it's easy to do. 1-E is a little hard, but usually are words of 2 a 3 chars.
About 2nd step, I don't know nothing about Basque morphology, so any advice?
About 3rd step, Basque is a high inflection language. My first attempt to generate all inflected words fails, :(. I have found an alternative to unmunch more stronger to very large word lists.


Bussinchen Online




Posts: 17

Thu Aug 02, 2012 6:46 pm
#8 RE: euskara.dic Quote · reply

Linhart has worked on such lists before, for example when he created our persian.dic. Since he already has that experience, I think he can give us good advice here.

Akerbeltzalba ( ) knows Basque (!), he speaks Basque (!) --> http://en.wikipedia.org/wiki/User:Akerbeltz. So I think that he is very competent to give us some advice here as well!


I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


Scotty Offline

Administrator


Posts: 3.609

Thu Aug 02, 2012 7:04 pm
#9 RE: euskara.dic Quote · reply

Zitat von Bussinchen im Beitrag #8
Akerbeltzalba speaks Basque
Unbelievable! But I'd be more interested in his experiences in Chinese.


Download: Sourceforge.net | Help:Scrabble3D Wiki | Discussion: Forum | News: Twitter | Fanship: Facebook


Bussinchen Online




Posts: 17

Thu Aug 02, 2012 7:40 pm
#10 RE: euskara.dic Quote · reply



We have gathered very very very high competences here in our beloved Scrabble3D project indeed!!!

You all are really great!!!


I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


linhart Offline




Posts: 2.484

Fri Aug 03, 2012 9:58 pm
#11 RE: euskara.dic Quote · reply

Bussinchen writes that I have experience with spellchecker lists. This is true, but unfortunately I will not be able to give you any advice how to handle such lists with the hunspell program since I wrote my own program for this purpose, and this is only suited for the Latin word list.

The Persian word list has a different structure, it does not consist of two lists (.aff and .dic), but only of one.

Sorry!


akerbeltzalba Offline




Posts: 142

Mon Aug 06, 2012 1:02 am
#12 RE: euskara.dic Quote · reply

It depends very much on the language I'd say. Gaelic and Irish were relatively easy to build on the back of the spellchecker files. I haven't seen the Xuxen files, perhaps jmontane can post a sample of the words he has generated? On the whole I'm slightly concerned about autogenerating anything in Basque because of it being a polysynthetic language but on the other hand, most software the Basques have produced is good so it could work. I know what the Basque Regional Government had its hand in the development of Xuxen anyway, so a guarded "maybe" at this stage :)

Zitat von Scotty
Unbelievable! But I'd be more interested in his experiences in Chinese



As far as Scrabble goes, no chance. You would need a "letterset" that's about 6,000 "units" big. Not really feasible I'm afraid.


Bussinchen Online




Posts: 17

Mon Aug 06, 2012 2:54 pm
#13 RE: euskara.dic Quote · reply

Adminchen:

Please let's continue our discussion about Scrabble3D games in Chinese language in this thread:


Chinese Scrabble3D


Welcome!


I OpenSource!
• Scrabble3D Download: Sourceforge.net | • Scrabble3D Help: Wiki | • Scrabble3D News: Twitter | • Scrabble3D Fanship: Facebook
• Scrabble3D in Italia: Sezione Scrabble3D sul Forum della Federazione Italiana Gioco Scrabble


akerbeltzalba Offline




Posts: 142

Mon Sep 17, 2012 9:55 pm
#14 RE: euskara.dic Quote · reply

Ok I grabbed the Xuxen oxt file from here http://extensions.services.openoffice.org/en/project/xuxen and looked at the .dic and .aff files (you can get them by renaming the extension from .oxt to .zip and then unzipping).

It's kinda as I expected. It's a dictionary file and a massive affix file. But it looks very well done so if someone could come up with some fancy code that generates all the words that the .aff file is supposed to generate, then we'd have a dictionary we can use.

What I mean with that is this; if you look in the .dic file, you'll see (for example)

1
 
aarondar/60
 


This means that the word aarondar can take all affixes from the .aff file which begin with SFX 60, for example

1
2
3
4
5
6
7
8
 
SFX 60    0       ra      .
SFX 60 0 rago/243 .
SFX 60 0 ragoa/243 .
SFX 60 0 ragoak/243 .
SFX 60 0 ragoarekin/243 .
SFX 60 0 ragoarendako/243 .
SFX 60 0 ragoarentzat/243 .
SFX 60 0 ragoaren/238 .
 



So this creates aarondarra, aarondarrago, aarondarragoak and so on. What I haven't quite figured is what the /243 is for, I'll have to ask but I think it's to prevent recursion i.e. it tells you what you may not stick on the word once you have generated it, so aarondarrago + rago is illegal.

If someone is capable of doing this sort of code, then yes, we could use the Xuxen dictionary.

Do, or do not. There is no try.


jmontane Offline




Posts: 63

Tue Sep 18, 2012 8:34 pm
#15 RE: euskara.dic Quote · reply

Hi,

I've good news. A month ago, a Scrabble-like for smartphones and Facebook, called Apalabrados (Angry Words) added Basque language, :)))

Letter distributions is pretty similar. I think they changed it to avoid Mattel lawsuit.

About word list, I've contacted with the player who provide it. I hope she will provide us the word list in an opensource license.

So. I think we can wait a few weeks.

About /243 feature... In hunspell dictionaries format, it's possible to compound affixes. In your example, the suffix numbered 60 is applied first, and then, the suffix numbered 243 is applied to the generated word by the first one.


 Jump  
disconnected Scrabble3D Chat Members online 1
Xobor Einfach ein eigenes Forum erstellen