If there is a Japanese Scrabble but with romaji and Japanese Scrabble Championships are played in English, there must be a good reason. There are millions of potential players/buyers. Kana words are too short for the board of a game designed for the English language.
One easy solution to make an interesting game in hiragana/katakana: play in 3D, with small board, like 7 or 8 squares.
Only 52 items contain Katakana only (chars from ァ to ヶ ). When I use Hiragana too (ぁ  to ヶ ), it is 152. Obviously I made a mistake with the allowed letter. Which should be allowed? And by the way: What happens with words that contain spaces? Exclude completely or remove space?
I made a sample database file from the hiragana index (hiragana-kanji-english) from year 1998 that shows my idea what Japanese scrabble could look like. I left out katakana entries. The letterset is modified by hiragana frequency in the dictionary. Attached are some screenshots. The real dictionary should be made from the current EDICT files, if it is ok to use it, as it looks to me.
One way to make human games better would be to play with inflected word forms, which of course needs a reasonable knowledge of Japanese.
The dictionary contains many phrases, but they do not need to be taken out. They can be given a different category and excluded from standard play. Then anyone who wants can play against or with the whole dictionary.
Japanese (the dictionary) doesn't use spaces between words, so it is not even possible to easily remove multi-word entries.
If you want to use the real hiragana frequencies as the base for the letterset, you need about 1000 tiles in the letterset. The ten most frequent hiragana occur on average 12 000 times, while the rarest only some hundreds of times. My letterset contains only 206 tiles, it cannot mirror the frequencies properly, so rare words are played probably more often.
To do this well, it would be great to have a native Japanese linguist who plays in scrabble competitions. Since Scrabble Japanese championships exist, maybe one could be found.
Attached is a test dictionary made from the old EDICT file from the year 1998 containing hiragana-english, katakana-english and english-japanese dictionaries in one file. It's possible to change between hiragana-katakana-english by changing the letterset and selecting the right dictionary category.
The letterset (more than 200 tiles) in hiragana which is slightly based on the frequences is included. The dictionary is not cleaned, the idea is just to show it works. The final dictionary (or separate dictionaries) should be made from up-to-date dictionary files.
The English-language part doesn't seem to play too well. Large number of entries have spaces and there are lots of short words (2) or abbreviations. Instead or in addition to English Romaji could be added so any player could choose whichever Japanese script is best for him/her.
In my opinion there is no need to use only one script. In this way all the words that a player plays, sees or learns are "correct" Japanese.
Playing with hiragana (and English) is greatly improved with at least 9 tiles in the rack.
Scotty, how do you define characters during filtering? UTF code range I'm guessing? I think it's probably just a character range definition issue. And yes, all lines which contain at least one space character need to go.
xyz, there's an issue with the Romaji version - it contains a lot of acronyms by the looks of it, you pleayed LL, RF, CM etc. And the Hiragana still has an over-abundance of 2 kana words.
I'm not saying I'm necessarily right but I would like to try the Katakana version first before we decide on a way forward.
******************************* Do, or do not. There is no try.
The screenshots are from demo game. There's no romaji, it's English. The dictionary didn't have romaji. It 's just to show that you can play with multiple character sets in one dictionary. If it is not good, you can play with separate katakana/hiragana/romaji dictionaries.
There are probably lot of two-syllable words, but it is a property of the language. That's why romaji would work more like English scrabble. Still, it's better to use kana.
I think katakana only version will not be played by native Japanese or someone fluent in Japanese because they probably don't want to use incorrect script for words. On the other hand, someone who wants to practise or learn Japanese doesn't want to learn read and write a word in a wrong script. Before making such dictionary, it might be a good idea to ask those people first.
The up-to date edict files contains large number of entries both in hiragana and katakana.
Look I don't want to get into a heated argument when we don't even have a working prototype. Fact is, Japanese people DO use Katakana for words normally written in Hiragana, both historically (remember Hiragana started out as woman's writing) and today - mainly for emphasis. Just open any old Manga and look for the whoosh-bang-ouch type words. All in Katakana.
The primary concern just now is to see if using either form of Kana or some sort of hybrid we can create a playable game. The initial outcome was that Hiragana alone produces too many ladders and blocks because of the frequency of two-Kana words which means it's not really playable. Yes, it's a property of the language, I'm not suggesting the Japanese language be changed to fit the game but it may mean that Kana is simply not suited to a Scrabble-type game. So since we're experimenting, I'm trying to see if Katakana produces something more workable, alright? I'm not suggesting we force anyone to do anything. I think you'll find Japanese people a bit more flexible in relation to Kana than you might think they are...
Take this mag cover:
where the word otokonoko 'boy' (the big yellow Kana) which is normally written in Kanji and Hiragana 男の子 or Hiragana おとこんこ is written in Katakana オトコノコ. In fact, though it's a bit hard to read, I can't see much in the way of Hiragana ANYwhere on the cover where you'd normally expect it.
When I started experimenting with Gaelic many years ago I didn't think it would work either because I started out treating h as a letter which made the game unplayable but when I reverted to the historic digraphs, suddenly it worked great. Sometimes thinking outside the box is a good idea. And yes, sometimes even thinking outside the box fails :)
Now regarding the range issue... it should go as far as 30FC (the ー length mark) - not including that will have removed a lot of words. I'm not entirely sure about the rest - is it possible that there is an end-of-line space character in a lot of lines? That would account for the small number of words in the output.
******************************* Do, or do not. There is no try.