Friday, March 21, 2008
Random name generation
Ive been poking with some random name generation stuff. Nothing uber fancy or anything.
First thing I did was to collect data of the top 100 boys and girls names over several years. Various web sites have lists of these things so its a pretty easy cut and paste to mine them.
I then used a ruby script to collate, sort and remove dupes.
This gave me 196 (of 500) total unique boys names and 162 (of 500) total unique girls names. You can see most are duplicates. This is also of course very English/American centric I should add.
The next task was to break down the names into c/v/cc/vv markers (consonant, vowel etc) of the english speach.
Some examples;
vcvcvccvc => alexander == (v, c, v, c, v, cc, v, c) => (a, l, e, x, a, nd, e, r) vcccvc => andrew == (v, cc, c, v, c) => (a, nd, r, e, w) vccvc => angel == (v, cc, v, c) => (a, ng, e, l) vcccvcc => anthony == (v, cc, c, v, cc) => (a, nt, h, o, ny) vccvcvv => antonio == (v, cc, v, c, vv) => (a, nt, o, n, io) vcccvc => ashton == (v, cc, c, v, c) => (a, sh, t, o, n) vvccvc => austin == (vv, cc, v, c) => (au, st, i, n) cvccvcv => barbara == (c, v, cc, v, c, v) => (b, a, rb, a, r, a) cvccc => betty == (c, v, cc, c) => (b, e, tt, y) cvcvccc => beverly == (c, v, c, v, cc, c) => (b, e, v, e, rl, y) ccvvccv => brianna == (cc, vv, cc, v) => (br, ia, nn, a) ccvccvcc => brittany == (cc, v, cc, v, cc) => (br, i, tt, a, ny) ccvvcv => brooke == (cc, vv, c, v) => (br, oo, k, e) ccvvcccc => brooklyn == (cc, vv, cc, cc) => (br, oo, kl, yn) cvcvccv => cadence == (c, v, c, v, cc, v) => (c, a, d, e, nc, e) cvcccc => camryn == (c, v, cc, cc) => (c, a, mr, yn) cvccc => carly == (c, v, cc, c) => (c, a, rl, y) cvcvc => carol == (c, v, c, v, c) => (c, a, r, o, l) cvcvcvcv => caroline == (c, v, c, v, c, v, c, v) => (c, a, r, o, l, i, n, e)
Of the 196 boys names this is what I got;
ccvcc = 4 ccvccvc = 7 ccvcvc = 4 cvc = 3 cvcc = 9 cvccc = 5 cvcccvc = 3 cvccvc = 21 cvccvcc = 6 cvccvvc = 3 cvcv = 4 cvcvc = 20 cvcvcc = 7 cvcvvc = 5 cvvc = 6 vcvc = 6 vvcvc = 3
Girls names were more varied and this is the output of that 162;
ccvccvc = 4 cvcc = 5 cvccc = 8 cvccv = 7 cvccvc = 8 cvccvcc = 4 cvccvcv = 4 cvccvcvv = 3 cvccvv = 4 cvcv = 4 cvcvc = 10 cvcvccc = 4 cvcvccv = 7 cvcvcv = 3 cvcvcvcv = 3 cvcvv = 4 cvvc = 3 cvvcv = 4 cvvcvc = 4 vccv = 3 vcvcv = 4 vcvcvv = 3
Using another script to use the above tables I randomly generated some names, for example;
pattern = cvcvc : pupet pattern = ccvvcc : stoush pattern = cvcvcvc : ronafut pattern = cvcvcvc : nodibol pattern = cvcvc : tunen pattern = cvcvcvc : lejacas pattern = cvccvcc : kolfift pattern = ccvvc : fraes pattern = cvccvcc : hangold pattern = ccvvcc : gruath pattern = ccvvcc : skeenk pattern = ccvvcc : thouft pattern = cvccvc : tolfaj pattern = cvccvcc : sontunt pattern = cvcvvc : toboum pattern = ccvvc : skaed pattern = cvcvcvc : lekajel pattern = cvccvcc : fathort pattern = cvcvvc : gekuat pattern = cvccvc : mankos
Mmm nothing of any real value or even remotely close to existing names from the original list (I was only testing against patterns of 5 or more letters)
I should note that I weeded out a lot of things, For example there are no ‘q’ or ‘qu’ pairings and letters like ‘z’ and ‘x’ have been removed. Consonant pairings are restricted to things like ‘st’,’ch’,’dr’,’sh’ etc and vowel pairings to ‘ae’,’ou’ etc…
Needs a lot more work. I think what I need to do is weight the combinations and make a dictionary of the first consonant/vowel/combos and then the rest of the combos.
Filed Under : Computers • Development •
Comments are closed Commented on by (3) people. Read those Comments Here