Onion Information
What Is a Language?
Hidden Wonders - What Is a Language? Published: 2022-09-25 - Lastmod: 2023-08-15 - Whisper [#] - I’ve been playing around with Whisper , “a general-purpose speech recognition model.” You can install it in one line: It can be used as a pytho...
Onion Details
Page Clicks: 1
First Seen: 03/11/2024
Last Indexed: 10/22/2024
WARNING
This Domain Has Been Reported 1 Time!
Human Trafficking: 1
Onion Content
Hidden Wonders Programming · Technology What Is a Language? Published: 2022-09-25 Lastmod: 2023-08-15 Table of Contents Whisper [#] I’ve been playing around with Whisper , “a general-purpose speech recognition model.” You can install it in one line: pip install git+https://github.com/openai/whisper.git It can be used as a python library, but it’s also usable as a command line program. The following line takes a file, music.mp3 , and outputs a music.mp3.txt and music.mp3.vtt file containing the transcription: whisper music.mp3 --model base.en There are several models to choose from with varying VRAM requirements, and model.en variants for each for better performance with English. If you have an Nvidia GPU definitely give it a try, it’s impressive how good it is with English. Especially exciting to me is that the model can do other languages and translate to English as well. The translation part doesn’t work the best yet, but someday this will be a valid way of getting subtitles for any media in another language: whisper japanese_cartoons.mkv --language Japanese --task translate --model base The translation stuff is really cool, but it’s the ability of a computer to look at audio and transcribe it into text that made me wonder what is a language anyway? What is a Language Anyway? [#] In particular, the whisper AI made me question whether language is something that can be quantified such that a computer can understand it perfectly. For instance, you can feed the model some people with strange accents and it will oftentimes get the transcription correct, but for fun I tried giving it an English Vocaloid song [watch?v=vW9_5giCK1I] and the model did horrible, only getting a few sentences in the whole song correct. It made me wonder if the human speech patterns the AI was trained on has some quality that the highly synthesized sound of Vocaloids are lacking. Here’s one definition of the word language: Language, n. Communication of thoughts and feelings through a system of arbitrary signals, such as voice sounds, gestures, or written symbols. And here’s another: Language, n. Such a system as used by a nation, people, or other distinct community; often contrasted with dialect. In the first definition, the usage of the term “arbitrary” is incredibly interesting to me. Can I say uuga booga duuga , and that’s language? No one would know what it means. I would have to prescribe a meaning to the phrase, but does just me knowing the phrase’s meaning imply I’ve created my own language? The second definition implies a different understanding of language as a medium of communication officially established and widely understood by some community. This sounds like something an AI or computer would someday be capable of understanding, but I’m not sure if language as some sort of arbitrary, non-standardized method of communication can ever be understood. Human Made Languages [#] -. . Oh, sorry, I was writing in Morse code, the standardized method of communication for telegrams. At least, I assume that’s what it is since I used a Morse code translator online to get it, a translator which uses a computer to read in English text and output the equivalent Morse code. Arbitrary symbols are easy for humans to create to represent all kinds of things. Just look at math: ∃x ∈ {0,1} -> x=0 ∨ x=1 . In English, this means “there exists x , an element of the set {0, 1}, which implies x is equal to 0 or x is equal to 1.” Been a while since I’ve written in that notation so the right side of the implication might not make any sense, but this is language, a mathematical language of getting thoughts about math across to other mathematicians. Programming languages are languages too. From obvious stuff like: /* C */ #include int main () { printf ( "Hello World \n " ); } To more unfamiliar languages solving less trivial problems like: /* Prolog */ ackermann ( 0 , N , X ) : X is N + 1 , !. ackermann ( M , 0 , X ) : !, M > 0 , M1 is M - 1 , ack ( M1 , 1 , X ). ackermann ( M , N , X ) : N > 0 , M > 0 , M1 is M - 1 , N1 is N - 1 , ack ( M , N1 , Y ), ack ( M1 , Y , X ). Programming languages in particular are interesting because while they are designed so that other programmers can understand the code, their initial and still primary intent is to provide instructions to computers. In contrast, it must be much harder for computers to parse the English language. There are also languages like Elvish. J.R.R. Tolkien played around with artificially made languages all his childhood, eventually leading himself to make Elvish. Elvish uses it’s own unique symbols, see here for more on the subject. He gave that language meaning himself through building the characters and words himself, and then spreading the language through his books. Whisper Does not “Understand” the Language [#] That all said, whisper in it’s current state cannot “understand” language in the truest sense. While I understand certain Google employees who want some media publicity like to claim their AI is sentient, there is a fine line between being able to translate spoken text into its written equivalent and being able to discern the meaning of that text. The “thoughts and feelings” that language is meant to portray is lost upon a computer, and there’s no real way for that problem to be solved anytime soon. Even with programming languages, the computer doesn’t know the end goal of what it is being instructed to do. For instance, the computer doesn’t understand that previously mentioned C code is a hello world program; rather, the computer sees it as storing certain values in it’s certain registers on the computer and computing certain values. In case you don’t believe me, here is the output of gcc -S hello.c , which converts the C code into assembly: .file "hello.c" .text .section .rodata .LC0: .string "Hello World" .text .globl main .type main , @ function main: .LFB0: .cfi_startproc pushq % rbp .cfi_def_cfa_offset 16 .cfi_offset 6 , - 16 movq % rsp , % rbp .cfi_def_cfa_register 6 leaq .LC0 (% rip ), % rax movq % rax , % rdi call puts@PLT movl $ 0 , % eax popq % rbp .cfi_def_cfa 7 , 8 ret .cfi_endproc .LFE0: .size main , . main .ident "GCC: (GNU) 12.2.0" .section .note.GNU - stack , "" , @ progbits Beneath that main: line, you can see that the computer only gets a string from us (“Hello World\n”) and the location in memory of the function we want to call ( printf ). That call line is calling the puts function a bunch, which is how printf is typically implemented on most machines. The point being: there is no implicit understanding by the computer of the higher level operation that we are performing. The computer just moves stuff around in memory. If an AI were to become sentient, I’d imagine it would first understand the higher-level purpose of this code before it could even consider trying to understand the thoughts and emotions of English literature or speech. Github Copilot is a step closer to this, but it’s still not understanding the code, just merely providing the code that is likely to come next based on its analysis of billions of other lines of code. In fact, I might have been too generous to the computer. Really, the output of gcc -S hello.c just shows the assembly code, not what the computer understands. What the computer sees is too long to put here, but here is the first 90 lines of the binary output from xxd : 00000000 : 7f 45 4 c46 0201 0100 0000 .ELF............ 00000010 : 0300 3e00 0100 0000 4010 0000 ..>..... @ ....... 00000020 : 4000 0000 0847 0000 @ ........G...... 00000030 : 0000 4000 3800 0 d00 4000 2500 2400 .... @ .8 ... @ .%. 00000040 : 0600 0000 0400 0000 4000 0000 ........ @ ....... 00000050 : 4000 0000 @ ....... @ ....... 00000060 : d802 0000 ................ 00000070 : 0800 0000 0300 0000 0400 0000 ................ 000000 80 : 1803 0000 ................ 000000 90 : 1803 0000 1 c00 0000 ................ 000000 a0: 1 c00 0000 0100 0000 ................ 000000 b0: 0100 0000 0400 0000 ................ 000000 c0: 0000 ................ 000000 d0: 3006 0000 0. ..... .0 ....... 000000e0 : 0010 0000 0100 0000 0500 0000 ................ 000000f 0 : 0010 0000 ................ 00000100 : 0010 0000 6101 0000 ........a....... 00000110 : 6101 0000 0010 0000 a............... 00000120 : 0100 0000 0400 0000 0020 0000 ......... ...... 00000130 : 0020 0000 . ....... 00000140 : b400 0000 ................ 00000150 : 0010 0000 0100 0000 0600 0000 ................ 00000160 : d02d 0000 d03d 0000 . .......=...... 00000170 : d03d 0000 4802 0000 .=......H....... 000001 80 : 5002 0000 0010 0000 P............... 000001 90 : 0200 0000 0600 0000 e02d 0000 ......... ...... 000001 a0: e03d 0000 .=.......=...... 000001 b0: e001 0000 ................ 000001 c0: 0800 0000 0400 0000 ................ 000001 d0: 3803 0000 8. ..... .8 ....... 000001e0 : 3803 0000 4000 0000 8. ...... 000001f 0 : 4000 0000 0800 0000 @ ............... 00000200 : 0400 0000 7803 0000 ........x....... 00000210 : 7803 0000 x.......x....... 00000220 : 4400 0000 D.......D....... 00000230 : 0400 0000 53e5 7464 0400 0000 ........S.td.... 00000240 : 3803 0000 8. ..... .8 ....... 00000250 : 3803 0000 4000 0000 8. ...... 00000260 : 4000 0000 0800 0000 @ ............... 00000270 : 50e5 7464 0400 0000 1020 0000 P.td..... ...... 000002 80 : 1020 0000 . ....... 000002 90 : 2400 0000 $ ....... $ ....... 000002 a0: 0400 0000 51e5 7464 0600 0000 ........Q.td.... 000002 b0: 0000 ................ 000002 c0: 0000 ................ 000002 d0: 0000 1000 0000 ................ 000002e0 : 52e5 7464 0400 0000 d02d 0000 R.td..... ...... 000002f 0 : d03d 0000 .=.......=...... 00000300 : 3002 0000 0. ..... .0 ....... 00000310 : 0100 0000 2f 6 c 6962 3634 2f 6 c ......../lib64/l 00000320 : 642 d 6 c69 6e75 782 d 7838 362 d 3634 2e73 d-linux-x86- 64. s 00000330 : 6f 2 e 3200 0000 0400 0000 3000 0000 o .2 ........ .0 ... 00000340 : 0500 0000 474 e 5500 02 80 00 c0 0400 0000 ....GNU......... 00000350 : 0100 0000 0100 01 c0 0400 0000 ................ 00000360 : 0100 0000 0200 01 c0 0400 0000 ................ 00000370 : 0000 0400 0000 1400 0000 ................ 000003 80 : 0300 0000 474 e 5500 05 85 b53f f43b 4 d7c ....GNU....?.;M| 000003 90 : 511 e 3f 20 3460 c080 9314 6302 0400 0000 Q.? 4 ` ....c..... 000003 a0: 1000 0000 0100 0000 474 e 5500 0000 ........GNU..... 000003 b0: 0400 0000 ................ 000003 c0: 0100 0000 ................ 000003 d0: 0000 ................ 000003e0 : 0000 ................ 000003f 0 : 0000 0600 0000 1200 0000 ................ 00000400 : 0000 ................ 00000410 : 4800 0000 2000 0000 H... ........... 00000420 : 0000 0100 0000 1200 0000 ................ 00000430 : 0000 ................ 00000440 : 6400 0000 2000 0000 d... ........... 00000450 : 0000 7300 0000 2000 0000 ........s... ... 00000460 : 0000 ................ 00000470 : 1800 0000 2200 0000 .... "........... 000004 80 : 0000 0070 7574 7300 5f5f .........puts....