Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

2
Experiment: Morse vs optimal binary encoding [oc]
Author Summary
Post Body

Inspired by the Adventures with Morse Code post by u/vk6flab, I decided to code some experiments comparing Morse with an ASCII-based encoding.

Results and disclaimer: Morse is optimized for humans. A 5-bit ASCII-based protocol could yield ~2x efficiency improvement, but would be too difficult to use without a machine interpreting the stream on each end.


Morse can be converted to binary to measure its efficiency. According to the spec, a dit is the shortest unit, equivalent to a single bit (1). A dah has the duration of three dits, so requires three binary bits (111). The gap between the dits and dahs of a single letter has a minimum silence duration of one dit (0), three between letters (000), and seven between words (0000000).

ASCII is normally 8 bits, but we can cover upper-case letters and basic punctuation with only 5 bits, accommodating 32 characters: 26 letters 6 punctuation marks (space .,!?'). See the bottom of this post for a note about how digits 0-9 are handled. I did this by truncating the normal ASCII ordinals to their 5 least significant digits and manually defining punctuation in the unused slots. Because this is fixed-width, extra silence bits between characters and words aren't required. The space between words is encoded as a regular character, included in the set of punctuation.

Morse tried to optimize by making the most frequent letters shorter, but that was done before the time when computers could ingest entire libraries at once to determine true frequency. The first few letters by order frequency of use in English text are ETAOINSHR.

Letter Morse Morse binary Morse duration 5-ASCII binary
E . 1000 4 01000101
T – 111000 6 01010100
A .– 10111000 8 01000001
O ––– 11101110111000 14 01001111
I .. 101000 6 01001001
N –. 11101000 8 01001110
S ... 10101000 8 01010011
H .... 1010101000 10 01001000
R .–. 1011101000 10 01010010

And the last few.

Letter Morse Morse binary Morse duration 5-ASCII binary
J .––– 1011101110111000 16 01001010
X –..– 11101010111000 14 01011000
Q ––.– 1110111010111000 16 01010001
Z ––.. 11101110101000 14 01011010

The 5-ASCII binary is of course 5 bits of duration in each case. The "Morse binary" I've written here is not just "kind of" equivalent to Morse; it's exactly equivalent. If you fed that into a signal generator that stepped through each bit slow enough for a human to hear and interpret, it would generate a perfect Morse signal.

Here are some practical efficiency comparisons. Spaces in the binary are included for reading convenience, but are not part of the data stream.


"HELLO WORLD."

  • Morse: .... . .–.. .–.. ––– .–– ––– .–. .–.. –.. .–.–.–
  • Morse binary, 134 bits: 1010101000 1000 101110101000 101110101000 11101110111000 0000 101110111000 11101110111000 1011101000 101110101000 1110101000 10111010111010111000
  • 5-ASCII, 60 bits: 01000 00101 01100 01100 01111 00000 10111 01111 10010 01100 00100 00001

Article 1 of the Universal Declaration of Human Rights: "ALL HUMAN BEINGS ARE BORN FREE AND EQUAL IN DIGNITY AND RIGHTS. THEY ARE ENDOWED WITH REASON AND CONSCIENCE AND SHOULD ACT TOWARDS ONE ANOTHER IN A SPIRIT OF BROTHERHOOD."

  • Morse: .– .–.. .–.. .... ..– –– .– –. –... . .. –. ––. ... .– .–. . –... ––– .–. –. ..–. .–. . . .– –. –.. . ––.– ..– .– .–.. .. –. –.. .. ––. –. .. – –.–– .– –. –.. .–. .. ––. .... – ... .–.–.– – .... . –.–– .– .–. . . –. –.. ––– .–– . –.. .–– .. – .... .–. . .– ... ––– –. .– –. –.. –.–. ––– –. ... –.–. .. . –. –.–. . .– –. –.. ... .... ––– ..– .–.. –.. .– –.–. – – ––– .–– .– .–. –.. ... ––– –. . .– –. ––– – .... . .–. .. –. .– ... .––. .. .–. .. – ––– ..–. –... .–. ––– – .... . .–. .... ––– ––– –.. .–.–.–
  • Morse binary, 1422 bits: 10111000 101110101000 101110101000 0000 1010101000 1010111000 1110111000 10111000 11101000 0000 111010101000 1000 101000 11101000 111011101000 10101000 0000 10111000 1011101000 1000 0000 111010101000 11101110111000 1011101000 11101000 0000 101011101000 1011101000 1000 1000 0000 10111000 11101000 1110101000 0000 1000 1110111010111000 1010111000 10111000 101110101000 0000 101000 11101000 0000 1110101000 101000 111011101000 11101000 101000 111000 1110101110111000 0000 10111000 11101000 1110101000 0000 1011101000 101000 111011101000 1010101000 111000 10101000 10111010111010111000 0000 111000 1010101000 1000 1110101110111000 0000 10111000 1011101000 1000 0000 1000 11101000 1110101000 11101110111000 101110111000 1000 1110101000 0000 101110111000 101000 111000 1010101000 0000 1011101000 1000 10111000 10101000 11101110111000 11101000 0000 10111000 11101000 1110101000 0000 11101011101000 11101110111000 11101000 10101000 11101011101000 101000 1000 11101000 11101011101000 1000 0000 10111000 11101000 1110101000 0000 10101000 1010101000 11101110111000 1010111000 101110101000 1110101000 0000 10111000 11101011101000 111000 0000 111000 11101110111000 101110111000 10111000 1011101000 1110101000 10101000 0000 11101110111000 11101000 1000 0000 10111000 11101000 11101110111000 111000 1010101000 1000 1011101000 0000 101000 11101000 0000 10111000 0000 10101000 10111011101000 101000 1011101000 101000 111000 0000 11101110111000 101011101000 0000 111010101000 1011101000 11101110111000 111000 1010101000 1000 1011101000 1010101000 11101110111000 11101110111000 1110101000 10111010111010111000
  • 5-ASCII, 850 bits: 00001 01100 01100 00000 01000 10101 01101 00001 01110 00000 00010 00101 01001 01110 00111 10011 00000 00001 10010 00101 00000 00010 01111 10010 01110 00000 00110 10010 00101 00101 00000 00001 01110 00100 00000 00101 10001 10101 00001 01100 00000 01001 01110 00000 00100 01001 00111 01110 01001 10100 11001 00000 00001 01110 00100 00000 10010 01001 00111 01000 10100 10011 00001 00000 10100 01000 00101 11001 00000 00001 10010 00101 00000 00101 01110 00100 01111 10111 00101 00100 00000 10111 01001 10100 01000 00000 10010 00101 00001 10011 01111 01110 00000 00001 01110 00100 00000 00011 01111 01110 10011 00011 01001 00101 01110 00011 00101 00000 00001 01110 00100 00000 10011 01000 01111 10101 01100 00100 00000 00001 00011 10100 00000 10100 01111 10111 00001 10010 00100 10011 00000 01111 01110 00101 00000 00001 01110 01111 10100 01000 00101 10010 00000 01001 01110 00000 00001 00000 10011 10000 01001 10010 01001 10100 00000 01111 00110 00000 00010 10010 01111 10100 01000 00101 10010 01000 01111 01111 00100 00001

Note about digits: In my 5-bit ASCII-based protocol, digits 0-9 would be represented by prepending a ! character, followed by one or more binary values corresponding to each 0-9 digit (eg. 1 = 00001, 2 = 00010), and a final ! to switch back to normal mode and automatically insert a space after the number. Optional dots and commas within the number would use their normal encoding but with a 1 as the first bit so they aren't interpreted as numbers. This is proposed but not actually implemented in the following code, since it wasn't needed for my sample texts.

My rough, proof-of-concept code:

https://gist.github.com/serif/68fa1b389e90072d9c1b377de123a92d

edit: updated the code and post after receiving this from u/vk6flab:

FYI, the spacing between words is inclusive of the space at the end of a symbol.

The opening of the Universal Declaration of Human Rights is now 1422 bits, down from 1509, and HELLO WORLD is similarly fixed. That makes 5-ASCII's 850 bits a 40% reduction.

I've also written a new test for 6-ASCII, which with 64 character options without using shift codes could be considered more directly in parity with Modern International Morse, and also standard 8-bit ASCII, to better answer the original Morse vs. ASCII question. Testing the opening of the Universal Declaration of Human Rights:

  • 5-ASCII: 850 bits
  • 6-ASCII: 1020 bits
  • 8-ASCII: 1360 bits
  • Morse: 1422 bits

Duplicate Posts
2 posts with the exact same title by 1 other authors
View Details
Author
Account Strength
100%
Account Age
12 years
Verified Email
Yes
Verified Flair
No
Total Karma
100,939
Link Karma
7,982
Comment Karma
92,063
Profile updated: 1 day ago
EM97|Extra

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
1 year ago