Deep Dives160 Characters

160 Characters

Every text message you send is an act of digital smuggling. Your words are sent through infrastructure that was never designed to carry them.
SMS - 160 Characters Introduction

01

The Smuggling Route

SMS - The Smuggling Route
To understand why SMS exists at all, we first need to get into telephone networks.
Key Concept

Separate Channels: SS7 protocol sends voice (user data) and control-talk on different lines

Telephone networks have two components to transfer information:

User data: This carries the actual user data to be transferred, your voice on a phone call, for instance.

Control-talk: A set of instructions shared between your device and network. Before your voice travels anywhere, the network needs to find your friend's phone, check if they're available, calculate the best route through potentially dozens of switches, set up billing, and reserve bandwidth.

In the 1980s, a new communication protocol called SS7 (Signaling System 7) was being used. This protocol sent the voice (user data) and the control-talk on separate lines. Formally, this is called common channel signaling.

Phone APhone BUser Data (Voice)Control-Talk (SS7)
SS7: Separate channels for voice and control messages
“SS7 sends data in the form of packets that have fixed sizes. This creates an interesting problem: what happens when your control message doesn't fill the container?”

02

The 140-Byte Void

SMS - The 140-Byte Void
In 1985, GSM engineers discovered something remarkable.
Key Discovery

140 Bytes of Unused Space: Found in every SS7 packet after accounting for headers

Friedhelm Hillebrand and Bernard Ghillebaert were examining SS7 packet structures. After accounting for all the headers like message type indicators, originating point codes, destination point codes, and circuit identification codes, they found something remarkable: 140 bytes of unused space in every packet.

MessageTypeIndicatorHeadersPointCodesHeaders140 BYTESUNUSED SPACESS7 Packet Structure← The Void (Perfect for SMS) →
140 bytes of empty space found in every SS7 control packet
“That is when an idea popped in his head: what if we use this idle space to smuggle messages between devices?”

And that brings us to our 160-character limitation for SMS messages, or to be more technically precise, 140 bytes.


Exercise 1

Encoding Calculator

Type a message and see how different encoding schemes affect byte usage in real-time.

GSM-7 Encoding

Uses 7 bits per character. Supports basic Latin alphabet, numbers, and common symbols. Allows up to 160 characters in 140 bytes.

UCS-2 Encoding

Uses 16 bits per character. Required for emojis and non-Latin scripts. Reduces capacity to just 70 characters in the same 140 bytes.

Try it: Start with plain text, then add an emoji. Watch the encoding switch and character limit drop from 160 to 70!

12
Characters
GSM-7
Encoding
7
Bits/Char
11
Bytes Used

140-Byte Limit

11 / 140

Standard text. GSM-7 allows 160 chars.

03

Math Behind the Limit

SMS - Math Behind the Limit
So, we have 140 bytes of space. How does that become 160 characters?
Key Calculation

GSM-7 Encoding: Uses 7 bits per character instead of 8

As we know, 1 byte = 8 bits. Thus, 140 bytes translate into 1120 bits:

140 × 8 = 1,120 bits

GSM-7 encoding is one of the protocols for encoding SMS. We use 7 bits for each character so,

1120 bits ÷ 7 = 160 characters
“This same 160 character limit shrinks to 70 character limit when we use emojis or alphabets from other languages.”

This happens because when we use non-GSM characters, we need a different encoding like UCS-2. UCS-2 uses 16 bits for each character so:

1120 bits ÷ 16 = 70 characters

04

The Encoding Wars

Character encoding is mapping human characters to numbers a computer can understand.
Key Trade-off

GSM-7 vs UCS-2: More characters vs more character types

GSM-7 uses seven bits to represent a character. 7 bits mean we can have 27 different combinations of 0's and 1's i.e. we can have 128 different characters.

GSM-7 is very limited and can't encode emojis and most of non-european languages. To solve this problem, a new encoding format called UCS-2 (UTF-16) was invented.

The '2' in UCS-2 stands for 2 bytes, because it uses 2 bytes to store each character i.e. now we have 16 bits instead of 7.

16 bits → 216 combinations → 65,636 characters
“When sending messages, if it contains even a single non-GSM character, the entire message gets replaced with UCS-2 encoding.”

05

Concatenation: A Hidden Transport Layer

The funny thing is, most of the times we don't feel the SMS character limit.
Key Innovation

Message Splitting: Long messages divided into 140-byte segments

Behind the scenes, concatenation works to provide this seamless experience. Messages longer than 160 characters are divided into 140-byte segments and are sent over the network (SS7). On the receiver's device we connect all these segments by proper order.

The UDH (user data header) is used to facilitate proper concatenation. Each segment's header contains some metadata which informs the receiver of the correct order.

Long Message Split into Segments
Segment 1
UDH
6 bytes
1/3
Message Data (134 bytes → 153 chars)
Segment 2
UDH
6 bytes
2/3
Message Data (134 bytes → 153 chars)
Segment 3
UDH
6 bytes
3/3
Message Data (134 bytes → 153 chars)
Each segment reduced from 160 to 153 characters due to 6-byte UDH overhead
“UDH needs 6 bytes, thus each segment spends 6 additional bytes on UDH. Thus, we are now left with 134 bytes.”

Since SMS pricing is based on segments, you'll pay for the extra segment here.


Exercise 2

Message Splitting

Watch how long messages get split into multiple SMS segments with UDH headers.

Single Message

Messages up to 160 characters fit in a single SMS packet using all 140 bytes.

Concatenation

Longer messages are split into segments. Each segment includes a 6-byte UDH (User Data Header) for reassembly, reducing capacity to 153 characters per segment.

The Cost

SMS pricing is per segment. Most carriers charge $0.0083 per SMS. A 300-character message costs 2× as much because it requires 2 segments to send. At scale, these costs add up quickly.

Try it: Type more than 160 characters and watch how the message splits into segments. Notice how concatenation impacts the total cost!

246
Characters
2
Segments
280
Bytes
Cost per SMS$0.0083
Total Cost
$0.0166(2× $0.0083)
If it fit in 1 SMS:$0.0083
Extra cost:+$0.0083

Split into 2 segments. 6-byte UDH per segment.

Segments

Segment 1/2153 chars
UDH
6b
This is a longer message that will demonstrate how SMS concatenation works. When you type more than 160 characters, your message gets split into multiple
Segment 2/293 chars
UDH
6b
segments, each with its own User Data Header (UDH). Keep typing to see more segments appear!
06

The Message Journey

An SMS is sent over a network in two stages.
Key Process

Two-Stage Delivery: Lookup (finding recipient) → Delivery (sending message)

Stage 1: The Lookup

The sender's phone encodes the text into a 140-byte SMS-TPDU using GSM-7 encoding. The SMSC uses MAP protocol to send query packets over the SS7 network asking, "Where is phone number [Recipient's Number] right now?"

Stage 1: The Lookup
SenderSMSCQuery: Where?HLRResponse: Location
Stage 2: The Delivery
SMSCSMS DataMSCReceiver
Two-stage process: First locate the recipient, then deliver the message
“The HLR checks its database and sends a response packet back to the sender's SMSC over the SS7 network.”

Stage 2: The Delivery

Now that the SMSC has the target address, it sends the actual message. The sender's SMSC sends this wrapped packet to receiver's current MSC on the SS7 control plane. MSC receives this SS7 packet, unwraps the envelopes and sends the message to receiver.


07

The Modern Shift: UTF-8 Over IP

The 140 byte limit was quite restrictive, and thus we innovated once again.
Key Evolution

From SMS to RCS/iMessage: Moving from control-plane to data-plane (IP)

This new stack is built without the old GSM stack entirely, on two key revolutions:

1. The "Over IP" Revolution

The new way (RCS/iMessage) travels on the Data Plane (the 'IP' part, which stands for Internet Protocol). This is your 4G, 5G, or Wi-Fi connection. RCS converts the messaging from telephony service to internet service.

2. The "UTF-8" Revolution

UTF-8 is a variable-length encoding. It stores A (a simple ASCII character) in 1 byte, é (a common European character) in 2 bytes, 汉 (a Chinese character) in 3 bytes, and 😊 (an emoji) in 4 bytes.

“When you combine the 'superhighway' (IP) with the 'universal language' (UTF-8), you get all the features we now expect from a modern chat app.”

RCS and iMessage freed messaging from the ancient phone network. They abandoned the old approach on the SS7 control channel and turned messaging into what it is today: a modern, fast, and flexible internet application.

Sources

GSM Technical Specification 03.38

3GPP TS 23.040 - Technical realization of the Short Message Service (SMS)

ITU-T Recommendation Q.700 - Introduction to CCITT Signalling System No. 7


[ KEEP READING ]

Learn more

Explore more about how Greptile works


AI Code Review - Greptile | Merge 4X Faster, Catch 3X More Bugs