2026-01-07 correction: I incorrectly asserted that the 5 codepoints took 6 bytes to represent. Brainfog moment; thank you Peter for the pointing this out!

Back in 2018, I was an intern at facebook1. I was also very early into my transition, so I was very in-the-loop with my trans collegeaues - and I heard about how one of the messenging teams (messenger? whatsapp? maybe both?) was going to be rolling out a trans pride flag emoji, and was encouraging early testing.

My first and immediate find? Copy-and-pasting the emoji into any other text field would result in a singular unknown glyph (like, say, �) - these clowns wanted to map it to a private use area codepoint! Fair enough, I suppose, for a corporation that wishes to ship on their own timelines - so I did some digging.

some background

I found no good prior art to go off of - I believe that the eventually-accepted proposal was circulating as a draft at the time, but I haven't been able to find any references or records of it from that period of time; I did find reference here to the unicode consortium rejecting a proposal for a trans flag that year - I think it might have been an earlier version of the proposal, it might have been an alternate proposal; I do know that I recalled seeing requests for a new codepoint to be issued for a trans flag emoji glyph, and a proposal without all of the requisite U+FE0F's.

The initial approach and codepoint mapping taken a day or so later was 🏳‍⚧ - [U+1F3F3, U+200D, U+26A7]. That's pretty close to what the approved codepoints ended up being, and it likely displays correctly now, but it's lacking a couple U+FE0F's that were necessary at the time. Here's where things get into the weeds.

U+FE0F: Emoji Presentation Mode

Nowadays, if you go look at the listed codepoints for 🏳️‍⚧️, you'll see five codepoints2 (characters) that make it up, with two of them being U+FE0F. FEOF can be slapped after a symbol - without a ZWJ! - to indicate that the preceding codepoint should be interpreted for Emoji Presentation, and not textual mode - her sister, U+FE0E, does that. If you want to see this in action: [⚧︎⚧️] is [U+26A7 U+FE0E U+26A7 U+FEOF]. Same codepoint, two render modes!

There was some prior art on how to approach the trans flag codepoints - emoji presentation white flag, zero width joiner, and a rainbow emoji. So, just swap the rainbow emoji for U+26A7 and we're good to go, right?

U+200D: Zero-width joiner my beloved

There's a lot of technical details about what is and is not technically valid to do with zero-width joiners. I'm not here to say that there is a wrong way to use the ZWJ to combine codepoints, because language is always-changing, but I was there to ensure that there is no technical reason to deny the trans flag proposal. The existing and eventually accepted proposal L2/19-080 that I linked earlier, proposed a four-codepoint approach, but it lacked the trailing U+FE0F for ⚧.

You might see where I'm going with this: at the time, U+26A7 was a text-mode-only glyph. Including it in a ZWJ sequence was not valid without a following U+FE0F at the time:

fully-qualified emoji zwj sequence — An emoji zwj sequence in which every default text presentation character (ED-7) is either followed by an emoji modifier or followed by an emoji presentation selector, and there are no other emoji or text presentation selectors in the sequence.

At the time3, ⚧ was a text-mode only glyph; thus, it must be followed by an emoji presentation selector. I suggested the 5-codepoint mapping, and also advised that there should be an emoji-rendering glyph for U+26A7 U+FE0F. To my utter amazement, my feedback was followed!

The end result of this, was the 5-codepoint version of 🏳️‍⚧️ was the first one to be out in the wild. You could send it on fb messenger, or whatsapp, and it'd render as a nice emoji; you could copy/paste it to another client, and you'd get the fallback render of a flag and a symbol, similar to the lag time between 🏳️‍🌈 being an accepted sequence and vendors supporting rendering a couple years prior. The next year, the trans flag emoji proposal was accepted, citing the sequence already existing on whatsapp and rendering as a first-class emoji - no new codepoints or allocations needed; it only needed the blessing of its codepoint sequence, and an update to U+26A7 saying that it is valid to use it in emoji presentation mode (and emoji presentation sequences).

This does actually mean that now, the four-codepoint sequence for 🏳️‍⚧️ is now valid, since now that ⚧︎ can be displayed in an emoji presentation, it doesn't have to be specified as needing emoji presentation in a ZWJ sequence. It is unfortunate that the transgender approach to acceptance feels like it necessitates being technically correct beyond all reproach in order to force acceptance - looking back, I think it's a bit of a shame that the advocates pushing for 🏳️‍⚧️ through the appropriate mediums were being rebuffed for not being totally technically accurate (as they weren't including any details about U+26A7 needing emoji presentation), and that no one technical would actually work with them on getting those details corrected earlier. It wasn't until a corporation, a vendor, whatever broke ranks and just had a technically unassailable implementation, that it was included into the fold.

if there is one mark i am glad to have left on this earth, it is this 5-codepoint sequence. 🩷🏳️‍⚧️🩷

1

early transition, included health insurance, enough money to unshackle me from my family, and also the only SRE-type internship offer I could secure at the time. I spent the whole summer being repeatedly misgendered by my manager :)

2

a unicode codepoint is just a single character, glyph, etc that can take up a variable number of bytes to represent. the tl;dr is that utf-8 accomplishes this in a really cute way, using up to 7 bits per byte for the codepoint. Conveniently, this means that the low 128 of ascii (0-127) are all valid utf-8, giving us a convenient 1-byte codepoint set that's backwards compatible. Every codepoint in 🏳️‍⚧️ takes a minimum of 3 bytes to encode in UTF-8, with the first codepoint taking 4 bytes, giving us a total count of 16 bytes to encode 🏳️‍⚧️ into utf-8.

3

now, U+26A7 has an emoji presentation mode. If my understanding of the current writing is correct, then [1F3F3 FE0F 200D 26A7] would be a minimally qualified emoji, as the 26A7 is unqualified as it is still a text-mode-presentation by default.

4

The thought here was that if you sent 🏳️‍⚧️ to someone on a device that doesn't support rendering it, they should see 🏳️⚧︎, but if they copy/paste that ⚧︎ back to someone on a device that can render ⚧︎ as an emoji, it should. And also that it just simply makes sense on a technical level, to have an emoji rendering of every character making up a ZWJ sequence in case the ZWJ mapping isn't available yet - there just wasn't prior art of a new sequence mapping and new presentation mode of an existing character at the same time before.