As someone who grew up in libraries, and then went on to work on the youtube cdn for a while, my opinions on digital media organization have evolved pretty significantly. I'll be outlining my personal media organization principles, some observations on the differening nuances between media types, physical media ripping & tagging, and some more here.
The hope here is to show people that their own, local media server is accomplishable with just a bit of intentful organizing underpinning it. I'll also have a little bit about some miscellaneous playback hurdles of UHD blu-rays and audio formats, as a treat.
Music
My general thought here is that most users turn to Streaming Music providers due to scaling limitations with the "traditional" music organization structure: {artist}/{album}/{track}. Once you get into the 1000 - 10,000 artist range, it becomes difficult to interact with the library, outside of perhaps some specific few artists. The first problem to tackle is: how do you organize your music files themselves? Certainly not like that.
Organization Principles
I'd been lugging around a folder full of .mp3s and .wavs and .oggs for upwards of a decade, when I made the mistake of using beets to try and manage it. After beets crashed mid-run, I found myself needing to re-build my library from scratch. The easiest way to do this 'data warehousing' was to organize all my music by "where did this come from". My directory structure looks something like this:
- audio
- physical
- bd
- cassete
- cd
- vinyl
- digital
- archive.org
- bandcamp
- indie
- nielcic
- msx
- rips
- sets
- steam
- takeouts
- itunes
- misc
Astute readers will note the load-bearing misc folder as the holding bag for all the old/legacy files. The goal is to largely de-duplicate those against the other, better organized source directories over time. Dumping your entire library into a misc folder is a great starting point going forward, at the least.
Broadly, this enables a few different long-term strategies:
- rebuilding after data loss becomes easier to batch
- you can handle duplicates/overlaps cleanly, especially by picking at the top-level "source" level, e.g. bandcamp over steam as sometimes steam releases are poorly tagged mp3s.
- source/file progeny are usually not tagged, but this way the information is persisted
The flip side is that the relatively flat {artist}/{album}/{track} structure is useful as a 'browsing catalogue' of a library. Thankfully, this is also cheaply accomplished with hard-links.
Hard-linking and libraries
If you're not familiar with hard-linking and inodes, the gist is that an inode is the underlying "file" full of bytes, while the "link" is just the full path in the filesystem, e.g. /mnt/pool/library/soundtracks/Takeharu Ishimoto/すばらしきこのせかい + The World Ends with You/13. Three Minutes Clapping.flac. Hard-linking is just a way to provide access to the one file at two (or more) paths in a file system. Sym linking or soft linking are weaker versions of this.
I ended up writing my own lil solution, which lets me configure a directory (and its descendents) to be hard-link deployed to one or more libraries - e.g. my Steam directory goes to Soundtracks, while bandcamp and indie go straight to music (with some soundtrack-releases overridden manually).
Ripping and Tagging
I really like navidrome's tagging guidelines, particularly the parts around multi-value tags.
Ripping CDs is pretty straightforward in this era: Musicbrainz exists, and you can look up discs by hashing their Table of Contents and matching with the track lengths of the ToC. Whipper has been my preferred solution here; on the occasion a disc doesn't have a Musicbrainz match, Whipper gives me a nice URL to just go do some data entry at. Some stuff already has releases on MusicBrainz (but no TOC hash attached), some stuff needs data entry.
For digital releases, things get a bit less clean. Chromaprints exist as a way to roughly fingerprint files, and AcoustID exists as a service for mapping chromaprints to musicbrainz recoridng IDs. However, this is insufficient for identifying a track confidently: a single recording may have numerous releases (e.g. tracks appearing in compilation albums), so the appropriate pipeline looks something like:
- chromaprint all files locally
- run them through acoustID for musicbrainz Recording IDs
- for every Release that our recordings might be a part of, run Kuhn–Munkres, picking from files in the same directory (or neighbor directories, if a release has multiple chapters/mediums/whatever you want to call them), forming potential optimal packings of each release.
- do some rounds of graph-theory MIS to achieve the most coverage of all of our files
There's a good bit there I'm glossing over. The takeaway is that really, no off-the-shelf software besides Musicbrainz Picard does all of that pipelining close to properly, as far as I'm aware.
Local Musicbrainz & Sonic Analysis
I've found that a local musicbrainz replica is both really easy to set up, and just generally critical for doing any local tag-matching. The rate limits on the public replicas are pretty low (due to scrapers etc, afaik).
My disc-ripping pipeline is still set up to do the disk TOC-hash lookups against the public instance, as that way I can do data entry on previously-unmatched discs and then re-rip without needing to wait for my local replica's hourly sync. The rest of the tag-matching pipeline work works off of the local replica after getting acoustID matches. Unfortunately, there's no easy way to run an acoustID local replica last I checked.
On the sonic analysis side of things, I use audiomuse-ai for doing local analysis, with their navidrome plugin to complete the loop and glue things together with the instant mix/radio mix results. It's pretty alright - the musiCNN and LAOIN-CLAP stuff is okay; I'm not going to pretend to know the most about either, but I've been thinking about trying to hack something together using MERT or MuQ to see how they stack up.
For now, the audiomuse parts are totally fine for giving me a spotify-like shuffle through my Very Large Music Library. It's a bit much for my purposes - I found the playlist generation etc to be of poor to mid quality, and at some point they enabled a local Whisper lyrics-transcription feature that massively slowed down my analysis steps.
Misc Notes
For all the non-musicbrainz-matched files (e.g. long tail of indie music, bootlegs, mixtapes, live set mp3s, etc), there's still plenty of work to do. The classic problem I encountered was small levenshtein distance differences in tags, e.g. DragonForce vs Dragonforce, but there's a handful of locally-bulk-fixable tag issues:
- Emitting navidrome-style multi-value ARTISTS tags from a single 'flat' ARTIST tag (e.g. keeping
ARTIST=deadmau5 vs Melleefreshand addingARTISTS=deadmau5 ARTISTS=Melleefresh - Same thing for GENRE tags, as a lot of released music or other tag managers will mis-render Electronic Metal as
GENRE=Electronic, Metalrather thanGENRE=Electronic GENRE=Metal - Albums with inconsistent ALBUMARTIST tags between tracks, preventing them from rendering as one album in clients
- Navidrome-styel ALBUMARTISTS for individual credits vs ALBUMARTIST for the overall rendered string, is one again to the rescue.
The whole problem of identifying genuine duplicates is an entire other can of worms. Again, the organization strategy at the directory level helps immensely in tackling these problems at scale.
Videos
Videos are the relatively easy category when it comes to organizing and tagging, as most of the cost comes from the actual storage overhead. Filenames alone are largely sufficient for mapping to a given Movie or TV show episode - as the production floor is much higher, and there's both a good quantity of data available in TMDB, TVDB, etc. The problem is actually getting those files.
The organization side of this house is much easier: again, by progeny (dvd, blu-ray, uhd-blu-ray, web), and sonarr/radarr are pretty reasonable for managing the hard-links side of things.
Ripping and Naming
The hard part here is producing correctly organized rips from local media. DVDs and Blu-Ray discs only really give you a Disc Label and 'titles' to work with - and the list of titles isn't necessarily in any sensible order related to episode order, nor will the first title be the full movie, etc etc. Thankfully, identifying the correct title to rip on a movie disc is pretty easy, as duration tends to be sufficient.
There's a couple ways to consistently identify and match titles to episodes:
- take existing subtitles, pull down (and cache) reference subs from opensubtitles
- do local transcription iff no subs are in the file, but reference subs exist
Generally, the hard-er part is determining which titles to rip in the first place. TV episodes are usually of a relatively consistent length, but double-length feature episodes (season premieres, season finales, etc) are both plentiful, and typically inconsistently tagged on external DBs. Additionally - and particularly in the case of my BSG box set - there's uncut / non-broadcast editions of episodes, which don't show up in online listings at all. My solution here was to simply rip the uncut editions and use those as 'upgrades', rather than try and rip both and try to fit htem both to the existing Season structure.
A note on formats
Playback of video formats tends to have two main problem cases: hardware decode of video codecs, and audio format licenses.
The video playback side of things is pretty fine. AV1 with trueHD might cause some problems down the line, but I suspect most BD players only really barely support HEVC (H.265). H.264 (AVC) has thankfully achieved widespread compatibility. HDR and Dolby Vision passthrough largely have just worked without any consideration.
As it turns out, most/all blu-rays will use TrueHD for their 'best' audio stream. This is especially relevant for Dolby Atmos. Both my smartTV and Apple TV refused/were incapable of simply passing through the trueHD audio bitstreams to my AVR - from what I can tell, purely due to licensing reasons - since they could pass through E-AC-3 and basically every other format available. My old Nvidia Shield TV (2019) does the direct passthrough to my AVR just fine, though, as does any sensible Linux HTPC, as far as I'm aware.
Swapping down to the basic 5.1.0 E-AC-3 will usually work fine, but I regrettably have noticed the mixing just isn't as good and dialogue volume is distinctly quieter when I switch from the 5.1.2 mix to a 5.1 mix on the same disc. To put on my "please be patient I have autism" hat for a moment, I've regrettably kinda found the full 5.1.2 to be necessary for a good home listening experience - partially due to the amount of audio that is "lifted" out of the main sound plane, and partially due to the resulting volume mixing being, well, better.
The thing about Dolby Atmos is that it is unfortunately the only real meaningful improvement beyond 2.1.0 mixing that isn't just about "having an audible dialogue channel in the center channel" - the whole jazz about how individual snippets of audio get spatially placed into the room, is pretty meaningful of an advancement, in my opinion. Dialogue audio comes from the middle of the room, while weather noise is overhead, and crowd audio is a bit more distant or otherwise coming from the edges of the room. I usually struggle a bit with auditory processing on media and am incredibly reliant upon subtitles to follow along, but, for example, Game of Thrones is much easier to audibly parse.
The frustrating part is that it requires 7 speakers (or 6 if you just, don't have a subwoofer?) with two of them being ceiling-mounted or upfiring. The floor for entry here is tremendous, and then you hit the fun problem of "the appleTV and the tv itself only support the netflix-tier codecs and not the UHD blu-ray raw TrueHD". Proper HTPCs shouldn't have an issue with any of this, though, afaik, since their HDMI drivers pretty consistently handle bitstreaming out the audio formats if the AVR advertises the capabilities - which largely, they do.
Closing Notes
Trying to combat tag-and-ID entropy feels like an impossible struggle, but I believe it's reasonably accomplishable in this day and age. TV/Movie organizing is easiest - with the hardest part being the actual rip-from-disc step - espeically thanks to jellyfin's front-end "identify" button for re-classifying anything that was mis-classified on the first step.
Music remains the hardest, especially when it comes to genre tags and identifying the exact correct release. Thankfully, we live in an era of doing local bulk-computations, so I think the hardest part is the UX of tag-correction tooling more than anything else.