One of the most interesting (and vexing) things about the audio hobby to me is not necessarily its inherently subjective nature, but rather how it extends to terminology and said terminology’s subsequent lack of structure. Nearly everyone has their own way of articulating what they hear, and most do so by making use of jargon. Jargon, mind you, that frequently has overlap with but also differs – sometimes significantly – from others’ interpretations. Heck, you could be arguing with someone about what you guys hear versus each other, when in fact you both hear something very similar!
This presents a larger problem in itself; however, it’s not exactly a fixable one given, again, the subjective nature of this hobby. And of course, there are times when what you hear differs so significantly from someone else’s impressions that you simply have to chalk it up to a difference of opinion. But frankly, it’s disconcerting when you don’t have context for the words others are using. So in this post, my aim is to better explain what I’m hearing, how I define what I hear, and what I attribute it to. As usual, I need to disclaim that for all intents and purposes, these are just my opinions. You’re totally welcome to agree or disagree with them as you see fit!
I mostly prioritized more balanced, resolving sound signatures. However, I’ve also found that these types of signatures generally don’t give me the “musicality” that I’m looking for, and in cases such as these, I tend to shift closer towards an L-shaped sound signature. I do like my bass, after all, and I’m not much of a treble-head. More decay in the bass is almost always a plus for me. I like my vocals to be on the slightly thicker side of things, and for them to be imaged further back on the soundstage – yeah, I’ll talk about that below. I also enjoy a moderate amount of coloration, warmth to my IEMs.
I mainly rotate through about five genres of music including 1) Weeby K-Pop/J-Pop, 2) Pop, 3) Country music (yeah, bet you didn’t see that one coming), 4) EDM, and 5) Instrumentals and music scores.
Six months ago, I couldn’t have articulated what I was hearing with IEMs to save my life; after all, I was still a newcomer to the hobby. And hey, I still am. I think it’s important to recognize personal limitations, so neither will I be going crazy in-depth with these explanations, nor do I pretend to understand all the science behind what I hear. In general, I understand technical ability – what I’ll be covering here – to be everything that is beyond the scope of what is reflected on a frequency response graph.
Bass texture: This is indicative of a sort of micro-splicing, fettering in the decay of the bass. To me, it’s part of what gives a good DD bass response that raw, natural quality. And like so, I find it to be one of the key distinctions between a DD and a BA driver; BAs often decay too quickly to produce it and have a “smoothed” quality to their bass. However, this isn’t a rule of thumb; there are certainly exceptions. The CA Solaris 2020 is a good example of a DD with very little texture; conversely, the 64 Audio U12t is an all-BA IEM that has a decent amount of texture. While it’s not the end-all, be-all for what defines a good bass response in my eyes, it is a large component of it.
BA timbre, note density, and timbral coloration: Ah man, timbre. There’s different types of timbre that can generally be attributed to the pattern of decay. However, I tend to just group everything under a catch-all: Timbral coloration. The way I understand timbral coloration might also differ from canon. I like to simply visualize everything I’m hearing as being in a photo. Anything that obscures what I’m seeing, or rather, hearing, is what I define as “coloration”. But there’s good and bad coloration. Warm coloration can be a good thing and akin to throwing say, a lens filter, over the photo. It keeps things musical and from getting too clinical. IEMs that I think do this well include the 64 Audio Nio, Sony IER-M9, and Moondrop KXXS. Of course, you can also have excessive amounts of warmth, and in the context of my photo example, this would be like slapping on filter after filter. With IEMs, this can come at the expense of technicalities; the CA Solaris 2020 and Jomo Audio Trinity Brass exemplify this. On the other hand, the Vision Ears VE8 is a very warm IEM that circumvents this trait, but comes with the plain bad type of coloration: BA timbre.
I’m not going to debate whether BA timbre is a result of the frequency response or the driver type itself; what matters most to me is that it’s most definitely present. So what is it exactly? This is a type of “plastickiness” that presents itself with a lack of density to notes. Oftentimes, this is noticeable to my ears in bass responses. When you hear a drum kick or a drop smack down, you expect to hear some semblance of weight behind it. But BAs have a tendency to neuter this density as well as the decay function. I think a lot of people notice it in the treble too; this is important because treble extends a lot to the overall character of an IEM, ie. bright, warm, dark. And it doesn’t help that BAs tend to roll-off in the treble. Sometimes, this dives straight into smothered, BA “artifact” territory – the Thieaudio Legacy 3 and Empire Ears Wraith (yikes, this IEM is meme) are good examples of this. Other types of coloration can include “metallic” and “plucked” qualities to the timbre. The Dunu Luna is characterized by the former, the Audeze LCD-i4 by the later.
Coherency vs. Cohesiveness: I’ve used these interchangeably in the past, and I think it’s about time I set concrete definitions as I’ve mixed them up myself. Coherency is something that often comes up when talking about hybrid IEMs or IEMs that make use of separate driver types. Because one is mix-and-matching driver types, perfect coherency is very difficult to achieve. Like so, for me it represents being able to discern between different drivers handling their respective parts of the frequency response. Examples of IEMs with poor coherency include the 64 Audio Nio, the Vision Ears Elysium, and the Jomo Audio Trinity Brass.
Conversely, cohesiveness is a bit more of a catch-all term that represents how everything syncs together in practice. AKA, while I don’t quite know how to explain it exactly, dang, I really like something about this IEM – or vice versa. And here, I need to eat my words about the CA Solaris 2020. It is indeed very coherent for a hybrid in the sense that I can’t discern the separate driver types; however, I would not consider it cohesive at all. The textureless bass, gritty midrange, and sparkly treble are all over the place and impress the notion of something less than refined – hence it lacks cohesiveness to my ears.
Imaging: There’s a lot to talk about here; I see this thrown around hand-in-hand with soundstage often. As I understand it, though, imaging is how an IEM’s two channels create a three-dimensional sense of space, shaping the “room” around you. Thus, soundstage is actually a derivative of imaging. IEMs that are able to shape said room are what I qualify as being “holographic”; the Campfire IEMs and the Sony IER-M9 are good examples. Personally, I also extend imaging to a couple of other points:
1) Positional cues. This represents the clarity – not to be confused with resolution – with which one is able to discern where specific instruments are coming from locationally on the stage. Most top-tier IEMs have very good positional cues even if I would not consider them to have holographic imaging or the following point, projection.
2) Projection or diffusal capability. This one is a little trickier, and represents the amount of distance a transducer is able to perceptually create between the stage and oneself. To an extent, this goes hand-in-hand with soundstage size. But for me, this often ties itself with vocal placement. IEMs that image the vocals further back are able to create a greater sense of depth; the 64 Audio U12t and Etymotic ER2XR are prime examples of this. I would not, however, go so far as to say either has particularly holographic imaging or even great imaging at that, particular the ER2XR. The ER2XR is very narrow in its staging and the U12t lacks soundstage height to my ears. Conversely, IEMs I would consider to project vocals poorly include the Sony IER-M9, qdc Anole VX, and Etymotic ER3XR. All of which have that “in-your-head” effect – for vocals specifically – to my ears, despite having decent stage size in the VX’s case. You’ll also note that I do consider the IER-M9 to have holographic imaging; these are, again, distinct components of imaging to me.
Macro-dynamics: These are the minute decibel shifts in a recording and how an IEM scales them. Think of riding a roller-coaster for example. You want to move quickly, but you also want to hit all those build-ups, peaks, and free-falls, or the fun is lost. IEMs that do this well include the DUNU Luna, 64 Audio U12t, and Moondrop Blessing 2. Most BA IEMs are indeed quite fast, but I find they fail to scale these shifts, resulting in something that I call “compression”. Examples of IEMs that I would consider to be compressed to varying extents include the CA Andromeda 2020, qdc Anole VX, and Apple Airpods Pro (to make a point that this isn’t limited to just BA IEMs).
Resolving capability: For me this is a joint term that takes into account pure resolution, layering capability, and detail retrieval.
1) Pure resolution is the nature of how crisp a note is, how cleanly it is articulated. This (and resolving capability as a whole) generally goes hand-in-hand with transient speed; that is to say, an IEM’s attack function. In general, BAs tend to be quicker than their DD counterparts; IEMs with a quicker attack are thus able to better flesh out notes. This is not at all a concrete rule or necessarily a good thing, however. Being too fast can result in an unpleasant grittiness to notes – the Fearless Audio S8P’s midrange for example – or being straight up fatiguing like the qdc Anole VX’s transients (to my ears) despite having high levels of resolution. Then you have IEMs like the Sony IER-Z1R and M9 which seem to be slower yet have surprisingly good resolution.
2) Layering capability is essentially the equivalent of separation, and it has some overlap with positional cues. Basically, do instruments ever jumble or smear? IEMs with more open soundstages and “air” between notes generally do this better for obvious reasons. The CA Andromeda 2020, Audeze LCD-i4, and qdc Anole VX all excel at this. However, some IEMs with smaller stages like the 64 Audio U12t and Dunu Luna also do this well.
3) Detail retrieval is a bit different from these other two terms, and simply reflects an IEM’s capacity to, well, force hammer out smaller details in a track and pull them to the forefront of the sound. Pretty self-explanatory I think.
These are working definitions, and I’ll likely adjust them in the future as my understanding develops or more nuances pop-up. Hopefully you guys find this helpful, as I know I’ve thrown around some of these terms before without much context. Heck, I don’t even know if I fully understood the context myself at times, and I had to do some thinking on how I wanted to explain some of this stuff.