Something you might note below is my copious use of “does not significantly influence”. I am not willing to completely rule out some popular audiophile remedies even if, by the science, they should not make meaningful differences. This is 1) based on my personal experience, and 2) a matter of principle that stems from the belief that the current science doesn’t hold all the answers to how we perceive sound.
It follows that my stance on the objective vs. subjective debate is a more pragmatic one. Allow me to state that I am 100% onboard with learning about the science behind audio and what we hear. But simultaneously, I don’t see why it has to detract into the ever-pervasive “us vs them” mentality. For example, it is a common sentiment in the objectivist community that every aspect of a headphone’s sound is based in frequency response because headphones are minimum-phase transducers. This often results in bashing of reviews that talk about subjective qualities of sound such as “dynamics”, “slam”, or “speed”. This is problematic because these are qualities that a good amount of listeners can actually perceive! Perceive is the key word here. There is nothing wrong with assigning subjective descriptors and using lingo – with the proper context – to better relate the listening experience of a headphone to readers. Heck, that’s what a good reviewer should be able to do! The goal in my writing is a marriage of subjective and objective elements to paint a more nuanced picture.
In any case, here are my beliefs on some popular audiophile topics:
- I do not believe burn-in significantly influences an IEM’s sound. One of my main issues is that the phenomenon is unanimously positive. On a mechanical level, yes, drivers should “break-in” simply because they are a moving part, but one would expect the phenomenon to be applicable both ways if such is the case (think wear and tear). It’s worth noting that changes in frequency response for IEMs have been measured over time; however, they are extremely minor. Thus, the extent of my belief in burn-in is mostly limited to one’s brain and ears getting used to the sound. More burn-in time, at least for me, really just means more time to get familiar with a given transducer’s sound and critique it more harshly: You’ll notice that my full-reviews are almost universally more critical than the first impressions.
- I do not believe that cables significantly influence an IEM’s sound. I think that to a small extent they may audibly shift frequency response, often via impedance, or intangibles via better shielding. But cable swapping to make better an already poor-sounding IEM is mostly an exercise in futility.
- I do not believe that sources or “more power” significantly influence an IEM’s sound. I absolutely believe in differences of sound between sources; however, they tend to be very minor to my ears. Generally, the only instance with which I have found “more power” to make a meaningful difference with IEMs is some that use ESTs. This is simply because these IEMs do not have the ESTs implemented correctly.
- I do believe that tips affect an IEM’s sound. Unequivocally. However, those differences tend to be minor; this is measurable. So for me, it’s more along the lines of “I don’t care enough to use anything outside what the manufacturer provides for reasons of consistency”. When you introduce aftermarket tips into the equation, there’s suddenly way more things to deal with! At what point do you stop testing tips? Someone is always going to tell you, “Oh, you didn’t test this, so your impressions are invalid”. And what if there is an IEM that’s make it or break it with a certain pair of tips? Am I supposed to tell people, “Well, shoot, you need to buy X tips with this IEM or else you effectively have a paper weight”. I’d certainly hope not. And if you did need to, I’d argue that’s a problem with the IEM itself more than anything. It’s the same idea with sources which, again, I absolutely believe make an audible difference. While tips are lower hanging fruit, there’s only a certain amount of leeway that I can afford an IEM, and (for me) this stops at the assortment of tips that a manufacturer provides.
I score on a subjective scale from 1 to 10 with the latter being the best sound I’ve ever heard. I am primarily taking into account tonality and technicalities. However, there is also a catch-all bias score so as to account for my subjective preferences.
- Most of my listening for reviews is done over a period of roughly ten hours. I rarely – if ever – hear audible differences after my first couple hours with an IEM. If I really like an IEM, or it’s mine to keep, then I’ll often throw on upwards of 50-100 hours. Impressions are done over anywhere from 10 minutes to a couple hours of listening.
- All listening is done using the stock cable and stock ear tips (if provided). If not, I default to silicon tips and a generic cable.
- All listening is done off of an iBasso DX300 or iPhone 13 Mini with lossless files.
- I rarely listen louder than 75dB, so take that for what you will if you’re a head-banger.
- I almost always try to perform direct A/B comparisons. If I do a comparison from memory, I will note so.
The Test Tracks
I think it’s pretty well-established that I do not particularly believe in transducer-to-genre synergy. My logic is that because there are so many possible variations and subsets of songs within a given genre, this means that when someone says that a transducer works well with a certain genre, they are effectively generalizing (something I’m no doubt guilty of doing nonetheless). I would prefer to avoid doing that when possible. However, I definitely believe that certain songs sound better with certain transducers, and hopefully this can lend more context to where I’m coming from when it comes to my writing. Something you’ll notice is that the moment I am listening for in a given track tends to be near the beginning. The main reason for this is because aural memory is highly fallible. Especially when I’m running A/B comparisons, I don’t want to be wasting time moving around the slider to a specific part of the song on top of swapping the IEMs.
Disclaimer: I do not always use these tracks. These tracks are mainly used for when I want to get a quick feel for frequency response and I am on the clock (for example, demoing at a show).
Trivecta, Wooli, Seven Lions – Island (feat. Nevve): The part that I am listening for here is from 2:23 to 2:50 which features Wooli’s drops. His drops have a very distinctive characteristic to them. They lean dark, rich, and with a touch of messiness. They are some of the hardest hitting drops that I have heard in EDM, and it is very easy to tell when a transducer is not slamming properly on them. The drops hit in quick succession too. A characteristic that I will often listen for is how a transducer is able to transition from drop to drop. Some transducers are able to scale the nuance of the drops, but they don’t sink as deeply or with as much aplomb as they should.
Dreamcatcher – Can’t get you out of my mind: The bassline from 0:20 onwards is a good indicator of bass texture and dynamics (this is not a particularly dynamic track as a whole, though). The bassline of this track has a thick, almost “moist” characteristic with good bounciness. A lot of transducers with poor bass texture will sound dry on this track; transducers with poor dynamics will not have this bouncy characteristic and will quickly make the bassline sound monotonous and possibly fatiguing.
FareOh – Run Away: This track opens with deep, slow bass hits from 0:06 onwards. This is a good test of bass decay, or how long a transducer allows a note to bloom. I feel that these bass notes also capture the sub-bass frequencies pretty closely, so this is a good test of extension.
LIGHTSUM – You, jam: This track is a good test of bass extension. It has a number of moments that hit 30-40Hz and where you should be able to really “feel” the bass tickle your eardrums in the left channel if a transducer is extending adequately. The most obvious times at which this occurs are at 0:09, 0:19, 1:08, 1:18, 2:08, 2:18, etc. If you listen closely, you can even hear it simmering at around 25Hz at 1:01.
Joe Nichols – Sunny & 75: I grew up listening to Nichol’s music; he has a great voice and, thankfully, he doesn’t sound like a robot on this track (I really liked his hit single in 2009, “Gimmie that Girl,” but the mastering was horrible). All I’m really listening for here is the lower-midrange. Transducers with lots of sub-1kHz presence will usually make his voice sound overly thick and blobby. This might also indicate a lack of pinna compensation or upper-midrange, although that’s more difficult to tell with just this track.
Keith Urban – Kiss a Girl: The only part I’m concerned with here is the opening electric guitar in the right channel, and then the one shortly after in the left channel. This is a very easy way to tell if there is a 3-4kHz recession or lack of pinna compensation. Transducers that recess this region too much will lose bite and the guitars’ tasteful sense of crunch.
Tim McGraw – Don’t Take the Girl / Alison Krauss – When You Say Nothing At All: I think McGraw’s voice has changed over the years, but you can definitely hear traces of nasality in this track. Transducers that have a bump at 1kHz tend to sound boxy and exacerbate this nasal quality. On the other hand, Allison Krauss has a really nice voice that has often been described as “angelic”. BA IEMs, for example, tend to make her voice sound gritty. Generally speaking, these are also both darker, slower tracks which might work well with the tonality of certain transducers but not others. IU’s – Blueming is another example of a vocalist with a smooth taper to the way her voice decays.
Loona – Eclipse: I’m really only concerned with the opening. The vocalist, Kim Lip, has a tendency to exhibit sibilance on the lyrics, “I feel sparks,” with the last word’s “s” consonance. Too much lisp to this consonance indicates to me that I’m dealing with an upper-midrange oriented IEM that doesn’t slope off adequately at ~3kHz. This is also what I usually use to evaluate the transition from 3-5kHz (upper-midrange to lower-treble for me). Kim Lip has a tendency to sound like she’s stuck in head-voice when a transducer’s transition from 3-5kHz shifts upwards. Danielle Bradbery & Thomas Rhett’s – Goodbye Summer is another track that I often use to test this transition. Danielle Bradbery’s voice has a very light, delicate balance that’s easy to throw off.
Girls’ Generation – Into the New World: This is a classic K-Pop track with lots of brightness. From 0:10 to 0:27, you can hear plenty of stick impact in the center channel. This is a quick way for me to assess if there might be a peak from 5-6kHz. A peak at 5kHz often represents itself with a certain “chalkiness” to the way the hit is articulated, like chalk being scraped upon a blackboard. A peak at 6kHz tends to sound more correct, just sharper. A wide-band elevation from 5-6kHz will generally present itself with a good sense of “weight” to these percussive hits. This track is also a way for me to assess transient speed and layering. The percussive hits from 0:10 to 0:27 hit in quick succession, and a transducer should have good distinction between each hit. There’s also good amounts of shimmer and sparkle to the backdrop going on which a transducer should be capturing; this is indicative of both extension and detail retrieval. A good example of micro-dynamics is when Soo-Young enters after Taeyeon at 2:23; her voice sounds noticeably more subtle and quiet contrasted to the volume of Taeyeon’s.
Galaxy Supernova is another test track by Girls’ Generation that I’ve used for treble in the past. I’ve mostly stopped using it, though, because most of the track’s treble brightness is based on high-pitched electronic resonances. This can make it difficult to pinpoint specific peaks and valleys in a transducer’s treble response, as there’s little basis for what these sounds are like in real-life. It’s mainly just a tolerance test for “does this IEM’s treble make my ears hurt?” at this point.
Tim McGraw – Thought About You: There is a persistent ~15kHz whine in the right channel all throughout the opening of this track. Don’t ask me what they were thinking when they mastered it like this, but it’s a quick way to tell if a transducer has adequate extension. Some IEMs that peak here will make this sound overly fatiguing too.
Soundstage & Imaging
Sawano Hiroyuki – Binary Star (ft. Uru): Sawano Hiroyuki’s tracks frequently play with staging, and there is a lot of ambiance to the opening of this track; it sounds oh-so-open. This is mainly useful for A/B purposes because it sounds more open than your average track on most transducers you listen to anyways. As it progressively gets busier, though, it makes for a good test of general layering ability. The violins from 2:58 to 3:20 are another good indicator of a transducer’s presence regions. If they fall back into the mix too much, that usually indicates a recession at 3-4kHz.
Taeyeon – Fine: I’ve talked about this track a lot before, but this remains a personal favorite for testing center image distinction and layering. Taeyeon’s voice should be forward and upfront in this track. But 2:38 onwards into the last chorus has a series of vocal overdubs with which Taeyeon’s voice comes from all parts of the center image. If you listen closely on a good transducer, you’ll even notice that her voice comes from varying depths too. Transducers with poor imaging will often cluster these overdubs and make it impossible to tell where they are coming from in the center image. Listening for the sense of space between these overdubs is also a good way for me to discern separation.
Okay, this is somewhat awkward: I actually do not listen to a lot of music with high dynamic range. If I’m being perfectly honest, this is mainly because I do not think that the dynamic range of a track has much bearing on my enjoyment. As far as I understand it (obligatory “not an expert”), dynamic range simply represents the ratio between a given track’s lowest and quietest points; it is not necessarily a representation of the actual quality of the track’s mastering. I would go so far as to argue that some measure of compression is enjoyable, as it gives tracks a sense of fullness when done correctly. And yes, I’m well aware that this might be ironic given how heavily I index for dynamics in my writing. ¯\_(ツ)_/¯
Yiruma – River Flows in You (Orchestra Version): Here, I’m listening from 2:25 to 3:34. At 2:25, the song recedes into a quiet section; this is followed by a steady rise in volume until it peaks with the violins at 3:34. Ideally, a transducer should sound quiet at the quietest part and loud at the loudest part. A transducer that does this properly has good dynamic contrast, particularly for crescendo/decrescendo (gradual) dynamic swings.
Kenny Chesney – There Goes My Life: Kenny’s older stuff tends to be pretty well mastered and with robust dynamic range. I really like how the cadence of this track builds from 1:47 to 3:15. There’s also a lot of tasteful detail that’s been left in the mastering. At 2:19, for example, you can hear that particular hit on the drum jump in volume.
Sawano Hiroyuki – e of s (2V-ALK Version): The re-mastered version actually has less dynamic range than the original, which is this one. Parts I’m looking for here are the explosions into loudness, particularly the one from 2:40 to 2:55. At 2:40, the track is at peak loudness, at 2:49 the vocalist Mizuki’s voice drops slightly, then the cadence of the track itself drops shortly before exploding into peak loudness again. This is a good way, at least for me, to tell if a transducer scales dynamic swings quickly (or with what some might call sforzando).
Sawano Hiroyuki – Tranquility / A/Z (Remastered): I’m not as fond as the original versions of these tracks. You can tell they are smoothed over in the air frequencies and that they sound somewhat congested; this leads me to prefer the remastered versions even if they might technically be more compressed on something like the DR Meter. That aside, these tracks are good for getting a general sense of a transducer’s ability to carry “weight” or a sense of physicality between dynamic swings. Tranquility has transitions from 0:50, 2:04, and 3:38. A/Z has transitions at 1:10 and 2:55, plus sports a good amount of peak loudness.
Taeyeon – I: Taeyeon has a vibrant timbre to her voice in the opening. She should sound like her voice is almost glowing. Verbal Jint’s rap verses from 0:23 to 0:44 should hit with authority and confidence. To be clear, I am not necessarily listening for frequency response on this track. This is a good test for what I usually associate with vibrancy and engagement factor; some transducers might have great resolution, but they will sound etched and boring on this track nonetheless.
One of the most interesting (and vexing) things about the audio hobby to me is not necessarily its inherently subjective nature, but rather how it extends to terminology and said terminology’s subsequent lack of structure. Nearly everyone has their own way of articulating what they hear, and most do so by making use of jargon. Jargon, mind you, that frequently has overlap with but also differs – sometimes significantly – from others’ interpretations. Heck, you could be arguing with someone about what you guys hear versus each other, when in fact you both hear something very similar!
This presents a larger problem in itself; however, it’s not exactly a fixable one given, again, the subjective nature of this hobby. And of course, there are times when what you hear differs so significantly from someone else’s impressions that you simply have to chalk it up to a difference of opinion. But frankly, it’s disconcerting when you don’t have context for the words others are using. So in this post, my aim is to better explain what I’m hearing, how I define what I hear, and what I attribute it to. As usual, I need to disclaim that for all intents and purposes, these are just my opinions. You’re totally welcome to agree or disagree with them as you see fit!
Coherency: This is something that often comes up when talking about hybrid IEMs, or IEMs that make use of separate driver types. Because these IEMs mix-and-match driver types that are attacking and decaying at different speeds, perfect coherency is very difficult to achieve. Like so, a lack of coherency represents being able to discern between different drivers handling their respective parts of the frequency response. This perception might also be baked into the frequency response. For example, a recession at 200Hz combined with an elevated sub-bass shelf can often beget a perceptively disjoint characteristic in the bass response.
Dynamic contrast: Let’s first establish what “dynamics” are in music. I often see this word mistaken for the equivalent of how hard a transducer slams (or just thrown around as some sort of catch-all term for bass); however, the word really has a quite different meaning! Dynamics are the variations in loudness in a given track. It’s probably no more complicated than what you’re already thinking. A “dynamic swing” is simply a transition between a decibel peak (loud section) and valley (quiet section), or vice versa. Then as the prefix “macro” implies, macro-dynamics are large-scale swings. They encompass a song exploding into loudness or suddenly shifting into a quiet section; of course, these decibel shifts can also occur more gradually (crescendo vs. decrescendo). On the other hand, micro-dynamics are more intimate swings, so the nuance of individual instrument lines and, say, vocal inflections.
It follows that dynamic contrast is the extent to which a transducer is able to scale the difference between a track’s peak loudness and minimum amplitude. Of course, not every transducer is able to do this well. Some transducers sound like they’re always on peak loudness, some skew in the opposite direction, and some don’t seem to go either way entirely! The end result is what I refer to as dynamic compression. Only a transducer that does none of the above – that is, scales decibel gradations anywhere from low to high – can be considered to have good dynamic contrast. A pro tip if you want to find out for yourself? A hallmark of a transducer with good dynamic contrast is one where you find yourself turning up the volume on quiet sections of tracks and, conversely, turning down the volume on louder parts of tracks.
Imaging: This is a piece of audio jargon that’s frustrated me to no end, and it’s not hard to see why. Like many terms slung around in the audio world, “imaging” has as many definitions as there are opinions, and one won’t find a concrete definition of what it constitutes. If we look up the Wikipedia definition (probably the most official one we’re going to find) of the word, we see that imaging “refers to the aspect of sound recording and reproduction of stereophonic sound concerning the perceived spatial locations of the sound source(s), both laterally and in depth”. It’s not hard to see why, for most listeners, soundstage – a more straightforward term, the distance a transducer is able to create between the listener and the instruments in a track – is a distinct characteristic of sound. But give it a little more thought. By definition, the perception of soundstage is crafted by positioning; therefore, soundstage is a derivative of imaging. Likewise, the following terms can be considered to fall under imaging:
- Positional accuracy is a term most readers are probably more familiar with because it aligns closely with the colloquial definition of imaging. This is the degree to which a transducer is able to localize instruments on the soundstage; then, the degree to which a listener can pinpoint them.
- Layering, often used interchangeably with the term separation, is the sense of physicality and space between instruments on the stage. Generally, it is also indicative of the extent to which a transducer is able to deconstruct individual instrument lines and give them a defined spot on the stage. You can see this has overlap with positional accuracy.
- Holographic imaging is a term that’s thrown around far too generously in my opinion. This is the perception with which instruments – usually percussive ones – “float” on the soundstage. By extension, this plays into soundstage height and the way a transducer shapes the walls of the stage.
Here’s an article by Crinacle that defines how he breaks down transients into their attack and decay functions. In terms of how I perceive notes, I would define them the same way. But I think we might disagree on what results in these perceptions and what constitutes detail: I don’t really subscribe to the idea that resolution (or “true” detail and any other similar descriptor) is separate – at least certainly not entirely – from frequency response these days.
In fact, clarity, the nature of how sharply a note’s attack is articulated, is mostly just a product of frequency response to me. I’ve heard cheap IEMs (eg. CCA CRA) that have terrific clarity from boosting the upper-midrange and treble regions. However, when you boost these regions, it sounds unnatural relative to what you’d hear in real life, so many people perceive this as “fake” detail.
Outside of the attack function, I think there’s a gray area where some people associate graininess in decay with being “true” detail. Here, I refer to grain, or texture, being the idea that transients are decaying in quick succession. I mostly disagree with this interpretation of detail. If you listen to studio monitors, or better yet singers in real life, this grain doesn’t exist – or at least it doesn’t to my ears. So what might this grain really be? Well, when you examine the frequency response of IEMs with this grain, they almost universally sport either a lack of air or a dipped region in the treble.
That brings us to the question of what is true detail? I think that “true detail” is 1) when FR closely mimics what we’d hear in real-life, or 2) when aberrations in FR are tasteful and don’t detract – but enhance – what we’d hear in real-life. When details in a track pop out at us on a given IEM, for example, I think that’s mostly just the product of aberrations in FR. This is why, even when you hear something new in a track you’ve never heard before on your new IEM, you’ll notice you now hear the same detail on your previous IEM(s) if you listen closely – unless those other IEMs are completely dipped in that region of the FR. Of course, there’s always case examples that contradict this idea. I’ve heard IEMs and headphones that have excellent frequency responses (and stellar extension on both ends) but that don’t sound remotely detailed. That’s where I’m inclined to say there’s more to detail than just frequency response.
Some measure of ambiguity is always going to be present concerning detail. Everyone perceives it differently. To me, it’s mostly a gut feeling when it comes down to two very good IEMs. For example, it’s difficult to draw a distinction between the U12t and the Helios. The Helios has a leaner frequency response and more linear treble, but the U12t has more upper-treble. How can I possibly say with certainty which is more detailed? For some people who cannot hear the U12t’s upper-treble peaks, the Helios will be more detailed even if I hear otherwise. In any case, the idea that two IEMs that use the same driver will have the same resolution doesn’t have much merit to me because there could be significant differences in how the drivers are implemented and their resulting frequency responses.
Slam vs. Impact: My thoughts on the distinction between these words align very closely with Fc-Constructs; however, I do have a tendency to sue them more interchangeably. Both are most applicable within the context of bass. Slam has more to do with the sharpness of transient attack and how explosive a note sounds. Impact, to me, is more indicative of the amount of sheer air that is being pushed and the sense of “weight” one feels when bass notes hit and when a dynamic swing is articulated.