Debugging Story: The Case of Rogue Reflection
Generics + Reflection + Type Inference = 🤯
Intro
Square is known for its open-source libraries and you've probably read about the challenges of building and supporting them.
A lesser known fact is that we also build a lot of SDKs that let developers use Square's platform to create amazing apps for sellers. I want to talk about a particular issue I was debugging recently in one of our Android SDKs - Reader SDK.
Ok, without further ado let's dive in.
The report
Two days ago I got a message from one of our developers who uses the Reader SDK. They said that the latest version of the SDK was crashing when pairing a Bluetooth reader.
Caused by: kotlin.reflect.jvm.internal.KotlinReflectionInternalError: Cannot create type for an unsupported classifier: T (class kotlin.jvm.internal.TypeParameterReference)
at kotlin.reflect.full.KClassifiers.createType(KClassifiers.kt:45)
at kotlin.reflect.jvm.internal.ReflectionFactoryImpl.typeOf(ReflectionFactoryImpl.java:123)
at kotlin.jvm.internal.Reflection.typeOf(Reflection.java:122)
at com.squareup.cardreaders.BleBackendWorkflow.onWorkerNoop(BleBackendWorkflow.kt:162)
My first reaction "Yay, it's not a Bluetooth issue!" was quickly replaced with "Oh no, we messed up the type inference somewhere!". This is going to be fun to debug...
The bug
Let’s start by enjoying the luxury of having a clear stack trace. It has a few lines of Kotlin internals that are followed by a call from “our” method - BleBackendWorkflow.onWorkerNoop()
. Its implementation and call site are pretty straightforward:
private fun <T> RenderContext.onWorkerNoop(
worker: Worker<T>,
key: String,
handler: (T) -> Unit
) = runningWorker(worker, key) {
handler.invoke(it)
noAction()
}
...
val bytes: Observable<ByteArray> = stateMachine.events.data
context.onWorkerNoop(bytes.asWorker(), "") { ... }
And here are the runningWorker and asWorker implementations:
public inline fun <T, reified W : Worker<T>, PropsT, StateT, OutputT>
BaseRenderContext<PropsT, StateT, OutputT>.runningWorker(
worker: W,
key: String = "",
noinline handler: (T) -> WorkflowAction<PropsT, StateT, OutputT>
) {
runningWorker(worker, typeOf<W>(), key, handler)
}
public inline fun <reified T : Any> Observable<out T?>.asWorker(): Worker<T> =
this.toFlowable(BUFFER).asWorker()
I don't see any problems with the onWorkerNoop
method. Android Studio is not complaining, unit tests pass, "works on my machine". Time to ask for some help!
Luckily for me, Aidan and Eric came for rescue and quickly found the problem in onWorkerNoop
: T
is not reified and the function should be inline
, otherwise type erasure makes T
as good as Any
.
Yay! Oh wait... How on Earth does it compile and run?
The reflection upon reflection
Fun fact about reflection in Kotlin on JVM, is that it is present in two different components: kotlin-reflect
and kotlin-stdlib
.
On the JVM platform, the runtime component required for using the reflection features is distributed as a separate artifact
kotlin-reflect.jar
in the Kotlin compiler distribution. This is done to reduce the required size of the runtime library for applications that do not use reflection features.On the JVM platform, the standard library contains extensions for reflection classes that provide a mapping to and from Java reflection objects (see package kotlin.reflect.jvm).
As a result, if kotlin-reflect
is present in your runtime, its ReflectionFactoryImpl will be used instead of stdlib's ReflectionFactory
Instantiation of the factory happens during runtime in Reflection.java (yup, .java
!)
ReflectionFactory impl;
try {
Class<?> implClass = Class.forName("kotlin.reflect.jvm.internal.ReflectionFactoryImpl");
impl = (ReflectionFactory) implClass.newInstance();
}
...
factory = impl != null ? impl : new ReflectionFactory();
Ok, so now we know that typeOf()
has two different implementations and based on the stack trace, the app was using the one from kotlin-reflect
. So what, the Reader SDK doesn’t depend on kotlin-reflect
anyway? How does it help us understand the problem?
Well, here's the thing. While, the Reader SDK does NOT depend on the kotlin-reflect
, the latter IS definitely being brought in by some other library transitively.
Transitive dependencies... diamond dependencies... we've bumped Kotlin to 1.4 in this release... type inference changes happened around 1.4…
Is it a non-matching lib version issue?
The culprit
The developer confirmed they're using Kotlin 1.4.10
. Reader SDK uses 1.4.32
, so we are overriding the older version during a dependency resolution. This means that kotlin-stdlib
is also 1.4.32
(or even newer).
But we're not the ones bringing in kotlin-reflect
, it's coming from somewhere else. Let's check the dependencies list and find all the kotlin
things:
implementation "com.squareup.moshi:moshi-kotlin:1.9.3"
implementation "org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.4.10"
implementation "androidx.room:room-ktx:2.3.0"
implementation 'androidx.navigation:navigation-fragment-ktx:2.3.5'
implementation 'androidx.navigation:navigation-ui-ktx:2.3.5'
moshi-kotlin
... My good old friend. Oh, the memories of getting rid of you and your transitive kotlin-reflect dependency from our SDK. You came back to make my day more interesting.
Let's see which libraries does moshi-kotlin:1.9.3
bring along to the app.
Bingo! We've got kotlin-reflect:1.3.50
running along the kotlin-stdlib:1.4.32
!
The difference
Once the difference in minor version numbers is found, everything starts making sense. Running unit tests with the kotlin-reflect:1.3.50
quickly leads to a consistent crash. Attaching a debugger, we can find the problem in our call to createType:
val descriptor = (this as? KClassifierImpl)?.descriptor
?: throw KotlinReflectionInternalError("Cannot create type for an unsupported classifier: $this (${this.javaClass})")
this
is TypeParameterReference
with name="T"
instead of being a KTypeParameterImpl
for T
. That’s not good, it means that code took different paths somewhere along the way.
Putting a breakpoint on constructors on both of those classes will make this difference obvious.
1.3.50
1.4.0
See how Reflection.typeParameter:174
is same in both stack traces, but next method - typeParameter
- is called from ReflectionFactory
in 1.3.50
, but in 1.4.0
it's now in ReflectionFactoryImpl
.
My first guess would be that this is an override of a method. Comparing code between 1.3.50 and 1.4.0 confirms this theory - typeParameter
method was overridden in 1.4.0
and now is "smarter" about the type of the parameter.
Fine, the behavior got smarter in new version. But what about backward compatibility? Why doesn't ReflectionFactory.typeParameter()
implementation from kotlin-stdlib:1.4.0
work with kotlin-reflect:1.3.50
?
The answer is pretty simple. Take a look at ReflectionFactory.typeParameter()
:
@SinceKotlin(version = "1.4")
public KTypeParameter typeParameter(Object container, String name, KVariance variance, boolean isReified) {
return new TypeParameterReference(container, name, variance, isReified);
}
Here @SinceKotlin(version = "1.4")
points at the right direction. The method doesn't exist in ReflectionFactory 1.3.50
! Furthermore TypeParameterReference
was also added to kotlin-stdlib
in 1.4.0
, so there is very little chance that kotlin-reflect:1.3.50
would know how to handle it!
At this point we've got enough evidence that kotlin-stdlib:1.4.0
ain't going to work with kotlin-reflect:1.3.50
. So all we need to do is bump the kotlin-reflect
version. It's as simple as updating its "parent" dependency - moshi-kotlin
- to the latest (as of writing) version 1.12.0
. Developer confirmed this resolved the issue.
Case closed?
The actual fix
Not yet. onWorkerNoop
should still be updated to capture the type properly by having T
marked as reified and inlining the whole method. Without this change, T
won't get inferred correctly and we will be losing information about the caller worker's type.
private inline fun <reified T> RenderContext.onWorkerNoop(
worker: Worker<T>,
key: String,
crossinline handler: (T) -> Unit
) = runningWorker(worker, key) {
handler.invoke(it)
noAction()
}
Now once we hit the breakpoint inside KClassifier.createType
we can see that our classifier is aKClassImpl
for ByteArray
instead of a vague KTypeParameterImpl
for T
.
⬇️
Finally, we're done! Worker<T>
became a Worker<ByteArray>
, which means that we are inferring the types inside onWorkerNoop
properly. This fix goes into the next release.
The afterword
As I was writing this blog post I suddenly realized that I had the unit tests around the aforementioned code fail when I refactored it into a onWorkerNoop
this spring (of course it was me who introduced this bug). But back then I didn't understand the failure and wrote it off as another Workflow quirk. Here's what happened:
Workflow uses the type of Worker
to distinguish different workers. If workers have the same type, one should provide a unique key
. I was creating three workers via onWorkerNoop
, all with different types, yet because of the missing reified
marker they all ended up the same - Worker<T>
. The error said exactly that:
Caused by: java.lang.IllegalArgumentException: Expected keys to be unique for worker com.squareup.workflow1.Worker<T>: key=""
at com.squareup.workflow1.internal.SubtreeManager.render(SubtreeManager.kt:125)
at com.squareup.workflow1.internal.RealRenderContext.renderChild(RealRenderContext.kt:61)
I failed to figure it out back then, and just added unique keys to each of the workers. I ignored this red flag and it ended up causing a crash in release. I wish I could say that this won't happen again, but it probably will. However, I'll try to pay more attention to seemingly insignificant failures.
There’s still an open question remaining: why does IDE allow signature reified W : Worker<T>
without forcing T
to also be reified
? typeOf<W>()
should be requiring both W
and T
to be reified
IMO. Let me know if you have an answer to this!
That's all for today, see you later!