Great Job!!
I did something similar a while back (albeit without controller ) using native FBTFT kernel module for the frame buffer and SDL abstraction layer. This allowed me to use chocolate-doom for doom1/2 etc and ALSA audio.
I had the same issues with performance, although I was using SPI connection.
Here's a link if your interested.
https://communities.intel.com/thread/57693?start=0&tstart=0
You can add the panel you used to FBTFT (if not already there). Would be interesting to see if you get the performance your after since FBTFT is kernel module based and using GPIO directly.
Well done again!
Stew