Computational science and engineering applications such as Partial Differential Equations (PDE) solvers have demanded high performance computing for decades. Such applications are often implemented using message passing paradigms such as MPI. MPI has scalable performance on distributed memory machines, and many libraries such as PETSc and OP2 focus on improving the ease in which PDE solvers can be developed with MPI. The problem is that now each node in a distributed memory machine has multicore or manycore parallelism, where shared memory parallelism needs to be specified and data locality is the main bottleneck for scalable performance. In this talk, I will overview a number of projects in my research group where we work toward juggling parallelism, data locality, expressivity, ease of use, and correctness in the context of high-performance PDE solvers. Our aim is to provide higher-level programming abstractions for algorithms and implementation details and have the compiler automate much of the complex juggling needed.