The HPC community is facing a highly disruptive situation with the advent of the manycore era. Facing this challenge imposes to redesign the entire software stack from the expression of the numerical algorithms down to the support of execution. A promising approach consists in describing the numerical algorithms as a graph of tasks where each vertex corresponds to a numerical kernel to be executed and the edges define the dependencies between the computational kernels. This graph is then processed by a runtime systems that is a software component that aims at supporting the execution of an algorithm written in a relatively high-level abstraction. It hides the complexity of the architectures to the user by scheduling the tasks on all the computational units available on node and by managing the data transfers. In this talk we will describe what is the STF model, one of the task-based programming models implemented in the StarPU runtime system ,developed at Inria Bordeaux Sud-Ouest. After introducing its distributed support (StarPU-MPI), we will show that the STF model scales up to 144 GPU-accelerated computing nodes of the TERA100 cluster of the CEA on a task-based dense Cholesky factorization. Then, we will present how we tackled the distributed memory consumption problem when scaling up dense linear algebra solvers. Finally, we will present on a simple task-based stencil application some prospective work on dynamic load balancing with StarPU.