The remarkable empirical success of Bayesian additive regression trees (BART) has raised considerable interest in understanding why and when this method produces good results. Since its inception nearly 20 years ago, BART has become widely used in practice and yet, theoretical justifications have been unavailable. To narrow this yawning gap, we study estimation properties of Bayesian trees and tree ensembles in nonparametric regression (such as the speed of posterior concentration, reluctance to overfit, variable selection and adaptation in high-dimensional settings). Our approach rests upon a careful analysis of recursive partitioning schemes and associated sieves of approximating step functions. We develop several useful tools for analyzing additive regression trees, showing their optimal performance in both additive and non-additive regression. Our results constitute a missing piece of the broader theoretical puzzle as to why Bayesian machine learning methods like BART hav e been so successful in practice.