The Itatiaia National Park (INP) located in the south-eastern region of Brazil, was designated a conservation unit in 1937 due to the unique biodiversity and landscape. However, partly because of access issues, information about natural resources such soil attributes are missing. These information are fundamental for the management of the area, in particular the soil carbon content fundamental for soil functions, ecosystem services and the environmental vulnerability assessment. The main aim of this study was modelling the vertical and horizontal soil carbon distribution. Different methods of variable selection were tested to obtain better prediction and a more parsimonious model. Generalized additive models (GAM) with a 3D smoother were used to predict the carbon distribution in 3D space. 90 soil profiles were available with 346 horizons. Leave-one-out cross-validation (LOO-CV) approach was used to evaluate the performance of the models. The result indicates that the best performance was obtained using an approach that combines expert knowledge and modelling. The selected model presented the best performance while being the most parsimonious, although the results were similar among the models tested. This model is a combination of spatial information in the 3D space (X, Y and Z [depth]), geology, remote sensing data (RapidEye images) and attributes derived from the digital elevation model. The model tends to underestimate the carbon values for depths of more than 30 centimetres in areas with low carbon contents, e.g. mineral soils, especially in pastures. The altitude field area is the areas with the highest carbon content, i.e. they have a greater capacity to store carbon, nutrients and store water. On the other hand, there are sensitive areas that should be given special attention in an environmental analysis. The areas that were predicted with lower carbon content were the ones that limit themselves to small farms north of the park, which are still under pressure from farmers and many still find pasture. In addition to the evaluation of models with metrics such as R2, RMSE and MSE, it is of paramount importance to evaluate uncertainty, especially in areas with limited access as INP, since areas with low accessibility and consequently low sample density may have high uncertainty values associated, that is, with a wide range of credible values.
Supplementary notes can be added here, including code, math, and images.