MaxHop

As part of our efforts to model climate related impacts on food security in Africa, we developed and open sourced a state of the art locust hopper presence prediction model: https://github.com/jataware/maxhop.

The model uses maximum entropy (maxent), which is commonly used for species distribution modeling. Though there are many options for species distribution modeling, maxent performs better than others when it comes to presence only data: data where only the species presence has been recorded (not its absence). The locust hopper data used to train and test our model is presence only data which is the main reasoning for choosing maxent over other machine learning models. In the context of machine learning, our training data contained only true positive cases, but no true negatives.

More information about maxent can be found here: https://biodiversityinformatics.amnh.org/open_source/maxent/

How the maxent model works:

The maxent model uses environmental data along with species presence data to predict a species probability of presence distribution. “From a set of environmental (e.g., climatic) grids and georeferenced occurrence localities, the model expresses a probability distribution where each grid cell has a predicted suitability of conditions for the species. Under particular assumptions about the input data and biological sampling efforts that led to occurrence records, the output can be interpreted as predicted probability of presence (cloglog transform), or as predicted local abundance (raw exponential output).

In our use case we are most interested in the complementary log log (cloglog) output raster.

If you are interested in a deeper understanding of the maxent model we recommend reading https://web.stanford.edu/~hastie/Papers/maxent_explained.pdf.