Installation

We provide a few methods to install the pipeline:

conda is the preferred method of installing the pipeline in an isolated environment with all the dependencies that you need.
pip can also be used with a virtual environment manager like virtualenv or pyenv
docker container - the most robust as it's completely self-contained

Getting the repository

You can clone the repository using git, we recommend cloning via SSH:

git clone git@github.com:restor-foundation/tcd

CUDA

As with most deep learning models, our tree crown detection pipeline will run much faster if you have some kind of ML accelerator installed on your system. Typically this is an NVIDIA GPU with CUDA drivers installed. If you don't have a GPU, you can still run the pipeline on a CPU but it will be slower.

For the installation process below, we typically install a specific version of CUDA inside a virtual environment to avoid conflicts with other things on your system. Thus while you probably have CUDA installed if you own a capable GPU, the methods below will install standalone CUDA runtimes which should not interfere with whatever's on your system.

Dependency overview

An up-to-date PyTorch install. We officially support version 2 or later, but version 1 will probably work.
GDAL for reading and writing geospatial data
Dependencies as listed in the requirements.txt file
For model support:
- Detectron2, which is required to run instance segmentation models
- The transformers and datasets libraries, which are used for model + data hosting and semantic segmentation models
- Segmentation Models Pytorch for UNet and other CNN-based segmentation architectures
For training, Pytorch Lightning

Of these, the most challenging to get "right" is a working Detectron2 install.

If you're using a GPU, you should make sure that PyTorch is installed with CUDA support. If you're using a CPU, you can install PyTorch without CUDA support. If you're using an ARM Mac then you can also use the mps backend.

Conda

Conda is an environment and package manager that is widely used within the Python community. It can create a fully isolated environment that includes system packages, and not just Python libraries. You can install conda using the instructions here.

We provide a frozen conda environment file that you can install (on Linux) as follows:

conda env create --name tcd --file environment.yml

and then install the pipeline:

pip install -e .[test]

If you run into issues when installing Detectron2 that nvcc isn't detected (typically related to the wrong version of CUDA being located), then try the following:

conda activate tcd
export CUDA_HOME=$CONDA_PREFIX
conda env update --name tcd --file environment.yml

Building your own conda env

If you need to create your own environment, or you want to try different versions of libraries then this is the process we use to create the environment.yml file. First, create and activate an environment:

conda create -n tcd python=3.12
conda activate tcd

Using conda on OSX

On Macs with ARM processors (e.g. M1/M2/M3), you should make sure that conda uses packages appropriate for your architecture:

CONDA_SUBDIR=osx-arm64 conda create -n tcd python=3.12
conda activate tcd
conda config --env --set subdir osx-arm64

For Linux/Windows users, install pytorch and CUDA:

conda install pytorch torchvision pytorch-cuda=12.1 cuda=12.1 -c pytorch -c nvidia
conda install gcc=12.3.0 gxx=12.3.0 libstdcxx-ng=12.3.0 -c conda-forge

At this point, you should check that the CUDA home path is inside the conda environment:

which nvcc # should return something like /home/josh/miniconda3/envs/tcd/bin/nvcc

Installing torch on OSX

On Macs you don't need to worry about CUDA:

conda install pybind11 pytorch torchvision -c pytorch

pybind11 is another dependency of Detectron and it seems on OS X you may need to install it manually sometimes, to build the library.

We're also aware of some issues with pyarrow and apache-arrow on newer versions of OS X. If you run into this issue, try to install everything with conda directly instead of with Homebrew.

Wait, isn't pytorch-cuda enough?

The reason that cuda is included here is that it's a requirement to build Detectron2, which depends on nvcc, the CUDA compiler. It's important that you install the same version of CUDA that PyTorch is expecting, so here we install everything in one step so that conda solves the dependencies for us. You may not need to specify the version of the cuda package, as long as you install it at the same time as pytorch.

We are working on a release of the pipeline that doesn't require Detectron2 to be installed, as it's a relatively heavy dependency.

Assuming you've cloned the repository, you can then install the requirements using pip (in principle you can also do this directly with conda, but pip is generally a lot faster).

pip install -r requirements.txt

Now you can export the environment to a file:

conda env export -f environment.yml

and install the pipeline itself:

pip install -e .[test]

Note

You will need to adjust the detectron2 requirement in the environment as it won't be a git path any more, and if you freeze the environment after installing the pipeline, you will also need to remove that because when you re-install from the environment, pip won't know where to locate it. This will be fixed once we release a pip-installable package for the pipeline.

Using pip, pyenv, etc.

Warning

It is strongly recommended that you use pip inside a virtual or conda environment. If you're on Linux, do not install the pipeline into your system environment (recent versions of Debian/Ubuntu will warn against this). That said, there are some times when this may be appropriate.

conda isn't always appropriate and it can be difficult to work with inside containers or other lean environments. Note for this method to work, you do need a system-wide CUDA install such that nvcc can be found. This is the approach that we use on GitHub for automated testing and within Docker.

Here's how to install the pipeline using "plain" Python with virtualenv (we also install GDAL which is required for some libraries like rasterio):

sudo apt install python3 python3-virtualenv gdal-bin
python3 -m venv tcd_env
source tcd_env/bin/activate
pip install --upgrade pip setuptools wheel

Getting GDAL on Mac

If you run into dependency problems on Mac, Homebrew is sometimes your friend:

brew install python virtualenv gdal

Getting things to install on OS X can be interesting. Some people find that they need Homebrew to help out, others find that it will conflict with everything you're trying to do and it's best to ignore it (e.g. you can call unset DYLD_LIBRARY_PATH to temporarily remove brew from your library path, so that your conda env gets used instead). For example if you start to see library mismatch versions or other arcane looking errors, try and let conda deal with everything. This seems to be particularly relevant on newer versions of OS X like Sequoia.

Warning

Don't be tempted to omit the last pip install command. Having wheel is critical to install Detectron2 and it turns out that it's not always included in base Python environments. If you see the following error when trying to install the requirements, it's probably because you didn't install wheel first.

ModuleNotFoundError: No module named 'torch'

Then, install torch using the latest instructions from Pytorch's website. Torch should detect your system CUDA installation:

pip install torch torchvision

check that torch is installed with CUDA support, if you were expecting it:

$ python
> import torch
> torch.cuda.is_available()
True

then as above, install the pipeline and requirements:

pip install -r requirements.txt
pip install -e .[test]

Docker

We provide Dockerfiles that have the pipeline pre-installed. We use the pytorch/pytorch base image which comes with CUDA support and has torch already installed. The Dockerfile simply adds the library dependencies on top and contains a clone of the repository.

Pulling from Github Container Repo

TBD.

Building containers:

cd docker
./build.sh Dockerfile

on ARM64 (e.g. Mac M series with ARM silicon):

cd docker
./build.sh Dockerfile_arm

Verifying the install

The most comprehensive way to check that you've installed everything is to run the test suite from the root directory of the repository:

pytest

In the process of running the tests, the training dataset will be downloaded and cached, as well as most of the models. This will take between 5-10 GB of disk space. When we release updates it's important that we check that the dataset can be automatically obtained, but if you don't want/need it, you can skip to the next section and try to run some predictions instead.

Building docs

The documentation pages you're reading now are also included in the repository, in order to build them you can run:

pip install -e .[docs]

and then

mkdocs serve

We use MkDocs with the beautiful Material theme which provides a lot of very nice features for markup.

What next?

Once you've installed the pipeline, it's time to try out some models! Head on over to the prediction documentation.