Installation
We provide a few methods to install the pipeline:
conda
is the preferred method of installing the pipeline in an isolated environment with all the dependencies that you need.pip
can also be used with a virtual environment manager likevirtualenv
orpyenv
docker
container - the most robust as it's completely self-contained
Getting the repository
You can clone the repository using git
, we recommend cloning via SSH:
git clone git@github.com:restor-foundation/tcd
CUDA
As with most deep learning models, our tree crown detection pipeline will run much faster if you have some kind of ML accelerator installed on your system. Typically this is an NVIDIA GPU with CUDA drivers installed. If you don't have a GPU, you can still run the pipeline on a CPU but it will be slower.
For the installation process below, we typically install a specific version of CUDA inside a virtual environment to avoid conflicts with other things on your system. Thus while you probably have CUDA installed if you own a capable GPU, the methods below will install standalone CUDA runtimes which should not interfere with whatever's on your system.
Dependency overview
- An up-to-date PyTorch install. We officially support version 2 or later, but version 1 will probably work.
- GDAL for reading and writing geospatial data
- Dependencies as listed in the
requirements.txt
file - For model support:
- Detectron2, which is required to run instance segmentation models
- The
transformers
anddatasets
libraries, which are used for model + data hosting and semantic segmentation models - Segmentation Models Pytorch for UNet and other CNN-based segmentation architectures
- For training, Pytorch Lightning
Of these, the most challenging to get "right" is a working Detectron2 install.
If you're using a GPU, you should make sure that PyTorch is installed with CUDA support. If you're using a CPU, you can install PyTorch without CUDA support. If you're using an ARM Mac then you can also use the mps
backend.
Conda
Conda is an environment and package manager that is widely used within the Python community. It can create a fully isolated environment that includes system packages, and not just Python libraries. You can install conda
using the instructions here.
We provide a frozen conda
environment file that you can install (on Linux) as follows:
conda env create --name tcd --file environment.yml
and then install the pipeline:
pip install -e .[test]
If you run into issues when installing Detectron2 that nvcc
isn't detected (typically related to the wrong version of CUDA
being located), then try the following:
conda activate tcd
export CUDA_HOME=$CONDA_PREFIX
conda env update --name tcd --file environment.yml
Building your own conda env
If you need to create your own environment, or you want to try different versions of libraries then this is the process we use to create the environment.yml
file. First, create and activate an environment:
conda create -n tcd python=3.12
conda activate tcd
Using conda on OSX
On Macs with ARM processors (e.g. M1/M2/M3), you should make sure that conda
uses packages appropriate for your architecture:
CONDA_SUBDIR=osx-arm64 conda create -n tcd python=3.12
conda activate tcd
conda config --env --set subdir osx-arm64
For Linux/Windows users, install pytorch and CUDA:
conda install pytorch torchvision pytorch-cuda=12.1 cuda=12.1 -c pytorch -c nvidia
conda install gcc=12.3.0 gxx=12.3.0 libstdcxx-ng=12.3.0 -c conda-forge
At this point, you should check that the CUDA home path is inside the conda environment:
which nvcc # should return something like /home/josh/miniconda3/envs/tcd/bin/nvcc
Installing torch on OSX
On Macs you don't need to worry about CUDA:
conda install pybind11 pytorch torchvision -c pytorch
pybind11
is another dependency of Detectron and it seems on OS X you may need to install it manually sometimes, to build the library.
We're also aware of some issues with pyarrow
and apache-arrow
on newer versions of OS X. If you run into this issue, try to install everything with conda
directly instead of with Homebrew.
Wait, isn't pytorch-cuda enough?
The reason that cuda
is included here is that it's a requirement to build Detectron2, which depends on nvcc
, the CUDA compiler. It's important that you install the same version of CUDA that PyTorch is expecting, so here we install everything in one step so that conda
solves the dependencies for us. You may not need to specify the version of the cuda
package, as long as you install it at the same time as pytorch
.
We are working on a release of the pipeline that doesn't require Detectron2 to be installed, as it's a relatively heavy dependency.
Assuming you've cloned the repository, you can then install the requirements using pip
(in principle you can also do this directly with conda
, but pip
is generally a lot faster).
pip install -r requirements.txt
Now you can export the environment to a file:
conda env export -f environment.yml
and install the pipeline itself:
pip install -e .[test]
Note
You will need to adjust the detectron2
requirement in the environment as it won't be a git path any more, and if you freeze the environment after installing the pipeline, you will also need to remove that because when you re-install from the environment, pip won't know where to locate it. This will be fixed once we release a pip-installable package for the pipeline.
Using pip, pyenv, etc.
Warning
It is strongly recommended that you use pip inside a virtual or conda environment. If you're on Linux, do not install the pipeline into your system environment (recent versions of Debian/Ubuntu will warn against this). That said, there are some times when this may be appropriate.
conda
isn't always appropriate and it can be difficult to work with inside containers or other lean environments. Note for this method to work, you do need a system-wide CUDA install such that nvcc
can be found. This is the approach that we use on GitHub for automated testing and within Docker.
Here's how to install the pipeline using "plain" Python with virtualenv
(we also install GDAL which is required for some libraries like rasterio
):
sudo apt install python3 python3-virtualenv gdal-bin
python3 -m venv tcd_env
source tcd_env/bin/activate
pip install --upgrade pip setuptools wheel
Getting GDAL on Mac
If you run into dependency problems on Mac, Homebrew is sometimes your friend:
brew install python virtualenv gdal
Getting things to install on OS X can be interesting. Some people find that they need Homebrew to help out, others find that it will conflict with everything you're trying to do and it's best to ignore it (e.g. you can call unset DYLD_LIBRARY_PATH
to temporarily remove brew from your library path, so that your conda env gets used instead). For example if you start to see library mismatch versions or other arcane looking errors, try and let conda deal with everything. This seems to be particularly relevant on newer versions of OS X like Sequoia.
Warning
Don't be tempted to omit the last pip install
command. Having wheel
is critical to install Detectron2 and it turns out that it's not always included in base Python environments. If you see the following error when trying to install the requirements, it's probably because you didn't install wheel
first.
ModuleNotFoundError: No module named 'torch'
Then, install torch
using the latest instructions from Pytorch's website. Torch should detect your system CUDA installation:
pip install torch torchvision
check that torch
is installed with CUDA support, if you were expecting it:
$ python
> import torch
> torch.cuda.is_available()
True
then as above, install the pipeline and requirements:
pip install -r requirements.txt
pip install -e .[test]
Docker
We provide Dockerfiles that have the pipeline pre-installed. We use the pytorch/pytorch
base image which comes with CUDA support and has torch
already installed. The Dockerfile simply adds the library dependencies on top and contains a clone of the repository.
Pulling from Github Container Repo
TBD.
Building containers:
cd docker
./build.sh Dockerfile
on ARM64 (e.g. Mac M series with ARM silicon):
cd docker
./build.sh Dockerfile_arm
Verifying the install
The most comprehensive way to check that you've installed everything is to run the test suite from the root directory of the repository:
pytest
In the process of running the tests, the training dataset will be downloaded and cached, as well as most of the models. This will take between 5-10 GB of disk space. When we release updates it's important that we check that the dataset can be automatically obtained, but if you don't want/need it, you can skip to the next section and try to run some predictions instead.
Building docs
The documentation pages you're reading now are also included in the repository, in order to build them you can run:
pip install -e .[docs]
and then
mkdocs serve
We use MkDocs with the beautiful Material theme which provides a lot of very nice features for markup.
What next?
Once you've installed the pipeline, it's time to try out some models! Head on over to the prediction documentation.