Apply image classification (supervised or unsupervised)

Modified on Thu, 11 Jan 2024 at 04:53 PM

In Earth observation, one of the common tasks to apply to satellite data is "classification".   This is the grouping together of similar features within the satellite image into tangible thematic classes. The focus in Earth Blox is classifying a satellite image collection based on the (spectral) properties of each pixel and how they relate to ground cover classes of interest such as woodland, agricultural land, urban, etc. 


If you already understand the principles of classification, jump to the Step-by-Step Guide





Background: A quick primer on classification


The topic of “classification” refers to identifying clusters of pixels that have similar image properties, the purpose being to map landscapes and identify areas (rather than just pixels) that are similar. In civilian remote sensing, this is best exemplified by crop mapping — the task here is to place each field into a discrete category, such as wheat, barley, potatoes, etc.  Each pixel in the image in question will be assigned to one of these discrete classes.


Methods that use a classification approach to characterise an area of interest that is actually on a continuous numeric (ratio) scale are more appropriately called thresholding, rather than classification. An example would be, say, classifying the boundaries between natural forests and woodland, which, even on the ground, can be poorly defined and difficult to measure in natural environments. That kind of problem uses a classification approach (a discrete scale of output classes) to draw arbitrary boundaries within an otherwise continuous variable. Here, we will stick to the example of crop mapping as this is where classification has a strong track record -- agricultural fields have clear boundaries and discrete classes. 


Below are three images from Sentinel 2 over a rural location west of Regina in Saskatchewan, Canada.  Sentinel 2 has 13 spectral bands, but we are just showing three to keep the description easy to follow.  From left to right, these are green, red and NIR bands.


 


Each band is slightly different. The NIR looks the brightest, which is what we might expect for an area covered in vegetation (because healthy vegetation reflects very well in the NIR part of the spectrum).  But there are varying shades for each field, and if you look closely, you will see that the fields are not the same relative brightness across each band.  Let us focus in on a small area so we can see this in detail.



These three pictures are a close-up of the above images.  Now we can look at individual fields and consider how they vary across each spectral band. Note how these three fields all exhibit different patterns across the bands:

  • Field 1 is comparatively dark in all three images
  • Field 2 is comparatively light in green and NIR, but dark in the red channel.
  • Field 3 is comparatively light in all three images.

These trends can be for different reasons. Perhaps the crops are all the same, but at different stages of growth, or perhaps some are fully grown and others are harvested.  In an ideal circumstance, the differences will only be indicative of different crop types, and we can use this information to map (and quantify) the area of each crop.


To illustrate how classification works, let us simplify even further and look at only the red and NIR bands, and plot the pixel values on a pair of axes, one axis for each band (the left-hand figure below). Fields 1 and 2 are both dark in the red, but they are separated in the NIR, where 2 is brighter than 1. And while 2 and 3 are both bright in NIR, they are distinguished by 3 being brighter than 2 in the red. This is the principle by which we group the pixels. Every pixel close to the centre of Field 1 pixels will be classed the same as Field 1, every pixel close to the centre of Field 2 as Field 2, etc. 




The challenge is that real data is not so neat. The second figure shows a more realistic scenario where there is a wide spread of pixels. The pixels here are colour-coded based on which class they should be in, but you will notice how some of the pixels start to lie in overlapping zones. This is where errors in the classification will happen.  It might be due to imperfections in the image, such as impact from clouds, or it may be variations in the fields, such as different crop densities, or pixels that cross between a boundary of two classes.  


Classification is well suited for landscapes with well-defined discrete areas that can be allocated to a particular category.  In a more natural and constantly varying landscape, classification works less well as there are no natural groupings, and so there is no clumping of the pixels either. 


When using data such as Sentinel 2, or Landsat, we are not constrained by just two channels – we have several. And each band adds to the ability to separate each field. This is difficult to draw in 3-dimensions but is impossible to draw in 4 or more dimensions of data, which is what we have when we classify with multiple bands.


Supervised vs. unsupervised classification

There are two broad types of classification of satellite images. Unsupervised classification only considers the data, looking for common patterns in the multiple-image dimensions. Where pixels seem to clump together (in terms of spectral properties) they are allocated to the same class. 


Supervised classification is used when you have some knowledge of “the right answers” – that is, you know what each class is that you are trying to determine and you can select some known areas to “train” the classification algorithm. Again, this is best exemplified using agricultural examples (although both methods might be used in other applications).


Note that classification needn’t just use spectral information. It can also include temporal variation, texture, or (for radar sensors) polarisation channels.


Unsupervised Classification

If we do not know what is within the image scene, we have to use unsupervised classification. This is when the analysis looks for statistical groupings of the pixels across the bands being used.  The user input is only to define how many groupings should be found, but the definitions of those groups within the data are determined by the data, not the user.

 

Supervised Classification

Sometimes we know some of the features on the surface and then we can use supervised classification.  This allows you to utilise a small area with field data to “train” the classification algorithm to recognise the same categories over a much larger area.  So, you may have some information on what crop is being grown in some fields. In supervised classification, we use this information to label the spectral groupings within the data based on those pixels we know. The other pixels in that same grouping are then also labelled with the same class. 

 


Classification on Earth Blox - Step-by-Step Guide


Step 1: Find the Classification Blocks

  • In Earth Blox, the classification blocks are under the Classify section of the Toolbox.
  • There are two main blocks: one for supervised and one for unsupervised classification. We will look at the unsupervised first, as that is much simpler.


Step 2: Download the example area files

The area we are going to explore is just West of Regina, Saskatchewan. This is a good example because the fields are large and regular and because we were able to find some corroborating data on which fields have which crops.


To follow this example, use the area-of-interest files that are attached to this article (at the bottom of this page).  You should download the following two files:


  • crop-classification-Area1-Regina-Canada.geojson
  • Regina-area-collection.geojson


The first contains the area that we are going to classify. The second is an area collection that contains the training areas for 5 different crops. The data are stored as GeoJson files. GeoJson files are files that contain all the coordinates of the Area of Interest.  Save them somewhere handy.



Unsupervised Classification


Step 1: Upload the area of interest

  • Upload the area of interest. In a new project, click on the Add Area of Interest in the box in the top left of the map window.
  • Upload the file: crop-classification-Area1-Regina-Canada.geojson.
  • When you open this file you will get a box in the middle of Canada – it will zoom to the box, so you will have to zoom out to see the location in context.
  • It will label it Area 1 by default, but you can rename it if you wish. 
  • If you switch on the Satellite view (from the basemap drop-down menu) and look at the area of interest, you will see it is mostly agricultural fields, with a couple of rivers running through the area.  We are not going to classify this background satellite image, as it only has the visible bands. We are going to use Sentinel 2 instead, and take advantage of the NIR bands that we know are useful for picking out details about vegetation. 

Step 2: Import the Sentinel 2 data and make it available to use

  • In Earth Blox, we always need to have an input data workflow first. The input data workflow is created using the Use This Dataset container block. Follow the instructions for Building A Workflow to create a data input block for Sentinel 2, and ensure you:
    • Add an area block and choose Area 1 from the map. 
    • Select dates from 1 April 2019 to 30 Sept 2019 (to avoid any risk of the winter snow having an impact).
    • Add a Mask Out Clouds block, and choose the custom option, but don't change any of the parameters.
    • Aggregate the data by month
    • If you want to see the data as a time series, also include an Add Map Layer block. 
  • The final important step for the data workflow is to include a Save Data for Re-Use block (available in OUTPUT->IMAGE OUTPUT.  Choose a sensible name to save this dataset.  
  • Your data workflow should then look like this:

Step 3: Use the Unsupervised Classification block

  • Bring the Unsupervised classification block into the workspace. This block performs a cluster analysis using machine learning.  This means that it looks across all the available input data and looks for where pixels seem to cluster together with similar properties.
  • Now add the Re-use This Saved Dataset block into the top space of the classification block (available from INPUT->INSERT DATASET).  The drop-down menu should automatically give the option of choosing the named dataset from your data workflow.
  • Finally, to view the results of the classification, add an Add Map Layer block to the lower space in the classification block.  
  • The classification block should now look like this:


  • When you click RUN WORKFLOW it will now analyse the data to look for “clumps” (clusters) of pixels that have similar spectral properties (the Earth Blox classification only uses spectral response).  Each of these clusters is assigned a class.
  • The result will show that some of the fields clearly have a sufficiently consistent spectral response that they end up in the same class. 
  • This is a useful approach when investigating data that you haven’t seen before, and are looking to see patterns in the data.
  • The Earth Blox blocks use the Weka software on Earth Engine to implement unsupervised classification using optimised machine learning. Weka refers to a collection of algorithms made available by Waikato University. It optimises the machine learning code.
  • The classification methods available are:
    • Cascade K Means.
    • Cobweb.
    • K Means.  
    • LVQ
    • X Means
  • Each of these methods has one or more parameters that can refine the algorithm in some way.  Most Earth Blox users will be happy to use the defaults for most values, but for the "number of classes" variable it is often worth experimenting.


Step 4: Display the results on the map and as a table

  • Choose the output band and a suitable colour table on the Add Map Layer block (the default colour table will separate all the classes in different colours).
  • The map output shows the classified image. 




Supervised Classification


Step 1: Upload the area of interest

  • First, you need to upload the area of interest. With a freshly opened Earth Blox window, click on the Add Area of Interest+ on the box in the top left of the map window.
  • Upload the file: crop-classification-Area1-Regina-Canada.geojson.
  • When you open this file you will get a box in the middle of Canada – it will zoom to the box, so you will have to zoom out to see the location in context.
  • It will label it Area 1 by default, but you can rename it if you wish. 
  • If you switch on the SATELLITE view (from the basemap drop-down menu) and look at the area of interest, you will see it is mostly agricultural fields, with a couple of rivers running through the area.  We are not going to classify this background satellite image, as it only has the visible bands.  We are going to use Sentinel 2 instead, and take advantage of the NIR bands that we know are useful for picking out details about vegetation.


Step 2a: Build the area collection that includes the class sample areas

  • Click on +Add new area collection. When you click on the downward arrow it will give you the option to ADD NEW AREA. Click on that and upload the second file you downloaded called Regina-area-collection.geojson (available at the bottom of this article). 
  • The upload window will give you the option to split the dataset.  Do this. Choose name as the attribute to split it. Splitting it ensures that each of the sample areas attributed to each crop type appears as a separate area on Earth Blox. 
  • Once you load and split the data, you will see the five crop types in the Area Collection box, and the map window will now have boxes of 5 different colours.
  • Each coloured box on the map relates to the areas assigned to that colour.  Each has a name of the crop that we know was planted in each of the areas. 
  • These are the sample areas that act as the training data for the supervised classification.  It is because we know where we know some of the crops are that it is called "supervised".  
  • Your map window should now look like this:



Step 2b: Adding your own training data

  • To add your own training data, first, select an Area of Interest (AOI) on the map (or upload your area file). 
  • Then click on +Add New Area Collection, and click on the downward arrow. 
  • You can then click ADD NEW AREA+ for each training area you want to add. Remember that one "area" can include many different polygons on the map (just as the map above has many areas that are coloured blue). 
  • Rename the area to keep track of each sample. 
  • When you are done, you can also Export Collection and save all of the training areas in one file (with their label). 


It is important that your training polygons lie within the boundary of your Area of Interest. Training polygons outside of your AOI will cause an error.


You must have training polygons for all classes. This means that if your classification is a binary classification, you must have training polygons for both types of class.



Step 3: Import the Sentinel 2 data and make it available to use for the classification block

  • In Earth Blox, we always need to have an input data workflow first.  The input data workflow is created using the Use This Dataset container block.  Follow the instructions for Building A Workflow to create a data input block for Sentinel 2 and ensure you:
    • Add an area block and choose Area 1 from the map. 
    • Select dates from 1 April 2019 to 30 Sept 2019 (to avoid any risk of the winter snow having an impact).
    • Add a Mask Out Clouds block, and choose the custom option, but don't change any of the parameters.
    • Aggregate the data by month
    • If you want to see the data as a time series, also include an Add Map Layer block. 
  • The final important step for the data workflow is to include a Save Data for Re-Use block (available in OUTPUT->IMAGE OUTPUT.  Choose a sensible name to save this dataset.  
  • Your data workflow should then look like this:



Step 4: Build the supervised classification workflow

  • Bring the Supervised classification block into the workspace. 
  • This block performs a cluster analysis using machine learning using the training data that you supply.  This means that it looks at the pixels associated with each class in the training data and tries to identify data properties that they have in common. The number of classes is defined by the training data.
  • Now add the Re-use This Saved Dataset block into the top space of the classification block (available from INPUT->INSERT DATASET).  The drop-down menu should automatically give the option of choosing the named dataset from your data workflow.
  • To select the training data in the Area Collection, use the drop-down menu to select the name of your collection. If there is only one collection available, it will automatically choose that one. 
  • Finally, to view the results of the classification, add an Add Map Layer block to the lower space in the classification block.  
  • The classification block should now look like this:


  • Now click on RUN WORKFLOW ( or Run on the block itself).  A map of the classified image will appear in the map window. 
  • The classification methods available are:
    • CART (Classification and Regression Trees) is a form of decision tree model. 
    • Random Forest is another decision tree method that builds multiple decision trees and then decides which is best. See Breiman (2001) 
    • Minimum Distance.  This is the simplest method -- it assigns each pixel to the nearest class.
    • SVM (Support Vector Machine) is a form of supervised learning model. See Burges (1998).
  • Each of these methods has one or more parameters that can refine the algorithm in some way.  Most Earth Blox users will be happy to use the defaults for most values, but you can experiment by selecting the options icon. Without a priori information about the physical nature of the problem, optimal parameters are difficult to identify in advance.
  • For most cases, the results from each of these 4 methods will largely be similar (since it is mostly the data that is constraining the solution, not the method), but we recommend you experiment by comparing the outputs to find the optimum solution for your particular area.
  • You can find out more detail about what is going on "under the hood" by looking at the Earth Engine description:  https://developers.google.com/earth-engine/guides/classification.


Step 5: Display the results on the map and as a table

  • Choose the output band and a suitable colour table on the Add Map Layer block (the default colour table will separate all the classes in different colours).
  • The map output shows the classified image.  Each class is the same colour as the legend in the Area of Interest window.
  • A table of the classification results will be automatically generated and will be available for view in the DASHBOARD tab (which lies behind the map window tab).  


Supervised Classification with SAR data (and other data)

The good news is that you can apply the above steps for classification using any image data as input, including SAR (e.g. Sentinel 1).  You can also use elevation data such as SRTM to help classify your data.  To include more than one dataset in your classification do the following steps:

  • Build a dataset workflow and a classification workflow as described above.
  • Create a new dataset workflow for each dataset you want to add. 
  • Make sure you choose the same area.
  • Use the Save Data for Re-use block to make each new dataset available to the classification block. 
  • Within the classification block add a Re-use This Saved Dataset block for each of the datasets you want to use in the classification.
  • Now click on RUN WORKFLOW





Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article