The Integration of Remote Sensing Data
into Geographic Information Systems

by Aubrey Weese
November 20, 2002
George Mason University


Introduction - the importance of remote sensing data to GIS

What is GIS?

A geographic information system is any computerized tool used for characterizing landscapes that allows for quantitative analysis of environments. Applications of GIS technology are numerous and have been used in fields such as forestry, agriculture, water resource management, urban planning, facility siteing, and environmental protection. GIS software development now creates a multi-billion dollar, international industry.

GIS provides many functionalities that make it such a widely used tool. For instance it provides functions for precisely aligning map features so that maps of all sorts of different projections, datums, and scales can be used together. Each individual map can also have their characteristics manipulated by categorizing features and using mathematical operations. Proximity and distance to other features can be analyzed and quantified, providing new information. For example elevation and distance can be calculated together to determine local aspect and slope of the landscape.

Perhaps the most useful tool of all in GIS, however, is its ability to form overlay operations between layers. Maps with nominal categories (i.e. water, corn field, forest, urban area) can be combined using boolean logic to find the intersection and union between features on different layers. Maps with numerical values (such as elevation or slope) can also be combined using mathematical operations. For example, a GIS might be used to find a good site for a power plant by recoding map layers for soils, slope, and proximity to cooling water and markets into suitability scores. These suitability maps could then be combined mathematically to create a map indicating the relative suitability for building a power plant throughout the entire region.

Where remote sensing comes in

Remote sensing data usually forms a critical component of this process. In fact almost all the maps created by the US government are based on satellite remote sensing or aerial photography. More and more of this imagery is becoming available to users due to technological advances and legislative changes. The repeal of the Landsat Commercialization Act significantly lowered the price of satellite imagery (Smith 1). Also, remote sensing has now been around long enough to allow the study of temporal changes on the land surface (such as land-use, vegetation, and urban change) over a period of 25+ years. Because of these factors, today remotely sensed imagery is one of the fastest growing sources for raster GIS data. The merging of remote sensing data into GIS creates a synergy where the GIS makes it easier to extract information from remotely sensed data and the remotely sensed data keeps the GIS up to date with actual environmental conditions. Examples of the types of information remote sensing can contribute to a GIS includes:

· Land cover
· Land Use
· Digital Elevation Models
· Vegetation Type
· Habitat Characteristics
· Biophysical Parameters
· Surficial Geology
· Flood Extent (McGwire 1)

Figure 1. Satellite Image Layers in GISNASA Remote Sensing Tutorial, Section 15

Figure 1 shows how remote sensing data might be incorporated with other data layers in GIS to formulate new information in a final product.

Remote sensing data is typically used to create the original map used in a GIS. It provides the most cost effective and timely method of collecting environmental data over large areas. This is significant because the date entry required to create a GIS database is often the most expensive part of the process, up to as much as 70 percent of the total cost.

Remotely sensed images imported into GIS can be either simple photographs, or images taken from satellites in different wavelengths beyond just visible light. Most satellite imagery is of the second type, and the sensors record in many different wavelengths (or "bands") at the same time, yielding multiple images of the same location on the ground. For example the sensor on Landsat 7 has seven bands in the infrared region plus one panchromatic band that covers all wavelengths (Smith 2). Each of these bands creates an image than can be treated as a separate raster layer in GIS.

Visible and infrared data are used in a variety of GIS applications, the most common which is vegetation and land-use classification. When using remote sensing data, there is always a trade off between spatial resolution and temporal coverage. Most satellites are polar orbiting, meaning that they circle the planet in a north-south ellipse while the each revolves below them. That means there are only certain times when a particular place on the ground will be imaged. In order to have frequent temporal coverage, the satellite must cover a wide swath on the ground, causing coarse resolution. (Resolution is the size of the smallest imaged element on the ground). Depending on the needs of the project, GIS users can chose data from a variety of different sensors. For example the Advanced Very High Resolution Radiometer has 1.1 km pixels and images are 2400 km wide and collected every 12 hours. Landsat 7 has 15-30 meter pixels but images are only 200 m wide and collected once a month. The IKONOS satellite has 4 meter pixels but images are only 11 m wide and are collected infrequently or by special request (Smith 3).

There are many software programs out there designed to import and analyze these images. The images can be imported into mainstream GIS packages such as ArcInfo and IDRISI, or they can be manipulated in software specially designed to handle remote sensing imagery such as PCI, ENVI, ERDRAS, IMAGINE and ERMapper. All of these programs really are just a specialized form of raster GIS.

For the remainder of my paper I will discuss some algorithms used to handle remote sensing data in a GIS and then I will discuss several GIS software packages that can be used to analyze remotely sensed images.


Part 2: Algorithms used to manipulate remotely sensed images

Remote sensing data is represented digitally in the form of matrixes of pixels or cells, where each cell corresponds to a number. The number represents the intensity of reflected or emitted energy on the Earth's surface in the resolution area that that cell represents. These matrices of cells form raster images where the digital number is represented by a brightness level (in a grayscale image) or a color (if a color palette is used). Before these images can be used in GIS they require a number of processing operations. I will discuss some of the most common algorithms used to process satellite images below. The algorithms fall into 4 main categories, which are as follows:

1. Preprocessing
2. Image Enhancement
3. Image Transformation
4. Image Classification

Preprocessing

Preprocessing refers to operations that are required prior to the main data analysis. The two main types are radiometric correction and geometric correction. Radiometric correction is the removal of distortions in the amount of electromagnetic energy received by the satellite so that the reading reflects the true intensity reflected or emitted by the surface. This kind of correction is needed because of atmospheric attenuation of energy before it reaches the sensor, and because of sensor irregularities such as striping, scan line dropping, and random noise. Often this kind of correction is done on board the satellite or it is done by the data warehouse before the data is released to the public.

Geometric correction is the removal of distortions in the shape of the image due to sensor-earth geometry variations. For example the skew correction is needed to account for the fact that the earth is moving while the image is being captured. The scanner distortion correction is needed to account for the fact that the instantaneous field of view covers more territory at the end of scan lines than in the middle.

Certain types of these errors are systematic (such as scan skew, mirror-scan velocity variance, panoramic distortion, platform velocity and perspective geometry) and they can easily be corrected using sensor characteristics and ephemeris data. On the other hand, unsystematic errors (such as roll, pitch and yaw of the platform and/or altitude variance) most be corrected by more difficult methods using ground control points.

Errors due to altitude variance become more dramatic the more topographically diverse the landscape is. A photo taken over a flat field would have little or no distortion whereas as photo taken over a mountain range would have a high amount of distortion. This is because in places where there are steep slopes, real distances are not displayed correctly in the image. An inch measured in a steep area would relate to a much longer distance than an inch measured in a flat area.

Ground control points (GPC's) can be used to correct these kinds of image distortions. GPC's are references points who's correct coordinates are known, and can be matched to points in the distorted image. The difference in the location between the two points is used to compute the transformation matrix needed to rectify the image. This process is called image to ground geocorrection (if the GPC's are from maps or ground GPS recognizance) or image to image geocorrection (if the GPC's are from a reference digital image). If a digital elevation model or digital terrain model for the area exists, it is an especially effective source to use for ground control points. The geocorrection algorithm follows 3 basic steps

1. The user identifies the X,Y coordinates of several ground control points. Ideally, these points should be spread evenly throughout the image, especially concentrating on the edges.

2. The software solves equations that describe the relationship between the two coordinate systems so it can produce an equation for the conversion of the X, Y coordinates from the old reference system to the new one. The user picks what order of equation he would like the software to use. Linear equations are ideal for smaller images while quadric and cubic equations should be used for larger ones.

3. Rubber-sheet transformation is used to convert the image to the new reference system. The process involves both spatial interpolation and brightness interpolation.

(Ramsey 5)

This process is called rubber-sheet transformation because it can be visualized by the incorrectly oriented image being printed on to a sheet of rubber and then this sheet being laid over the correctly oriented one and stretched and pulled until it lines up. Figure 2 helps to illustrate this:

Figure 2. Resampling to a new orientation
Remote Sensing Core Curriculum, Volume 3, Module 5.32

In other words, the new values are estimated for each cell by looking at the corresponding cells underneath them in the image. One of three interpolation schemes can be used to calculate these new values. If the nearest neighbor rule is used, the nearest old cell determines the value of the new cell. If the bilinear interpolation method is used, a distance weighted average of the four nearest old cells determines the value of the new cell. If the cubic convolution method is used, a distance weighted average of the sixteen nearest old cells determines the value of the new cell. The nearest neighbor rule is best for data values that cannot be changed (such as qualitative data about soil types). For quantitative data such as remotely sensed imagery, the bilinear or cubic convolution schemes are ideal (Ramsey 21).

Georectification, the conversion of the data to real-world coordinates (such as latitude and longitude) is also a geometric correction that is essential if one wants to import data layers from other sources to be used with the satellite imagery.

Image Enhancement

Image enhancement is a procedure used to improve the appearance of the imagery to make visual interpretation and analysis easier. The most widely used of these is contrast enhancement. Contrast is the range of brightness values present in an image. These values can theoretically range from 0 to 255, but in many remotely sensed images the initial range is much less than this. If an image starting with a range from only 40-90 is stretched so that it ranges from 0-255, the difference between features becomes accentuated. There are two types of contrast enhancement: linear contrast enhancement and nonlinear contrast enhancement.

Linear contrast enhancement is also known as contrast stretching. It is best used on remotely sensed images with a gaussian or near-gaussian histogram (meaning that all the brightness values fall with a narrow range and there is only one mode). The simplest form of contrast stretching is the minimum-maximum linear contrast stretch. In this method, the minimum brightness value in the image is replaced with 0, the maximum brightness value is replaced with 255, and all the intermediate values are scaled proportionately (Jensen 3). See figure 3 for an example of how this would change the histogram.

Figure 3. Histogram before and after min-max contrast stretch
Remote Sensing Core Curriculum, Volume 3, Module 6.3

If, on the other hand, a percentage linear contrast stretch is used, different portions of the histogram can be stretched as needed, instead of the entire thing. This can be used to increase the contrast of an image only in specific portions in the electromagnetic spectrum, and is useful is the analyst is interested in seeing more detail in specific feature (such as ocean, vegetation, or urban features) (Jensen 4). Figure 4 shows what the results of these kinds of stretches can look like. The first image is the raw data, the second is a min-max linear stretch and the third is a percentage linear stretch.


Figure 4. Original image, min-max stretch, percentage stretch
Remote Sensing Core Curriculum, Volume 3, Module 6.3

One of the most common forms of nonlinear image contrast enhancement is histogram equalization. This method redistributes all the pixel values of an image so that there are approximately an equal number of pixels to each of the user-specified output gray scale classes. This technique increases contrast in the most populated range of brightness values (the peaks of the histogram) and reduces the contrast in the very light or dark parts of the image (the tails of the histogram). This method can often provide an image with the most contrast of any enhancement technique. However is has one significant drawback. Each value in the input image can end up having several values in the output image, so that objects in the original scene lose their correct relative brightness value. For instance, pixels that are very dark or bright will be changed into a few gray scales. So, histogram equalization is not good if there are clouds in the data or if one is trying to see information located in terrain shadows (Jensen 9). Figure 5 shows a histogram equalization run on the same image as above.


Figure 5. Original Image, Histogram Equalization
Remote Sensing Core Curriculum, Volume 3, Module 6.3

Image Transformation

Image transformation refers to operations that are applied to multiple spectral bands within the image. Basically, it is another term for spectral math. Layers representing different bands can be added, subtracted, multiplied or divided to produce a new image that will better highlight certain features in the scene. For example, image subtraction can be used to identify changes between data collected on different dates.

Image division or spectral ratioing is one of the most common transforms applied, because it helps to highlight variations in the spectral responses of surface covers. For example, vegetation indices are spectral ratios that are used to map the presence of vegetation as well as measuring the amount or condition of vegetation within each pixel. This is done by exploiting the unique spectral signature of vegetation, particularly in the red and near infrared portions of the spectrum. Plants reflect a very low amount of red energy because this energy is absorbed by chlorophyll in photosynthetic leaves. At the same time they reflect a very high amount of near infrared energy because of the scattering process in healthy, turgid leaves. The most common vegetation index is the NDVI (Normalized Difference Vegetation Index) which can be calculated according to the formula NDVI=Xnir-Xred/Xnir+Xred. This ratio can range from -1 to +1 but normal values fall between 0 and 0.8 (Huete 4).

Image Classification

Image Classification is a technique used to generalize remote sensing data, where cells are assigned to one of a number of surface cover groups based on their reflectance. To do this the computer can either use an unsupervised or a supervised classification algorithm. Unsupervised classification distinguishes pattern in the reflectance data and groups them into a pre-defined number of classes without any prior knowledge of the image. Supervised classification is a technique where the user trains the computer, telling it what type of spectral characteristics to look for and what type of land cover they represent. Usually this is done by the user simply specifying a mean and range of digital values for each class, though sometimes more sophisticated statistical methods are used that take into account the prior probability that each class might exist in an area.

Part 3: Software Packages that Integrate Remote Sensing with GIS

NWGISS

The first software program I will discuss is the NASA Web GIS Software Suite (NWGISS) that was developed in our own George Mason University school of computational sciences. The purpose of this software suite is to allow GIS users to access remote sensing images created by NASA's Earth Observing system (EOS) satellites. These images are output as HDF files (Hierarchical Data Format), which is a common scientific file format, but which is unreadable by standard GIS software packages. NWGISS has been developed to convert these files into several widely-used formats such as binary, geoTIFF, NITF and GIF, so that they can be imported into GIS. In addition NWGISS provides on-the-fly georectification that can extract swath data and georectify into any user specified spatial resolution. It is the only OGC-compliant software suite available that provides such a function. OGC is the Open GIS Consortium, a non-profit organization founded in 1994 with the mission to address the lack of interoperability between geospatial processing systems, with the goal of making georeferenced data behave just like another standard data type in systems of all kinds.

NWGISS uses two algorithms when performing on the fly georectification of HDF-EOS swath data, depending on what kind of data is it dealing with. The first method is called bivariate polynomial regression. This approach is ideal when image distortions can be corrected by low-order polynomial equations (two to four order) and when an adequate number of ground control points is available. It works best with images that have low distortion and small bounding boxes (such as ASTER data or MODIS data with a ¼ to 1/6 scan frame) (Yang 3).

The second method is called piecewise bilinear interpolation. This method is ideal for images that have very dense geolocated pixels, such as AVHRR data. The algorithm works by transferring pixels in a data field from a raw image coordinate to a map/earth coordinate. A data value interpolation function then fills in blank pixels in the map/earth coordinate system. In many cases, it produces more accurate georectification results, with a position error of less than one tenth of a pixel (Yang 3).

Future versions of NWGISS plan to include conversion capabilities to even more file formats, such as ARC/INFO grid and shapefiles.

IDRISI

IDRISI is a standard and commonly used GIS program that can easily incorporate certain kinds of remote sensing data. The software was developed by the Clark Labs in the Graduate School of Geography at Clark University, and is especially formulated for working with raster GIS data such as satellite images. It supports the input of TIFF and BMP images.

The software can be used to calculate surface parameters from the data using spectral math between image layers (as I discussed above in the section on image transformation). In fact, scientific studies have been done to develop algorithms to correct for atmospheric distortion of remote sensing data caused by haze, dust and water vapor. The scientists then developed MAP ALGEBRA modules in IDRISI to apply these algorithms to the data.

For instance, in the infrared region radiation emitted by the surface is partially absorbed by water vapor in the atmosphere, and re-emitted at a colder temperature. This results in the sensor detecting a colder temperate than the temperature of the actual surface. The most common way of correcting this attenuation is by using a split window algorithm, which uses the linear difference between the last two infrared channels of the AVHRR sensor to estimate the magnitude of the atmospheric moisture effect. The algorithm takes the form Ts = c1 + c2T4 + c3T5, where Ts is the actual surface temperature, T4 and T5 are the brightness temperatures in the 4th and 5th AVHRR channels, and c1, c2 and c3 are coefficients calculated by ground temperature measurements and regression. If the coefficients are known, all the steps of the algorithm can be done using the IDRISI routines OVERLAY and SCALAR in the map algebra module (Shalina 7).

In addition, IDRISI comes preprogrammed with many algorithms commonly used in analyzing remote sensing data. For instance, it will automatically calculate vegetation indices such as the NDVI through the use of the VEGINDEX function shown in Figure 6.


Figure 6. Vegetation Indices
Remote Sensing Data Processing with the use of GIS IDRISI

Theoretically, a GIS user could take NASA satellite HDF files, run them through the NWGISS software described above to convert them into a TIFF image, then import them into IDRISI for analysis and combination with other types of digital map layers.

ArcView Image Analysis

The final software package I would like to discuss an extension to ESRI's popular ArcView GIS program called Image Analysis. This extension allows ArcView users to go beyond using digital imagery as a backdrop for vector maps. With Image Analysis the users can do data visualization, data extraction/creation and analysis on satellite imagery, aerial photography, orthoimagery, and other remotely sensed data. The extension was created as part of a collaborative effort between ESRI (Environmental Systems Research Institute) and Leica Geosystems (the creators of ERDAS IMAGINE, which is a geographic imaging software that is used worldwide to produce maps for remote sensing analysts). Image Analysis is created to be able to work in tandem with ArcView GIS, Spatial Analyst, and ERDAS IMAGINE, giving direct and transparent data flow between them all.

Image Analysis can import TIFF files (along with a variety of other file formats), meaning it could also import files converted by NWGISS just like IDRISI can. It also has functions to automatically apply many of the algorithms I have discussed previously. For example Image Analysis can perform three different kinds of contrast stretches including standard deviation, histogram equalization, and minimum-maximum. It addition it can georectify images to shapefiles, GPS collected ground control points, or reference images, using the Image Align tool. This tool will automatically calculate the best 1st or 2nd order polynomial equation needed to adjust the image. And, it has built-in features to correct for systematic errors by calibrating the image with known satellite ephemeris data. Finally, Image Analysis has tools for classifying areas of the image due to spectral properties, and for performing several common spectral math calculation such as NDVI.

There are many other software programs out there that can integrate remote sensing data with GIS, but the principles behind them are roughly the same. The use of satellite imagery in a GIS project is already widely common, and will continue to grow in practice because it is, as I have stated before, the most cost effective and efficient way of importing large amounts of data at one time and forms an ideal base map to add layers to.


Works Cited

"ArcView Image Analysis" ESRI Website. Last Updated: Aug 30, 2002. Accessed: Nov 7, 2002. http://www.esri.com/software/arcview/extensions/imageext.html

"ArcView Image Analysis Extension" ERDAS Website. Accessed: Nov 19, 2002. http://www.erdas.com/admin/ProductGroups/Files/ArcViewWhitePaper.pdf

Huete, Alfredo and Justice, Chris. MODIS Vegetation Index Algorithm Theoretical Basis Document. Version 3. NASA. April 30, 1999

Jensen, John R. and Schill, Steven R. "Contrast Enhancement." Remote Sensing Core Curriculum, Volume 3, Module 6.3. Accessed: Nov 7, 2002. http://www.cla.sc.edu/geog/rslab/rscc/

McGwire, Dr. Kenneth. "GIS Input and Update" Remote Sensing Core Curriculum, Volume 3, Module 9.3. Last Updated: May 18, 1998. Accessed: Nov 7, 2002. http://www.cla.sc.edu/geog/rslab/rscc/

"Orthophotos and GIS" GIS Lounge. Accessed: Nov 7, 2002. http://gislounge.com/features/aa031300.shtml

Ramsey, Douglas R. "Geometric Correction of Remotely Sensed Data." Remote Sensing Core Curriculum, Volume 3, Module 5.2. Accessed: Nov 7, 2002. http://www.cla.sc.edu/geog/rslab/rscc/

"Remote Sensing Data Processing with the use of GIS IDRISI." Last Updated: Nov 3, 1999. Accessed: Nov 7, 2002. http://lee1.en.a.u-tokyo.ac.jp/Rs&gis/Engl/c3.htm

Shalina, E.V. Atmospheric correction modules for IDRISI. Nansen International Environmental and Remote Sensing Center, Ltd. (NIERSC), Korpusnaya 18, 197042. St. Petersburg, Russia

Short, Nicholas M. "NASA Remote Sensing Tutorial." Section 15: Geographic Information Systems. Last Updated: July 3, 2002. Accessed: Nov 19, 2002. http://rst.gsfc.nasa.gov/Front/tofc.html

Smith, Dr. Laurence. "Remote-Sensing Technologies." GIS Lounge. http://gislounge.com/features/aa121900.shtml

Yang, Wendi and DI, Limping. Serving NASA HDF-EOS Data through NWGISS Coverage Server. NASA 2002 Conference Papers. Center for Earth Observing and Space Research, School of Computational Sciences, George Mason University.