GIS refers to the software, data, and use of, a Geographic Information System, a system for acquiring, storing, processing, analyzing, and visualizing spatial data – as in, data that has real-world geographic locations (coordinates) associated with it. Broadly defined, a GIS can consist of a full desktop application, such as QGIS or ESRI ArcGIS; a spatially enabled geodatabase like PostGIS or SpatiaLite; web-based solutions like Google Maps, MapBox; or GIS scripts and modules written in a programming language like Python or R.
There are two distinct primary forms of spatial data – vector or raster. Vector data at its most basic level consists of discrete point locations (coordinates), for example, the location of an individual tree. A series of points can be ordered and joined in a “dot-to-dot” fashion to form a linestring (also known as a polyline), for example, a line representing a footpath. A linestring can be joined at both of its ends to create a linear ring, which when allocated an area, creates a polygon, for example, the outline of a building. Common vector file formats include GeoJSON, KML/KMZ, and the ESRI Shapefile.
The second spatial data format is raster data, which are rectangular grids of evenly sized cells (i.e. images). The origin of each raster cell is a coordinate (often the upper left corner), and the cell size is usually measured in real-world units. The value assigned to each cell is a representation of the measured, modelled, or categorical values of the area covered by the extent of the cell, and raster data may consist of multiple spectral bands. For example, NASA Landsat satellite imagery is available at a 30 x 30 m resolution, as various bands, including red, green, blue, and infra-red wavelengths. Common raster file formats include GeoTIFF, ESRI GRID, and netCDF.
Both forms of spatial data can be displayed by a GIS, visualized as a map containing overlapping layers, a term also frequently used to refer to spatial data. The layers often have attributes associated with them e.g. for a buildings layer, the name, address, and area of the building, or for raster data, the cells are typically associated with a single measured, calculated, or modelled value e.g. sea surface temperature. In the image below, the black crosses are points, the red lines polylines, and the grey rectangles polygons, all of which overly a raster.
All spatial data needs to be assigned a spatial reference (also known as coordinate system or projection) – the system by which the coordinates of the data are matched to real world locations, by converting the coordinates as they exist on a 3d model of the Earth (a geoid), to a 2d representation. There are many, many spatial reference systems, used for various purposes – some are best suited for specific locations e.g. a county, country, or continent, while others may be chosen for their advantages at measuring areas, distances, or displaying data on a map. They are often identified and assigned by a text string defining the spatial units, datum, origin and other metadata, and are often referred to by numeric codes (known as EPSG, SRID, or WKID codes) for shorthand operations, for example the spatial reference used for Google Maps and many other web mapping services is 3857. When performing GIS operations between multiple layers, care should be taken to ensure their spatial references match, so the coordinates coincide as intended and the results will be as expected.
Aside from visualizing spatial data as maps, spatial data can also be queried, edited, modelled, and analyzed in a variety of ways, to provide useful insights from geographic locations. Some common real-world applications are: finding the shortest route between two points on a road network; modelling flood risk from terrain, hydrology, and rainfall data; determining deforestation rates from a time series of forest layers. These sorts of tasks are typical problems solved by GIS analysts and spatial scientists, and can be achieved through use of desktop GIS applications, or by developing bespoke code using programming languages like Python and R, the subject of this blog.