42

GitHub - Microsoft/USBuildingFootprints: Computer generated building footprints...

 5 years ago
source link: https://github.com/Microsoft/USBuildingFootprints
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

Introduction

This dataset contains 124,885,597 computer generated building footprints in all 50 US states. This data is freely available for download and use.

License

This data is licensed by Microsoft under the Open Data Commons Open Database License (ODbL)

FAQ

What the data include:

Approximately 125 million building footprint polygon geometries in all 50 US States in GeoJSON format.

Creation Details:

The building extraction is done in two stages:

  1. Semantic Segmentation – Recognizing building pixels on the aerial image using DNNs
  2. Polygonization – Converting building pixel blobs into polygons

Semantic Segmentation

DNN architecture

The network foundation is ResNet34 which can be found here. In order to produce pixel prediction output, we have appended RefineNet upsampling layers described in this paper. The model is fully-convolutional, meaning that the model can be applied on an image of any size (constrained by GPU memory, 4096x4096 in our case).

Training details

The training set consists of 5 million labeled images. Majority of the satellite images cover diverse residential areas in US. For the sake of good set representation, we have enriched the set with samples from various areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc. Images in the set are of 256x256 pixel size with 1 ft/pixel resolution. The training is done with CNTK toolkit using 32 GPUs.

Metrics

These are the intermediate stage metrics we use to track DNN model improvements and they are pixel based. The pixel error on the evaluation set is 1.15%. Pixel recall/precision = 94.5%/94.5%

Polygonization

Method description

We developed a method that approximates the prediction pixels into polygons making decisions based on the whole prediction feature space. This is very different from standard approaches, e.g. Douglas-Pecker algorithm, which are greedy in nature. The method tries to impose some of a priory building properties, which are, at the moment, manually defined and automatically tuned. Some of these a priory properties are:

  1. The building edge must be of at least some length, both relative and absolute, e.g. 3 meters
  2. Consecutive edge angles are likely to be 90 degrees
  3. Consecutive angles cannot be very sharp, smaller by some auto-tuned threshold, e.g. 30 degrees
  4. Building angles likely have very few dominant angles, meaning all building edges are forming angle of (dominant angle ± nπ/2)

In near future, we will be looking to deduce this automatically from existing building information.

Metrics

Building matching metrics:

Metric Value Precision 99.3% Recall 93.5%

We track various metrics to measure the quality of the output:

  1. Intersection over Union – This is the standard metric measuring the overlap quality against the labels
  2. Shape distance – With this metric we measure the polygon outline similarity
  3. Dominant angle rotation error – This measures the polygon rotation deviation

On our evaluation set contains ~15k building. The metrics on the set are:

  • IoU is 0.85, Shape distance is 0.33, Average rotation error is 1.6 degrees
  • The metrics are better or similar compared to OSM building metrics against the labels

Data Vintage

The vintage of the footprints depends on the vintage of the underlying imagery. Because Bing Imagery is a composite of multiple sources it is difficult to know the exact dates for individual pieces of data.

How good is the data?

Our metrics show that in the vast majority of cases the quality is at least as good as data hand digitized buildings in OpenStreetMap. It is not perfect, particularly in dense urban areas but it is still awesome.

Will Microsoft be open sourcing the models?

Yes. We are working through the internal process to open source the segmentation models and polyonization algorithms.

Will there be more data coming for other geographies?

Maybe. This is a work in progress.

Why are the data being released?

Microsoft has a continued interest in supporting a thriving OpenStreetMap ecosystem.

Should we import the data in to OpenStreetMap?

Maybe. Never overwrite the hard work of other contributors or blindly import data in to OSM without first checking the local quality. While our metrics show that this data meets or exeeds the quality of hand drawn building footprints, the data does vary in quality from place to place, between rural and urban, mountains and plains, and so on. Inspect quality locally and discuss an import plan with the community. Always follow the OSM import community guidelines.

State Number of Buildings Unzipped MB Alabama 2,392,171 711.76 Alaska 232,159 123.06 Arizona 2,492,999 773.50 Arkansas 1,499,025 443.99 California 10,556,550 3,240 Colorado 2,043,866 617.68 Connecticut 1,156,638 350.88 Delaware 331,654 99.91 District Of Columbia 58,330 18.00 Florida 6,532,545 1960 Georgia 3,801,461 1100 Hawaii 252,894 75.79 Idaho 883,618 268.31 Illinois 4,783,021 1380 Indiana 3,224,996 961.83 Iowa 2,013,085 584.85 Kansas 1,564,845 460.28 Kentucky 2,363,324 685.64 Louisiana 2,005,341 608.33 Maine 736,346 218.32 Maryland 1,590,655 467.61 Massachusetts 1,982,583 596.01 Michigan 4,854,138 1410 Minnesota 2,792,296 838.22 Mississippi 1,470,285 438.99 Missouri 3,096,410 904.84 Montana 762,428 226.78 Nebraska 1,135,526 330.05 Nevada 847,575 261.59 New Hampshire 558,850 169.91 New Jersey 2,370,475 701.58 New Mexico 985,820 304.70 New York 4,788,312 1390 North Carolina 4,504,348 1290 North Dakota 557,809 165.65 Ohio 5,343,670 1550 Oklahoma 2,056,402 624.71 Oregon 1,781,820 544.30 Pennsylvania 4,801,561 1390 Rhode Island 348,566 103.63 South Carolina 2,134,688 629.38 South Dakota 649,233 189.85 Tennessee 2,964,339 875.41 Texas 9,638,970 2830 Utah 980,745 298.24 Vermont 346,038 104.33 Virginia 3,020,994 880.71 Washington 2,910,981 888.41 West Virginia 1,020,048 295.15 Wisconsin 3,010,755 897.09 Wyoming 376,912 111.03

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK