Artificial intelligence on urban tree species identification 人工智能在市区树种识别上的应用

It doesn’t matter which part of the world you are living now,  very diverse tree species are planted around the urban area we live.  Trees in the urban areas have many functions, for example, trees provide habitats for wildlife, clean air and water, provide significant health and social benefits, and also improve property value too.  Wake up in a beautiful morning that birds are singing outside your apartment because you have many beautiful trees grow outside of your space. How awesome is that!

However, tree planting, survey, and species identification require an enormous amount of work that literally took generations and years of inputs and care. What if we could identify tree species from satellite imagery, how much faster and how well we could get tree species identified and also tell their geolocations as well.

A city has its own tree selection and planting plan, but homeowners have their own tree preference, which the identification work a bit complicated, though.

chicagoTrees

(Photo from Google Earth Pro June 2010 in Chicago area)

It’s hard to tell now how many tree species are planted in above image. But we could (zoom in and) tell these trees actually have a slightly different shape of tree crown, color, and texture. From here I only need to have a valid dataset basically tell me what tree I am looking at now, which is a tree survey and trees geolocation records from the city. I will be able to teach a computer to select similar features for the species I’m interested in identifying.

GreeAsh

These are Green Ash trees (I marked as green dots here).

LittleleafLiden.png

These are Littleleaf Linden, they are marked as orange dots.

Let me run a Caffe deep learning model (it’s one of the neural networks and also known as artificial intelligence model) for an image classification on these two species, and see if the computer could separate these two species from my training and test datasets.

Great news that the model could actually tell the differences between these two species. I run the model for 300 epochs (runs) from learning rate 0.01 to 0.001 on about 200 images for two species. 75% went to train the model and 25% for testing. The result is not bad that we have around 90% of accuracy (orange line) and less than 0.1 loss on the training dataset.

nvidia_d_modeltest

I threw a random test image to the model (a green ash screenshot in this case) and it tells the result.

test_trees2

I will be working on identifying other 20 trees species and their geolocations next time.

Let’s get some answer what trees are planted in Chicago area and how it related to the property value (an interesting question to ask), and also what ecological benefits and functions these tree are providing (leave this to urban ecologist if my cloud computer could identify the species)? Check my future work ;-).

 

PV Solar Installation in US: who has installed solar panel and who will be the next?

Project idea

Photovoltaic (PV) solar panels, which convert solar energy into electricity, are one of the most attractive options for the homeowners. Studies have shown that by 2015, there are about 4.8million homeowners had installed solar panels in the United States of America. Meanwhile, the solar energy market continues growing rapidly. Indeed, the estimated cost and potential saving of solar is the most concerned question. However, there is a tremendous commercial potential for the solar energy business, and visualizing the long term tendency of the market is vital for the solar energy companies’ survival in the market . The visualization process could be realized by examining the following aspects:

  1. Who has installed PV panels, and what are the characteristics of the household, e.g. what’s the age, household income, education level, current utility rate, race, home location, current PV resource, existing incentive and tax credits for those that have installed PV panels?
  2. What does the pattern of solar panel installation looks like across the nation, and at what rate? Which household is the most likely to install solar panels in the future?

The expected primary output from this proposal is a web map application . It will contain two major functions. The first is the cost and returned benefit for the households according to their home geolocation. The second is interactive maps for the companies of the geolocations of their future customers and the growth trends.

Initial outputs


The cost and payback period for the PV solar installation: Why not go solar!

NetCost

Incentive programs and tax credits bring down the cost of solar panel installation. This is the average costs for each state.

Monthly Saving

Going solar would save homeowners’ spending on the electricity bill.

Payback Years

Payback years vary from state to state, depending on incentives and costs. High cost does not necessarily mean a longer payback period because it also depends on the state’s current electricity rate and state subsidy/incentive schemes. The higher the current electricity rate, the sooner you would recoup the costs of solar panel installation. The higher the incentives from the state, the sooner you will recoup the installation cost.

How many PV panels have been installed and where?

Number of Solar Installation

The number of solar panels installed in the states that have been registered on NREL’s Open PV Project. There were about 500,000 installations I was able to collect from the Open PV Project. It’s zip-code-based data, so I’ve been able to merge it to the “zip code” package on R. My R codes file is added here at my GitHub project.

Other statistical facts : American homeowners who installed solar panels generally has $25,301.5higher household income compare to the national household income. Their home located in places that have higher electricity rate, about 4 cents/kW greater than the national average, and they are also having higher solar energy resource, about 1.42 kW/m2 higher than the national average.

Two interactive maps were produced in RStudio with “leaflet”

Solar Installation_screen shot1

An overview of the solar panel installation in the United States.

Solar Installation_screen shot2

Residents on the West Coast have installed about 32,000 solar panels from the data registered on the Open PV Project, and most of them were installed by residents in California. When zoomed in closely, one could easily browse through the details of the installation locations around San Francisco.

Solar Installation_screen shot3

Another good location would be The District of Columbia (Washington D.C.) area. The East Coast has less solar energy resource (kW/m2) compared to the West Coast, especially California. However, the solar panel installations of homeowners around DC area are very high too. From maps above, we know that because the cost of installation is much lower, and the payback period is much faster compared to other parts of the country. It would be fascinating to dig out more information/factors behind their installation motivation. We could zoom in too much more detailed locations for each installation on this interactive map.

However, some areas, like DC and San Francisco, have a much larger population compared to other parts of US, which means there are going to be much more installations. An installation rate per 10,000 people would be much more appropriate. Therefore, I produced another interactive map with the installation rate per 10,000 people, the bigger the size of the circle is the higher rate of the installation.

Solar Installation_screen shot4

The largest installation rate in the country is in the city of Ladera Ranch, located in South Orange County, California. Though, the reason behind it is not clear and more analysis is needed.

Solar Installation_screen shot5

Buckland, MA has the highest installation on the East Coast. I can’t explain what the motivation behind it yet either. Further analysis of the household characteristics would be helpful. These two interactive maps were uploaded tomy GitHub repository, where you will be able to see the R code I wrote to process the data as well.

Public Data Sources

To answer these two questions, datasets of 1670M (1.67G) were downloaded and scraped from multiple sources:
(1). Electricity rate by zip codes;

(2). A 10km resolution of solar energy resources map, in ESRI shapefile, was downloaded the National Renewable Energy Laboratory (NREL); It was later extracted by zipcode polygon downloaded from ESRI ArcGIS online.

(3). Current solar panel installation data was scraped from the website of open PV website, a collection of installations by zip code. It requires registration to be able to access the data. It is part of NREL. The dataset includes the zip code of the installation, the cost, the size of the installation and the state of each location.

(4). Household income, education, the population of each zip code was obtained from US census.

(5). The average cost of the solar installation for each state was scraped from the website: Current cost of solar panels and Why Solar Energy? More of datasets for this proposal will be downloaded from the Department of Energy on GitHub via API.

Note: I cannot guarantee the accuracy of the analysis. My results are based on two days of data mining, wrangling, and analysis. The quality of the analysis is highly depended on the quality of the data and on how I understood the datasets in such limited time. A further validation of the analysis and datasets is needed.

For further contact the author, please find me on https://geoyi.org; or email me:geospatialanalystyi@gmail.com.

My web map application is online

Hi friends,

I’ve been working on a web application for Chinese Ministry of Commerce on rubber cultivation and risks will be out soon, and I just wanna share with you the simplified version web map API here. I only have layers here, though, more to come.

Screen shot for web map application

Web map application by Zhuangfang Yi: Current rubber cultivation area (ha) in tropical Asia

This web map API aims to tell the investors that rubber cultivation is not just about clearing the land/forests, plant trees and then you could wait for tapping the tree and sell the latex. There are way more risks for the planting/cultivate rubber trees, including several natural disasters, cultural and economic conflicts between the foreign investors and host countries.

We also found the minimum price for rubber latex for livelihood sustainability is as high as 3USD/kg. I define the  minimum price is the price that an investor/household could cover the costs of establishing and managing their rubber plantations. While the actual rubber price is lower than the minimum price, there is no profit for having the rubber plantations. The minimum price for running a rubber plantation varies from country to country. I ran the analysis through 8 countries in Asia: China, Laos, Myanmar, Cambodia, Vietnam, Malaysia and Indonesia. The minimum price depends on the minimum wage, labour availability, costs of the plantation establishments and management, average rubber latex productivity throughout the life span of rubber trees. The cut-off price ranges from 1.2USD/kg to 3.6USD/kg.

We could make an example that if rubber price is 2USD/kg now in the market, the country whose cutoff price for rubber is 3USD/kg won’t make any profit, but the investors in the country might lose at least 1USD/kg for selling every kg of rubber latex.

 

The natural rubber value chain and foreign investments in Thailand: how can we achieve sustainable and responsible rubber cultivation and investment?

I have an opportunity worked for Chinese Ministry of Commerce with ICRAF last fall, and have been studying natural rubber value chain since then. I led four technic reports on natural rubber value chain: the first report is for Thailand natural rubber value chain (please see the title);the second one  is about natural rubber value chain, foreign investments and land conflicts in Cambodia; the third report is the a comparison study between Thailand and Cambodia, the biggest natural rubber producer and the emerging rubber producer; the last report will concentrate on the risks of natural rubber cultivation and investment in Asia, from geosnatially perspectives. As I mentioned in the reports that there are no winner in the natural rubber value chain: we lost biodiversity and ecosystem services from covering natural forests to rubber monoculture (upstream of the value chain); and emitted million tons of polluted air and water, and carbon dioxide back to nature from rubber processing (the midstream); at the end, without sustainable livelihood for the poor who grows rubber; and limited competitiveness in the end products market (the downstream). We should go back the source and really think about how we can improve the whole value chain, and why.

The following content is the abstract of Thailand report in English. These reports are in Chinese recently, if you are interested in the content please contact Dr. Zhuang-Fang Yi, geospatialanalystyi@gmail.com and yizhuangfang@mail.kib.ac.cn.

Upper Mekong Region

Figure 1. The great Mekong region and also the global nature rubber producers. 

Asia supplies 93% of natural rubber demand globally. As the world No.1 natural rubber producer, Thailand has exported nearly 40% of global rubber production demands, which is 87% of its domestic rubber production. The production improvement in Thailand is not only depending on its biophysical suitability of rubber growing, but also relying on its policy supports and subsidies to millions of upstream rubber farmers. Thailand has spent about 21.3billion Baht (586million USD) from Sep. 2013 to Mar. 2014 to subsidize its rubber farmers while the price of natural rubber went down. However, lack of manufacturing and financial supports for its midstream and downstream of the natural rubber value chain, Thailand highly depends on rubber exporting to other countries, e.g. China, US, EU and Japan.

The long history of natural rubber cultivation and supports from Thai government has grown Thai rubber farmers a better rubber economic resilience cultivation systems, which is rubber agroforestry. Rubber agroforestry is a rather complex intercropping system compare to rubber monoculture. Rubber monoculture refers to the rubber plantations that only have rubber trees, and other plant species has been killed and get rids constantly by using herbicide and manual clearance. Rubber agroforestry sustains better ecosystem services and also bring more economic returns. But the labour requirement and knowledge gaps from rubber monoculture to rubber agroforestry are the main constrains for a greener cultivation system. It means rubber farmers only need to intensively take care rubber trees in rubber monoculture system, but need other knowledge and time inputs for rubber agroforestry. However, there are about 21 intercropping systems and more than 300 farms are practicing the intercropped rubber agroforestry by the rubber famers without authority supports like rubber monoculture in Thailand. Urgent research and institution support are need for rubber agroforestry in Thailand and globally.

The merging economies and natural rubber producer countries, e.g. Vietnam, Cambodia, Laos, and Myanmar in Mekong region, are following Thailand’s foot steps, only practicing rubber monoculture, that highly support its upstream value chain but lack of rubber manufacturing and supporting financing systems for mid-stream and downstream. It leads to heavily depend on Chinese and the rest of world rubber demands. It leads to very weak economic resilience for millions of smallholding rubber farmers when the price goes down. In China market, rubber price dropped from 6.3USD/kg to less than a dollar in 2014. China, as the biggest natural rubber importer, consuming nearly 40% of global rubber supply. On the other hand, 20% of imported taxes are charged and have dramatically increased the cost of rubber end products, and loss its global competitiveness in the natural rubber market. There are no winner in the natural rubber value chain: we lost biodiversity and ecosystem services from covering natural forests to rubber monoculture (upstream of the value chain); and emitted million tons of polluted air and water, and carbon dioxide back to nature from rubber processing (the midstream); at the end, without sustainable livelihood for the poor who grows rubber; and limited competitiveness in the end products market (the downstream). We should go back the source and really think about how we can improve the whole value chain, and why.

While more and more Chinese state-owned and private enterprises follow “Go Global” strategy by Chine central government who have heavily invested outside of China. Natural rubber end products, especially tires industry is one of them. In this reports, we scrutinized the natural rubber value chain in Thailand and its foreign investments , especially Chinese investments. We tried to answer:

  1. If there are the best rubber cultivation systems that combine economic returns and a better ecosystem services supporting system;
  2. The relationship between Chinese investors and Thai natural rubber value chain;
  3. The possible ways of sustainable and responsible rubber cultivation and investment.

Coming reports in Chinese

泰国橡胶种植面积.jpg

Figure 2. Thailand as the biggest rubber producer, produce 4.5millions ton of natural rubber, and 80% of Thailand domestic natural rubber is from Southern Thailand. Each polygon represents of a province in the map and the darker of the color represents the bigger area of rubber cultivation.

Everything is spatial and they all can be mapped and my Geo-Case 3.

I majored in Geography for my bachelor degree and did Ecological Economics for my PhD. I often got questions from people asking if I am more an economist or an ecologist, I would say I am more a geographer.  “Oh, I never knew Geography is a major” or “what is that?” If you ever asked me these kind of question, please don’t feel bad about it, because you’re not the only one, and I bet my peers got these questions and response all the time.

One thing you might not know is Geospatial Technology has been listed as the top emerging industry in next decades by United States Department of Labor. To me Geospatial Technology is a combination of art, science and engineering. It’s the art of data visualization, and the thought and skills the analyst/cartographer put behind it. I usually got question about what is my ideal job from friends and family, I would say it’s probably Geographic Information System (GIS) analyst or geospatial analyst within the Geospatial Technology since I moved to the states . It’s difficult to become a good GIS analyst in China, even though China got a huge group of GIS analysts and good Geo-programers because it’s more difficult to access to geo-data there, and the field develops just recently and the fundamental data sources are so limited. You have to work for top research institutes or universities to be able to get the data to do further geospatial analysis, and of course you have to build up the network with the state agents first to be able to have the access to the data. However, the story is so different in the states that you have so much open source database run by the state department, and the people are so willing to share their data, the analysis. I got too excited I guess lol and decide to be one of them….This is a link to more than 300 free GIS datasets shared by Prof. Robin Wilson: http://freegisdata.rtwilson.com. Of course, like you’ve just guessed most of free datasets are run and shared by the research institutes, NGO, state lab and universities in the states.

You’re probably would ask about ‘what is exactly geospatial technology?’ I would say they are just the maps you’re using every day, for example, the fastest or least traffic driving routs that google map tells you to get to airport; the map of the best Thai/Mediterranean/Mexican restaurants Yelp shows you; The price of housing around your neighborhood that your real estate agent tells you; or how about the crime map for police station in your area, some area attracts more serious crimes and some are low. Actually, you only need a very basic mapping skill with good platform to map these data that I mentioned above. Of course, you could do more sophisticated analysis with the geo-datasets you have access to, e.g. the medicare spending in some states (countries) are higher than others? Is that because they have less accessibility to good and clean water/food? Or there are just more health facilities happen to be there? Do people become healthier  after they spend more on the medicare and health? Or how about having a protected area in this location and why not a thousand miles away? How much we need to spend on protecting this wildlife? And where would be the most cost-efficient way to fund it?

Everything is spatial and they all can be mapped. Map is so powerful that you definitely could tell the story from the data presents on it. Even though a lot of data is open in the states, but to become a GIS analyst, who works for states department, needs to have a security clearance. It becomes impossible for a non us citizen then. It’s such a dilemma, right? However, there are already such a good and open data source in the states, and I’ve seen so many cool analysis have been presented. Therefore, the work ethics have been highly valued in the field, including what message you’re presenting in the maps and who you present to.

Back to my Geo-Case 3, I had an experience to work with a national nature reserve (state protected area) in China this July. I tried to quantify the habitat quality for future biodiversity conservation under human disturbance and climate change, and to target a location for avoid the impacts from both human disturbance and climate change. We’ve been working with the natural reserve for years and have able to road network, villages location, socioeconomic and demographic data, and historical land-use maps, soil map, vegetation type and so on. My colleagues in World Agroforestry Centre run climate change models, and they impacts on land-use change, which were the land-use change simulation rule for me to run the land-use scenarios.

Data description:  To run this geospatial model of wildlife habitat degradation under human disturbance would need the land- use maps in recent years, climate change data, soil, vegetation type, socioeconomic and demographic and so on. I run the habitat degradation models though InVest (you could get more information through here: http://www.naturalcapitalproject.org). It a platform that allow you to value natural capital.

Knowledge need: remote sensing imagery processing for the land-use mapping; Geo-statistics/geospatial through ESRI ArcGIS; InVest, especially, habitat quality and habitat risk management. (Leave me a comment if you need more detail information).

Result maps:

land-use

hibitat degredation

No surprise that human disturbance have more impacts on habitat quality in the nature reserve, which you could tell the degraded areas are along the road network; and the core area for the nature reserve, which has the least human disturbance, would maintain the highest habitat quality (yellow corner on the left corner of the map.).

*** I am now open for a job position in Geospatial Analyst/GIS Analyst around DC area (or telecommuting). I have knowledge on: 

  • Utilize advanced modeling techniques
  • Apply fundamental spatial statistics
  • Perform advanced vector and raster analysis
  • Perform surface modeling and analysis
  • Understand and apply python scripting
  • Perform fundamentals of spatial database design
  • Determine and define an appropriate coordinate system
  • Demonstrate proficiency in application configuration
  • Apply advanced visualization techniques
  • Understand fundamental network analysis concepts
  • Perform some basic network analysis

I mainly using ESRI ArcGIS 10.2,  ArcGIS online, server, and have experience using open GIS softwares and platforms on QGIS and GRASS, and spatial modeling scripting via R and Python. I will present the knowledge above in the  geo-cases I’m doing in my future blogs. English is my third language, and any comments, proof reading, and advice on language polishing and even the content revision are welcomed.

Geo-Case 2: analyzing a household hazardous waste participant for a NGO

Case description: The Pennsylvania Resources Council (PRC) is a non-profit organization (NGO) that into environmental protection. Recently, PRC is helping homeowners in many counties in Pennsylvania collecting and disposing of common household products such as paint, solvents, automotive fluids, pesticides, insecticides, and cleaning chemicals, which could not recycling easily. To better develop education and outreach materials response to their work,they want to map out the participants who are volunteering for the program.

Geospatial skill needed: Geocoding.

Data: In this study, only zipcode of the household participants have been recorded. 

Result: the location

Geocode household hazardous waste

In this case study, only participants around Allegheny county are presented. The result shows that more people who can access to PRC are more willing need to participant the program at this point. 

I would try to write more blogs here. The purposes are:

  1. wanna get better on writing in English;
  2. trying to sale my Geospatial analysis skill to the potential employers and customers;
  3. Put up the geospatial cases I’ve done and passion about, and would help me to build my own GEO case file in the future.

I am doing case study instead of putting up the programming,coding and mapping skill, because I believe that mapping is a kinda art and you won’t really interested in how I made them.If you are interesting in these skills, you could contact me for lectures, mapping and analysis work through: yi.zhuangfang@gmail.com. Any language editing, case presenting and map cartography comments are welcomed.

Geo-Case 1: how to promote your art activities for the potential market of your company

(In my blog, I will use a lot of abbreviation, one of the most frequent ones would be GIS, which stands for Geographic Information System)

GIS knowledge needed: Geocoding;

Data source: Pennsylvania (GIS Tutorial BOOK 1)

Analysis purpose:  There is an art event sponsored by an arts’ organization in Allison Park, Pittsburgh, called FLUX. The event planners of FLUX would like to know where the event attendees reside for planning and marketing activities in the future. There are only attendees’ ZIP codes have been recorded. So simply, we are going to produce a map for FLUX by using ZIP code that attendees provided to tell where they are from in the area.

Data summery: 1487 attendees, and about 86% of them are from Pennsylvania;

Map result:

Tutorial8-1

Conclusion: FLUX needs more planning and marketing for Pittsburgh, especially the southeast corner, since there are more attendee from that area, and they could be the main customers for the future art events marketing who drove far up to north for the art event. However, there are always more information could be visualized, for example, the age, occupation and so on!