How to use the online map tool for investing in sustainable rubber cultivation in tropical Asia

Please go ahead and play with the full-screen map here.

This map Application is developed to support the Guidelines for Sustainable Development of Natural Rubber, which led by China Chamber of Commerce of Metals, Minerals & Chemicals Importers & Exporters with supports from World Agroforestry Centre, East and Center Asia Office (ICRAF). Asia produces >90% of global natural rubber primarily in monoculture for highest yield in limited growing areas. Rubber is largely harvested by smallholders in remote, undeveloped areas with limited access to markets, imposing substantial labor and opportunity costs. Typically, rubber plantations are introduced in high productivity areas, pushed onto marginal lands by industrial crops and uses and become marginally profitable for various reasons.

Rubberplantation

Fig. 1. Rubber plantations in tropical Asia. It brings good fortune for millions of smallholder rubber farmers, but it also causes negative ecological and environmental damages.

The online map tool is developed for smallholder rubber farmers, foreign and domestic natural rubber investors as well as different level of governments.  

The online map tool entitled “Sustainable and Responsible Rubber Cultivation and Investment in Asia”, and it includes two main sections: “Rubber Profits and Biodiversity Conservation” and “Risks, SocioEconomic Factors, and Historical Rubber Price”.

The main user interface looks like the graph (Fig 2). There are 4 theme graphs and maps.

p1_section intro

Fig. 2. The main user interface of the online map tool.

. Section 1

This graph tells the correlation between “Minimum Profitable Rubber (USD/kg)” (the x-axis of the graph, and “Biodiversity (total species number)” in 2736 county that planted natural rubber trees in eight countries in tropical Asia.  There are 4312 counties in total, and in this map tool, we only present county that has the natural rubber cultivated.

p1_section intro_high

Fig. 3. How to read and use the data from the first graph. Each dot/circle represents a county, the color, and size of it indicates the area of natural rubber are planted. When you move your mouse closer to the dot, you will see “(2.34, 552) 400000 ha @ Xishuangbanna, China”, 2.34 is the minimum profitable rubber price (USD/kg), 552 is the total wildlife species including amphibians, reptiles, mammals, and birds.  “400000 ha” is the total area of planted natural rubber plantation from satellite images between 2010 and 2013. “@ Xishuangbanna, China” is the geolocation of the county. 

Don’t be shy, please go ahead and play with the full-screen map here. The minimum profitable rubber price is the market price for national standard dry rubber products that would help you to start makes profits. For example, if the market price of natural rubber is 2.0 USD/kg in the county your rubber plantation located, but your minimum profitable rubber price is 2.5 USD/kg means you will lose money by just producing rubber products. However, if your minimum profitable rubber price is 1.5 USD/kg means you will still make about 0.5 USD/kg profit from your plantation.

The county that has a lower minimum profitable price for natural rubber is generally going to make better rubber profit in the global natural rubber market. However, as scientists behind this research, we hope that when you rush to invest and plant rubber in a certain county, please also think about other risks, e.g. biodiversity loss, topographic, tropical storm, frost as well as drought risks. They are going to be shown later in this demonstration. 

p2_section intro_high.gif

Fig. 4.  The first map is the “Rubber Cultivation Area”, which shows the each county that has rubber trees from low to high in colors from yellow to red. The second map “Minimum Profitable Rubber Price”(USD/kg), again the higher the minimum profitable price is the fewer rubber profits that farmers and investors are going to receive. The third map is ” Biodiversity (Amphibians, Reptiles, Mammals, and Birds)”,  data was aggregated from IUCN-Redlist and BirdLife International.

. Section 2

We also demonstrated different types of risks that investors and smallholder farmers would face when they invest and plant rubber trees. Rubber tree doesn’t produce rubber latex before 7 years old, and the tree owners won’t make any profit until the tree is around 10 years old in general. In this section, we presented “Topographic Risk”, ” Tropical Storm”, “Drought Risk”,  and “Frost Risk”.

p3_section intro_high.gif

Fig. 5. Section 2 ” Risks, SocioEconomic Factors and Historical Rubber Price” has seven different theme maps and interactive graphs. They are “Topographic Risk”, ” Tropical Storm”, “Drought Risk”,  and “Frost Risk”, “Average Natural Rubber Yield (kg/ha.year)”, “Minimum Wage for the 8 Countries (USD/day)”, and ” 10 years Rubber price”.

If you are interested in how the risk theme maps were produced, Dr. Antje Ahrends and her other coauthors have a peer-reviewed article published in Global Environmental Change in 2015.  “Average Natural Rubber Yield (kg/ha.year)” and “Minimum Wage for the 8 Countries (USD/day)” dataset was obtained from  International Labour Organization (ILO, 2014)  and FAO.” 10 years Rubber price” was scraped from  IndexMudi Natural Rubber Price.

Dr. Chuck Cannon and I are wrapping up a peer-reviewed journal article to explain the data collection, analysis, and policy recommendations based on the results, and we will share the link to the article once it’s available. Dr. Xu Jianchu and Su Yufang have shaped and provided guidance to shape the online map tool development. We could not gather the datasets and put insights to see how we could cultivate, manage, and invest in natural rubber responsibly without other scientists and researchers study and contribute to field for years. We appreciated Wildlife Conservation Society, many other NGOs and national department of rubber research in Thailand and Cambodia for their supports during our field investigation in 2015 and 2016.

We have two country reports for natural rubber in Thailand, and natural rubber and land conflict in Cambodia, a report support this online map tool is finalizing and we will share the link soon when it’s ready.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Technical sides

The research and analysis were done in R, and you could find my code here.

The visualization is purely coded in R too, isn’t R is such an awesome language? You could see my code for the visualization here.

To render geojson format of multi-polygon, you should use:

library(rmapshaper)
county_json_simplified <- ms_simplify(<your geojson file>)

My original geojson for 4000+ county weights about 100M but this code have help to reduce it to 5M, and it renders much faster on Rpubs.com.

I learnt a lot from this blog on manipulating geojson with R and another blog on using flexdashboard in R for visualization. Having an open source and general support from R users are great.

Global Zika virus epidemic from 2015 to 2016: A big data problem- 大数据分析全球Zika病毒传染

Centers for Disease Control and Prevention (CDC) provided Zika virus epidemic from 2015 to 2016,  about 107250 observed cases globally, to kaggle.com. Kaggle is a platform that data scientists compete on data cleaning, wrangling, analysis and provide the best solution for big data problems.

美国疾病传染防控中心 (CDC) 给大数据分析师们提供了一个记录有十多万个全球Zika病毒传染案例。这个数据传到了Kaggle网站上,Kaggle网站是一个大数据分析比赛和数据共享平台。

Zika virus epidemic problem is an interesting problem, so I took the challenge and coded an analysis in RStudio.  However, after finishing a rough analysis, I found that this could be an example of big data analysis instead of a perfect example for CDC on Zika virus epidemic. Because the raw data has not been cleaned and clarified yet, and the raw data description could be seen here.

我觉得这个挑战还蛮有意思的,所以也下载了数据来分析看看。这个博客里头提供的是我初始分析的一些结果。但是必须提前申明的一点是:由于CDC提供的原始数据本身还是满粗糙也有很多记录不明晰的地方,所以我的这个分析以其说是一个解决方案不如说是一个纯粹的大数据分析案例。

A bit of background of Zika and Zika virus epidemic from CDC.

  • Zika is spread mostly by the bite of an infected Aedes species mosquito (Ae. aegypti and Ae. albopictus). These mosquitoes are aggressive daytime biters. They can also bite at night.
  • Zika can be passed from a pregnant woman to her fetus. Infection during pregnancy can cause certain birth defects, e.g. Microcephaly.  Microcephaly is a rare nervous system disorder that causes a baby’s head to be small and not fully developed.
  • There is no vaccine or medicine for Zika yet.

关于Zika和Zika病毒传染的一些背景知识:

  • Zika由通过Aedes蚊虫叮咬传播(主要是该蚊子的两个分种:Ae. aegypti 和Ae. albopictus 传播)。该蚊虫叮咬主要发生在白天,当然也会发生在晚上。
  • Zika的危险之处是病毒可以通过怀孕的母亲传给其腹中的婴孩。病毒可以影响胎儿正常的神经发育而引起生育缺陷,包括现在被发现和报道的小头症。
  • 目前可预防Zika的药物和预防针还没有。

Initiative outputs from the data analysis 初始的分析结果

Firstly, let see the animation of the Zika virus observations globally. The cases observations were started recorded from Nov. 2015 to July 2016. At least from the documented cases during the period, it started from Mexico and El Salvador, and it spread to South American countries and the USA. The gif animation makes the data visualization looks fancy, but while I looked deeply, the dataset need a serious cleaning and wrangling.

CDC提供的数据采集于2015年11月到2016年7月份之间。从下图动画中可以看出这段时间之内Zika的传播是从墨西哥和萨尔瓦多两个国家开始传播的。虽然这个动图让传染病从一个国家到另一个国家的传播速度更为明了,但是其实仔细看下来CDC提供的这个原始的数据却还是需要特别清理的。换句话来说就是数据采集,和记录挺混乱。

Zika_ani.gif

Raw data 原始数据用Excel表格打开的样子

dataset screenshot

The raw data was organized by report date, case locations, location type, data field/data category,  the field code, period, its types, value (how observations/cases), the unit.

原始数据的记录记录是每一个Zika案列发生的时间,地点,地点类型(是区域还是省级的),案例类型,类型代码,发生的时段,发生的类型,以及案列数等等。

Rplot

While I plotted the cases by counties from 2015 to 2016, we could see most of Zika epidemic cases were observed much more in 2016 especially in South American countries. Colombia had by far the most reported Zika cases. Puerto Rico, New York, Florida and Virgin Islands of USA have reported Zika cases so far.  During this data recorded period 12 countries were reported had Zika virus cases, from most reported cases to the least these countries are: Colombia (86,889 reported cases), Dominican Republic (5,716), Brazil (4,253),  USA(2,962), Mexico (2894),  Argentina (2,091 ), Salvador (1,000), Ecuador(796), Guatemala (516), El   Panama(148) , Nicaragua (125) and Haiti (52). See the below map.

把原始数据按照记录直接用来作图的话就会发现Zika传染病被报道的案例从2015年到2016年有一个数量级的爆发。换句话来说就是2016年的数量比2015年要多很多(不过2015年的数据记录才从11月份开始,所以其实也不足以说明问题)。哥伦比亚这个国家Zika被报道的案例在2016年是全球最高的。美国的话也有近3000个案例被记录在案,其中波多黎各,纽约,佛罗里达和维京各岛屿相继都有Zika案例报道。从全球传播来看亚洲欧洲被报道的案例数没有被包括在这个数据之中,而有12个北美,中美和南美的国家被大量报道Zika病毒的传播。这12个国家和这些国家被记录的Zika案例数量从最高到最低来看分别是:哥伦比亚 (86889 报道案例),多米尼加共和国(5716),巴西(4253),美国(2962),墨西哥(2894),阿根廷(2091 ),萨尔瓦多(1000),厄瓜多尔(796),危地马拉(516),巴拿马(148),尼加拉瓜 (125)和海地(52)。请看一下地图。

Rplot01

However, while I went back to organize the reported Zika cases for each country, I found the data recorded for each country was not consistent. It’s oblivious that the each country has their strengths and different constraints for tracking Zika epidemic. Let’s see some examples:

所以我接下来想要看的就是每个国家记录的Zika案列都可以怎么分类。但是其实从下图就可以看出来每个国家对于案例的追踪和记录还是有所差别的,可能和每个国家负责记录数据,追踪案例的机构都不同有关系。大家可以通过以下各图来了解一个究竟:

Rplot14Rplot13Rplot12Rplot11Rplot10Rplot09Rplot08Rplot07Rplot06Rplot05Rplot04Rplot03Rplot02

In the states, most of the reported cases are from travel. But I am confused that aren’t the confirmed fever, eye pain, headache cases overlapped with zika reported, and zika_reported travel were included in yearly_reported_travel_cases. If so, were the cases were overestimated for most of the countries. Probably only CDC could explain the data much better from medical conditions and epidemic perspective.

就比如在美国被报道最多的案例类型中,其实是旅游相关的,就是病毒传染者去过病毒传播比较猖狂的国家。但是数据记录类型来看有症状相关的记录比如确定发烧,眼睛疼和头疼的案列,难道这些案列不是和已经怀疑或者的确诊的案列是重合的吗?难道眼睛疼和发烧是两个独立的案例和症状?所以有此就可以看出CDC提供的原始数据本身在分析之前是需要好好的理解也需要好好的清理一下的。或者数据记录都正确,但很多让人不解的地方似乎也只有CDC自己出来解释了。

From the reported cases that Microcephaly cases caused by Zika virus were only founded in Brazil and Dominic Republic.  Microcephaly is a rare nervous system disorder that causes a baby’s head to be small and not fully developed. The child’s brain stops growing as it should. People get infected with Zika through the bite of an infected Aedes species mosquito (Aedes aegypti and Aedes albopictus). A man with Zika can pass it to sex partners but there was a case that a woman who infected with Zika virus has been found passed the virus to her partner too.

从发生的Zika案例来看Zika病毒感染引起的小头症(Microcephaly )目前只有在多米尼加共和国和巴西这两个国家被确诊和报道过。小头症是一种病毒感染而阻止婴孩神经系统正常发育,而引起的不正常头部发育。小头症顾名思义就是婴孩脑子的发育比正常发育的头要小,婴孩的脑子停止发育造成的。所以准备怀孕和已经怀孕的妇女其实应该避免到这些国家履行。现在已经被报道Zika病毒除了通过蚊虫叮咬传播其实通过性交也是可以传播的。之前报道只发现感染病毒的男性通过性交会把病毒传给其女伴,但是最近有一个案例也说明感染病毒的女性同样也可以通过性交传播病毒给其男伴。

My original R codes could be accessed here; first gif animation graph was originally coded by a UK-based data scientist Rob Harrand, and I only edit the data presented interval and image resolution.

这也算是一个非常粗糙的分析,但是如果大家对我的原始分析程序感兴趣,请移步这里。这个博客中使用的动图原始程序是英国大数据分析师Rob Harrand做的,我只是改了他的参数还有生成的动态图的尺寸。当然除了动图之外其他程序都是我写的,如果有需要请注明出于geoyi.org.

Note: Again, this is an example of big data analysis instead of a perfect example for CDC on Zika virus epidemic, because the raw data from CDC still need seriously cleaning. For more insight, please follow CDC’s reports and cases recorded.

注明:再一次重申这个大数据分析以其说是给CDC做的完整的分析不如说是一个纯粹的大数据分析案例。因为大家可以看到其实这个原始数据是需要特别清理的,而且部分数据应该只有CDC他们自己才能够解释清楚的。如果大家感兴趣可以去看看CDC相继的报道以及数据记录。