I believe all of us have been watched the movie Titanic by James Cameron (1997) again and after a good sobbing, let find out if we all could survival through the Titanic. Actually, Titanic dataset is also a superstar dataset in data science that people use to do all sort of crazy survival machine learning. Today we are going to use R to answer who actually survived and what their age, sex, and social status.
The sinking of the RMS Titanic occurred on the night of 14 April through to the morning of 15 April 1912 in the north Atlantic Ocean, four days into the ship’s maiden voyage from Southampton to New York City.
(image from google)
What is in the dataset.
We have 1308 passengers in the data. The data includes:
survival Survival (0 = No; 1 = Yes);
pclass: Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd);
sibsp Number of Siblings/Spouses Aboard;
parch Number of Parents/Children Aboard;
ticket Ticket Number;
fare Passenger Fare;
embarked Port of Embarkation; (C = Cherbourg; Q = Queenstown; S = Southampton).
How the dataset looks like.
2. Running R and packages.
I have uploaded my R codes to my GitHub account, find my R codes on GitHub.
This graph shows you who are on Titanic, there were more male passengers than female especially for the third class.
This is a graph show the survival comparison. Left graph shows people who did not survive and right graph show the survival counts (how many people survived). The death rate for third class passengers was super high :-(. Female passengers had high survival rate, especially for the first class.
This is also a death and survival comparison but with the age element (y-axis). From who were the survivals question you could see, the female had the highest survival rate overall, but for third class female tended to be much younger to be able to survive the tragedy. Now you know why Jack did not survive in the movie Titanic wasn’t a just tragedy itself, but it also there was the higher risk for him to lose his life in the voyage sinking.
Data visualization is very straight forward, isn’t it. Here is a TED talk ‘The beauty of data visualization’ by David McCandless I found. It’s really inspiring if you guys every interested in data visualization.