Can the next concept. We’re going to talk through is the idea of data visualization and data exploration. So before we do fancy statistical models. Typically, we want to just check it out. And often the exploration techniques we have include visualization and the tool that we’re going to use called ggplots, which is part of the tidy verse inside of our What’s nice about it is the techniques that work for simple plots also make it easy to make more complicated or fancier plots that we might want to look at Later. So the main idea to start with the idea that when we’re thinking about what kind of graph to make we want to start by thinking about what kind of data. We’re going to look at and how we want to look at it so The short answer is, we often want to look at categorical data which are categories and numerical data which are numbers. Of course, there are different variations within that different versions. But typically for categorical variables in a one variable case we think about our charts for multiple Cases, we’d make a two way table or a chart numbers. However, we would either make a box plot or histogram, to start with. And again, hopefully you remember those from your interest that quarters. The question though is, what happens when you start to look at more than one variable at a time. And for numerical data if we have to numerical data to numerical variables, then a scatter plot is going to totally be the variation we want to have When we’re looking at categorical variables against themselves stacked bar graphs are what we want. And when we mix the two, we’re going to often make box plots to look at how numerical variables. Look for different categories. Later on we’re going to talk about what we do and then I’ll get three variables. And do we want to make Just even more box plots. Do you want to make a third variable and scatter plot like color or size of the.or, that kind of thing. But for most simple investigations. We’re going to make either one variable or two variable charts. The other thing we want to think about are the kind of compelling stories that we could have a good visualization tells a story and The idea is that the mapping should help explain what’s going on, rather than Make it more complicated. So in general, just like when you write a story or when you write a paper you’re told have one thesis make everything tie into your one big idea. The same thing is true of visualization And even looking historically there are some really good and some really interesting cases of graphs and these were done by hand. Well, before Thinking about the idea of doing big data or any of that kind of thing. This is one of the first big charts. That came out. And this is from Dr. Jon Snow before he was in Game of Thrones Jon Snow was one of the founders of epidemiology, or the idea that we could study public health through statistics. And there was a cholera outbreak. And what he did is he took a map of London, and he started putting dots where everyone who had color ahead. And he realized that there was a water pump right here in the middle. And sure enough, that pump had been poisoned accidentally and everybody who had the contamination. Got it. Because they were taking water from that water pump. Another graph. This is even older was a graph of how Napoleon March. So this is a mix of a geography map moving from Paris. To Moscow with this many troops. You can see that as the graph got narrower, there are fewer troops and by the time he got to Moscow Napoleon had lost a big chunk of troops and as they retreated, you can see How they got even worse. And it turns out that when you’re invading somewhere burning the crops behind is not actually a good idea. And this graph was actually Sort of famous from Charles Joseph minority, who’s one of the famous Dress people now Dr. That tree includes one other graph here and this is one that he found on the internet and The idea of political parties and this graph is very complicated. It’s confusing and it’s hard to really tell what’s going on. Now if you are looking for good public graphs gap minder is we’ll talk about this a little bit later but gap minder has a tool specifically to make simple population and public health visualizations. Some of you may have heard of tablo which is also software that specifically designed for visualization purpose. Now thinking about how GG plots is going to do it G G stands for the grammar of graphics and the idea is we’re going to do. Kind of a common language to make graphs and as we try to make different crafts. We’re going to try to keep the terminology, the same Just like before, there’s actually a very nice cheat sheet on the our studio site that helps with GG plots and I’ll link to that in the Google Classroom site. But the idea is we’re going to start by talking about what data set. We’re using Then we’re going to use GM to talk about the geometric graph, the kind of visualization. We’re going to use. And then we’re going to use a mapping To decide how the variables are going to be assigned sometimes also called the aesthetics to figure out how their work, then everything else is added on the leader. And what’s nice is if we’re going to use that Titanic survival data that we used before we start by just saying, here’s the data. We’re going to use. And of course, at this point, there’s no Graph made and then we use what we call pipe which is a way to connect to the next line and we use the command GM point and appoint just makes the point graph. And then mapping just says what variables. We’re going to be using now mappings can either go here in with the GM or they can go at the top. Either works. And as you get better at it, you can actually leave a lot of these out. So you can actually just go GM point x equal survived y equals age, but seeing how the grammar works in its fullest sense Makes it easy to see how we can go now individual value pots like this dot plots. Are kind of nice. But you could probably make a box plot of this case. But what’s cool is we can just switch to your point to GM box. Plot and it’s going to make the same graph. And you can see that everything else in the code is literally the same so GM box plot is what we use to make box plots. We can start to add more functions to make fancier box plots. So we do Phil equals passenger class. So the first, second, and third class and you can see how those variables go on from there. Now, If we start to look at maybe a chart of that, what you see is that We can’t actually make this chart in a more complicated way. Doing the simple thing, but we can add stat summary so stat summary, what it does is it makes a function called film.ly which is just the mean And now we can color that mean as a red and we’re going to use a point to show it. So now we have our box plot and now we’ve put a dot with the mean Right there on it. So you can see that the grammar of graphics start simply and we just start stacking things on top of it. And as we get towards the end of the semester, you’ll be able to make really cool, really interesting charts using this simple notation. And like I said, getting more and more complicated. Will have a lab and homework assignment where we’ll talk about the more specifics of the terms that we’re going to use