Note: This workflow is covered in greater detail through the Introduction to Data Visualization workshop, offered periodically throughout the year. Find it in the library events calendar. Feel free to sign up to learn more!
One way to approach a data visualization task is to go through a general process of:
1. Identify your audience and purpose
2. Select, prepare and explore your data
3. Select the broad form of the visualization based on audience, purpose, and data
4. Select the detailed visual elements within the visualization, considering the data/variables to display and how, what text to include, how you will lay out the elements, etc.
5. Share and receive feedback
The process usually doesn’t end there. Instead it is an iterative approach to further develop the visualization until you are happy with it and it meets the desired needs. (Adapted from Visual Insights by Katy Borner and David Polley).
When you start thinking about visualizing your data, it is important to first consider your audience and purpose. You should be asking yourself the following questions:
- Who is your audience for your visualization? What level of familiarity do they have with your topic? Are there any accessibility concerns?
- What is the purpose of your visualization? Is it to communicate a finding or is it exploratory, for your own analysis?
- If you are using your visualization for communication, what is the main idea you're trying to communicate? Is there an important trend or comparison that you will need to highlight?
- Where is your visualization going to be used? Is this for a presentation, a poster, an article, or a website? How will the visualization add value to and support the presentation/poster/article/website? Are there any submission guidelines or requirements?
- Will your visualization be interactive? If so, what options will you give your users so that they can interact with your visualization to gain their own insights?
Your next step is to gather data, often from multiple sources. When you do this, you will also have to clean or normalize your data. You may have to consider missing data or outliers, converting units, aggregating/summarizing data, etc.
Some tools/languages you can use to "clean" your data:
- MS Excel (Add-ons to help with cleaning in Excel can be found here.)
- OpenRefine (Free, open source software to help clean data. Check out the documentation to help you get started.)
- Trifacta Wrangler (Free tool to help you clean data. Trifacta Online Training will help you get started.)
- R (Try R Studio or Anaconda to work with R. Here's a document to help you get started cleaning data with R.)
- Python (Try Anaconda to work with Python. Here's a tutorial to help you get started clearning data with Python.)
Then you have to consider the characteristics of your data, as this will inform the type of visualization you choose and how to visualize individual variables within the broader visualization. Questions to ask yourself:
- In general, roughly, is your dataset mainly quantitative (numbers), qualitative (text), or both?
- In general, what type of dataset is it (could be more than 1)? Geographical, topical (i.e., textual), temporal (i.e., over time), network, and/or statistical/numeric?
- Is your dataset, microdata or aggregated/summarized data? (Microdata means unaggregated, where each row is an observation.)
- How large is your dataset? How many variables to do you have?
- What types are your variables within your data? Categorical (nominal), ordinal, interval, or ratio? (Here is a helpful link to clarify these types.)
- Are there any other special characteristics of your dataset? Privacy concerns, etc?
Note: These questions are broad and the terms used may vary in use and meaning depending on discipline.
The better you understand your dataset and what it is trying to tell you, the more effective your visualizations will be.
When thinking about the form of your visualization, it may be helpful to decide if the data you’re going to visualize is geographic (so you might want a map), relationship-based (network diagram), temporal (line graph), numeric (bar chart), etc. That is why you were asking yourselves questions about your data in the previous section. Another question to ask yourself is: what are you trying to show: comparisons, relationships, etc.?
One you have answers to these questions and the questions from section 2, if you are not sure what visualization form works your data or the story you want to tell, check out these links:
- Choosing a Good Chart
- The Data Visualisation Catalogue (Try selecting Search by Function option. This site also has a description about each form with pros and cons.)
- Chart Chooser
- Tableau - Which chart or graph is right for you?
- A Periodic Table of Visualization Methods
- Data Viz Project
- Interactive Chart Chooser
- Text Visualization Browser
- The TimeViz Browser
- A Visual Bibliography of Tree Visualization 2.0
- From Data to Viz
Once you have the broad form chosen, you can start to decide how to add and visualize individual variables. You would basically vary the marks on a page to convey the data. This is where those questions you asked yourself about your variables from section 2 are so important. For example, categorical variables can work well visualized by varying colour or shapes. Quantitative variables, such as ratio variables work better visualized varying position.
Here are a few articles, with helpful charts/lists, to read to understand this better and help with your visualization choices:
- InfoViz Wiki Visual Variables (This is a great overview presenting classic research by Jaques Bertin and Jock Mackinlay.)
- “Automating the design of graphical presentations of relational information” by Jock Mackinlay (Here's the full article, where fig. 15 is discussed in the InfoViz article above.)
- "Considering Visual Variables as a Basis for Information Visualization" by M. S. T. Carpendale (This is a report building upon Jaques Bertin's visual variables.)
- "Properties and Best Uses of Visual Encodings" by Noah Iliinsky (Another summary table condensing and drawing upon this information discussed above.)
- "Visual Variables" in the Cartography Guide by Axis Maps (An excellent overview describing Bertin's visual variables and their properties - especially with how it relates to mapping.)
Another particular aspect of your visualization is colour. Choosing appropriate colours for your visualization can be a challenge. It depends, again, on your variable type. So to ensure that your visualizations are effective and well-understood by your audience (watch out for colour blindness or cultural meanings associated with a certain colour!), consider some of these tools and websites to help you choose colours:
- ColorBrewer 2.0
- Coblis — Color Blindness Simulator
- 5 tips on designing colorblind-friendly visualizations
- Picking a Colour Scale for Scientific Graphics
Also, note, that some visualization programs, such as Tableau Desktop, include colour palettes that are appropriate for wide audiences.
Finally, you should also keep in mind that general design principles apply, whether you're creating a static poster or an interactive visualization. See the Design Principles section for more details.
Creating a visualization can be iterative. So remember to share your visualizations and get feedback to constantly improve them.
Also, remember that it is important to make sure that your audience understands how to read your visualization. Basic types, such as a line graph are taught in school, but more complex visualizations may not be so obvious. Sometimes a legend is enough. Other times, you may need to include some information on how to read/interpret your visualization.
If you're allowed to use the data in that way, don't forget to include citations for your data sources! Not only is this good academic practice, but it lends more credibility and authority to your data visualizations. Check out our Citing Data page for more information.