Data visualization tools help to house, cleanse, process and display the rapidly growing, vast and complex data that we generate and collect every moment. Data visualization facilitates decision-making processes that allow businesses to compete and adapt quickly to changing environments. The ever-increasing flow of data streams in both real-time and near real-time require the use of dynamic dashboards, driving the demand for sophisticated tools, platforms and applications. The most effective visualizations combine data cleansing and analytic techniques, and often require a competent data scientist. Nevertheless, there are a number of open source data visualization tools that can assist businesses with visualizing their data easily.
Open-Source Data Visualization Tools
Research conducted for this post revealed more than 50 data visualization tools that can be considered “open source”. Open source can sometimes generate a bit of confusion as being synonymous with “free”. The most basic definition of open source in the context of software is “software with source code that anyone can inspect, modify, and enhance”. Open source data visualization tools require the user to have some programming ability, whereas free visualization tools may not necessarily need the user to have programming ability.
This post profiles both free and open source data visualization tools, and includes a comparison matrix that can be used to compare and contrast each tool.
Charted is perhaps one of the easiest data visualization tools around, as it simply requires a link to a .csv file or a Google Sheets location; hit GO and Charted creates a visual display using a bar or line chart. According to the developers of Charted (created by the Product Science Team at Medium), the tool was built around three principles: it does not store data, does not transform data, and is not a formatting tool. It pulls data on a regular cadence (refreshes every 30 minutes) so changes made to the underlying sheet are always up-to-date in the chart. It also supports tab-delimited files and Dropbox links. Training? Non-existent, though neither is it required.
Datawrapper is a tool that has been in existence since 2011 and is primarily used by journalists, though is comprehensive enough to be useful to any data scientist or researcher. In contrast to most of the tools profiled here, Datawrapper has free and paid versions. It’s also not technically open-source because no coding skills are needed. As the site home page explains, you simply cut & paste, visualize, and publish. Charts are interactive, meaning viewers can see underlying values, and the visualizations can also be embedded on a website. There is a wide range of charting options from simple bar charts to scatter plots, as well as mapping functionality.
— Datawrapper (@Datawrapper) July 5, 2018
Similar in some respects to Charted and Datawrapper, RawGraphs, whose tagline is the missing link between spreadsheets and data visualizations, simply requires the user to either cut/paste data, upload, or provide a link to create a wide variety of charts. One feature that differentiates RawGraphs is that a number of unconventional visualization models are provided (e.g. sunburst, alluvial diagrams, dendrograms for hierarchical clustering, etc.). Don’t fret, novices – the usual suspects (bar, line, pie, scatter) are also included. For advanced users, new chart types can also be created. Visual creations can be exported as vector or raster images for display on your website, and the tutorials, while not extensive, can be completed quickly so you can get right to work on that visual magnum opus.
— Radim Řehůřek (@RadimRehurek) July 6, 2018
In the category of upload and create, OpenHeatMaps is a fairly basic tool that allows user to upload either a csv, excel, or Google Sheets file, and create a map instantly. OpenHeatMap can also be used by developers (as a JQuery plugin) to provide for mapping functionality within their own website. Users uploading a file for rendering are recommended to include a full street address in one field, with values represented in another field (for instance, housing value, sales price, number of employees, etc.). Geographies can be point-based (i.e. one address), or aggregates such as city, county, state, etc.
DyGraphs claims as one of its primary features the ability to handle huge data sets, plotting millions of data points without “getting bogged down”. Another feature, for those who consider themselves stats nerds, is the ability to display error bars and/or confidence intervals. To use these, one standard deviation must be specified in the data file. The tutorial demonstrations are fairly basic but should serve to get someone started fairly quickly in creating their own visualizations.