GitHub - nshiab/simple-data-analysis.js: Easy-to-use JavaScript library for most...
source link: https://github.com/nshiab/simple-data-analysis.js
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Simple data analysis (SDA) in JavaScript
This repository is maintained by Nael Shiab, senior data producer at CBC/Radio-Canada.
If you use the library, show off your work and tag me on Twitter or LinkedIn! :)
These project's goals are:
-
To ease the way for non-coders (especially journalists) into the beautiful world of data analysis and data visualization in JavaScript.
-
To standardize and accelerate frontend/backend workflows with a simple-to-use library working both in the browser and with NodeJS.
We are always trying to improve it. Feel free to start a conversation or open an issue, and check how you can contribute.
The documentation is available here and more demos here.
Table of contents
Core principles
Under the hood, SDA is mainly based on D3 modules, Observable Plot and Lodash. The focus is on providing code that is easy to use and understand.
The library expects tabular data stored in CSV/TSV files or arrays of objects stored in JSON files. It works best when the data is tidy:
-
Every key (or column) is a variable
-
Every item (or row) is an observation
-
Every value (or cell) is a single value
For more about tidy data, you can read this great article.
The easiest way to use the library
If you don't want to install anything, a great platform is Observable. Check this demo of the library in an Observable's notebook.
Simple example from the HTML
If you want to add the library directly to your webpage, you can use the minified bundle from a npm-based CDN like jsDelivr and call sda.
Here's an example.
<script src="https://cdn.jsdelivr.net/npm/simple-data-analysis@latest">
// If you have a source map warning in the console,
// you can use src="https://cdn.jsdelivr.net/npm/simple-data-analysis@latest/dist/simple-data-analysis.min.js"
</script>
<div id="viz"></div>
<script>
async function main() {
const simpleData = await new sda.SimpleData()
// We retrieve some data
.loadDataFromUrl({
url: "https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/data/employees.csv",
autoType: true // CSV files are text. Automatically convert numbers.
})
simpleData
// We remove duplicate items
.removeDuplicates()
// We compute the mean of
// the salaries for each job
.summarize({
keyValue: "Salary",
keyCategory: "Job",
summary: "mean"
})
// We remove items with missing values
.excludeMissingValues()
// We log the table in the console
.showTable()
// We select our div with the id "viz"
// and we add a chart in it.
document.querySelector("#viz").innerHTML =
simpleData
// getChart() returns SVG
// or HTML elements
.getChart({
x: "mean",
y: "Job",
color: "Job",
type: "barHorizontal",
marginLeft: 100
})
}
main()
</script>
And here's the result in the browser!
As you can see below, SDA is a lightweight library optimized for the web (98kb ≈ 12ko).
Working with NodeJS and JavaScript Bundlers
First, make sure that your NodeJS version is 16 or higher. To check it, write node
in your terminal and press Enter.
You should see something like this.
If the version is less than 16, update NodeJS with the latest LTS (long-term support) version .
To install the library with npm, type this command in your terminal:
npm i simple-data-analysis
Once installed, you can import what you need.
import { SimpleData } from "simple-data-analysis"
const someData = [...] // An array of objects
const simpleData = new SimpleData({ data: someData })
If you are using NodeJS and want to read or write local files, use SimpleDataNode instead.
import { SimpleDataNode } from "simple-data-analysis"
const simpleData = new SimpleDataNode()
.loadDataFromLocalFile({
path: "./someFile.csv"
})
Using it with React
You can use SDA with React as well. Put the relevant code inside a useEffect or useMemo. The example below was created inside a Next.js app.
import { useEffect, useRef } from "react"
import { SimpleData } from "simple-data-analysis"
export default function Home() {
const ref = useRef()
useEffect(() => {
SimpleDataFromUrl()
async function SimpleDataFromUrl() {
const simpleData = await new SimpleData()
.loadDataFromUrl({
url: "https://raw.githubusercontent.com/nshiab/simple-data-analysis/main/data/employees.csv",
autoType: true
})
ref.current.innerHTML =
simpleData
.getChart({
x: "Departement or unit",
y: "Salary",
type: "dot",
marginLeft: 50,
trend: true,
showTrendEquation: true
})
}
}, [])
return <div ref={ref}>
</div>
}
Here's the result.
Using it with D3
D3 is a powerful library widely used to create stunning data visualizations.
It works best with the data structured as an array of objects, exactly like SDA. The two libraries complement each other very well.
// Use SimpleData to prepare your data
const simpleData = new SimpleData({
data: arrayOfObjects
})
// Chain methods to filter,
// clean, summarize, etc.
// Then use D3 to visualize
const svg = d3.select("#dataviz")
svg.selectAll("circle")
.data(
simpleData.getData()
// getData() returns the data as
// an array of objects. Easy!
)
.join("circle")
// Keep on doing your D3 magic.
Using it with ThreeJS / React Three Fiber (shaders)
ThreeJS is general purpose 3D library. Under the hood, it sends instructions to the GPU, which allows for high-performance visualizations in 2D and 3D.
React Three Fiber is a React renderer for ThreeJS.
To visualize hundreds of thousands of data points, you can use custom shaders with these wonderful libraries. To do so, you need to pass your data as a BufferAttribute.
Here's how to display points while passing custom data - from a SimpleData instance of course! - to the shaders.
// Use SimpleData to manipulate your data
const simpleData = new SimpleData({
data: arrayOfObjects
})
// Let's imagine that you transformed your
// data to look like this.
// [
// {r: 0.1, g: 0.2, b: 0.3, x: 1, y: 2, z: 3},
// {r: 0.4, g: 0.5, b: 0.6, x: 4, y: 5, z: 6},
// {r: 0.7, g: 0.8, b: 0.9, x: 7, y: 8, z: 9},
// ...
// ]
// To pass the positions and colors as BufferAttributes,
// you need to restructure your data as one array
// Let's start with the positions
const positionsKeys = ["x", "y", "z"];
const positions = simpleData.getArray({
key: positionsKeys
});
// The returned array looks like this
// [1, 2, 3, 4, 5, 6, 7, 8, 9, ...]
// And now the colors.
const colorsKeys = ["r", "g", "b"]
const colors = simpleData.getArray({
key: colorsKeys
})
// The returned array looks like this
// [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9...]
// Now you can create your BufferAttributes
const positionsAttribute = new THREE.BufferAttribute(
new Float32Array(positions),
positionsKeys.length
)
const colorsAttribute = new THREE.BufferAttribute(
new Float32Array(colors),
colorsKeys.length
)
// Create a BufferGeometry and add your attributes
const geometry = new THREE.BufferGeometry()
geometry.setAttribute("position", positionsAttribute);
geometry.setAttribute("color", colorsAttribute);
const material = new THREE.ShaderMaterial({
vertexShader: yourVertexShader,
fragmentShader: yourFragmentShader
})
// If you don't want to mess with shaders,
// you can use the PointsMaterial and
// set the vertexColors to true like so
// new THREE.PointsMaterial({vertexColors: true})
const mesh = new THREE.Points(geometry, material)
// You now have acces to your data as
// an attribute for each vertice
// in your shaders. Here we added
// positions and colors but it can be anything!
// PS: Don't forget to add your mesh to your scene. :)
If you work with React, you can use React Three Fiber. Here's how to do the same thing, but the React way!
// Let's imagine that we are inside a React component
// nested inside the React Three Fiber Canvas element.
// We create everything needed
// for our BufferAttributes
const attributes = useMemo(() => {
// Use SimpleData to manipulate your data
const simpleData = new SimpleData({
data: arrayOfObjects
})
// Let's imagine that you transformed your
// data to look like this.
// [
// {r: 0.1, g: 0.2, b: 0.3, x: 1, y: 2, z: 3},
// {r: 0.4, g: 0.5, b: 0.6, x: 4, y: 5, z: 6},
// {r: 0.7, g: 0.8, b: 0.9, x: 7, y: 8, z: 9},
// ...
// ]
const positionsKeys = ["x", "y", "z"];
const positions = simpleData.getArray({
key: positionsKeys
});
// The returned array looks this
// [1, 2, 3, 4, 5, 6, 7, 8, 9, ...]
const colorsKeys = ["r", "g", "b"]
const colors = simpleData.getArray({
key: colorsKeys
})
// The returned array looks this
// [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9...]
return {
positions: new Float32Array(positions),
positionsItemSize: positionsKeys.length,
colors: new Float32Array(colors),
colorsItemSize: colorsKeys.length,
count: simpleData.getLength()
}
}, [arrayOfObjects])
// We setup and return our points here.
return <points>
<bufferGeometry>
<bufferAttribute
attach="attributes-position"
count={attributes.count}
itemSize={attributes.positionsItemSize}
array={attributes.positions}
/>
<bufferAttribute
attach="attributes-color"
count={attributes.count}
itemSize={attributes.colorsItemSize}
array={attributes.colors}
/>
</bufferGeometry>
<shaderMaterial
vertexShader={yourVertexShader}
fragmentShader={yourFragmentShader}
/>
</points>
SimpleData class
The SimpleData class is the core of the library. Chaining methods allow you to clean, analyze, and visualize your data easily.
When you chain methods, the data is updated at each step and sent to the next one.
You also have special properties to facilitate your work. If you create a SimpleData with verbose to true (like this new SimpleDataNode({ verbose: true })
), extra information will be logged on the console at each step, like a table of your data. You can also log methods parameters with logParameters: true
.
If, for some reason, you want to chain a method but not overwrite the data, you can pass overwrite: false
to the method (like this simpleData.summarize({ overwrite: false })
). The result of the method will be logged in the console (even if verbose is set to false), but the data passed to the next chained method will not be modified.
If you are curious about how much time everything took, you can use the showDuration method (like this simpleData.showDuration()
) to log this information. After logging, this method returns the SimpleData instance, so you can chain it anywhere you want, just like the showTable method. If you want to retrieve the duration and put it inside a variable, use getDuration (like this simpleData.getDuration()
) which will return this information in milliseconds.
For a description of all methods available, check this Observable notebook or the automatically generated documentation.
SimpleDataNode class
If you use the library with NodeJS, you can import SimpleDataNode instead of SimpleData. It will give you extra methods to load local files, save files and save charts.
import { SimpleDataNode } from "simple-data-analysis";
new SimpleDataNode()
.loadDataFromLocalFile({
path: "../simple-data-analysis/data/employees.csv",
autoType: true
})
// You can load TSV and JSON files as well
.summarize({
keyValue: "Salary",
keyCategory: "Job",
summary: "mean"
})
.excludeMissingValues()
.selectKeys({ keys: ["Job", "mean"] })
.showTable()
.saveData({ path: "./employees.json" })
// You can save CSV and TSV files as well
// When saving JSON files, you can restructure
// the data as arrays by adding dataAsArrays : true
.saveChart({
path: "./chart.html",
type: "barHorizontal",
x: "mean",
y: "Job",
color: "Job",
marginLeft: 100
})
// You need to save the charts
// as HTML files.
And here's the result in VS Code!
SimpleDocument class (experimental, for NodeJS only)
While working on your analysis, it's sometimes helpful to build a document that you'll be able to share with your results.
The SimpleDocument allows you to do that. You can pass JSX expressions, React components and SVG to it, and it will render everything as an HTML file or React component.
Note that this class is still under heavy development.
import React from "react"
import {SimpleData, SimpleDocument, Table} from "simple-data-analysis"
import { Typography } from "@mui/material"
const someData = [...]
// Let's say it's some employees information again.
const simpleData = new SimpleData({data: someData})
// or SimpleDataNode
const simpleDocument = new SimpleDocument()
simpleDocument
.add(<h1>Some JSX!</h1>)
.add(<Typography>
An MUI component!
</Typography>)
.add(<Table
keys={simpleData.getKeys()}
data={simpleData.getData()}
/>)
.add(simpleData.getChart({
type: "dot",
x: "job",
y: "salary",
color: "union"
}))
.saveDocument('somePath/analysis.html')
.saveDocument('somePath/AnalysisComponent.js')
// saveDocument use ReactDOMServer.renderToString
// on everything that has been added
All functions and methods
The documentation is automatically generated with TypeDoc and available here: https://nshiab.github.io/simple-data-analysis.js/.
For a description of all methods and how to use them, you can also check this Observable notebook: https://observablehq.com/@nshiab/simple-data-analysis?collection=@nshiab/simple-data-analysis-in-javascript.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK