Homepage / Notes / Computer Science / Data Science
https://probability4datascience.com/
https://github.com/czekster/markov
http://harelba.github.io/q/ Run SQL directly on CSV or TSV files
https://github.com/BurntSushi/xsv A fast CSV command line toolkit written in Rust
vd {filename}
open file[
ascending sort]
descending sortF
to create a histogram for that column=
(equal sign) to create a new Python columnJ
move row downK
move row upH
move column leftL
move column right#
treat column as integerhttps://github.com/alexhallam/tv 📺(tv) Tidy Viewer is a cross-platform CLI csv pretty printer that uses column styling to maximize viewer enjoyment.
https://pauljuliusmartinez.github.io/ new command-line JSON viewer
https://roapi.github.io/docs/index.html ROAPI automatically spins up read-only APIs for static datasets without requiring you to write a single line of code.
https://datasette.io/ Datasette is a tool for exploring and publishing data. It helps people take data of any shape or size, analyze and explore it, and publish it as an interactive website and accompanying API. https://docs.datasette.io/en/stable/ecosystem.html https://architecturenotes.co/datasette-simon-willison/
https://facebook.github.io/prophet/ Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.
https://meltano.com/ ELT for the DataOps era Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.
airflow webserver -p {port_number}
airflow scheduler
line by line file with columns separated by comma
"columnar" file, way lighter than CSV, with data types for each column
import pandas as pd
return pd.read_csv("~/tmp/train_routes.csv")
route_id origin destination price
0 paris-marseille Paris Marseille $65
1 marseille-paris Marseille Paris $65
2 montreal-toronto Montreal Toronto $45
3 toronto-montreal Toronto Montreal $45
4 montreal-ottawa Montreal Ottawa $35
5 ottawa-montreal Ottawa Montreal $35
6 ottawa-toronto Ottawa Toronto $30
7 toronto-ottawa Toronto Ottawa $30
import pandas as pd
= pd.read_csv("~/tmp/train_routes.csv")
df "~/tmp/train_routes.parquet") df.to_parquet(
import pandas as pd
return pd.read_parquet("~/tmp/train_routes.parquet")
route_id origin destination price
0 paris-marseille Paris Marseille $65
1 marseille-paris Marseille Paris $65
2 montreal-toronto Montreal Toronto $45
3 toronto-montreal Toronto Montreal $45
4 montreal-ottawa Montreal Ottawa $35
5 ottawa-montreal Ottawa Montreal $35
6 ottawa-toronto Ottawa Toronto $30
7 toronto-ottawa Toronto Ottawa $30
f(x) = sin(x)
plot f(x)
https://github.com/red-data-tools/YouPlot
A command line tool that draw plots on the terminal.
https://github.com/microsoft/Data-Science-For-Beginners
https://harvard-iacs.github.io/2021-CS109A/ https://harvard-iacs.github.io/2021-CS109A/pages/materials.html
by Martin Kleppmann
by Roger D. Peng, and others
by Tural Sadigov, William Thistleton
2nd Edition by Ani Adhikari, John DeNero, David Wagner