- A database is a set of related information
- Single-parent hierachy: Data is structured as a tree with nodes haveing 0 or 1 parent nodes and having 0,1, or many child nodes.
- Network database system: Sets of recrods and sets of links that define relationships between those records. Navigation of the database is like navigating a multi-parent hierarchy
- Dr. E. F. Codd, IBM, 1970
- Data represented as sets of tables
- Instead of pointers to navigate between relationships, redundant data is used to link records across different tables
- Each table has infomration that unqiuely identifies a row in that table (primary key)
- Choosing fnane/lname as the primary key is defined as a natural key while using cust_id is referred to as a surrogate key. This decision is important for the database designer
- It is important that important pieces of information only resides in one place inside the database. If you needed to change the last name of a customer you would only need to change it in one location of the database
- The process of ensuring that eaech independent piece of information is in only one place is known as normalization
- Entity: Something of interest
- Column: INdividual piece of data
- Row: Set of columns to describe an entity. Record
- Table: Set of Rows
- Result Set: Nonpersistent table
- Primary Key: Unique identifiers
- Foreign Key: Used to identify a single row in a table
- Types of SQL statements
- SQL schema statements: define data structures in the database
- SQL data statements: manipulate data structures inside the database
- SQL transaction statements: being, end, and roll back transactions
- All the data elements created bia SQL schema statements is stored in a special set of tables called the data dictionary
- This data about the data(base) is called the metadata
- You can hardcode the names of the columsn in the account table or you can query the data dictionary to determine the current set of columns and dynamically generate the report every time
- A procedural language defines both the desired results and mechanisms by which those results are achieved
- Nonprocedural langauges define the desired results but not the mechanisms of achieving those results
- Your database's optimizer decides the most efficient execution path to output the results that you desire. You can influence the optimiser by providing optimiser hints.
- When constructing your query, first you determin the tables you need, then add conditionals to filter data, then decide the columns you want to add to the results set
- MySQL and ProstgresSQL are open source database servers
- This book will use mySQL 8.0 and the mysql cli to format query results
- New database technologies have emerged to meet the needs of large companies like Amazon and Google
- Hadoop, Spark, NoSQL, NewSQL
- Distrbuted, Scalable systems typically deployed on clusters of commodity servers
- Because companies like to store data in mulitple technologies, software tools like Apache Drill can bring together data from multiple places - like Oracle, Hadoop, JSON files, CSV, Unix log files.