SQL Basic
Learn how to use the basic SQL statement to query data from a database.
SELECT
Understanding the SELECT Statement
SELECT is the primary SQL statement for retrieving data from one or more database tables. Every SELECT statement requires two essential components:
- What to select (columns or expressions)
- Where to select from (table or view name)
Basic SELECT Syntax:
SELECT column_name(s)
FROM table_name;
Key Concepts:
- Keywords: Reserved SQL words like SELECT, FROM, WHERE
- Case insensitive: SELECT = select = Select
- Whitespace flexible: Can span multiple lines
- Semicolon termination: Required for multiple statements
Retrieving Individual Columns
Single Column Selection:
-- Retrieve product names only
SELECT prod_name
FROM Products;
Output:
prod_name
--------------------
Fish bean bag toy
Bird bean bag toy
Rabbit bean bag toy
8 inch teddy bear
12 inch teddy bear
18 inch teddy bear
Raggedy Ann
King doll
Queen doll
Important Notes:
- Unsorted by default: Results appear in no guaranteed order
- All rows returned: No filtering unless specified
- Raw data: No formatting applied by SQL
Best Practices for Single Columns:
-- ✅ Good: Clear, readable formatting
SELECT prod_name
FROM Products;
-- ✅ Also acceptable: Single line for simple queries
SELECT prod_name FROM Products;
-- ✅ Multi-line for complex queries
SELECT
prod_name
FROM
Products;
Retrieving Multiple Columns
Multiple Column Selection:
-- Retrieve product ID, name, and price
SELECT prod_id, prod_name, prod_price
FROM Products;
Output:
prod_id prod_name prod_price
--------- -------------------- ----------
BNBG01 Fish bean bag toy 3.49
BNBG02 Bird bean bag toy 3.49
BNBG03 Rabbit bean bag toy 3.49
BR01 8 inch teddy bear 5.99
BR02 12 inch teddy bear 8.99
BR03 18 inch teddy bear 11.99
RGAN01 Raggedy Ann 4.99
RYL01 King doll 9.49
RYL02 Queen doll 9.49
Column Selection Rules:
- Comma separation: Each column separated by comma
- No trailing comma: Don't add comma after last column
- Order matters: Columns appear in specified order
- Column names: Must exist in the table
Advanced Multiple Column Examples:
-- Customer information
SELECT cust_id, cust_name, cust_email, cust_phone
FROM Customers;
-- Order details
SELECT order_num, order_date, cust_id, order_total
FROM Orders;
-- Product catalog
SELECT prod_id, prod_name, prod_price, vend_id, prod_desc
FROM Products;
Common Mistakes to Avoid:
-- ❌ Missing comma between columns
SELECT prod_id prod_name, prod_price
FROM Products;
-- ❌ Trailing comma after last column
SELECT prod_id, prod_name, prod_price,
FROM Products;
-- ❌ Non-existent column name
SELECT prod_id, product_name, prod_price
FROM Products;
Retrieving All Columns
Using the Asterisk (*) Wildcard:
-- Retrieve all columns from Products table
SELECT *
FROM Products;
*When to Use SELECT :
✅ Good Use Cases | ❌ Avoid When |
---|---|
Exploratory queries | Production applications |
Ad-hoc analysis | Performance-critical queries |
Unknown table structure | Specific column needs |
Quick data inspection | Network bandwidth concerns |
Practical Examples:
-- ✅ Good: Exploring new table structure
SELECT * FROM NewCustomerTable;
-- ✅ Good: Quick data verification
SELECT * FROM Orders WHERE order_num = 20005;
-- ❌ Avoid: Production application
-- This retrieves unnecessary data and hurts performance
SELECT * FROM Products WHERE prod_price > 10;
-- ✅ Better: Specify needed columns
SELECT prod_id, prod_name, prod_price
FROM Products
WHERE prod_price > 10;
Advantages and Disadvantages:
Advantages:
- Quick exploration: See all available data
- Unknown columns: Retrieve columns you don't know exist
- Simple syntax: Easy to write and remember
Disadvantages:
- Performance impact: Retrieves unnecessary data
- Network overhead: More data transferred
- Application brittleness: Breaks if table structure changes
- Security concerns: May expose sensitive columns
Retrieving Distinct Rows
The Problem: Duplicate Values
-- This returns duplicate vendor IDs
SELECT vend_id
FROM Products;
Output:
vend_id
----------
BRS01
BRS01
BRS01
DLL01
DLL01
DLL01
DLL01
FNG01
FNG01
The Solution: DISTINCT Keyword
-- This returns only unique vendor IDs
SELECT DISTINCT vend_id
FROM Products;
Output:
vend_id
----------
BRS01
DLL01
FNG01
DISTINCT with Multiple Columns:
-- DISTINCT applies to the combination of ALL columns
SELECT DISTINCT vend_id, prod_price
FROM Products;
Important DISTINCT Rules:
- Applies to all columns: Can't be partially applied
- Combination uniqueness: Looks at entire row combination
- Performance impact: Requires sorting/grouping internally
- Position matters: Must be first keyword after SELECT
Practical DISTINCT Examples:
-- Find all states where customers are located
SELECT DISTINCT cust_state
FROM Customers;
-- Find all unique product categories
SELECT DISTINCT prod_category
FROM Products;
-- Find unique combinations of state and city
SELECT DISTINCT cust_state, cust_city
FROM Customers
ORDER BY cust_state, cust_city;
-- Count unique customers who placed orders
SELECT COUNT(DISTINCT cust_id) AS unique_customers
FROM Orders;
DISTINCT vs GROUP BY:
-- These queries produce similar results
SELECT DISTINCT cust_state FROM Customers;
SELECT cust_state FROM Customers GROUP BY cust_state;
-- But GROUP BY allows aggregation
SELECT cust_state, COUNT(*) AS customer_count
FROM Customers
GROUP BY cust_state;
Limiting Results
Database-Specific Syntax Variations:
SQL Server - TOP:
-- Get first 5 products
SELECT TOP 5 prod_name
FROM Products;
-- Get top 10 percent
SELECT TOP 10 PERCENT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC;
-- With ties (include rows with same values)
SELECT TOP 5 WITH TIES prod_name, prod_price
FROM Products
ORDER BY prod_price DESC;
MySQL/PostgreSQL/SQLite - LIMIT:
-- Get first 5 products
SELECT prod_name
FROM Products
LIMIT 5;
-- Get 5 products starting from row 6 (pagination)
SELECT prod_name
FROM Products
LIMIT 5 OFFSET 5;
-- MySQL shorthand: LIMIT offset, count
SELECT prod_name
FROM Products
LIMIT 5, 5; -- Skip 5, take 5
Oracle - ROWNUM and FETCH:
-- Oracle 11g and earlier
SELECT prod_name
FROM Products
WHERE ROWNUM <= 5;
-- Oracle 12c and later (preferred)
SELECT prod_name
FROM Products
FETCH FIRST 5 ROWS ONLY;
-- With offset (Oracle 12c+)
SELECT prod_name
FROM Products
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY;
DB2 - FETCH FIRST:
-- Get first 5 rows
SELECT prod_name
FROM Products
FETCH FIRST 5 ROWS ONLY;
Pagination Examples:
-- Page 1: First 10 products
SELECT prod_id, prod_name, prod_price
FROM Products
ORDER BY prod_name
LIMIT 10 OFFSET 0;
-- Page 2: Next 10 products
SELECT prod_id, prod_name, prod_price
FROM Products
ORDER BY prod_name
LIMIT 10 OFFSET 10;
-- Page 3: Next 10 products
SELECT prod_id, prod_name, prod_price
FROM Products
ORDER BY prod_name
LIMIT 10 OFFSET 20;
Performance Considerations:
-- ✅ Good: Use with ORDER BY for consistent results
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC
LIMIT 10;
-- ❌ Problematic: Without ORDER BY, results are unpredictable
SELECT prod_name, prod_price
FROM Products
LIMIT 10;
-- ✅ Good: Efficient pagination with indexed columns
SELECT cust_id, cust_name
FROM Customers
ORDER BY cust_id -- Assuming cust_id is indexed
LIMIT 20 OFFSET 100;
Using Comments
Inline Comments with --:
SELECT prod_name, -- Product name for display
prod_price, -- Current selling price
vend_id -- Vendor identifier
FROM Products; -- Main product catalog
Line Comments with #:
# This query retrieves all product information
# for products priced above $10
SELECT prod_name, prod_price
FROM Products
WHERE prod_price > 10;
Block Comments with / /:
/*
* Product Analysis Query
* Author: Database Team
* Date: 2024-01-15
* Purpose: Retrieve high-value products for marketing analysis
*/
SELECT prod_id, prod_name, prod_price
FROM Products
WHERE prod_price > 15;
/* Temporarily disabled - alternative query
SELECT prod_id, prod_name, prod_price, vend_id
FROM Products
WHERE prod_price BETWEEN 10 AND 20;
*/
Comment Best Practices:
-- ✅ Good: Explain complex logic
SELECT
prod_name,
prod_price,
prod_price * 0.9 AS discounted_price -- 10% discount for promotion
FROM Products
WHERE prod_price > 10; -- Only products above minimum threshold
-- ✅ Good: Document business rules
/*
* Customer Status Classification:
* - Premium: Orders > $1000 total
* - Standard: Orders $100-$1000
* - Basic: Orders < $100
*/
SELECT
cust_id,
cust_name,
CASE
WHEN total_orders > 1000 THEN 'Premium'
WHEN total_orders >= 100 THEN 'Standard'
ELSE 'Basic'
END AS customer_status
FROM CustomerSummary;
-- ✅ Good: Temporary debugging
SELECT prod_id, prod_name, prod_price
FROM Products
-- WHERE prod_price > 10 -- Temporarily disabled for testing
ORDER BY prod_name;
Advanced SELECT Techniques
Column Aliases:
-- Make output more readable
SELECT
prod_id AS "Product ID",
prod_name AS "Product Name",
prod_price AS "Price ($)"
FROM Products;
-- Calculate derived values
SELECT
prod_name,
prod_price,
prod_price * 1.1 AS price_with_tax,
prod_price * 0.9 AS sale_price
FROM Products;
Expressions and Calculations:
-- Mathematical operations
SELECT
order_num,
quantity,
item_price,
quantity * item_price AS line_total
FROM OrderItems;
-- String operations
SELECT
cust_name,
cust_city + ', ' + cust_state AS location -- SQL Server
-- CONCAT(cust_city, ', ', cust_state) AS location -- MySQL/PostgreSQL
FROM Customers;
-- Date operations
SELECT
order_num,
order_date,
DATEDIFF(day, order_date, GETDATE()) AS days_since_order -- SQL Server
FROM Orders;
Conditional Logic:
-- CASE statements for conditional output
SELECT
prod_name,
prod_price,
CASE
WHEN prod_price < 5 THEN 'Budget'
WHEN prod_price < 15 THEN 'Standard'
ELSE 'Premium'
END AS price_category
FROM Products;
-- NULL handling
SELECT
cust_name,
ISNULL(cust_email, 'No email provided') AS email_status -- SQL Server
-- COALESCE(cust_email, 'No email provided') AS email_status -- Standard SQL
FROM Customers;
Performance Optimization Tips
Efficient Column Selection:
-- ✅ Good: Select only needed columns
SELECT cust_id, cust_name
FROM Customers
WHERE cust_state = 'CA';
-- ❌ Inefficient: Selecting unnecessary data
SELECT *
FROM Customers
WHERE cust_state = 'CA';
Index-Friendly Queries:
-- ✅ Good: Use indexed columns in WHERE clauses
SELECT prod_name, prod_price
FROM Products
WHERE prod_id = 'BR01'; -- Assuming prod_id is indexed
-- ✅ Good: Limit results for large tables
SELECT TOP 100 cust_name, cust_email
FROM Customers
ORDER BY cust_name;
Challenge Solutions
Challenge 1: Retrieve all customer IDs
-- Solution: Basic SELECT for single column
SELECT cust_id
FROM Customers;
Challenge 2: Unique products ordered
-- Solution: Use DISTINCT to eliminate duplicates
SELECT DISTINCT prod_id
FROM OrderItems;
-- This should return 7 unique product IDs
Challenge 3: All columns vs customer ID with comments
-- Solution: Use comments to switch between queries
/* All columns version */
SELECT *
FROM Customers;
-- Single column version
-- SELECT cust_id FROM Customers;
/* Alternative approach - comment out the other query
SELECT cust_id
FROM Customers;
*/
-- All columns version (commented out)
-- SELECT * FROM Customers;
Common SELECT Patterns
Data Exploration:
-- Quick table overview
SELECT * FROM Products LIMIT 5;
-- Check data types and sample values
SELECT
prod_id, -- Character data
prod_name, -- Variable text
prod_price, -- Numeric data
vend_id -- Foreign key reference
FROM Products
LIMIT 3;
Data Quality Checks:
-- Find missing data
SELECT COUNT(*) AS total_rows,
COUNT(cust_email) AS rows_with_email,
COUNT(*) - COUNT(cust_email) AS missing_emails
FROM Customers;
-- Check for duplicates
SELECT cust_email, COUNT(*) AS duplicate_count
FROM Customers
WHERE cust_email IS NOT NULL
GROUP BY cust_email
HAVING COUNT(*) > 1;
Reporting Queries:
-- Summary information
SELECT
COUNT(*) AS total_products,
MIN(prod_price) AS lowest_price,
MAX(prod_price) AS highest_price,
AVG(prod_price) AS average_price
FROM Products;
-- Categorized data
SELECT
vend_id,
COUNT(*) AS product_count,
AVG(prod_price) AS avg_price
FROM Products
GROUP BY vend_id
ORDER BY product_count DESC;
Data Sorting
Understanding Data Sorting
Without explicit sorting, data appears in unpredictable order. The sequence depends on how the database stores and manages data internally, which can change due to updates, deletions, and storage optimization.
Why Sorting Matters:
-- Without ORDER BY: Unpredictable order
SELECT prod_name
FROM Products;
Output (unpredictable):
prod_name
--------------------
Fish bean bag toy
Bird bean bag toy
Rabbit bean bag toy
8 inch teddy bear
12 inch teddy bear
18 inch teddy bear
Raggedy Ann
King doll
Queen doll
Key Principles:
- No guaranteed order: Never assume data will appear in any particular sequence
- Storage dependent: Order may reflect insertion sequence, but this can change
- Explicit control required: Use ORDER BY to ensure predictable results
- Relational theory: Unordered data is the default state in relational databases
Basic ORDER BY Syntax
Single Column Sorting:
-- Sort products alphabetically by name
SELECT prod_name
FROM Products
ORDER BY prod_name;
Output (predictable alphabetical order):
prod_name
--------------------
12 inch teddy bear
18 inch teddy bear
8 inch teddy bear
Bird bean bag toy
Fish bean bag toy
King doll
Queen doll
Rabbit bean bag toy
Raggedy Ann
Essential ORDER BY Rules:
- Last clause: ORDER BY must be the final clause in SELECT statements
- Column specification: Can sort by selected or non-selected columns
- Case sensitivity: Depends on database configuration
- Required for predictability: Only way to guarantee result order
ORDER BY Clause Position:
-- ✅ Correct: ORDER BY at the end
SELECT prod_name, prod_price
FROM Products
WHERE prod_price > 5
ORDER BY prod_name;
-- ❌ Error: ORDER BY not at the end
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_name
WHERE prod_price > 5; -- This will cause an error
Sorting by Non-Selected Columns:
-- Sort by a column not in the SELECT list
SELECT prod_name, prod_price
FROM Products
ORDER BY vend_id; -- vend_id not selected but used for sorting
-- Practical example: Display customer names sorted by registration date
SELECT cust_name, cust_email
FROM Customers
ORDER BY cust_registration_date;
Sorting by Multiple Columns
Multi-Column Sort Logic:
-- Sort by price first, then by name within each price group
SELECT prod_id, prod_price, prod_name
FROM Products
ORDER BY prod_price, prod_name;
Output:
prod_id prod_price prod_name
------- ---------- --------------------
BNBG02 3.4900 Bird bean bag toy
BNBG01 3.4900 Fish bean bag toy
BNBG03 3.4900 Rabbit bean bag toy
RGAN01 4.9900 Raggedy Ann
BR01 5.9900 8 inch teddy bear
BR02 8.9900 12 inch teddy bear
RYL01 9.4900 King doll
RYL02 9.4900 Queen doll
BR03 11.9900 18 inch teddy bear
Sort Priority Understanding:
- Primary sort: First column (prod_price) determines main order
- Secondary sort: Second column (prod_name) sorts within groups of equal primary values
- Tertiary sort: Additional columns continue the hierarchy
Complex Multi-Column Examples:
-- Customer report: Sort by state, then city, then name
SELECT cust_name, cust_city, cust_state
FROM Customers
ORDER BY cust_state, cust_city, cust_name;
-- Sales analysis: Sort by date, then customer, then amount
SELECT order_date, cust_id, order_total
FROM Orders
ORDER BY order_date, cust_id, order_total DESC;
-- Product catalog: Sort by category, then price (high to low), then name
SELECT prod_category, prod_price, prod_name
FROM Products
ORDER BY prod_category, prod_price DESC, prod_name;
When Secondary Sorting Matters:
-- Example showing why multiple columns matter
SELECT cust_state, cust_name
FROM Customers
ORDER BY cust_state, cust_name;
-- Without secondary sort, customers in same state appear randomly
-- With secondary sort, customers in same state are alphabetical
Sorting by Column Position
Using Numeric Positions:
-- Sort by column positions instead of names
SELECT prod_id, prod_price, prod_name
FROM Products
ORDER BY 2, 3; -- Sort by 2nd column (prod_price), then 3rd (prod_name)
Position Reference:
- 1: First column in SELECT list (prod_id)
- 2: Second column in SELECT list (prod_price)
- 3: Third column in SELECT list (prod_name)
Advantages and Disadvantages:
✅ Advantages | ❌ Disadvantages |
---|---|
Shorter syntax | Error prone |
Less typing | Hard to maintain |
Quick for simple queries | Unclear intent |
Best Practices for Column Positions:
-- ✅ Good: Use for simple, stable queries
SELECT prod_name, prod_price
FROM Products
ORDER BY 2 DESC, 1; -- Price descending, name ascending
-- ❌ Avoid: Complex queries with many columns
SELECT prod_id, prod_name, prod_price, vend_id, prod_desc, prod_category
FROM Products
ORDER BY 3, 6, 2; -- Hard to understand what's being sorted
-- ✅ Better: Use explicit column names for clarity
SELECT prod_id, prod_name, prod_price, vend_id, prod_desc, prod_category
FROM Products
ORDER BY prod_price, prod_category, prod_name;
Mixed Approach:
-- Combine column names and positions
SELECT prod_name, prod_price, vend_id
FROM Products
ORDER BY 2 DESC, vend_id, prod_name; -- Mix positions and names
Specifying Sort Direction
ASC vs DESC Keywords:
-- Ascending order (default)
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price ASC; -- ASC is optional (default)
-- Descending order
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC;
Descending Sort Example:
-- Most expensive products first
SELECT prod_id, prod_price, prod_name
FROM Products
ORDER BY prod_price DESC;
Output:
prod_id prod_price prod_name
------- ---------- --------------------
BR03 11.9900 18 inch teddy bear
RYL01 9.4900 King doll
RYL02 9.4900 Queen doll
BR02 8.9900 12 inch teddy bear
BR01 5.9900 8 inch teddy bear
RGAN01 4.9900 Raggedy Ann
BNBG01 3.4900 Fish bean bag toy
BNBG02 3.4900 Bird bean bag toy
BNBG03 3.4900 Rabbit bean bag toy
Multiple Columns with Different Directions:
-- Price descending, name ascending within each price
SELECT prod_id, prod_price, prod_name
FROM Products
ORDER BY prod_price DESC, prod_name ASC;
Output:
prod_id prod_price prod_name
------- ---------- --------------------
BR03 11.9900 18 inch teddy bear
RYL01 9.4900 King doll
RYL02 9.4900 Queen doll
BR02 8.9900 12 inch teddy bear
BR01 5.9900 8 inch teddy bear
RGAN01 4.9900 Raggedy Ann
BNBG02 3.4900 Bird bean bag toy
BNBG01 3.4900 Fish bean bag toy
BNBG03 3.4900 Rabbit bean bag toy
Important Direction Rules:
-- ❌ Wrong: DESC applies only to preceding column
SELECT prod_name, prod_price, vend_id
FROM Products
ORDER BY prod_price, prod_name DESC; -- Only prod_name is descending
-- ✅ Correct: Specify DESC for each column needing it
SELECT prod_name, prod_price, vend_id
FROM Products
ORDER BY prod_price DESC, prod_name DESC; -- Both columns descending
Practical Sorting Examples
Business Reporting:
-- Sales report: Top customers by total orders
SELECT cust_name, COUNT(*) AS order_count, SUM(order_total) AS total_spent
FROM Customers c
JOIN Orders o ON c.cust_id = o.cust_id
GROUP BY cust_name
ORDER BY total_spent DESC, order_count DESC;
-- Product performance: Best sellers first
SELECT p.prod_name, SUM(oi.quantity) AS total_sold, AVG(oi.item_price) AS avg_price
FROM Products p
JOIN OrderItems oi ON p.prod_id = oi.prod_id
GROUP BY p.prod_name
ORDER BY total_sold DESC, avg_price DESC;
Data Analysis:
-- Customer segmentation: Sort by state, then by order value
SELECT cust_state, cust_name,
COALESCE(SUM(order_total), 0) AS lifetime_value
FROM Customers c
LEFT JOIN Orders o ON c.cust_id = o.cust_id
GROUP BY cust_state, cust_name
ORDER BY cust_state, lifetime_value DESC;
-- Inventory management: Low stock items first
SELECT prod_name, quantity_in_stock, reorder_level
FROM Products
ORDER BY quantity_in_stock ASC, reorder_level DESC;
Date and Time Sorting:
-- Recent orders first
SELECT order_num, order_date, cust_id, order_total
FROM Orders
ORDER BY order_date DESC, order_total DESC;
-- Chronological order with time
SELECT order_num, order_datetime, cust_id
FROM Orders
ORDER BY order_datetime ASC;
-- Monthly sales summary
SELECT YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
SUM(order_total) AS monthly_total
FROM Orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY order_year DESC, order_month DESC;
Advanced Sorting Techniques
Conditional Sorting:
-- Sort by custom business logic
SELECT prod_name, prod_price, prod_category
FROM Products
ORDER BY
CASE prod_category
WHEN 'Electronics' THEN 1
WHEN 'Clothing' THEN 2
WHEN 'Books' THEN 3
ELSE 4
END,
prod_price DESC;
NULL Value Handling:
-- Control NULL sorting behavior
SELECT cust_name, cust_email
FROM Customers
ORDER BY
CASE WHEN cust_email IS NULL THEN 1 ELSE 0 END, -- NULLs last
cust_email,
cust_name;
-- Database-specific NULL handling
-- PostgreSQL: NULLS FIRST or NULLS LAST
SELECT cust_name, cust_email
FROM Customers
ORDER BY cust_email NULLS LAST, cust_name;
String Sorting Considerations:
-- Case-insensitive sorting (database dependent)
SELECT cust_name
FROM Customers
ORDER BY UPPER(cust_name); -- Convert to uppercase for sorting
-- Numeric sorting of string data
SELECT prod_code
FROM Products
ORDER BY CAST(SUBSTRING(prod_code, 5, 10) AS INTEGER); -- Sort by numeric part
Performance Considerations
Index-Friendly Sorting:
-- ✅ Good: Sort by indexed columns
SELECT cust_name, cust_email
FROM Customers
ORDER BY cust_id; -- Assuming cust_id is indexed (primary key)
-- ✅ Good: Composite index usage
CREATE INDEX idx_orders_date_customer ON Orders(order_date, cust_id);
SELECT order_num, order_date, cust_id
FROM Orders
ORDER BY order_date, cust_id; -- Uses the composite index
Avoiding Performance Issues:
-- ❌ Slow: Sorting by calculated columns
SELECT prod_name, prod_price * 1.1 AS price_with_tax
FROM Products
ORDER BY prod_price * 1.1; -- Calculation for every row
-- ✅ Better: Use column alias
SELECT prod_name, prod_price * 1.1 AS price_with_tax
FROM Products
ORDER BY price_with_tax; -- Some databases optimize this
-- ❌ Slow: Sorting large result sets without LIMIT
SELECT *
FROM LargeTable
ORDER BY some_column; -- Sorts millions of rows
-- ✅ Better: Limit results when possible
SELECT *
FROM LargeTable
ORDER BY some_column
LIMIT 100; -- Only sort what you need
Challenge Solutions
Challenge 1: Customer names Z to A
-- Solution: Use DESC for descending alphabetical order
SELECT cust_name
FROM Customers
ORDER BY cust_name DESC;
Challenge 2: Customer ID and order number sorted
-- Solution: Sort by customer ID first, then order date descending
SELECT cust_id, order_num
FROM Orders
ORDER BY cust_id, order_date DESC;
Challenge 3: Quantity and price, highest first
-- Solution: Both columns need DESC for highest first
SELECT quantity, item_price
FROM OrderItems
ORDER BY quantity DESC, item_price DESC;
Challenge 4: What's wrong with this SQL?
-- ❌ Problem: Missing BY keyword and trailing comma
SELECT vend_name,
FROM Vendors
ORDER vend_name DESC;
-- ✅ Corrected version:
SELECT vend_name
FROM Vendors
ORDER BY vend_name DESC;
Errors in the original:
- Trailing comma after vend_name in SELECT
- Missing BY keyword in ORDER clause
Common ORDER BY Mistakes
Syntax Errors:
-- ❌ Wrong: ORDER BY not at end
SELECT prod_name
FROM Products
ORDER BY prod_name
WHERE prod_price > 10;
-- ❌ Wrong: Missing BY keyword
SELECT prod_name
FROM Products
ORDER prod_name;
-- ❌ Wrong: Invalid column reference
SELECT prod_name
FROM Products
ORDER BY prod_description; -- Column not in table
Logic Errors:
-- ❌ Confusing: Inconsistent direction expectations
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC, prod_name DESC; -- Both descending might not be intended
-- ❌ Performance issue: Sorting without purpose
SELECT *
FROM LargeTable
ORDER BY random_column; -- Unnecessary sorting of large dataset
Best Practices Summary
1. Always Use ORDER BY for Predictable Results:
-- ✅ Good: Explicit ordering
SELECT cust_name, cust_email
FROM Customers
ORDER BY cust_name;
-- ❌ Risky: Relying on default order
SELECT cust_name, cust_email
FROM Customers; -- Order unpredictable
2. Choose Meaningful Sort Orders:
-- ✅ Good: Business-relevant sorting
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC, prod_name; -- Expensive first, then alphabetical
-- ✅ Good: User-friendly sorting
SELECT cust_name, cust_city, cust_state
FROM Customers
ORDER BY cust_state, cust_city, cust_name; -- Geographic then alphabetical
3. Consider Performance:
-- ✅ Good: Use indexed columns when possible
SELECT order_num, order_date
FROM Orders
ORDER BY order_date DESC; -- If order_date is indexed
-- ✅ Good: Limit large result sets
SELECT TOP 50 prod_name, prod_price
FROM Products
ORDER BY prod_price DESC;
4. Be Explicit About Direction:
-- ✅ Good: Clear intent
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC, prod_name ASC;
-- ✅ Acceptable: Default ASC is clear
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price DESC, prod_name;
Data Filtering
Understanding Data Filtering
The WHERE clause specifies search criteria to filter data and retrieve only the rows that match your conditions. Without filtering, you retrieve all rows from a table, which is rarely what you need.
Why Filter Data:
Benefit | Description | Impact |
---|---|---|
Performance | Retrieve only needed data | Faster queries, less network traffic |
Relevance | Get specific information | More useful results |
Efficiency | Reduce processing overhead | Better resource utilization |
Accuracy | Focus on relevant records | More precise analysis |
Database vs Application Filtering:
-- ✅ Good: Database-level filtering (efficient)
SELECT prod_name, prod_price
FROM Products
WHERE prod_price > 10;
-- ❌ Bad: Application-level filtering (inefficient)
-- Retrieve all data, then filter in application code
SELECT prod_name, prod_price
FROM Products;
-- Then filter in application: if (prod_price > 10) { ... }
Basic WHERE Clause Syntax
Standard WHERE Structure:
SELECT column_list
FROM table_name
WHERE condition;
Simple Equality Example:
-- Find products with specific price
SELECT prod_name, prod_price
FROM Products
WHERE prod_price = 3.49;
Output:
prod_name prod_price
------------------- ----------
Fish bean bag toy 3.49
Bird bean bag toy 3.49
Rabbit bean bag toy 3.49
Important WHERE Rules:
- Position: WHERE comes after FROM, before ORDER BY
- Required for filtering: Only way to limit rows returned
- Case sensitivity: Depends on database configuration
- Data type matching: Use appropriate comparison methods
Clause Order in SELECT Statements:
-- ✅ Correct order
SELECT prod_name, prod_price
FROM Products
WHERE prod_price > 5
ORDER BY prod_price;
-- ❌ Wrong order - will cause error
SELECT prod_name, prod_price
FROM Products
ORDER BY prod_price
WHERE prod_price > 5;
WHERE Clause Operators
Comparison Operators:
Operator | Description | Example |
---|---|---|
= | Equality | WHERE price = 10 |
<> | Not equal | WHERE price {'<>'} 10 |
!= | Not equal (alternative) | WHERE price != 10 |
< | Less than | WHERE price {'<'} 10 |
<= | Less than or equal | WHERE price {'<='} 10 |
> | Greater than | WHERE price {'>'} 10 |
>= | Greater than or equal | WHERE price {'>='} 10 |
!< | Not less than | WHERE price !{'<'} 10 |
!> | Not greater than | WHERE price !{'>'} 10 |
Special Operators:
Operator | Description | Example |
---|---|---|
BETWEEN | Range check | WHERE price BETWEEN 5 AND 10 |
IS NULL | NULL value check | WHERE email IS NULL |
IS NOT NULL | Not NULL check | WHERE email IS NOT NULL |
Checking Against Single Values
Equality Testing:
-- Find specific product
SELECT prod_id, prod_name, prod_price
FROM Products
WHERE prod_id = 'BR01';
-- Find customers in specific state
SELECT cust_name, cust_city
FROM Customers
WHERE cust_state = 'CA';
-- Find orders on specific date
SELECT order_num, cust_id, order_total
FROM Orders
WHERE order_date = '2020-01-01';
Less Than Comparisons:
-- Products under $10
SELECT prod_name, prod_price
FROM Products
WHERE prod_price < 10;
Output:
prod_name prod_price
------------------- ----------
Fish bean bag toy 3.49
Bird bean bag toy 3.49
Rabbit bean bag toy 3.49
8 inch teddy bear 5.99
12 inch teddy bear 8.99
Raggedy Ann 4.99
King doll 9.49
Queen doll 9.49
Less Than or Equal:
-- Products $10 or less
SELECT prod_name, prod_price
FROM Products
WHERE prod_price <= 10;
-- Orders from last 30 days
SELECT order_num, order_date, cust_id
FROM Orders
WHERE order_date >= DATEADD(day, -30, GETDATE()); -- SQL Server
-- WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY); -- MySQL
Greater Than Comparisons:
-- Expensive products
SELECT prod_name, prod_price
FROM Products
WHERE prod_price > 15;
-- Large orders
SELECT order_num, order_total, cust_id
FROM Orders
WHERE order_total > 1000;
-- Recent customers
SELECT cust_name, cust_registration_date
FROM Customers
WHERE cust_registration_date > '2023-01-01';
Checking for Nonmatches
Not Equal Operators:
-- Products not from specific vendor
SELECT vend_id, prod_name
FROM Products
WHERE vend_id <> 'DLL01';
Output:
vend_id prod_name
---------- ------------------
BRS01 8 inch teddy bear
BRS01 12 inch teddy bear
BRS01 18 inch teddy bear
FNG01 King doll
FNG01 Queen doll
Alternative Not Equal Syntax:
-- Same result using != operator
SELECT vend_id, prod_name
FROM Products
WHERE vend_id != 'DLL01';
-- Customers not in specific state
SELECT cust_name, cust_state
FROM Customers
WHERE cust_state != 'CA';
-- Orders not from specific customer
SELECT order_num, order_date, cust_id
FROM Orders
WHERE cust_id <> 'CUST001';
String vs Numeric Comparisons:
-- ✅ Correct: String values in quotes
SELECT cust_name
FROM Customers
WHERE cust_state = 'NY';
-- ✅ Correct: Numeric values without quotes
SELECT prod_name, prod_price
FROM Products
WHERE prod_price = 9.49;
-- ❌ Wrong: Numeric value in quotes (may work but inefficient)
SELECT prod_name, prod_price
FROM Products
WHERE prod_price = '9.49';
-- ❌ Wrong: String value without quotes (will cause error)
SELECT cust_name
FROM Customers
WHERE cust_state = NY;
Checking for Range of Values
BETWEEN Operator:
-- Products in price range
SELECT prod_name, prod_price
FROM Products
WHERE prod_price BETWEEN 5 AND 10;
Output:
prod_name prod_price
------------------- ----------
8 inch teddy bear 5.99
12 inch teddy bear 8.99
King doll 9.49
Queen doll 9.49
BETWEEN Characteristics:
- Inclusive: Includes both boundary values
- Requires AND: Must use AND between values
- Order matters: Lower value first, then higher value
Date Range Examples:
-- Orders in date range
SELECT order_num, order_date, order_total
FROM Orders
WHERE order_date BETWEEN '2020-01-01' AND '2020-12-31';
-- Recent orders (last 7 days)
SELECT order_num, order_date, cust_id
FROM Orders
WHERE order_date BETWEEN DATEADD(day, -7, GETDATE()) AND GETDATE(); -- SQL Server
-- Quarterly sales
SELECT order_num, order_date, order_total
FROM Orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
Numeric Range Examples:
-- Medium-priced products
SELECT prod_name, prod_price, prod_category
FROM Products
WHERE prod_price BETWEEN 10 AND 50;
-- Order quantities in range
SELECT order_num, prod_id, quantity
FROM OrderItems
WHERE quantity BETWEEN 5 AND 20;
-- Customer age range (if age column exists)
SELECT cust_name, cust_age
FROM Customers
WHERE cust_age BETWEEN 25 AND 65;
Alternative to BETWEEN:
-- BETWEEN equivalent using AND
SELECT prod_name, prod_price
FROM Products
WHERE prod_price >= 5 AND prod_price <= 10;
-- Sometimes more readable for complex conditions
SELECT order_num, order_total
FROM Orders
WHERE order_total >= 100 AND order_total < 1000;
Checking for NULL Values
Understanding NULL:
- NULL means "no value" - different from 0, empty string, or spaces
- Cannot use = or != with NULL - special operators required
- NULL comparisons always return unknown - not true or false
IS NULL Operator:
-- Find products with no price
SELECT prod_name
FROM Products
WHERE prod_price IS NULL;
-- Find customers with no email
SELECT cust_name
FROM Customers
WHERE cust_email IS NULL;
Output:
cust_name
----------
Kids Place
The Toy Store
IS NOT NULL Operator:
-- Find customers with email addresses
SELECT cust_name, cust_email
FROM Customers
WHERE cust_email IS NOT NULL;
-- Find products with descriptions
SELECT prod_name, prod_desc
FROM Products
WHERE prod_desc IS NOT NULL;
-- Complete customer records only
SELECT cust_name, cust_phone, cust_email
FROM Customers
WHERE cust_phone IS NOT NULL
AND cust_email IS NOT NULL;
NULL Behavior in Filtering:
-- ❌ Wrong: This won't find NULL values
SELECT cust_name
FROM Customers
WHERE cust_email = NULL; -- Returns no results
-- ❌ Wrong: This also won't find NULL values
SELECT cust_name
FROM Customers
WHERE cust_email != 'something'; -- NULLs not included
-- ✅ Correct: Explicitly check for NULL
SELECT cust_name
FROM Customers
WHERE cust_email IS NULL;
-- ✅ Correct: Include NULLs in non-match queries
SELECT cust_name
FROM Customers
WHERE cust_email != 'gmail.com' OR cust_email IS NULL;
Advanced Filtering Examples
Business Logic Filtering:
-- Active customers with recent orders
SELECT c.cust_name, c.cust_email
FROM Customers c
WHERE c.cust_status = 'ACTIVE'
AND c.last_order_date >= DATEADD(month, -6, GETDATE());
-- High-value products in stock
SELECT prod_name, prod_price, quantity_in_stock
FROM Products
WHERE prod_price > 50
AND quantity_in_stock > 0;
-- Overdue orders
SELECT order_num, order_date, ship_date
FROM Orders
WHERE ship_date IS NULL
AND order_date < DATEADD(day, -7, GETDATE());
Data Quality Checks:
-- Find incomplete customer records
SELECT cust_id, cust_name
FROM Customers
WHERE cust_email IS NULL
OR cust_phone IS NULL
OR cust_address IS NULL;
-- Find products with pricing issues
SELECT prod_id, prod_name, prod_price
FROM Products
WHERE prod_price IS NULL
OR prod_price <= 0;
-- Find orders without items
SELECT o.order_num, o.order_date
FROM Orders o
WHERE NOT EXISTS (
SELECT 1 FROM OrderItems oi
WHERE oi.order_num = o.order_num
);
Performance Optimization
Index-Friendly Filtering:
-- ✅ Good: Filter on indexed columns
SELECT cust_name, cust_email
FROM Customers
WHERE cust_id = 'CUST001'; -- Primary key lookup
-- ✅ Good: Range queries on indexed dates
SELECT order_num, order_total
FROM Orders
WHERE order_date >= '2024-01-01'; -- If order_date is indexed
-- ❌ Slow: Functions in WHERE clause
SELECT cust_name
FROM Customers
WHERE UPPER(cust_name) = 'JOHN DOE'; -- Can't use index
-- ✅ Better: Store data in searchable format
SELECT cust_name
FROM Customers
WHERE cust_name = 'John Doe'; -- Can use index if data is consistent
Efficient Filtering Patterns:
-- ✅ Good: Selective filters first
SELECT prod_name, prod_price
FROM Products
WHERE prod_category = 'Electronics' -- Selective filter
AND prod_price > 100; -- Further refinement
-- ✅ Good: Use LIMIT with filtering
SELECT TOP 10 cust_name, order_total
FROM Customers c
JOIN Orders o ON c.cust_id = o.cust_id
WHERE order_total > 1000
ORDER BY order_total DESC;
Challenge Solutions
Challenge 1: Products with price 9.49
-- Solution: Simple equality filter
SELECT prod_id, prod_name
FROM Products
WHERE prod_price = 9.49;
Challenge 2: Products with price 9 or more
-- Solution: Greater than or equal filter
SELECT prod_id, prod_name
FROM Products
WHERE prod_price >= 9;
Challenge 3: Order numbers with 100+ items
-- Solution: Combine filtering with DISTINCT
SELECT DISTINCT order_num
FROM OrderItems
WHERE quantity >= 100;
Challenge 4: Products priced between 3 and 6, sorted
-- Solution: BETWEEN with ORDER BY
SELECT prod_name, prod_price
FROM Products
WHERE prod_price BETWEEN 3 AND 6
ORDER BY prod_price;
-- Alternative solution using comparison operators
SELECT prod_name, prod_price
FROM Products
WHERE prod_price >= 3 AND prod_price <= 6
ORDER BY prod_price;
Common WHERE Clause Mistakes
Syntax Errors:
-- ❌ Wrong: Missing quotes for string
SELECT cust_name
FROM Customers
WHERE cust_state = CA;
-- ❌ Wrong: Using = with NULL
SELECT cust_name
FROM Customers
WHERE cust_email = NULL;
-- ❌ Wrong: Incorrect BETWEEN syntax
SELECT prod_name
FROM Products
WHERE prod_price BETWEEN 5, 10; -- Missing AND
Logic Errors:
-- ❌ Wrong: Not considering NULL values
SELECT cust_name
FROM Customers
WHERE cust_email != 'gmail.com'; -- Excludes NULLs
-- ✅ Correct: Include NULLs if intended
SELECT cust_name
FROM Customers
WHERE cust_email != 'gmail.com' OR cust_email IS NULL;
Best Practices Summary
1. Use Database-Level Filtering:
-- ✅ Good: Filter at database level
SELECT cust_name, cust_email
FROM Customers
WHERE cust_state = 'CA';
-- ❌ Avoid: Application-level filtering
-- SELECT cust_name, cust_email FROM Customers;
-- Then filter in application code
2. Choose Appropriate Operators:
-- ✅ Good: Use BETWEEN for ranges
SELECT prod_name, prod_price
FROM Products
WHERE prod_price BETWEEN 10 AND 50;
-- ✅ Good: Use IS NULL for NULL checks
SELECT cust_name
FROM Customers
WHERE cust_email IS NULL;
3. Consider Performance:
-- ✅ Good: Filter on indexed columns
SELECT order_num, order_date
FROM Orders
WHERE cust_id = 'CUST001'; -- If cust_id is indexed
-- ✅ Good: Use selective filters
SELECT prod_name
FROM Products
WHERE prod_category = 'Electronics' -- Selective
AND prod_price > 1000; -- Further refinement
Advanced Data Filtering
Understanding Advanced Filtering
Advanced filtering allows you to combine multiple conditions to create sophisticated search criteria that go beyond simple single-condition filters. This enables precise data retrieval for complex business requirements.
Why Use Advanced Filtering:
Benefit | Description | Example Use Case |
---|---|---|
Precision | Multiple criteria for exact matches | Find customers in specific region with recent orders |
Flexibility | Alternative conditions with OR | Products from multiple vendors or categories |
Efficiency | Single query vs multiple queries | Better performance than separate lookups |
Business Logic | Complex rules in database | Pricing tiers, eligibility criteria, status combinations |
Logical Operators Overview:
| Operator | Purpose | Syntax | Example
|-----|-----|-----
| AND | All conditions must be true | condition1 AND condition2
| price > 10 AND category = 'Electronics'
| OR | Any condition can be true | condition1 OR condition2
| state = 'CA' OR state = 'NY'
| IN | Match any value in list | column IN (value1, value2)
| status IN ('Active', 'Pending')
| NOT | Negate a condition | NOT condition
| NOT category = 'Discontinued'
Combining WHERE Clauses with AND
The AND operator requires all conditions to be true for a row to be included in the results. This creates more restrictive filters.
Basic AND Syntax:
SELECT column_list
FROM table_name
WHERE condition1 AND condition2 AND condition3;
Simple AND Example:
-- Products from specific vendor under $4
SELECT prod_id, prod_price, prod_name
FROM Products
WHERE vend_id = 'DLL01' AND prod_price <= 4;
Output:
prod_id prod_price prod_name
------- ---------- --------------------
BNBG02 3.4900 Bird bean bag toy
BNBG01 3.4900 Fish bean bag toy
BNBG03 3.4900 Rabbit bean bag toy
Multiple AND Conditions:
-- Customers in specific state with recent orders
SELECT cust_name, cust_city, cust_state
FROM Customers
WHERE cust_state = 'CA'
AND cust_country = 'USA'
AND last_order_date >= '2024-01-01';
-- High-value orders from premium customers
SELECT order_num, order_date, order_total, cust_id
FROM Orders
WHERE order_total > 1000
AND cust_status = 'Premium'
AND order_date >= DATEADD(month, -3, GETDATE());
-- In-stock products in specific price range
SELECT prod_name, prod_price, quantity_in_stock
FROM Products
WHERE prod_price BETWEEN 10 AND 50
AND quantity_in_stock > 0
AND prod_category = 'Electronics';
Date and Time AND Filtering:
-- Orders from specific time period and customer type
SELECT order_num, order_date, cust_id, order_total
FROM Orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-04-01'
AND cust_type = 'Business'
AND order_total > 500;
-- Recent active customers
SELECT cust_name, cust_email, registration_date
FROM Customers
WHERE registration_date >= DATEADD(year, -1, GETDATE())
AND cust_status = 'Active'
AND email_verified = 1;
Combining WHERE Clauses with OR
The OR operator includes rows where any condition is true. This creates more inclusive filters and provides alternative matching criteria.
Basic OR Syntax:
SELECT column_list
FROM table_name
WHERE condition1 OR condition2 OR condition3;
Simple OR Example:
-- Products from either vendor
SELECT prod_id, prod_price, prod_name
FROM Products
WHERE vend_id = 'DLL01' OR vend_id = 'BRS01';
Output:
prod_name prod_price
------------------- ----------
Fish bean bag toy 3.4900
Bird bean bag toy 3.4900
Rabbit bean bag toy 3.4900
8 inch teddy bear 5.9900
12 inch teddy bear 8.9900
18 inch teddy bear 11.9900
Raggedy Ann 4.9900
Multiple OR Conditions:
-- Customers from multiple states
SELECT cust_name, cust_city, cust_state
FROM Customers
WHERE cust_state = 'CA'
OR cust_state = 'NY'
OR cust_state = 'TX';
-- Orders with various status conditions
SELECT order_num, order_status, order_date
FROM Orders
WHERE order_status = 'Pending'
OR order_status = 'Processing'
OR order_status = 'Shipped';
-- Products in multiple categories
SELECT prod_name, prod_category, prod_price
FROM Products
WHERE prod_category = 'Electronics'
OR prod_category = 'Computers'
OR prod_category = 'Software';
OR with Different Columns:
-- Flexible customer search
SELECT cust_name, cust_email, cust_phone
FROM Customers
WHERE cust_city = 'Los Angeles'
OR cust_email LIKE '%@gmail.com'
OR order_total > 5000;
-- Product availability options
SELECT prod_name, prod_price, quantity_in_stock
FROM Products
WHERE quantity_in_stock > 100
OR prod_price < 10
OR prod_category = 'Clearance';
Understanding Order of Evaluation
SQL processes AND operators before OR operators (operator precedence), which can lead to unexpected results if not properly managed with parentheses.
Problematic Example Without Parentheses:
-- ❌ Problem: Incorrect grouping due to operator precedence
SELECT prod_name, prod_price
FROM Products
WHERE vend_id = 'DLL01' OR vend_id = 'BRS01'
AND prod_price >= 10;
What SQL Actually Processes:
-- SQL interprets this as:
WHERE vend_id = 'DLL01' OR (vend_id = 'BRS01' AND prod_price >= 10)
-- This returns:
-- 1. ALL products from DLL01 (regardless of price)
-- 2. Products from BRS01 that cost $10 or more
Output (Incorrect):
prod_name prod_price
------------------- ----------
Fish bean bag toy 3.4900 -- DLL01, under $10 (unexpected)
Bird bean bag toy 3.4900 -- DLL01, under $10 (unexpected)
Rabbit bean bag toy 3.4900 -- DLL01, under $10 (unexpected)
18 inch teddy bear 11.9900 -- BRS01, over $10 (expected)
Raggedy Ann 4.9900 -- DLL01, under $10 (unexpected)
Correct Solution with Parentheses:
-- ✅ Correct: Explicit grouping with parentheses
SELECT prod_name, prod_price
FROM Products
WHERE (vend_id = 'DLL01' OR vend_id = 'BRS01')
AND prod_price >= 10;
What SQL Processes:
-- SQL interprets this as:
WHERE (vend_id = 'DLL01' OR vend_id = 'BRS01') AND prod_price >= 10
-- This returns:
-- Products from EITHER vendor that cost $10 or more
Output (Correct):
prod_name prod_price
------------------- ----------
18 inch teddy bear 11.9900
Complex Grouping Examples:
-- Multiple condition groups
SELECT cust_name, cust_state, order_total
FROM Customers c
JOIN Orders o ON c.cust_id = o.cust_id
WHERE (cust_state = 'CA' OR cust_state = 'NY')
AND (order_total > 1000 OR cust_type = 'Premium')
AND order_date >= '2024-01-01';
-- Business logic with proper grouping
SELECT prod_name, prod_price, prod_category
FROM Products
WHERE (prod_category = 'Electronics' OR prod_category = 'Computers')
AND (prod_price < 100 OR quantity_in_stock > 50)
AND prod_status = 'Active';
-- Date range with multiple criteria
SELECT order_num, order_date, cust_id, order_total
FROM Orders
WHERE (order_date >= '2024-01-01' AND order_date < '2024-04-01')
AND (order_total > 500 OR cust_type = 'VIP')
AND (order_status = 'Completed' OR order_status = 'Shipped');
Best Practices for Operator Precedence:
-- ✅ Always use parentheses for clarity
SELECT *
FROM Products
WHERE (category = 'A' OR category = 'B')
AND (price > 10 AND price < 100)
AND status = 'Active';
-- ✅ Group related conditions
SELECT *
FROM Customers
WHERE (state = 'CA' OR state = 'NY' OR state = 'TX')
AND (customer_type = 'Premium' OR total_orders > 10)
AND registration_date >= '2023-01-01';
-- ❌ Avoid relying on default precedence
SELECT *
FROM Products
WHERE category = 'A' OR category = 'B' AND price > 10; -- Unclear intent
Using the IN Operator
The IN operator provides a clean way to test against multiple values, functioning as a more readable alternative to multiple OR conditions.
Basic IN Syntax:
SELECT column_list
FROM table_name
WHERE column_name IN (value1, value2, value3, ...);
Simple IN Example:
-- Products from specific vendors
SELECT prod_name, prod_price
FROM Products
WHERE vend_id IN ('DLL01', 'BRS01')
ORDER BY prod_name;
Output:
prod_name prod_price
------------------- ----------
12 inch teddy bear 8.9900
18 inch teddy bear 11.9900
8 inch teddy bear 5.9900
Bird bean bag toy 3.4900
Fish bean bag toy 3.4900
Rabbit bean bag toy 3.4900
Raggedy Ann 4.9900
IN vs OR Comparison:
-- Using IN (cleaner, more readable)
SELECT prod_name, prod_price
FROM Products
WHERE vend_id IN ('DLL01', 'BRS01', 'FNG01', 'ABC01')
ORDER BY prod_name;
-- Equivalent using OR (more verbose)
SELECT prod_name, prod_price
FROM Products
WHERE vend_id = 'DLL01'
OR vend_id = 'BRS01'
OR vend_id = 'FNG01'
OR vend_id = 'ABC01'
ORDER BY prod_name;
Advanced IN Examples:
-- Multiple status values
SELECT order_num, order_status, order_date
FROM Orders
WHERE order_status IN ('Pending', 'Processing', 'Shipped', 'Delivered');
-- Specific customer types
SELECT cust_name, cust_type, cust_city
FROM Customers
WHERE cust_type IN ('Premium', 'VIP', 'Corporate')
AND cust_state IN ('CA', 'NY', 'TX');
-- Product categories
SELECT prod_name, prod_category, prod_price
FROM Products
WHERE prod_category IN ('Electronics', 'Computers', 'Software', 'Hardware')
AND prod_price BETWEEN 50 AND 500;
-- Numeric values
SELECT emp_name, emp_department, emp_salary
FROM Employees
WHERE emp_department_id IN (10, 20, 30, 40)
AND emp_salary > 50000;
IN with Subqueries (Preview):
-- Products from vendors in specific states
SELECT prod_name, prod_price
FROM Products
WHERE vend_id IN (
SELECT vend_id
FROM Vendors
WHERE vend_state IN ('CA', 'NY')
);
-- Orders from customers with high total purchases
SELECT order_num, order_date, order_total
FROM Orders
WHERE cust_id IN (
SELECT cust_id
FROM Customers
WHERE total_purchases > 10000
);
Advantages of IN Operator:
Advantage | Description | Example |
---|---|---|
Readability | Cleaner syntax than multiple ORs | status IN ('A', 'B', 'C') vs status = 'A' OR status = 'B' OR status = 'C' |
Performance | Often faster than OR chains | Better optimization by database engine |
Maintainability | Easier to modify value lists | Add/remove values in single location |
Subquery Support | Can use with SELECT statements | Dynamic value lists from other tables |
Evaluation Order | Clearer precedence with other operators | Less ambiguity in complex WHERE clauses |
Using the NOT Operator
The NOT operator negates conditions, allowing you to find rows that don't match specified criteria. It's particularly powerful when combined with other operators.
Basic NOT Syntax:
SELECT column_list
FROM table_name
WHERE NOT condition;
Simple NOT Example:
-- Products NOT from specific vendor
SELECT prod_name
FROM Products
WHERE NOT vend_id = 'DLL01'
ORDER BY prod_name;
Output:
prod_name
------------------
12 inch teddy bear
18 inch teddy bear
8 inch teddy bear
King doll
Queen doll
NOT with Different Operators:
-- NOT with equality (same as <> or !=)
SELECT prod_name
FROM Products
WHERE NOT vend_id = 'DLL01';
-- Equivalent using <> operator
SELECT prod_name
FROM Products
WHERE vend_id <> 'DLL01';
-- NOT with IN operator
SELECT cust_name, cust_state
FROM Customers
WHERE cust_state NOT IN ('CA', 'NY', 'TX');
-- NOT with BETWEEN
SELECT prod_name, prod_price
FROM Products
WHERE prod_price NOT BETWEEN 10 AND 50;
-- NOT with LIKE (pattern matching)
SELECT cust_name, cust_email
FROM Customers
WHERE cust_email NOT LIKE '%@gmail.com';
Advanced NOT Examples:
-- Customers without recent orders
SELECT c.cust_name, c.cust_email
FROM Customers c
WHERE c.cust_id NOT IN (
SELECT DISTINCT cust_id
FROM Orders
WHERE order_date >= DATEADD(month, -6, GETDATE())
);
-- Products not in specific categories
SELECT prod_name, prod_category, prod_price
FROM Products
WHERE prod_category NOT IN ('Discontinued', 'Clearance', 'Seasonal')
AND quantity_in_stock > 0;
-- Orders not from specific regions
SELECT order_num, order_date, cust_id
FROM Orders o
JOIN Customers c ON o.cust_id = c.cust_id
WHERE c.cust_state NOT IN ('CA', 'NY', 'FL')
AND o.order_total > 100;
NOT with Complex Conditions:
-- NOT with grouped conditions
SELECT prod_name, prod_price, vend_id
FROM Products
WHERE NOT (vend_id = 'DLL01' AND prod_price > 10);
-- This is equivalent to:
SELECT prod_name, prod_price, vend_id
FROM Products
WHERE vend_id != 'DLL01' OR prod_price <= 10;
-- NOT with multiple conditions
SELECT cust_name, cust_state, cust_type
FROM Customers
WHERE NOT (cust_state = 'CA' AND cust_type = 'Individual')
AND cust_status = 'Active';
NOT vs Alternative Operators:
-- NOT with equality
SELECT * FROM Products WHERE NOT prod_category = 'Electronics';
-- Same as:
SELECT * FROM Products WHERE prod_category <> 'Electronics';
-- NOT with IN
SELECT * FROM Customers WHERE cust_state NOT IN ('CA', 'NY');
-- Same as:
SELECT * FROM Customers WHERE cust_state <> 'CA' AND cust_state <> 'NY';
-- NOT with BETWEEN
SELECT * FROM Products WHERE prod_price NOT BETWEEN 10 AND 50;
-- Same as:
SELECT * FROM Products WHERE prod_price < 10 OR prod_price > 50;
Complex Filtering Combinations
Real-world scenarios often require combining multiple logical operators to create sophisticated filtering logic that matches business requirements.
Business Logic Examples:
-- E-commerce product search
SELECT prod_name, prod_price, prod_category, vend_name
FROM Products p
JOIN Vendors v ON p.vend_id = v.vend_id
WHERE (prod_category IN ('Electronics', 'Computers', 'Software'))
AND (prod_price BETWEEN 50 AND 500)
AND (quantity_in_stock > 0)
AND NOT (prod_status = 'Discontinued')
ORDER BY prod_price;
-- Customer segmentation
SELECT cust_name, cust_type, total_orders, total_spent
FROM Customers
WHERE (cust_type = 'Premium' OR total_spent > 5000)
AND (total_orders >= 10 OR registration_date >= '2023-01-01')
AND cust_status = 'Active'
AND cust_email IS NOT NULL;
-- Order fulfillment analysis
SELECT o.order_num, o.order_date, o.order_total, c.cust_name
FROM Orders o
JOIN Customers c ON o.cust_id = c.cust_id
WHERE (o.order_status IN ('Processing', 'Shipped'))
AND (o.order_total > 100 OR c.cust_type = 'VIP')
AND o.order_date >= DATEADD(day, -30, GETDATE())
AND NOT (o.shipping_method = 'Standard' AND o.order_total < 50);
Data Quality and Validation:
-- Find incomplete customer records
SELECT cust_id, cust_name, cust_email, cust_phone
FROM Customers
WHERE (cust_email IS NULL OR cust_email = '')
OR (cust_phone IS NULL OR cust_phone = '')
OR (cust_address IS NULL OR cust_address = '')
AND cust_status = 'Active';
-- Identify problematic orders
SELECT order_num, order_date, order_total, order_status
FROM Orders
WHERE (order_total <= 0 OR order_total IS NULL)
OR (order_date > GETDATE())
OR (order_status NOT IN ('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled'))
OR (cust_id IS NULL);
-- Product inventory issues
SELECT prod_id, prod_name, quantity_in_stock, prod_price
FROM Products
WHERE (quantity_in_stock < 0)
OR (prod_price <= 0 OR prod_price IS NULL)
OR (prod_name IS NULL OR prod_name = '')
AND prod_status = 'Active';
Performance Optimization Patterns:
-- ✅ Good: Most selective conditions first
SELECT cust_name, order_total
FROM Customers c
JOIN Orders o ON c.cust_id = o.cust_id
WHERE o.order_date >= '2024-01-01' -- Date range (selective)
AND c.cust_type = 'Premium' -- Customer type (selective)
AND (o.order_total > 1000 OR c.total_lifetime_value > 10000);
-- ✅ Good: Use indexes effectively
SELECT prod_name, prod_price
FROM Products
WHERE prod_category = 'Electronics' -- Indexed column
AND prod_price BETWEEN 100 AND 500 -- Range on indexed column
AND quantity_in_stock > 0;
-- ❌ Avoid: Functions in WHERE clause
SELECT cust_name
FROM Customers
WHERE UPPER(cust_name) LIKE 'JOHN%'; -- Can't use index
-- ✅ Better: Store data consistently
SELECT cust_name
FROM Customers
WHERE cust_name LIKE 'John%'; -- Can use index
Challenge Solutions
Challenge 1: California vendors
-- Find vendors in California, USA
SELECT vend_name
FROM Vendors
WHERE vend_country = 'USA' AND vend_state = 'CA';
Challenge 2: Large orders for specific products
-- Orders with 100+ items of BR01, BR02, or BR03
SELECT order_num, prod_id, quantity
FROM OrderItems
WHERE prod_id IN ('BR01', 'BR02', 'BR03')
AND quantity >= 100;
-- Alternative with proper grouping
SELECT order_num, prod_id, quantity
FROM OrderItems
WHERE (prod_id = 'BR01' OR prod_id = 'BR02' OR prod_id = 'BR03')
AND quantity >= 100;
Challenge 3: Price range with AND
-- Products between $3 and $6 using AND
SELECT prod_name, prod_price
FROM Products
WHERE prod_price >= 3 AND prod_price <= 6
ORDER BY prod_price;
Challenge 4: SQL statement error
-- ❌ Wrong: ORDER BY before WHERE
SELECT vend_name
FROM Vendors
ORDER BY vend_name
WHERE vend_country = 'USA' AND vend_state = 'CA';
-- ✅ Correct: WHERE before ORDER BY
SELECT vend_name
FROM Vendors
WHERE vend_country = 'USA' AND vend_state = 'CA'
ORDER BY vend_name;
Error Explanation: The WHERE clause must come before the ORDER BY clause. SQL processes clauses in a specific order: FROM → WHERE → ORDER BY → SELECT.
Common Advanced Filtering Mistakes
Operator Precedence Errors:
-- ❌ Wrong: Unclear precedence
SELECT * FROM Products
WHERE category = 'A' OR category = 'B' AND price > 10;
-- ✅ Correct: Clear grouping
SELECT * FROM Products
WHERE (category = 'A' OR category = 'B') AND price > 10;
NULL Handling Issues:
-- ❌ Wrong: NOT doesn't handle NULLs as expected
SELECT cust_name FROM Customers
WHERE NOT cust_email = 'test@example.com'; -- Excludes NULLs
-- ✅ Correct: Explicitly handle NULLs
SELECT cust_name FROM Customers
WHERE cust_email != 'test@example.com' OR cust_email IS NULL;
IN vs OR Performance:
-- ✅ Good: Use IN for multiple values
SELECT * FROM Products
WHERE category IN ('A', 'B', 'C', 'D', 'E');
-- ❌ Less efficient: Long OR chains
SELECT * FROM Products
WHERE category = 'A' OR category = 'B' OR category = 'C' OR category = 'D' OR category = 'E';
Best Practices Summary
1. Always Use Parentheses:
-- ✅ Clear and unambiguous
SELECT * FROM Products
WHERE (category = 'Electronics' OR category = 'Computers')
AND (price > 100 AND price < 1000)
AND status = 'Active';
2. Choose Appropriate Operators:
-- ✅ Use IN for multiple values
WHERE status IN ('Active', 'Pending', 'Processing');
-- ✅ Use BETWEEN for ranges
WHERE price BETWEEN 10 AND 100;
-- ✅ Use NOT for negation
WHERE category NOT IN ('Discontinued', 'Clearance');
3. Consider Performance:
-- ✅ Most selective conditions first
WHERE date_column >= '2024-01-01' -- Selective
AND category = 'Electronics' -- Selective
AND (price > 100 OR status = 'Premium'); -- Less selective
4. Handle NULLs Explicitly:
-- ✅ Account for NULL values
WHERE (email != 'spam@example.com' OR email IS NULL)
AND status = 'Active';
Wildcard Filtering
Understanding Wildcard Filtering
Wildcard filtering enables pattern-based searching when you need to find data that matches partial or flexible criteria, rather than exact values. This is essential for text searches, data exploration, and flexible filtering scenarios.
When to Use Wildcards:
Scenario | Example | Wildcard Solution |
---|---|---|
Partial Text Search | Find products containing "bean bag" | LIKE '%bean bag%' |
Prefix Matching | Names starting with "John" | LIKE 'John%' |
Suffix Matching | Files ending with ".pdf" | LIKE '%.pdf' |
Pattern Matching | Phone numbers like "555-xxxx" | LIKE '555-____' |
Character Sets | Names starting with J or M | LIKE '[JM]%' |
Wildcard vs Standard Operators:
-- ✅ Standard operators: Known exact values
SELECT * FROM Products WHERE prod_price = 9.99;
SELECT * FROM Customers WHERE cust_state = 'CA';
-- ✅ Wildcards: Pattern-based matching
SELECT * FROM Products WHERE prod_name LIKE '%toy%';
SELECT * FROM Customers WHERE cust_email LIKE '%@gmail.com';
Key Concepts:
- Wildcards: Special characters with pattern-matching meanings
- Search Patterns: Combinations of literal text and wildcards
- LIKE Operator: Required for wildcard pattern matching
- Text Fields Only: Wildcards work only with string/text data types
The LIKE Operator
LIKE is the foundation of wildcard filtering, instructing the database to perform pattern matching rather than exact equality comparisons.
Basic LIKE Syntax:
SELECT column_list
FROM table_name
WHERE column_name LIKE 'pattern';
LIKE vs Equality Comparison:
-- Exact match (equality)
SELECT prod_name FROM Products
WHERE prod_name = 'Fish bean bag toy'; -- Only exact matches
-- Pattern match (LIKE)
SELECT prod_name FROM Products
WHERE prod_name LIKE 'Fish%'; -- Starts with "Fish"
Important LIKE Characteristics:
Characteristic | Description | Example |
---|---|---|
Case Sensitivity | Depends on database configuration | 'fish%' may not match 'Fish bean bag toy' |
Text Fields Only | Cannot use with numeric/date fields | WHERE price LIKE '10%' is invalid |
Pattern Required | Must include wildcard characters | LIKE 'exact' same as = 'exact' |
NULL Handling | Wildcards don't match NULL values | LIKE '%' excludes NULL rows |
The Percent Sign (%) Wildcard
The % wildcard matches zero, one, or more characters at the specified position, making it the most versatile and commonly used wildcard.
Basic % Usage:
-- Products starting with "Fish"
SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE 'Fish%';
Output:
prod_id prod_name
------- ------------------
BNBG01 Fish bean bag toy
% Wildcard Positions:
-- Starts with pattern
SELECT prod_name FROM Products
WHERE prod_name LIKE 'Fish%';
-- Matches: "Fish bean bag toy", "Fishing rod", "Fish"
-- Ends with pattern
SELECT prod_name FROM Products
WHERE prod_name LIKE '%toy';
-- Matches: "Fish bean bag toy", "Teddy bear toy", "toy"
-- Contains pattern anywhere
SELECT prod_name FROM Products
WHERE prod_name LIKE '%bean bag%';
-- Matches: "Fish bean bag toy", "Large bean bag chair"
-- Multiple patterns
SELECT prod_name FROM Products
WHERE prod_name LIKE 'F%y';
-- Matches: "Fish bean bag toy", "Funny", "Fly"
Advanced % Examples:
-- Email domain filtering
SELECT cust_name, cust_email
FROM Customers
WHERE cust_email LIKE '%@gmail.com';
-- Product code patterns
SELECT prod_id, prod_name
FROM Products
WHERE prod_id LIKE 'BR%'; -- All products with codes starting "BR"
-- File extension matching
SELECT file_name, file_size
FROM Documents
WHERE file_name LIKE '%.pdf';
-- Multiple word search
SELECT prod_name, prod_desc
FROM Products
WHERE prod_desc LIKE '%wireless%'
AND prod_desc LIKE '%bluetooth%';
% Wildcard with Multiple Patterns:
-- Products containing "bean bag" anywhere
SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '%bean bag%';
Output:
prod_id prod_name
-------- --------------------
BNBG01 Fish bean bag toy
BNBG02 Bird bean bag toy
BNBG03 Rabbit bean bag toy
Complex % Patterns:
-- Names with middle initial pattern
SELECT cust_name
FROM Customers
WHERE cust_name LIKE '% _. %'; -- "John A. Smith"
-- Product codes with specific format
SELECT prod_id, prod_name
FROM Products
WHERE prod_id LIKE '__-____%'; -- "AB-1234X", "CD-5678Y"
-- Partial email search
SELECT cust_name, cust_email
FROM Customers
WHERE cust_email LIKE 'b%@forta.com'; -- Emails starting with 'b' at forta.com
% Wildcard Characteristics:
Feature | Description | Example |
---|---|---|
Zero Characters | Matches empty string | 'Fish%' matches 'Fish' |
One Character | Matches single character | 'Fish%' matches 'Fishy' |
Multiple Characters | Matches any length | 'Fish%' matches 'Fish bean bag toy' |
Position Flexible | Can be anywhere in pattern | '%fish%' , 'fish%' , '%fish' |
The Underscore (_) Wildcard
The _ wildcard matches exactly one character - no more, no less. This provides precise control over pattern length and character positions.
Basic _ Usage:
-- Products with exactly 2 characters before "inch teddy bear"
SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '__ inch teddy bear';
Output:
prod_id prod_name
-------- --------------------
BR02 12 inch teddy bear
BR03 18 inch teddy bear
Note: "8 inch teddy bear" doesn't match because it has only 1 character before "inch", but the pattern requires exactly 2.
_ vs % Comparison:
-- Underscore: exactly one character
SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '__ inch teddy bear';
-- Matches: "12 inch teddy bear", "18 inch teddy bear"
-- Doesn't match: "8 inch teddy bear" (only 1 character)
-- Percent: any number of characters
SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '% inch teddy bear';
-- Matches: "8 inch teddy bear", "12 inch teddy bear", "18 inch teddy bear"
Output for % version:
prod_id prod_name
-------- --------------------
BR01 8 inch teddy bear
BR02 12 inch teddy bear
BR03 18 inch teddy bear
Practical _ Examples:
-- Phone number patterns
SELECT cust_name, cust_phone
FROM Customers
WHERE cust_phone LIKE '555-____'; -- 555- followed by exactly 4 digits
-- Product codes with fixed format
SELECT prod_id, prod_name
FROM Products
WHERE prod_id LIKE 'BR__'; -- BR followed by exactly 2 characters
-- Date patterns (if stored as text)
SELECT order_num, order_date
FROM Orders
WHERE order_date LIKE '2024-__-01'; -- First day of any month in 2024
-- License plate patterns
SELECT vehicle_id, license_plate
FROM Vehicles
WHERE license_plate LIKE '___-____'; -- 3 chars, dash, 4 chars
Multiple _ Wildcards:
-- Exact character count matching
SELECT prod_name
FROM Products
WHERE prod_name LIKE '___ inch ____'; -- 3 chars + " inch " + 4 chars
-- Fixed-length codes
SELECT cust_id, cust_name
FROM Customers
WHERE cust_id LIKE 'C____'; -- C followed by exactly 4 characters
-- Pattern with mixed wildcards
SELECT prod_name
FROM Products
WHERE prod_name LIKE '_% inch %'; -- 1 char + any chars + " inch " + any chars
_ Wildcard Characteristics:
Feature | Description | Example |
---|---|---|
Exact Count | Always matches exactly one character | '_' never matches 0 or 2+ characters |
Position Specific | Each _ represents one position | '__' = exactly 2 characters |
Cannot Skip | Must have character at each _ position | 'A_C' won't match 'AC' |
Combine with % | Mix for flexible patterns | '__%' = at least 2 characters |
The Brackets ([]) Wildcard
The [] wildcard specifies a character set where any single character from the set can match at that position. This provides precise control over allowed characters.
Database Support:
Database | Support | Notes |
---|---|---|
SQL Server | ✅ Full support | Includes negation with ^ |
MySQL | ❌ Not supported | Use REGEXP instead |
Oracle | ❌ Not supported | Use REGEXP_LIKE instead |
PostgreSQL | ❌ Not supported | Use SIMILAR TO or REGEXP |
SQLite | ❌ Not supported | Limited pattern matching |
Basic [] Usage (SQL Server):
-- Names starting with J or M
SELECT cust_contact
FROM Customers
WHERE cust_contact LIKE '[JM]%'
ORDER BY cust_contact;
Output:
cust_contact
-----------------
Jim Jones
John Smith
Michelle Green
Character Set Examples:
-- Specific letters
SELECT cust_contact
FROM Customers
WHERE cust_contact LIKE '[ABCD]%'; -- Starts with A, B, C, or D
-- Numeric ranges
SELECT prod_id
FROM Products
WHERE prod_id LIKE 'BR[0-9][0-9]'; -- BR followed by 2 digits
-- Mixed character sets
SELECT cust_name
FROM Customers
WHERE cust_name LIKE '[A-Z][a-z]%'; -- Uppercase + lowercase + any chars
-- Multiple character sets
SELECT prod_code
FROM Products
WHERE prod_code LIKE '[A-Z][A-Z][0-9][0-9]'; -- 2 letters + 2 digits
Negated Character Sets with ^:
-- Names NOT starting with J or M
SELECT cust_contact
FROM Customers
WHERE cust_contact LIKE '[^JM]%'
ORDER BY cust_contact;
-- Product codes NOT starting with specific letters
SELECT prod_id, prod_name
FROM Products
WHERE prod_id LIKE '[^AB]%'; -- Doesn't start with A or B
-- Non-numeric first character
SELECT cust_id
FROM Customers
WHERE cust_id LIKE '[^0-9]%'; -- Doesn't start with digit
Alternative to [] with NOT:
-- Using NOT operator instead of ^
SELECT cust_contact
FROM Customers
WHERE NOT cust_contact LIKE '[JM]%'
ORDER BY cust_contact;
-- Multiple NOT conditions
SELECT cust_contact
FROM Customers
WHERE cust_contact NOT LIKE 'J%'
AND cust_contact NOT LIKE 'M%'
ORDER BY cust_contact;
Cross-Database Alternatives:
-- MySQL: Using REGEXP
SELECT cust_contact
FROM Customers
WHERE cust_contact REGEXP '^[JM]';
-- PostgreSQL: Using SIMILAR TO
SELECT cust_contact
FROM Customers
WHERE cust_contact SIMILAR TO '[JM]%';
-- Oracle: Using REGEXP_LIKE
SELECT cust_contact
FROM Customers
WHERE REGEXP_LIKE(cust_contact, '^[JM]');
Advanced Wildcard Combinations
Real-world scenarios often require combining multiple wildcards to create sophisticated pattern matching for complex business requirements.
Complex Pattern Examples:
-- Email validation patterns
SELECT cust_name, cust_email
FROM Customers
WHERE cust_email LIKE '%_@_%._%'; -- Basic email format
-- Product code validation
SELECT prod_id, prod_name
FROM Products
WHERE prod_id LIKE '[A-Z][A-Z][0-9][0-9][0-9]'; -- 2 letters + 3 digits
-- Phone number patterns
SELECT cust_name, cust_phone
FROM Customers
WHERE cust_phone LIKE '([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]';
-- File naming conventions
SELECT file_name, file_type
FROM Documents
WHERE file_name LIKE 'DOC_____-____.pdf'; -- DOC + 5 chars + dash + 4 chars + .pdf
Business Logic Patterns:
-- Customer ID format validation
SELECT cust_id, cust_name
FROM Customers
WHERE cust_id LIKE 'CUST[0-9][0-9][0-9][0-9]' -- CUST + 4 digits
OR cust_id LIKE 'VIP[0-9][0-9][0-9]'; -- VIP + 3 digits
-- Product search with multiple criteria
SELECT prod_name, prod_desc
FROM Products
WHERE (prod_name LIKE '%wireless%' OR prod_desc LIKE '%wireless%')
AND (prod_name LIKE '%mouse%' OR prod_desc LIKE '%mouse%');
-- Address format checking
SELECT cust_name, cust_address
FROM Customers
WHERE cust_address LIKE '[0-9]% % St' -- Number + street name + St
OR cust_address LIKE '[0-9]% % Ave' -- Number + street name + Ave
OR cust_address LIKE '[0-9]% % Blvd'; -- Number + street name + Blvd
Data Quality and Validation:
-- Find inconsistent data formats
SELECT cust_id, cust_phone
FROM Customers
WHERE cust_phone NOT LIKE '([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
AND cust_phone IS NOT NULL;
-- Identify potential data entry errors
SELECT prod_id, prod_name
FROM Products
WHERE prod_name LIKE '% %' -- Double spaces
OR prod_name LIKE ' %' -- Leading space
OR prod_name LIKE '% '; -- Trailing space
-- Check for incomplete records
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_email NOT LIKE '%@%.%' -- Invalid email format
OR cust_email LIKE '%@%@%'; -- Multiple @ symbols
Performance Considerations
Wildcard searches can be expensive, especially when wildcards appear at the beginning of patterns. Understanding performance implications helps optimize queries.
Performance Impact by Pattern Type:
| Pattern Type | Performance | Index Usage | Example
|-----|-----|-----
| Prefix (suffix %) | ✅ Fast | Can use index | 'John%'
| Suffix (prefix %) | ❌ Slow | Cannot use index | '%Smith'
| Contains (% both sides) | ❌ Slowest | Cannot use index | '%John%'
| Fixed position (_) | ⚠️ Moderate | Limited index use | 'J_hn'
Optimization Strategies:
-- ✅ Good: Prefix matching (can use index)
SELECT cust_name
FROM Customers
WHERE cust_name LIKE 'Smith%';
-- ❌ Slow: Leading wildcard (full table scan)
SELECT cust_name
FROM Customers
WHERE cust_name LIKE '%Smith';
-- ✅ Better: Use additional criteria to limit scope
SELECT cust_name
FROM Customers
WHERE cust_state = 'CA' -- Indexed column first
AND cust_name LIKE '%Smith%';
-- ✅ Good: Combine with selective filters
SELECT prod_name, prod_price
FROM Products
WHERE prod_category = 'Electronics' -- Selective filter
AND prod_name LIKE '%wireless%'; -- Then wildcard search
Alternative Approaches for Better Performance:
-- Full-text search (if available)
SELECT prod_name, prod_desc
FROM Products
WHERE CONTAINS(prod_desc, 'wireless bluetooth');
-- Separate search columns
SELECT cust_name
FROM Customers
WHERE cust_name_search LIKE '%SMITH%'; -- Pre-processed uppercase column
-- Use LIMIT to control result size
SELECT TOP 100 prod_name
FROM Products
WHERE prod_name LIKE '%toy%'
ORDER BY prod_name;
Common Wildcard Pitfalls
Trailing Spaces Issue:
-- ❌ Problem: Trailing spaces prevent matches
SELECT prod_name
FROM Products
WHERE prod_name LIKE 'F%y'; -- Won't match "Fishy " (with trailing spaces)
-- ✅ Solution: Add trailing wildcard
SELECT prod_name
FROM Products
WHERE prod_name LIKE 'F%y%'; -- Matches with trailing spaces
-- ✅ Better: Use TRIM function
SELECT prod_name
FROM Products
WHERE TRIM(prod_name) LIKE 'F%y';
Case Sensitivity Issues:
-- ❌ May not work: Case sensitive search
SELECT cust_name
FROM Customers
WHERE cust_name LIKE 'john%'; -- Won't match "John Smith"
-- ✅ Solution: Use UPPER/LOWER functions
SELECT cust_name
FROM Customers
WHERE UPPER(cust_name) LIKE 'JOHN%';
-- ✅ Alternative: Case-insensitive collation (SQL Server)
SELECT cust_name
FROM Customers
WHERE cust_name COLLATE SQL_Latin1_General_CP1_CI_AS LIKE 'john%';
NULL Value Handling:
-- ❌ Problem: Wildcards don't match NULL
SELECT cust_name
FROM Customers
WHERE cust_email LIKE '%'; -- Excludes rows where cust_email IS NULL
-- ✅ Solution: Explicitly handle NULLs
SELECT cust_name
FROM Customers
WHERE cust_email LIKE '%' OR cust_email IS NULL;
-- ✅ Alternative: Use COALESCE
SELECT cust_name
FROM Customers
WHERE COALESCE(cust_email, '') LIKE '%';
Challenge Solutions
Challenge 1: Products with "toy" in description
-- Find products containing "toy" in description
SELECT prod_name, prod_desc
FROM Products
WHERE prod_desc LIKE '%toy%';
Challenge 2: Products WITHOUT "toy" in description
-- Find products not containing "toy" in description
SELECT prod_name, prod_desc
FROM Products
WHERE prod_desc NOT LIKE '%toy%'
ORDER BY prod_name;
-- Alternative handling NULLs
SELECT prod_name, prod_desc
FROM Products
WHERE (prod_desc NOT LIKE '%toy%' OR prod_desc IS NULL)
ORDER BY prod_name;
Challenge 3: Products with both "toy" AND "carrots"
-- Find products containing both "toy" and "carrots"
SELECT prod_name, prod_desc
FROM Products
WHERE prod_desc LIKE '%toy%'
AND prod_desc LIKE '%carrots%';
Challenge 4: "toy" before "carrots" in description
-- Find products with "toy" appearing before "carrots"
SELECT prod_name, prod_desc
FROM Products
WHERE prod_desc LIKE '%toy%carrots%';
Best Practices Summary
1. Choose the Right Wildcard:
-- ✅ Use % for variable-length matching
WHERE prod_name LIKE 'Fish%';
-- ✅ Use _ for exact character count
WHERE prod_id LIKE 'BR__';
-- ✅ Use [] for character sets (where supported)
WHERE cust_name LIKE '[A-M]%';
2. Optimize for Performance:
-- ✅ Avoid leading wildcards when possible
WHERE cust_name LIKE 'Smith%'; -- Good
-- ✅ Combine with selective filters
WHERE cust_state = 'CA' AND cust_name LIKE '%Smith%'; -- Better
-- ✅ Use LIMIT for large result sets
SELECT TOP 100 * FROM Products WHERE prod_name LIKE '%toy%';
3. Handle Edge Cases:
-- ✅ Account for trailing spaces
WHERE TRIM(prod_name) LIKE 'Fish%';
-- ✅ Handle case sensitivity
WHERE UPPER(cust_name) LIKE 'JOHN%';
-- ✅ Consider NULL values
WHERE cust_email LIKE '%@gmail.com' OR cust_email IS NULL;
4. Use Appropriate Alternatives:
-- ✅ Full-text search for complex text queries
WHERE CONTAINS(prod_desc, 'wireless AND bluetooth');
-- ✅ Regular expressions for complex patterns
WHERE cust_email REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';
Calculated Fields
What Are Calculated Fields?
Calculated fields are virtual columns created on-the-fly within SQL SELECT statements. They don't exist in database tables but are computed from existing data during query execution.
Why Use Calculated Fields?
Common scenarios where calculated fields are essential:
- Combining data: Company name + location from separate columns
- Formatting: City, state, ZIP into mailing label format
- Case conversion: Mixed case data to uppercase for reports
- Calculations: Price × quantity for invoice totals
- Aggregations: Totals, averages, and other computations
Client vs Server Processing
Best Practice: Perform data transformations on the database server rather than in client applications for better performance.
Field Concatenation
Basic Concatenation Syntax
Concatenation operators vary by DBMS:
DBMS | Operator | Example |
---|---|---|
SQL Server | + | column1 + column2 |
DB2, Oracle, PostgreSQL, SQLite | ` | |
MySQL, MariaDB | CONCAT() | CONCAT(column1, column2) |
Example: Vendor Name with Location
SQL Server
SELECT vend_name + '(' + vend_country + ')'
FROM Vendors
ORDER BY vend_name;
Oracle/PostgreSQL/SQLite
SELECT vend_name || '(' || vend_country || ')'
FROM Vendors
ORDER BY vend_name;
MySQL/MariaDB
SELECT CONCAT(vend_name, ' (', vend_country, ')')
FROM Vendors
ORDER BY vend_name;
Output:
Bear Emporium (USA )
Bears R Us (USA )
Doll House Inc. (USA )
Fun and Games (England )
Handling Padded Spaces
Many databases pad text values to column width. Use RTRIM()
to remove trailing spaces:
SQL Server
SELECT RTRIM(vend_name) + ' (' + RTRIM(vend_country) + ')'
FROM Vendors
ORDER BY vend_name;
Oracle/PostgreSQL/SQLite
SELECT RTRIM(vend_name) || ' (' || RTRIM(vend_country) || ')'
FROM Vendors
ORDER BY vend_name;
Clean Output:
Bear Emporium (USA)
Bears R Us (USA)
Doll House Inc. (USA)
Fun and Games (England)
TRIM Functions
| Function | Purpose
|-----|-----|-----
| RTRIM()
| Removes spaces from right side
| LTRIM()
| Removes spaces from left side
| TRIM()
| Removes spaces from both sides
Using Aliases
Calculated fields need names to be referenced by client applications. Aliases provide alternate names using the AS
keyword.
Basic Alias Syntax
SELECT RTRIM(vend_name) + ' (' + RTRIM(vend_country) + ')'
AS vend_title
FROM Vendors
ORDER BY vend_name;
Output:
vend_title
-----------------------------------------------------------
Bear Emporium (USA)
Bears R Us (USA)
Doll House Inc. (USA)
MySQL/MariaDB Example
SELECT CONCAT(RTRIM(vend_name), ' (', RTRIM(vend_country), ')')
AS vend_title
FROM Vendors
ORDER BY vend_name;
Alias Best Practices
✅ Good Practices:
- Use single-word aliases
- Use descriptive names
- Include
AS
keyword (even if optional) - Use consistent naming conventions
❌ Avoid:
- Multi-word aliases (create client application issues)
- Special characters in alias names
- Reserved SQL keywords as aliases
Other Uses for Aliases
- Rename columns with illegal characters (spaces, special chars)
- Expand cryptic column names for clarity
- Resolve ambiguous column names in joins
- Create meaningful names for complex calculations
Mathematical Calculations
Calculated fields excel at performing mathematical operations on retrieved data.
Basic Example: Order Item Totals
-- Show individual columns
SELECT prod_id, quantity, item_price
FROM OrderItems
WHERE order_num = 20008;
Output:
prod_id quantity item_price
---------- ----------- ---------------------
RGAN01 5 4.9900
BR03 5 11.9900
BNBG01 10 3.4900
-- Add calculated total
SELECT prod_id,
quantity,
item_price,
quantity * item_price AS expanded_price
FROM OrderItems
WHERE order_num = 20008;
Output:
prod_id quantity item_price expanded_price
---------- ----------- ------------ -----------------
RGAN01 5 4.9900 24.9500
BR03 5 11.9900 59.9500
BNBG01 10 3.4900 34.9000
SQL Mathematical Operators
Operator | Description | Example |
---|---|---|
+ | Addition | price + tax |
- | Subtraction | price - discount |
* | Multiplication | quantity * price |
/ | Division | total / count |
Order of Operations
Use parentheses to control calculation order:
SELECT prod_id,
quantity,
item_price,
(quantity * item_price) * 0.9 AS discounted_total,
quantity * (item_price - 1.00) AS bulk_discount
FROM OrderItems;
Testing Calculations
Use SELECT
without FROM
to test expressions:
-- Test mathematical operations
SELECT 3 * 2; -- Returns: 6
-- Test string functions
SELECT TRIM(' abc '); -- Returns: abc
-- Test date functions (MySQL/MariaDB)
SELECT CURDATE(); -- Returns: current date
Practical Examples
Example 1: Full Customer Address
SELECT cust_name,
cust_address || ', ' || cust_city || ', ' || cust_state || ' ' || cust_zip
AS full_address
FROM Customers;
Example 2: Sales Tax Calculation
SELECT prod_id,
prod_name,
prod_price,
prod_price * 0.08 AS sales_tax,
prod_price * 1.08 AS total_with_tax
FROM Products;
Example 3: Profit Margin Analysis
SELECT prod_id,
prod_price,
prod_cost,
prod_price - prod_cost AS profit,
((prod_price - prod_cost) / prod_price) * 100 AS profit_margin_pct
FROM Products;
Challenge Solutions
Challenge 1: Column Renaming
SELECT vend_id,
vend_name AS vname,
vend_address AS vaddress,
vend_city AS vcity
FROM Vendors
ORDER BY vname; -- Can use original or alias name
Challenge 2: Sale Price Calculation
SELECT prod_id,
prod_price,
prod_price * 0.9 AS sale_price
FROM Products;
Advanced Techniques
Complex String Formatting
-- Create formatted product description
SELECT UPPER(RTRIM(prod_name)) || ' - ' ||
LOWER(RTRIM(prod_desc)) || ' ($' ||
CAST(prod_price AS VARCHAR(10)) || ')'
AS formatted_product
FROM Products;
Conditional Calculations
-- Different discount rates based on quantity
SELECT prod_id,
quantity,
item_price,
CASE
WHEN quantity >= 10 THEN item_price * 0.9
WHEN quantity >= 5 THEN item_price * 0.95
ELSE item_price
END AS discounted_price
FROM OrderItems;
Date Calculations
-- Calculate days since order
SELECT order_num,
order_date,
CURRENT_DATE - order_date AS days_since_order
FROM Orders;
Data Manipulation Functions
What Are SQL Functions?
Functions are operations performed on data to facilitate conversion and manipulation. They're essential tools in your SQL toolkit for transforming and processing data efficiently.
Example: RTRIM()
removes trailing spaces from strings.
The Portability Challenge
⚠️ Major Limitation: Unlike SQL statements (SELECT, INSERT, etc.), functions are highly DBMS-specific.
Function Differences Across DBMSs
Function Type | DBMS Variations |
---|---|
Extract substring | DB2, Oracle, PostgreSQL, SQLite: SUBSTR()``<br> MariaDB, MySQL, SQL Server: SUBSTRING() |
Data type conversion | Oracle: Multiple specific functions<br> DB2, PostgreSQL, SQL Server: CAST()``<br> MariaDB, MySQL, SQL Server: CONVERT() |
Current date | DB2, PostgreSQL: CURRENT_DATE``<br> MariaDB, MySQL: CURDATE()``<br> Oracle: SYSDATE``<br> SQL Server: GETDATE()``<br> SQLite: DATE() |
Should You Use Functions?
Pros:
- Better performance (DBMS handles operations efficiently)
- Cleaner, more readable code
- Built-in optimizations
Cons:
- Reduced code portability
- DBMS lock-in
- Migration complexity
Best Practice: If you use functions, document your code thoroughly and specify which DBMS you're targeting.
Types of SQL Functions
1. Text Functions
Manipulate strings of text (trimming, padding, case conversion)
2. Numeric Functions
Perform mathematical operations on numeric data
3. Date and Time Functions
Manipulate date/time values and extract components
4. Formatting Functions
Generate user-friendly outputs (localized dates, currencies)
5. System Functions
Return DBMS-specific information (user login, system info)
Text Manipulation Functions
Basic Example: UPPER()
SELECT vend_name, UPPER(vend_name) AS vend_name_upcase
FROM Vendors
ORDER BY vend_name;
Output:
vend_name vend_name_upcase
Bear Emporium BEAR EMPORIUM
Bears R Us BEARS R US
Doll House Inc. DOLL HOUSE INC.
Fun and Games FUN AND GAMES
Common Text Functions
Function | Description |
---|---|
LEFT() | Returns characters from left of string |
LENGTH() / LEN() / DATALENGTH() | Returns string length |
LOWER() | Converts to lowercase |
LTRIM() | Trims whitespace from left |
RIGHT() | Returns characters from right of string |
RTRIM() | Trims whitespace from right |
SUBSTR() / SUBSTRING() | Extracts part of string |
SOUNDEX() | Returns phonetic representation |
UPPER() | Converts to uppercase |
SOUNDEX() Function
Purpose: Matches strings by how they sound, not exact spelling.
-- This returns no results
SELECT cust_name, cust_contact
FROM Customers
WHERE cust_contact = 'Michael Green';
-- This finds "Michelle Green" because it sounds similar
SELECT cust_name, cust_contact
FROM Customers
WHERE SOUNDEX(cust_contact) = SOUNDEX('Michael Green');
Output:
cust_name cust_contact
Kids Place Michelle Green
Note: SOUNDEX() is not supported in PostgreSQL and requires special compilation in SQLite.
Date and Time Manipulation Functions
Date/time functions are among the most important but least portable SQL functions.
Extracting Year from Date
SQL Server
SELECT order_num
FROM Orders
WHERE DATEPART(yy, order_date) = 2020;
PostgreSQL
SELECT order_num
FROM Orders
WHERE DATE_PART('year', order_date) = 2020;
Oracle
SELECT order_num
FROM Orders
WHERE EXTRACT(year FROM order_date) = 2020;
MySQL/MariaDB/DB2
SELECT order_num
FROM Orders
WHERE YEAR(order_date) = 2020;
SQLite
SELECT order_num
FROM Orders
WHERE strftime('%Y', order_date) = '2020';
Alternative: Using BETWEEN
-- Oracle example
SELECT order_num
FROM Orders
WHERE order_date BETWEEN to_date('2020-01-01', 'yyyy-mm-dd')
AND to_date('2020-12-31', 'yyyy-mm-dd');
Numeric Manipulation Functions
Numeric functions are the most consistent across different DBMSs.
Common Numeric Functions
Function | Description |
---|---|
ABS() | Returns absolute value |
COS() | Returns trigonometric cosine |
EXP() | Returns exponential value |
PI() | Returns value of PI |
SIN() | Returns trigonometric sine |
SQRT() | Returns square root |
TAN() | Returns trigonometric tangent |
Examples
-- Calculate absolute values
SELECT product_id, price, ABS(price - 10) AS price_difference
FROM Products;
-- Mathematical calculations
SELECT PI() AS pi_value, SQRT(16) AS square_root;
Best Practices
1. Code Documentation
-- SQL Server specific function
SELECT DATEPART(yy, order_date) AS order_year
FROM Orders;
2. Function Naming Consistency
Choose a case style and stick with it:
UPPER()
✅upper()
✅Upper()
✅- Mixing styles ❌
3. Performance Considerations
-- Efficient: Use functions in SELECT
SELECT cust_name, UPPER(cust_name) AS name_upper
FROM Customers;
-- Less efficient: Functions in WHERE (may prevent index usage)
WHERE UPPER(cust_name) = 'JOHN DOE'
4. Portability Planning
- Document DBMS dependencies
- Consider abstraction layers
- Plan migration strategies
- Test across target platforms
Practical Challenges
Challenge 1: User Login Generation
Create a login combining first 2 characters of contact name + first 3 characters of city (uppercase):
SELECT
cust_id,
cust_name,
UPPER(
CONCAT(
LEFT(cust_contact, 2),
LEFT(cust_city, 3)
)
) AS user_login
FROM Customers;
Challenge 2: Date Range Filtering
Find all orders from January 2020:
-- SQL Server
SELECT order_num, order_date
FROM Orders
WHERE DATEPART(mm, order_date) = 1
AND DATEPART(yy, order_date) = 2020
ORDER BY order_date;
-- MySQL/MariaDB
SELECT order_num, order_date
FROM Orders
WHERE MONTH(order_date) = 1
AND YEAR(order_date) = 2020
ORDER BY order_date;
Aggregate Functions
What Are Aggregate Functions?
Aggregate functions operate on a set of rows to calculate and return a single value. They're essential for data analysis and reporting, allowing you to:
- Count rows in a table
- Calculate sums, averages, minimums, and maximums
- Summarize data efficiently without retrieving all records
The Five Core Aggregate Functions
Function | Description |
---|---|
AVG() | Returns a column's average value |
COUNT() | Returns the number of rows in a column |
MAX() | Returns a column's highest value |
MIN() | Returns a column's lowest value |
SUM() | Returns the sum of a column's values |
AVG() Function
Returns the average value of a numeric column.
Basic Usage
SELECT AVG(prod_price) AS avg_price
FROM Products;
Output: 6.823333
With WHERE Clause
SELECT AVG(prod_price) AS avg_price
FROM Products
WHERE vend_id = 'DLL01';
Output: 3.8650
Important: AVG() only works with numeric columns and ignores NULL values.
COUNT() Function
Counts rows in a table or column.
Two Usage Patterns
COUNT(*) - Count All Rows
SELECT COUNT(*) AS num_cust
FROM Customers;
Output: 5
COUNT(column) - Count Non-NULL Values
SELECT COUNT(cust_email) AS num_cust
FROM Customers;
Output: 3
(only customers with email addresses)
Note: COUNT(*) includes NULL values, COUNT(column) excludes them.
MAX() Function
Returns the highest value in a column.
SELECT MAX(prod_price) AS max_price
FROM Products;
Output: 11.9900
Works with Text Data
MAX() can also work with textual columns, returning the value that would appear last when sorted alphabetically.
Note: NULL values are ignored by MAX().
MIN() Function
Returns the lowest value in a column.
SELECT MIN(prod_price) AS min_price
FROM Products;
Output: 3.4900
** Works with Text Data**
Like MAX(), MIN() can work with textual columns, returning the value that would appear first when sorted alphabetically.
Note: NULL values are ignored by MIN().
** SUM() Function**
Returns the total sum of values in a numeric column.
Basic Sum
SELECT SUM(quantity) AS items_ordered
FROM OrderItems
WHERE order_num = 20005;
Output: 200
Calculated Sum
SELECT SUM(item_price * quantity) AS total_price
FROM OrderItems
WHERE order_num = 20005;
Output: 1648.0000
Note: SUM() can perform calculations on multiple columns and ignores NULL values.
Using DISTINCT with Aggregates
Aggregate functions can work with unique values only by using the DISTINCT keyword.
Default Behavior (ALL)
SELECT AVG(prod_price) AS avg_price
FROM Products
WHERE vend_id = 'DLL01';
With DISTINCT
SELECT AVG(DISTINCT prod_price) AS avg_price
FROM Products
WHERE vend_id = 'DLL01';
Output: 4.2400
(higher because duplicate lower prices are excluded)
DISTINCT Rules
- ✅ Can use with COUNT(column)
- ❌ Cannot use with COUNT(*)
- ⚠️ No practical benefit with MIN() and MAX()
Combining Multiple Aggregate Functions
You can use multiple aggregate functions in a single query:
SELECT COUNT(*) AS num_items,
MIN(prod_price) AS price_min,
MAX(prod_price) AS price_max,
AVG(prod_price) AS price_avg
FROM Products;
Output:
num_items | price_min | price_max | price_avg
----------|-----------|-----------|----------
9 | 3.4900 | 11.9900 | 6.823333
Best Practices
Alias Naming
- Use descriptive aliases for aggregate results
- Avoid using actual column names as aliases
- This prevents potential conflicts and obscure error messages
NULL Handling
- All aggregate functions ignore NULL values (except COUNT(*))
- COUNT(*) includes rows with NULL values
- COUNT(column) excludes rows with NULL values
Performance Tips
- Use aggregate functions instead of retrieving all data for analysis
- Combine multiple aggregates in one query when possible
- Use WHERE clauses to filter data before aggregation
Common Use Cases
Data Analysis
-- Sales summary
SELECT
COUNT(*) AS total_orders,
SUM(order_total) AS total_revenue,
AVG(order_total) AS avg_order_value,
MAX(order_total) AS largest_order
FROM Orders
WHERE order_date >= '2024-01-01';
Quality Checks
-- Data completeness check
SELECT
COUNT(*) AS total_records,
COUNT(email) AS records_with_email,
COUNT(phone) AS records_with_phone
FROM Customers;
Reporting
-- Product performance by category
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price,
MIN(price) AS min_price,
MAX(price) AS max_price
FROM Products
GROUP BY category;
Data Grouping
Understanding Data Grouping
Data grouping allows you to divide data into logical sets and perform aggregate calculations on each group, rather than on the entire dataset.
Why Group Data?
Instead of calculating aggregates for all data:
-- Count all products from vendor DLL01
SELECT COUNT(*) AS num_prods
FROM Products
WHERE vend_id = 'DLL01';
Output: 4
You might want to:
- Count products offered by each vendor
- Find vendors offering only one product
- Identify vendors with more than 10 products
This is where GROUP BY becomes essential.
Creating Groups with GROUP BY
Basic GROUP BY Syntax
SELECT vend_id, COUNT(*) AS num_prods
FROM Products
GROUP BY vend_id;
Output:
vend_id num_prods
------- ---------
BRS01 3
DLL01 4
FNG01 2
Analysis: The query groups products by vendor ID and counts products in each group automatically.
GROUP BY Rules
1. Multiple Columns Allowed
SELECT vend_id, prod_type, COUNT(*) AS num_prods
FROM Products
GROUP BY vend_id, prod_type;
2. Nested Grouping
- Data summarized at the last specified group level
- All columns evaluated together when grouping
3. Column Requirements
- Every non-aggregate column in SELECT must be in GROUP BY
- Use actual column names or expressions, not aliases
4. Expression Consistency
-- If using expression in SELECT...
SELECT UPPER(vend_id), COUNT(*)
FROM Products
GROUP BY UPPER(vend_id); -- Must use same expression in GROUP BY
5. Data Type Limitations
- Most implementations don't allow variable-length types (TEXT, MEMO)
6. NULL Handling
- NULL values grouped together
- Multiple NULLs form one group
7. Clause Order
SELECT columns
FROM table
WHERE conditions -- Before GROUP BY
GROUP BY columns -- After WHERE
ORDER BY columns -- After GROUP BY
Filtering Groups with HAVING
WHERE vs HAVING
WHERE | HAVING |
---|---|
Filters rows | Filters groups |
Applied before grouping | Applied after grouping |
Works on individual records | Works on aggregate results |
Basic HAVING Example
SELECT cust_id, COUNT(*) AS orders
FROM Orders
GROUP BY cust_id
HAVING COUNT(*) >= 2;
Output:
cust_id orders
---------- -----------
1000000001 2
Analysis: Returns only customers with 2 or more orders.
HAVING Supports All WHERE Operators
All WHERE clause techniques work with HAVING:
- Comparison operators (
>=
,<
,=
, etc.) - Wildcard patterns (
LIKE
) - Multiple conditions (
AND
,OR
) - IN and BETWEEN operators
Combining WHERE and HAVING
Complex Filtering Example
SELECT vend_id, COUNT(*) AS num_prods
FROM Products
WHERE prod_price >= 4 -- Filter rows first
GROUP BY vend_id
HAVING COUNT(*) >= 2; -- Filter groups second
Output:
vend_id num_prods
------- -----------
BRS01 3
FNG01 2
Analysis:
- WHERE filters products with price ≥ 4
- GROUP BY groups remaining products by vendor
- HAVING keeps only vendors with ≥ 2 products
Without WHERE Clause
SELECT vend_id, COUNT(*) AS num_prods
FROM Products
GROUP BY vend_id
HAVING COUNT(*) >= 2;
Output:
vend_id num_prods
------- -----------
BRS01 3
DLL01 4 -- Included because it has 4 products total
FNG01 2
GROUP BY vs ORDER BY
Key Differences
GROUP BY | ORDER BY |
---|---|
Groups rows | Sorts output |
Output may not be in group order | Always sorts output |
Only selected columns/expressions | Any columns (even non-selected) |
Required with aggregates | Optional |
Best Practice: Always Use ORDER BY
-- Good: Explicit sorting
SELECT order_num, COUNT(*) AS items
FROM OrderItems
GROUP BY order_num
HAVING COUNT(*) >= 3
ORDER BY items, order_num;
Output:
order_num items
--------- -----
20006 3
20009 3
20007 5
20008 5
Important: Never rely on GROUP BY to sort data. Always include ORDER BY for guaranteed sort order.
SELECT Clause Order
Required Sequence
| Clause | Description | Required |-----|----- | SELECT | Columns or expressions | Yes | FROM | Table to retrieve from | Only if selecting from table | WHERE | Row-level filtering | No | GROUP BY | Group specification | Only with aggregates by group | HAVING | Group-level filtering | No | ORDER BY | Output sort order | No
Complete Example
SELECT vend_id, AVG(prod_price) AS avg_price
FROM Products
WHERE prod_price > 0
GROUP BY vend_id
HAVING AVG(prod_price) >= 5
ORDER BY avg_price DESC;
Practical Examples
Example 1: Sales Analysis by Region
SELECT region,
COUNT(*) AS total_orders,
SUM(order_total) AS total_sales,
AVG(order_total) AS avg_order_value
FROM Orders
WHERE order_date >= '2024-01-01'
GROUP BY region
HAVING SUM(order_total) > 10000
ORDER BY total_sales DESC;
Example 2: Product Performance by Category
SELECT category,
COUNT(*) AS product_count,
MIN(price) AS min_price,
MAX(price) AS max_price,
AVG(price) AS avg_price
FROM Products
GROUP BY category
HAVING COUNT(*) >= 5
ORDER BY avg_price DESC;
Example 3: Customer Segmentation
SELECT
CASE
WHEN COUNT(*) >= 10 THEN 'High Value'
WHEN COUNT(*) >= 5 THEN 'Medium Value'
ELSE 'Low Value'
END AS customer_segment,
COUNT(*) AS customer_count
FROM (
SELECT cust_id, COUNT(*) AS order_count
FROM Orders
GROUP BY cust_id
) customer_orders
GROUP BY customer_segment
ORDER BY customer_count DESC;
Challenge Solutions
Challenge 1: Order Lines Count
SELECT order_num, COUNT(*) AS order_lines
FROM OrderItems
GROUP BY order_num
ORDER BY order_lines;
Challenge 2: Cheapest Item by Vendor
SELECT vend_id, MIN(prod_price) AS cheapest_item
FROM Products
GROUP BY vend_id
ORDER BY cheapest_item;
Challenge 3: Large Orders (≥100 items)
SELECT order_num
FROM OrderItems
GROUP BY order_num
HAVING SUM(quantity) >= 100;
Challenge 4: High-Value Orders (≥$1000)
SELECT order_num
FROM OrderItems
GROUP BY order_num
HAVING SUM(item_price * quantity) >= 1000
ORDER BY order_num;
Challenge 5: What's Wrong?
-- INCORRECT:
SELECT order_num, COUNT(*) AS items
FROM OrderItems
GROUP BY items -- ❌ Can't group by alias
HAVING COUNT(*) >= 3
ORDER BY items, order_num;
-- CORRECT:
SELECT order_num, COUNT(*) AS items
FROM OrderItems
GROUP BY order_num -- ✅ Group by actual column
HAVING COUNT(*) >= 3
ORDER BY items, order_num;
Problem: You cannot use aliases in GROUP BY clause.
Advanced Grouping Techniques
Multiple Grouping Levels
SELECT vend_id,
EXTRACT(YEAR FROM order_date) AS order_year,
COUNT(*) AS orders,
SUM(total_amount) AS yearly_sales
FROM Orders
GROUP BY vend_id, EXTRACT(YEAR FROM order_date)
ORDER BY vend_id, order_year;
Conditional Aggregation
SELECT vend_id,
COUNT(*) AS total_products,
COUNT(CASE WHEN prod_price > 10 THEN 1 END) AS expensive_products,
COUNT(CASE WHEN prod_price <= 10 THEN 1 END) AS affordable_products
FROM Products
GROUP BY vend_id;
Subqueries
Understanding Subqueries
Subqueries are SQL queries embedded inside other queries. They enable you to break down complex data retrieval into logical steps and create more flexible, powerful SQL statements.
Why Use Subqueries?
Consider this scenario: Find all customers who ordered item 'RGAN01'.
Traditional approach (3 separate queries):
- Find orders containing 'RGAN01'
- Find customer IDs for those orders
- Get customer details for those IDs
Subquery approach: Combine all steps into one statement.
Filtering with Subqueries
Step-by-Step Example
Step 1: Find Orders with Specific Product
SELECT order_num
FROM OrderItems
WHERE prod_id = 'RGAN01';
Output:
order_num
-----------
20007
20008
Step 2: Find Customers for Those Orders
SELECT cust_id
FROM Orders
WHERE order_num IN (20007, 20008);
Output:
cust_id
----------
1000000004
1000000005
Step 3: Combine with Subquery
SELECT cust_id
FROM Orders
WHERE order_num IN (SELECT order_num
FROM OrderItems
WHERE prod_id = 'RGAN01');
Output:
cust_id
----------
1000000004
1000000005
Complete Nested Subquery
SELECT cust_name, cust_contact
FROM Customers
WHERE cust_id IN (SELECT cust_id
FROM Orders
WHERE order_num IN (SELECT order_num
FROM OrderItems
WHERE prod_id = 'RGAN01'));
Output:
cust_name cust_contact
----------------------------- --------------------
Fun4All Denise L. Stephens
The Toy Store Kim Howard
How Subqueries Execute
Processing Order: Always innermost to outermost
-
Innermost:
SELECT order_num FROM OrderItems WHERE prod_id = 'RGAN01'
-
Returns:
20007, 20008
-
Middle:
SELECT cust_id FROM Orders WHERE order_num IN (20007, 20008)
-
Returns:
1000000004, 1000000005
-
Outermost:
SELECT cust_name, cust_contact FROM Customers WHERE cust_id IN (1000000004, 1000000005)
-
Returns: Customer details
** Subqueries as Calculated Fields**
Counting Related Records
Goal: Show each customer with their total number of orders.
SELECT cust_name,
cust_state,
(SELECT COUNT(*)
FROM Orders
WHERE Orders.cust_id = Customers.cust_id) AS orders
FROM Customers
ORDER BY cust_name;
Output:
cust_name cust_state orders
------------------------- ---------- ------
Fun4All IN 1
Fun4All AZ 1
Kids Place OH 0
The Toy Store IL 1
Village Toys MI 2
Fully Qualified Column Names
Critical: Use table.column
syntax when column names exist in multiple tables.
** ✅ Correct (Fully Qualified)**
WHERE Orders.cust_id = Customers.cust_id
❌ Incorrect (Ambiguous)
WHERE cust_id = cust_id -- Compares column to itself!
Wrong Result:
cust_name cust_state orders
------------------------- ---------- ------
Fun4All IN 5
Fun4All AZ 5
Kids Place OH 5
The Toy Store IL 5
Village Toys MI 5
Subquery Rules and Best Practices
Essential Rules
- Single Column Only: Subqueries in WHERE clauses can only return one column
- Proper Nesting: No limit on nesting depth, but performance degrades
- Fully Qualify: Always use
table.column
when ambiguity exists - Parentheses Required: Subqueries must be enclosed in parentheses
Formatting Best Practices
-- Good: Well-formatted subquery
SELECT cust_name, cust_contact
FROM Customers
WHERE cust_id IN (
SELECT cust_id
FROM Orders
WHERE order_num IN (
SELECT order_num
FROM OrderItems
WHERE prod_id = 'RGAN01'
)
);
Performance Considerations
- Subqueries: Can be less efficient for complex operations
- JOINs: Often more efficient alternative (covered in next lesson)
- Testing: Always test performance with real data volumes
Practical Examples
Example 1: High-Value Product Customers
-- Find customers who bought expensive items
SELECT DISTINCT cust_name
FROM Customers
WHERE cust_id IN (
SELECT cust_id
FROM Orders
WHERE order_num IN (
SELECT order_num
FROM OrderItems
WHERE item_price >= 10
)
);
Example 2: Product Sales Summary
-- Show each product with total quantity sold
SELECT prod_name,
(SELECT SUM(quantity)
FROM OrderItems
WHERE OrderItems.prod_id = Products.prod_id) AS total_sold
FROM Products
ORDER BY total_sold DESC;
Example 3: Customer Spending Analysis
-- Calculate total spending per customer
SELECT cust_name,
(SELECT SUM(item_price * quantity)
FROM OrderItems oi
JOIN Orders o ON oi.order_num = o.order_num
WHERE o.cust_id = c.cust_id) AS total_spent
FROM Customers c
ORDER BY total_spent DESC;
Challenge Solutions
Challenge 1: Customers with High-Value Items
SELECT DISTINCT cust_name
FROM Customers
WHERE cust_id IN (
SELECT cust_id
FROM Orders
WHERE order_num IN (
SELECT order_num
FROM OrderItems
WHERE item_price >= 10
)
);
Challenge 2: BR01 Order Dates
SELECT cust_id, order_date
FROM Orders
WHERE order_num IN (
SELECT order_num
FROM OrderItems
WHERE prod_id = 'BR01'
)
ORDER BY order_date;
Challenge 3: BR01 Customer Emails
SELECT cust_email
FROM Customers
WHERE cust_id IN (
SELECT cust_id
FROM Orders
WHERE order_num IN (
SELECT order_num
FROM OrderItems
WHERE prod_id = 'BR01'
)
);
Challenge 4: Customer Order Totals
SELECT cust_id,
(SELECT SUM(item_price * quantity)
FROM OrderItems oi
JOIN Orders o ON oi.order_num = o.order_num
WHERE o.cust_id = Orders.cust_id) AS total_ordered
FROM Orders
GROUP BY cust_id
ORDER BY total_ordered DESC;
Challenge 5: Product Sales Quantities
SELECT prod_name,
(SELECT COALESCE(SUM(quantity), 0)
FROM OrderItems
WHERE OrderItems.prod_id = Products.prod_id) AS quant_sold
FROM Products
ORDER BY quant_sold DESC;
Advanced Subquery Techniques
Correlated Subqueries
-- Find customers with above-average order values
SELECT cust_name
FROM Customers c
WHERE (
SELECT AVG(item_price * quantity)
FROM OrderItems oi
JOIN Orders o ON oi.order_num = o.order_num
WHERE o.cust_id = c.cust_id
) > (
SELECT AVG(item_price * quantity)
FROM OrderItems
);
EXISTS vs IN
-- Using EXISTS (often more efficient)
SELECT cust_name
FROM Customers c
WHERE EXISTS (
SELECT 1
FROM Orders o
WHERE o.cust_id = c.cust_id
AND o.order_date >= '2024-01-01'
);
-- Using IN (traditional approach)
SELECT cust_name
FROM Customers
WHERE cust_id IN (
SELECT cust_id
FROM Orders
WHERE order_date >= '2024-01-01'
);
Subqueries in SELECT, FROM, and HAVING
-- Subquery in SELECT (calculated field)
SELECT prod_name,
(SELECT COUNT(*) FROM OrderItems WHERE prod_id = p.prod_id) AS times_ordered
FROM Products p;
-- Subquery in FROM (derived table)
SELECT avg_price_category, COUNT(*) as product_count
FROM (
SELECT
CASE
WHEN prod_price < 5 THEN 'Low'
WHEN prod_price < 15 THEN 'Medium'
ELSE 'High'
END AS avg_price_category
FROM Products
) price_categories
GROUP BY avg_price_category;
-- Subquery in HAVING
SELECT vend_id, COUNT(*) as product_count
FROM Products
GROUP BY vend_id
HAVING COUNT(*) > (
SELECT AVG(product_count)
FROM (
SELECT COUNT(*) as product_count
FROM Products
GROUP BY vend_id
) vendor_counts
);
Table Joins
Understanding Relational Tables
Why Split Data Across Tables?
Consider a product database. Instead of storing all information in one table:
❌ Poor Design (Single Table)
Products Table:
prod_id | prod_name | prod_price | vend_name | vend_address | vend_contact
--------|-----------|------------|-----------|--------------|-------------
P001 | Toy Bear | 9.99 | Bears R Us| 123 Main St | John Smith
P002 | Toy Car | 5.99 | Bears R Us| 123 Main St | John Smith
Problems:
- Wasted space: Vendor info repeated for each product
- Update anomalies: Change vendor address in multiple places
- Data inconsistency: Risk of typos in repeated data
✅ Good Design (Related Tables)
Vendors Table:
vend_id | vend_name | vend_address | vend_contact
--------|-----------|--------------|-------------
V001 | Bears R Us| 123 Main St | John Smith
Products Table:
prod_id | prod_name | prod_price | vend_id
--------|-----------|------------|--------
P001 | Toy Bear | 9.99 | V001
P002 | Toy Car | 5.99 | V001
Benefits:
- No data duplication
- Single point of update
- Data consistency guaranteed
- Better scalability
Why Use Joins?
Challenge: Data is in multiple tables, but you need it in one result set.
Solution: Joins associate tables within a SELECT statement to return combined data.
Key Point: Joins are created by the DBMS on-the-fly during query execution - they don't exist as physical entities.
Creating Basic Joins
Simple Join Syntax
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
Output:
vend_name prod_name prod_price
-------------------- -------------------- ----------
Doll House Inc. Fish bean bag toy 3.4900
Doll House Inc. Bird bean bag toy 3.4900
Bears R Us 8 inch teddy bear 5.9900
Bears R Us 12 inch teddy bear 8.9900
Fun and Games King doll 9.4900
Key Components
- FROM clause: Lists all tables to join
- WHERE clause: Defines the relationship between tables
- Fully qualified names:
table.column
format required for ambiguous columns
The Critical WHERE Clause
Without WHERE (Cartesian Product)
-- ❌ DANGEROUS: Missing WHERE clause
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products;
Result: Every vendor paired with every product (54 rows instead of 9!)
vend_name prod_name prod_price
---------------- ---------------------------- ----------
Bears R Us 8 inch teddy bear 5.99
Bears R Us Fish bean bag toy 3.49 -- Wrong!
Bears R Us King doll 9.49 -- Wrong!
Bear Emporium 8 inch teddy bear 5.99 -- Wrong!
Bear Emporium Fish bean bag toy 3.49 -- Wrong!
...
With Proper WHERE Clause
-- ✅ CORRECT: Proper join condition
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
Critical Rule: Always include WHERE clauses in joins to avoid Cartesian products.
Inner Joins (ANSI Syntax)
Modern INNER JOIN Syntax
SELECT vend_name, prod_name, prod_price
FROM Vendors
INNER JOIN Products ON Vendors.vend_id = Products.vend_id;
Advantages of ANSI Syntax:
- Clearer intent: Join logic separated from filtering
- Less error-prone: Harder to forget join conditions
- Industry standard: Preferred by SQL purists
Syntax Comparison
Traditional | ANSI INNER JOIN |
---|---|
FROM table1, table2 WHERE table1.id = table2.id | FROM table1 INNER JOIN table2 ON table1.id = table2.id |
Join condition in WHERE | Join condition in ON |
Mixing joins and filters | Clear separation |
Joining Multiple Tables
Three-Table Join Example
SELECT prod_name, vend_name, prod_price, quantity
FROM OrderItems, Products, Vendors
WHERE Products.vend_id = Vendors.vend_id
AND OrderItems.prod_id = Products.prod_id
AND order_num = 20007;
Output:
prod_name vend_name prod_price quantity
--------------- ------------- ---------- --------
18 inch teddy bear Bears R Us 11.9900 50
Fish bean bag toy Doll House Inc. 3.4900 100
Bird bean bag toy Doll House Inc. 3.4900 100
ANSI Syntax for Multiple Tables
SELECT prod_name, vend_name, prod_price, quantity
FROM OrderItems
INNER JOIN Products ON OrderItems.prod_id = Products.prod_id
INNER JOIN Vendors ON Products.vend_id = Vendors.vend_id
WHERE order_num = 20007;
Joins vs Subqueries
Subquery Approach (from Lesson 11)
-- Find customers who ordered product RGAN01
SELECT cust_name, cust_contact
FROM Customers
WHERE cust_id IN (
SELECT cust_id
FROM Orders
WHERE order_num IN (
SELECT order_num
FROM OrderItems
WHERE prod_id = 'RGAN01'
)
);
Join Approach (More Efficient)
-- Same result using joins
SELECT cust_name, cust_contact
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num
AND prod_id = 'RGAN01';
Output:
cust_name cust_contact
----------------------------- --------------------
Fun4All Denise L. Stephens
The Toy Store Kim Howard
Performance Comparison
Approach | Pros | Cons |
---|---|---|
Subqueries | Easier to understand | Often slower, more resource-intensive |
Joins | Usually faster, more efficient | Can be more complex to write |
Best Practices
1. Always Use Fully Qualified Column Names
-- ✅ Good: Clear and unambiguous
WHERE Customers.cust_id = Orders.cust_id
-- ❌ Bad: Ambiguous, may cause errors
WHERE cust_id = cust_id
2. Performance Considerations
- Join only necessary tables
- Use appropriate indexes on join columns
- Test both joins and subqueries for your specific use case
- Monitor performance with large datasets
3. Formatting for Readability
-- ✅ Well-formatted join
SELECT c.cust_name,
o.order_num,
oi.prod_id
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
INNER JOIN OrderItems oi ON o.order_num = oi.order_num
WHERE oi.prod_id = 'RGAN01';
Challenge Solutions
Challenge 1: Customer Orders (Both Syntaxes)
Traditional Syntax:
SELECT cust_name, order_num
FROM Customers, Orders
WHERE Customers.cust_id = Orders.cust_id
ORDER BY cust_name, order_num;
ANSI Syntax:
SELECT cust_name, order_num
FROM Customers
INNER JOIN Orders ON Customers.cust_id = Orders.cust_id
ORDER BY cust_name, order_num;
Challenge 2: Customer Orders with Totals
SELECT c.cust_name,
o.order_num,
SUM(oi.item_price * oi.quantity) AS OrderTotal
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
INNER JOIN OrderItems oi ON o.order_num = oi.order_num
GROUP BY c.cust_name, o.order_num
ORDER BY c.cust_name, o.order_num;
Challenge 3: BR01 Order Dates (Join Version)
SELECT o.order_date
FROM Orders o, OrderItems oi
WHERE o.order_num = oi.order_num
AND oi.prod_id = 'BR01'
ORDER BY o.order_date;
Challenge 4: BR01 Customer Details (ANSI Syntax)
SELECT c.cust_email
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
INNER JOIN OrderItems oi ON o.order_num = oi.order_num
WHERE oi.prod_id = 'BR01';
Challenge 5: High-Value Customers
SELECT c.cust_name,
SUM(oi.item_price * oi.quantity) AS total_price
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
INNER JOIN OrderItems oi ON o.order_num = oi.order_num
GROUP BY c.cust_name
HAVING SUM(oi.item_price * oi.quantity) >= 1000
ORDER BY c.cust_name;
Advanced Join Concepts
Table Aliases for Cleaner Code
SELECT c.cust_name, o.order_num, p.prod_name
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
INNER JOIN OrderItems oi ON o.order_num = oi.order_num
INNER JOIN Products p ON oi.prod_id = p.prod_id;
Self-Joins (Preview)
-- Find employees and their managers
SELECT e.emp_name AS employee, m.emp_name AS manager
FROM Employees e
INNER JOIN Employees m ON e.manager_id = m.emp_id;
Advanced Joins
Table Aliases
Why Use Table Aliases?
Two primary reasons:
- Shorten SQL syntax for better readability
- Enable multiple uses of the same table in one query
Basic Table Alias Syntax
-- Without aliases (verbose)
SELECT Customers.cust_name, Orders.order_num
FROM Customers, Orders
WHERE Customers.cust_id = Orders.cust_id;
-- With aliases (concise)
SELECT c.cust_name, o.order_num
FROM Customers AS c, Orders AS o
WHERE c.cust_id = o.cust_id;
Complete Example with Multiple Tables
SELECT c.cust_name, c.cust_contact
FROM Customers AS c, Orders AS o, OrderItems AS oi
WHERE c.cust_id = o.cust_id
AND oi.order_num = o.order_num
AND prod_id = 'RGAN01';
Important Notes
- Oracle: Don't use
AS
keyword (Customers c
instead ofCustomers AS c
) - Scope: Table aliases only exist during query execution
- Usage: Can be used in SELECT, WHERE, ORDER BY, and other clauses
Self Joins
Problem Scenario
Find all customers who work for the same company as Jim Jones.
Subquery Approach
SELECT cust_id, cust_name, cust_contact
FROM Customers
WHERE cust_name = (SELECT cust_name
FROM Customers
WHERE cust_contact = 'Jim Jones');
Self Join Approach (More Efficient)
SELECT c1.cust_id, c1.cust_name, c1.cust_contact
FROM Customers AS c1, Customers AS c2
WHERE c1.cust_name = c2.cust_name
AND c2.cust_contact = 'Jim Jones';
Output:
cust_id cust_name cust_contact
------- ----------- --------------
1000000003 Fun4All Jim Jones
1000000004 Fun4All Denise L. Stephens
How Self Joins Work
- Same table used twice with different aliases (
c1
,c2
) - c2 finds Jim Jones's company
- c1 finds all customers from that company
- JOIN condition matches company names
Performance Advantage
Tip: Self joins are often processed faster than subqueries by most DBMSs.
Natural Joins
Purpose
Eliminate duplicate columns that appear in multiple joined tables.
Standard Join (with duplicates)
-- This would return cust_id from both tables
SELECT *
FROM Customers c, Orders o
WHERE c.cust_id = o.cust_id;
Natural Join (no duplicates)
SELECT c.*, o.order_num, o.order_date,
oi.prod_id, oi.quantity, oi.item_price
FROM Customers AS c, Orders AS o, OrderItems AS oi
WHERE c.cust_id = o.cust_id
AND oi.order_num = o.order_num
AND prod_id = 'RGAN01';
Key Points:
- Use
*
for one table only - Explicitly list columns from other tables
- Prevents duplicate columns in results
Note: Most inner joins you create are actually natural joins.
Outer Joins
Purpose
Include rows that have no related rows in the other table.
Use Cases
- Count orders per customer (including customers with no orders)
- List products with quantities (including unordered products)
- Calculate averages including customers who haven't purchased
Inner Join vs Outer Join
Inner Join (Excludes Unmatched Rows)
SELECT c.cust_id, o.order_num
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
Output:
cust_id order_num
---------- ---------
1000000001 20005
1000000001 20009
1000000003 20006
1000000004 20007
1000000005 20008
Left Outer Join (Includes All Left Table Rows)
SELECT c.cust_id, o.order_num
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id;
Output:
cust_id order_num
---------- ---------
1000000001 20005
1000000001 20009
1000000002 NULL -- Customer with no orders
1000000003 20006
1000000004 20007
1000000005 20008
Types of Outer Joins
1. LEFT OUTER JOIN
-- Include ALL rows from left table (Customers)
SELECT c.cust_id, o.order_num
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id;
2. RIGHT OUTER JOIN
-- Include ALL rows from right table (Orders)
SELECT c.cust_id, o.order_num
FROM Customers c
RIGHT OUTER JOIN Orders o ON c.cust_id = o.cust_id;
3. FULL OUTER JOIN
-- Include ALL rows from BOTH tables
SELECT c.cust_id, o.order_num
FROM Customers c
FULL OUTER JOIN Orders o ON c.cust_id = o.cust_id;
DBMS Support
Join Type | MySQL/MariaDB | PostgreSQL | SQL Server | Oracle | SQLite |
---|---|---|---|---|---|
LEFT OUTER | ✅ | ✅ | ✅ | ✅ | ✅ |
RIGHT OUTER | ✅ | ✅ | ✅ | ✅ | ❌ |
FULL OUTER | ❌ | ✅ | ✅ | ✅ | ❌ |
Converting Between Left and Right Joins
-- These are equivalent:
-- LEFT OUTER JOIN
SELECT c.cust_id, o.order_num
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id;
-- RIGHT OUTER JOIN (tables reversed)
SELECT c.cust_id, o.order_num
FROM Orders o
RIGHT OUTER JOIN Customers c ON c.cust_id = o.cust_id;
Joins with Aggregate Functions
Count Orders per Customer (Inner Join)
SELECT c.cust_id,
COUNT(o.order_num) AS num_ord
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
GROUP BY c.cust_id;
Output:
cust_id num_ord
---------- --------
1000000001 2
1000000003 1
1000000004 1
1000000005 1
Count Orders per Customer (Including Zero Orders)
SELECT c.cust_id,
COUNT(o.order_num) AS num_ord
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id
GROUP BY c.cust_id;
Output:
cust_id num_ord
---------- -------
1000000001 2
1000000002 0 -- Customer with no orders included
1000000003 1
1000000004 1
1000000005 1
Advanced Aggregate Examples
Total Sales per Customer
SELECT c.cust_name,
COALESCE(SUM(oi.item_price * oi.quantity), 0) AS total_sales
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id
LEFT OUTER JOIN OrderItems oi ON o.order_num = oi.order_num
GROUP BY c.cust_id, c.cust_name
ORDER BY total_sales DESC;
Product Performance Analysis
SELECT p.prod_name,
COUNT(oi.order_num) AS times_ordered,
COALESCE(SUM(oi.quantity), 0) AS total_quantity
FROM Products p
LEFT OUTER JOIN OrderItems oi ON p.prod_id = oi.prod_id
GROUP BY p.prod_id, p.prod_name
ORDER BY times_ordered DESC;
Challenge Solutions
Challenge 1: Customer Orders (Inner Join)
SELECT c.cust_name, o.order_num
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
ORDER BY c.cust_name;
Challenge 2: All Customers (Including No Orders)
SELECT c.cust_name, o.order_num
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id
ORDER BY c.cust_name;
Challenge 3: Products and Order Numbers
SELECT p.prod_name, oi.order_num
FROM Products p
LEFT OUTER JOIN OrderItems oi ON p.prod_id = oi.prod_id
ORDER BY p.prod_name;
Challenge 4: Product Order Counts
SELECT p.prod_name,
COUNT(oi.order_num) AS order_count
FROM Products p
LEFT OUTER JOIN OrderItems oi ON p.prod_id = oi.prod_id
GROUP BY p.prod_id, p.prod_name
ORDER BY order_count DESC;
Challenge 5: Vendor Product Counts
SELECT v.vend_id,
COUNT(p.prod_id) AS product_count
FROM Vendors v
LEFT OUTER JOIN Products p ON v.vend_id = p.vend_id
GROUP BY v.vend_id
ORDER BY product_count DESC;
Best Practices and Key Points
1. Choose the Right Join Type
- INNER JOIN: When you only want matching rows
- LEFT/RIGHT OUTER JOIN: When you need all rows from one table
- FULL OUTER JOIN: When you need all rows from both tables
2. Always Provide Join Conditions
-- ❌ Wrong: Creates Cartesian product
SELECT * FROM Customers, Orders;
-- ✅ Correct: Proper join condition
SELECT * FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
3. Test Joins Separately
-- Test each join individually before combining
-- 1. First join
SELECT * FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
-- 2. Add second join
SELECT * FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id
INNER JOIN OrderItems oi ON o.order_num = oi.order_num;
4. Use Meaningful Aliases
-- ✅ Good: Clear, meaningful aliases
SELECT cust.name, ord.date, item.price
FROM Customers cust
INNER JOIN Orders ord ON cust.id = ord.customer_id
INNER JOIN OrderItems item ON ord.num = item.order_num;
5. Handle NULLs in Outer Joins
-- Use COALESCE for better NULL handling
SELECT c.cust_name,
COALESCE(COUNT(o.order_num), 0) AS order_count
FROM Customers c
LEFT OUTER JOIN Orders o ON c.cust_id = o.cust_id
GROUP BY c.cust_id, c.cust_name;
Combined Queries (UNION)
Understanding Combined Queries
Combined queries (also called unions or compound queries) allow you to execute multiple SELECT statements and return results as a single result set.
When to Use Combined Queries
Two primary scenarios:
-
Different tables with similar structure
-
Combine customer data from multiple regional databases
-
Merge current and archived order data
-
Multiple queries against same table
-
Complex filtering that's easier with separate queries
-
Alternative to complex WHERE clauses
UNION vs Multiple WHERE Clauses
Most combined queries on the same table can be written as single queries with multiple WHERE conditions, but UNION offers more flexibility and sometimes better readability.
Creating Combined Queries with UNION
Basic UNION Syntax
SELECT column1, column2, column3
FROM table1
WHERE condition1
UNION
SELECT column1, column2, column3
FROM table2
WHERE condition2;
Practical Example: Customer Report
Requirement: Get all customers in IL, IN, MI, plus all Fun4All locations regardless of state.
Individual Queries
Query 1: Customers in specific states
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI');
Output:
cust_name cust_contact cust_email
----------- ------------- ------------
Village Toys John Smith sales@villagetoys.com
Fun4All Jim Jones jjones@fun4all.com
The Toy Store Kim Howard NULL
Query 2: All Fun4All locations
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All';
Output:
cust_name cust_contact cust_email
------------ ----------- -------------
Fun4All Jim Jones jjones@fun4all.com
Fun4All Denise L. Stephens dstephens@fun4all.com
Combined with UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI')
UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All';
Output:
cust_name cust_contact cust_email
----------- ----------- ----------------
Fun4All Denise L. Stephens dstephens@fun4all.com
Fun4All Jim Jones jjones@fun4all.com
Village Toys John Smith sales@villagetoys.com
The Toy Store Kim Howard NULL
Notice: Only 4 rows returned (not 5) because UNION automatically removes duplicates.
Equivalent WHERE Clause Approach
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI')
OR cust_name = 'Fun4All';
UNION Rules and Requirements
Essential Rules
- Two or more SELECT statements separated by UNION
- Same number of columns in each SELECT
- Compatible data types (must be convertible)
- Column order matters (first query determines column names)
Column Compatibility
-- ✅ Valid: Same columns, compatible types
SELECT prod_id, prod_name, prod_price
FROM Products
WHERE prod_price < 5
UNION
SELECT prod_id, prod_name, prod_price
FROM Products
WHERE vend_id = 'DLL01';
-- ❌ Invalid: Different number of columns
SELECT prod_id, prod_name
FROM Products
UNION
SELECT prod_id, prod_name, prod_price -- Extra column
FROM Products;
Column Names in Results
-- First query determines column names
SELECT prod_name, prod_price -- These names used in result
FROM Products
WHERE prod_price < 5
UNION
SELECT productname, price -- These names ignored
FROM Products
WHERE vend_id = 'DLL01';
-- Result columns: prod_name, prod_price
Using Aliases for Column Names
SELECT prod_name AS product, prod_price AS cost
FROM Products
WHERE prod_price < 5
UNION
SELECT prod_name, prod_price
FROM Products
WHERE vend_id = 'DLL01';
-- Result columns: product, cost
UNION vs UNION ALL
UNION (Default - Removes Duplicates)
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI')
UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All';
-- Returns 4 rows (duplicate removed)
UNION ALL (Keeps All Rows)
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI')
UNION ALL
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All';
Output:
cust_name cust_contact cust_email
----------- ------------- ------------
Village Toys John Smith sales@villagetoys.com
Fun4All Jim Jones jjones@fun4all.com
The Toy Store Kim Howard NULL
Fun4All Jim Jones jjones@fun4all.com -- Duplicate kept
Fun4All Denise L. Stephens dstephens@fun4all.com
When to Use Each
Use UNION when | Use UNION ALL when |
---|---|
You want unique results | You need all occurrences |
Duplicates should be eliminated | Duplicates are meaningful |
Performance isn't critical | Performance is critical |
Performance Tip: UNION ALL is faster because it doesn't need to check for duplicates.
Sorting Combined Results
Single ORDER BY Rule
Only one ORDER BY clause allowed, and it must come after the final SELECT.
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI')
UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All'
ORDER BY cust_name, cust_contact; -- Sorts entire result set
Output:
cust_name cust_contact cust_email
----------- ------------- -------------
Fun4All Denise L. Stephens dstephens@fun4all.com
Fun4All Jim Jones jjones@fun4all.com
The Toy Store Kim Howard NULL
Village Toys John Smith sales@villagetoys.com
Column References in ORDER BY
-- ✅ Valid: Use column names from first SELECT
ORDER BY cust_name
-- ✅ Valid: Use column positions
ORDER BY 1, 2
-- ❌ Invalid: Use column names not in first SELECT
ORDER BY customer_name -- If first SELECT uses cust_name
Advanced UNION Techniques
Multiple Table UNION
-- Combine data from different tables
SELECT 'Customer' AS type, cust_name AS name, cust_city AS city
FROM Customers
UNION
SELECT 'Vendor' AS type, vend_name AS name, vend_city AS city
FROM Vendors
ORDER BY type, name;
Complex Filtering with UNION
-- High-value orders OR recent orders
SELECT order_num, order_date, 'High Value' AS category
FROM Orders o
WHERE (SELECT SUM(item_price * quantity)
FROM OrderItems oi
WHERE oi.order_num = o.order_num) > 100
UNION
SELECT order_num, order_date, 'Recent' AS category
FROM Orders
WHERE order_date >= DATEADD(day, -30, GETDATE())
ORDER BY order_date DESC;
Data Type Conversion
-- Mixing compatible types
SELECT prod_id, CAST(prod_price AS VARCHAR(10)) AS value
FROM Products
UNION
SELECT CAST(cust_id AS VARCHAR(10)), cust_name
FROM Customers;
Challenge Solutions
Challenge 1: Product Quantities
SELECT prod_id, quantity
FROM OrderItems
WHERE quantity = 100
UNION
SELECT prod_id, quantity
FROM OrderItems
WHERE prod_id LIKE 'BNBG%'
ORDER BY prod_id;
Challenge 2: Single SELECT Alternative
SELECT prod_id, quantity
FROM OrderItems
WHERE quantity = 100
OR prod_id LIKE 'BNBG%'
ORDER BY prod_id;
Challenge 3: Products and Customers
SELECT prod_name AS name
FROM Products
UNION
SELECT cust_name AS name
FROM Customers
ORDER BY name;
Challenge 4: What's Wrong?
-- ❌ INCORRECT: ORDER BY in wrong position
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state = 'MI'
ORDER BY cust_name; -- ❌ ORDER BY before UNION
UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state = 'IL'
ORDER BY cust_name; -- ❌ Multiple ORDER BY clauses
-- ✅ CORRECT:
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state = 'MI'
UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state = 'IL'
ORDER BY cust_name; -- ✅ Single ORDER BY at end
Other UNION Types (Advanced)
EXCEPT/MINUS
-- Rows in first query but NOT in second
SELECT prod_id FROM Products
EXCEPT
SELECT prod_id FROM OrderItems;
-- Products never ordered
INTERSECT
-- Rows that exist in BOTH queries
SELECT prod_id FROM Products
INTERSECT
SELECT prod_id FROM OrderItems;
-- Products that have been ordered
Note: EXCEPT and INTERSECT are less commonly supported and used than UNION.
Performance Considerations
Query Optimizer Impact
- Most DBMSs optimize UNION queries internally
- Performance difference between UNION and complex WHERE is often minimal
- Always test both approaches with your specific data
Best Practices
- Use UNION ALL when duplicates don't matter (faster)
- Index join columns used in WHERE clauses
- Test performance with realistic data volumes
- Consider alternatives like JOINs for complex scenarios
Data Insertion (INSERT)
Understanding Data Insertion
The INSERT statement is used to add new rows to database tables. It's one of the three most frequently used SQL statements alongside SELECT and UPDATE/DELETE.
Three main INSERT methods:
- Inserting a single complete row
- Inserting a single partial row
- Inserting results of a query
Security Note: INSERT operations may require special privileges in client/server DBMSs.
Inserting Complete Rows
Basic INSERT Syntax (Not Recommended)
-- ❌ Unsafe: Depends on column order
INSERT INTO Customers
VALUES(1000000006,
'Toy Land',
'123 Any Street',
'New York',
'NY',
'11111',
'USA',
NULL,
NULL);
Problems with basic syntax:
- Depends on column order in table definition
- Breaks if table structure changes
- No guarantee of column order consistency
Safe INSERT Syntax (Recommended)
-- ✅ Safe: Explicitly specifies columns
INSERT INTO Customers(cust_id,
cust_name,
cust_address,
cust_city,
cust_state,
cust_zip,
cust_country,
cust_contact,
cust_email)
VALUES(1000000006,
'Toy Land',
'123 Any Street',
'New York',
'NY',
'11111',
'USA',
NULL,
NULL);
Advantages of explicit column specification:
- Order independent: Columns can be in any order
- Future-proof: Works even if table structure changes
- Self-documenting: Clear what data goes where
- Error prevention: Reduces mistakes
Flexible Column Ordering
-- Columns can be specified in any order
INSERT INTO Customers(cust_id,
cust_contact, -- Different order
cust_email, -- Different order
cust_name,
cust_address,
cust_city,
cust_state,
cust_zip,
cust_country)
VALUES(1000000007,
NULL, -- Matches cust_contact
NULL, -- Matches cust_email
'Game Store',
'456 Game Ave',
'Chicago',
'IL',
'60601',
'USA');
Primary Key Constraints
-- ❌ This will fail if cust_id already exists
INSERT INTO Customers(cust_id, cust_name, ...)
VALUES(1000000006, 'Duplicate Store', ...);
-- Error: Primary key violation
Important: Primary key values must be unique. Attempting to insert duplicate primary keys will result in an error.
Inserting Partial Rows
You can omit columns when certain conditions are met:
Valid Partial Row Insert
INSERT INTO Customers(cust_id,
cust_name,
cust_address,
cust_city,
cust_state,
cust_zip,
cust_country)
VALUES(1000000008,
'Quick Shop',
'789 Fast Lane',
'Miami',
'FL',
'33101',
'USA');
-- cust_contact and cust_email omitted
When You Can Omit Columns
Condition | Description |
---|---|
NULL allowed | Column definition allows NULL values |
Default value | Column has a default value specified |
When You Cannot Omit Columns
-- ❌ This will fail if cust_name is required
INSERT INTO Customers(cust_id,
cust_address,
cust_city)
VALUES(1000000009,
'321 Main St',
'Boston');
-- Error: cust_name cannot be NULL
Inserting Retrieved Data (INSERT SELECT)
INSERT SELECT allows you to insert data from another table or query result.
Basic INSERT SELECT
INSERT INTO Customers(cust_id,
cust_contact,
cust_email,
cust_name,
cust_address,
cust_city,
cust_state,
cust_zip,
cust_country)
SELECT cust_id,
cust_contact,
cust_email,
cust_name,
cust_address,
cust_city,
cust_state,
cust_zip,
cust_country
FROM CustNew;
Key Points:
- Column positions matter, not names
- Number of columns must match
- Data types must be compatible
- Inserts multiple rows in single operation
INSERT SELECT with Filtering
-- Insert only active customers from another table
INSERT INTO Customers(cust_id, cust_name, cust_city, cust_state)
SELECT cust_id, cust_name, cust_city, cust_state
FROM CustImport
WHERE status = 'ACTIVE'
AND cust_city IS NOT NULL;
INSERT SELECT with Transformations
-- Insert with data transformations
INSERT INTO Customers(cust_id, cust_name, cust_email, cust_country)
SELECT customer_number,
UPPER(customer_name), -- Transform to uppercase
LOWER(email_address), -- Transform to lowercase
'USA' -- Constant value
FROM ImportTable
WHERE import_date >= '2024-01-01';
Copying Tables (CREATE SELECT)
CREATE SELECT creates a new table and copies data in one operation.
Standard Syntax
-- Most DBMSs
CREATE TABLE CustCopy AS
SELECT * FROM Customers;
SQL Server Syntax
-- SQL Server specific
SELECT * INTO CustCopy
FROM Customers;
Selective Column Copying
-- Copy only specific columns
CREATE TABLE CustSummary AS
SELECT cust_id, cust_name, cust_city, cust_state
FROM Customers
WHERE cust_country = 'USA';
Advanced CREATE SELECT Examples
-- Copy with joins and calculations
CREATE TABLE CustomerOrders AS
SELECT c.cust_id,
c.cust_name,
COUNT(o.order_num) AS total_orders,
COALESCE(SUM(oi.item_price * oi.quantity), 0) AS total_spent
FROM Customers c
LEFT JOIN Orders o ON c.cust_id = o.cust_id
LEFT JOIN OrderItems oi ON o.order_num = oi.order_num
GROUP BY c.cust_id, c.cust_name;
CREATE SELECT Features
Feature | Supported |
---|---|
WHERE clauses | ✅ Yes |
GROUP BY | ✅ Yes |
ORDER BY | ✅ Yes |
JOINs | ✅ Yes |
Multiple target tables | ❌ No |
DBMS Support: DB2 does not support CREATE SELECT syntax.
Best Practices
1. Always Specify Columns
-- ✅ Good: Explicit and safe
INSERT INTO Customers(cust_id, cust_name, cust_city)
VALUES(1000000010, 'Best Store', 'Seattle');
-- ❌ Bad: Depends on table structure
INSERT INTO Customers
VALUES(1000000010, 'Best Store', NULL, 'Seattle', ...);
2. Handle NULL Values Properly
-- Explicit NULL handling
INSERT INTO Customers(cust_id, cust_name, cust_contact, cust_email)
VALUES(1000000011, 'New Store', NULL, NULL);
-- Or omit NULL columns entirely
INSERT INTO Customers(cust_id, cust_name)
VALUES(1000000011, 'New Store');
3. Use Transactions for Multiple Inserts
BEGIN TRANSACTION;
INSERT INTO Customers(cust_id, cust_name) VALUES(1000000012, 'Store A');
INSERT INTO Customers(cust_id, cust_name) VALUES(1000000013, 'Store B');
INSERT INTO Customers(cust_id, cust_name) VALUES(1000000014, 'Store C');
COMMIT;
4. Validate Data Before Insertion
-- Check for duplicates before inserting
INSERT INTO Customers(cust_id, cust_name, cust_email)
SELECT 1000000015, 'Unique Store', 'unique@store.com'
WHERE NOT EXISTS (
SELECT 1 FROM Customers
WHERE cust_email = 'unique@store.com'
);
Performance Considerations
Single Row vs Bulk Operations
| Method | Performance | Use Case |-----|----- | Single INSERT | Slower | One-off additions | INSERT SELECT | Faster | Bulk data migration | Multiple INSERTs | Medium | Batch processing
Optimizing INSERT Performance
-- Disable indexes temporarily for large bulk inserts
-- (Database-specific syntax)
-- Use INSERT SELECT for bulk operations
INSERT INTO TargetTable(col1, col2, col3)
SELECT col1, col2, col3
FROM SourceTable
WHERE condition;
-- Batch multiple single inserts in transactions
Challenge Solutions
Challenge 1: Add Yourself to Customers
INSERT INTO Customers(cust_id,
cust_name,
cust_address,
cust_city,
cust_state,
cust_zip,
cust_country,
cust_contact,
cust_email)
VALUES(1000000020,
'John Doe',
'123 Main Street',
'Anytown',
'CA',
'90210',
'USA',
'John Doe',
'john.doe@email.com');
Challenge 2: Backup Tables
-- Backup Orders table
CREATE TABLE OrdersBackup AS
SELECT * FROM Orders;
-- Backup OrderItems table
CREATE TABLE OrderItemsBackup AS
SELECT * FROM OrderItems;
-- Or using SQL Server syntax:
SELECT * INTO OrdersBackup FROM Orders;
SELECT * INTO OrderItemsBackup FROM OrderItems;
Common Errors and Solutions
1. Primary Key Violations
-- Error: Duplicate primary key
-- Solution: Check existing values first
SELECT MAX(cust_id) FROM Customers; -- Find next available ID
2. NULL Constraint Violations
-- Error: Column doesn't allow NULL
-- Solution: Provide value or check table definition
DESCRIBE Customers; -- Check which columns allow NULL
3. Data Type Mismatches
-- Error: Invalid data type
-- Solution: Cast or convert data types
INSERT INTO Customers(cust_id, cust_name)
VALUES(CAST('1000000021' AS INTEGER), 'Type Safe Store');
Data Updating and Deleting (UPDATE & DELETE)
Understanding Data Modification
The UPDATE and DELETE statements are essential for maintaining and modifying database content. Both are powerful but potentially dangerous operations that require careful handling.
Critical Warning: Both UPDATE and DELETE can affect every row in a table if used without proper WHERE clauses.
Updating Data (UPDATE Statement)
UPDATE modifies existing data in table rows and can be used in two ways:
- Update specific rows (with WHERE clause)
- Update all rows (without WHERE clause - dangerous!)
Basic UPDATE Syntax
UPDATE table_name
SET column1 = value1,
column2 = value2
WHERE condition;
Three essential components:
- Table to update
- Column names and new values
- Filter condition (WHERE clause)
Single Column Update
-- Update email for specific customer
UPDATE Customers
SET cust_email = 'kim@thetoystore.com'
WHERE cust_id = 1000000005;
How it works:
- UPDATE Customers: Specifies target table
- SET cust_email = 'value': Assigns new value to column
- WHERE cust_id = 1000000005: Filters to specific row
Multiple Column Update
-- Update contact and email for specific customer
UPDATE Customers
SET cust_contact = 'Sam Roberts',
cust_email = 'sam@toyland.com'
WHERE cust_id = 1000000006;
Key points:
- Single SET command for multiple columns
- Comma-separated column = value pairs
- No comma after last column
Setting Values to NULL
-- Remove email address (set to NULL)
UPDATE Customers
SET cust_email = NULL
WHERE cust_id = 1000000005;
NULL vs Empty String:
- NULL: No value at all
- Empty string (''): Actual value that happens to be empty
Advanced UPDATE Techniques
Using Subqueries in UPDATE
-- Update customer's state based on zip code lookup
UPDATE Customers
SET cust_state = (
SELECT state_code
FROM ZipCodes
WHERE ZipCodes.zip = Customers.cust_zip
)
WHERE cust_country = 'USA';
Conditional Updates
-- Update prices with different logic
UPDATE Products
SET prod_price = CASE
WHEN prod_price < 10 THEN prod_price * 1.1 -- 10% increase
WHEN prod_price < 50 THEN prod_price * 1.05 -- 5% increase
ELSE prod_price * 1.02 -- 2% increase
END
WHERE vend_id = 'DLL01';
UPDATE with Calculations
-- Apply discount based on order quantity
UPDATE OrderItems
SET item_price = item_price * 0.9 -- 10% discount
WHERE quantity >= 10;
UPDATE with JOINs (Some DBMSs)
-- Update customer info from another table (SQL Server syntax)
UPDATE Customers
SET cust_email = ci.email_address
FROM Customers c
INNER JOIN CustomerImport ci ON c.cust_id = ci.customer_id
WHERE ci.email_address IS NOT NULL;
Deleting Data (DELETE Statement)
DELETE removes entire rows from tables and can be used in two ways:
- Delete specific rows (with WHERE clause)
- Delete all rows (without WHERE clause - very dangerous!)
Basic DELETE Syntax
DELETE FROM table_name
WHERE condition;
Single Row Deletion
-- Delete specific customer
DELETE FROM Customers
WHERE cust_id = 1000000006;
Key characteristics:
- DELETE FROM: Specifies table
- No column names: Deletes entire rows
- WHERE clause: Critical for targeting specific rows
Multiple Row Deletion
-- Delete all customers from specific state
DELETE FROM Customers
WHERE cust_state = 'XX';
-- Delete old orders
DELETE FROM Orders
WHERE order_date < '2020-01-01';
DELETE vs TRUNCATE
Operation | DELETE | TRUNCATE |
---|---|---|
Speed | Slower | Much faster |
WHERE clause | Supported | Not supported |
Logging | Full logging | Minimal logging |
Rollback | Can rollback | Limited rollback |
Triggers | Fires triggers | May not fire triggers |
-- Delete all rows (slow, logged)
DELETE FROM TempTable;
-- Delete all rows (fast, minimal logging)
TRUNCATE TABLE TempTable;
Safety Guidelines and Best Practices
1. Always Use WHERE Clauses
-- ✅ Safe: Targets specific rows
UPDATE Customers
SET cust_email = 'new@email.com'
WHERE cust_id = 1000000005;
-- ❌ Dangerous: Updates ALL rows
UPDATE Customers
SET cust_email = 'new@email.com';
2. Test with SELECT First
-- Step 1: Test your WHERE clause
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_state = 'CA';
-- Step 2: If results look correct, then update
UPDATE Customers
SET cust_email = LOWER(cust_email)
WHERE cust_state = 'CA';
3. Use Primary Keys When Possible
-- ✅ Best: Use primary key for precision
DELETE FROM Customers
WHERE cust_id = 1000000006;
-- ✅ Good: Use unique constraints
UPDATE Customers
SET cust_contact = 'New Contact'
WHERE cust_email = 'unique@email.com';
4. Backup Before Major Operations
-- Create backup before bulk operations
CREATE TABLE CustomersBackup AS
SELECT * FROM Customers;
-- Perform bulk update
UPDATE Customers
SET cust_state = UPPER(cust_state)
WHERE cust_country = 'USA';
5. Use Transactions for Safety
-- Start transaction
BEGIN TRANSACTION;
-- Perform updates
UPDATE Customers
SET cust_email = LOWER(cust_email)
WHERE cust_country = 'USA';
-- Check results
SELECT COUNT(*) FROM Customers WHERE cust_email LIKE '%@%';
-- If satisfied, commit; otherwise rollback
COMMIT;
-- OR
-- ROLLBACK;
Foreign Key Protection
Referential Integrity Benefits:
-- This will fail if customer has orders
DELETE FROM Customers
WHERE cust_id = 1000000001;
-- Error: Cannot delete referenced row
-- This protects against orphaned records
DELETE FROM Vendors
WHERE vend_id = 'DLL01';
-- Error: Products reference this vendor
Foreign keys prevent:
- Orphaned records in child tables
- Accidental deletion of referenced data
- Data integrity violations
Common Patterns and Examples
Bulk Data Cleanup
-- Remove inactive customers (no orders in 2 years)
DELETE FROM Customers
WHERE cust_id NOT IN (
SELECT DISTINCT cust_id
FROM Orders
WHERE order_date >= DATEADD(year, -2, GETDATE())
);
Data Standardization
-- Standardize phone number format
UPDATE Customers
SET cust_phone = REPLACE(REPLACE(REPLACE(cust_phone, '(', ''), ')', ''), '-', '')
WHERE cust_phone IS NOT NULL;
-- Standardize state codes to uppercase
UPDATE Customers
SET cust_state = UPPER(cust_state)
WHERE cust_country = 'USA';
Conditional Deletion
-- Delete duplicate records (keep newest)
DELETE c1 FROM Customers c1
INNER JOIN Customers c2
WHERE c1.cust_email = c2.cust_email
AND c1.cust_id < c2.cust_id;
Error Handling and Recovery
Common Errors:
- Constraint Violations
-- Error: Foreign key constraint
DELETE FROM Vendors WHERE vend_id = 'DLL01';
-- Solution: Delete dependent records first
- Missing WHERE Clause
-- Accidentally updates all rows
UPDATE Products SET prod_price = 9.99;
-- Solution: Use transactions and test first
- Data Type Mismatches
-- Error: Invalid data type
UPDATE Customers SET cust_id = 'ABC123';
-- Solution: Ensure compatible data types
Recovery Strategies:
-- 1. Restore from backup
RESTORE TABLE Customers FROM BACKUP;
-- 2. Rollback transaction
ROLLBACK;
-- 3. Reverse operation (if possible)
UPDATE Customers
SET cust_email = old_email_backup
WHERE cust_id IN (affected_ids);
Challenge Solutions
Challenge 1: Uppercase State Abbreviations
-- Update vendor states to uppercase
UPDATE Vendors
SET vend_state = UPPER(vend_state)
WHERE vend_country = 'USA';
-- Update customer states to uppercase
UPDATE Customers
SET cust_state = UPPER(cust_state)
WHERE cust_country = 'USA';
Challenge 2: Delete Yourself from Customers
-- Step 1: Test the WHERE clause first
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_name = 'Your Name' -- Replace with your actual name
AND cust_email = 'your.email@domain.com'; -- Replace with your email
-- Step 2: If results look correct, then delete
DELETE FROM Customers
WHERE cust_name = 'Your Name'
AND cust_email = 'your.email@domain.com';
Advanced Safety Techniques
1. Row Count Validation
-- Check how many rows will be affected
SELECT COUNT(*) AS rows_to_update
FROM Customers
WHERE cust_state = 'ca';
-- If count looks reasonable, proceed
UPDATE Customers
SET cust_state = UPPER(cust_state)
WHERE cust_state = 'ca';
2. Staged Updates
-- Update in small batches
UPDATE Customers
SET cust_email = LOWER(cust_email)
WHERE cust_id BETWEEN 1000000001 AND 1000000010;
-- Check results before continuing
SELECT * FROM Customers WHERE cust_id BETWEEN 1000000001 AND 1000000010;
3. Audit Trail
-- Log changes before making them
INSERT INTO ChangeLog (table_name, operation, old_values, new_values, change_date)
SELECT 'Customers', 'UPDATE',
CONCAT('email:', cust_email),
CONCAT('email:', LOWER(cust_email)),
GETDATE()
FROM Customers
WHERE cust_email != LOWER(cust_email);
-- Then perform the update
UPDATE Customers
SET cust_email = LOWER(cust_email)
WHERE cust_email != LOWER(cust_email);
Table Creation and Manipulation
Understanding Table Operations
SQL provides powerful statements for table structure management:
- CREATE TABLE: Create new tables
- ALTER TABLE: Modify existing table structure
- DROP TABLE: Delete entire tables
- RENAME TABLE: Change table names (DBMS-specific)
Important: These operations affect table structure, not just data, and should be used with extreme caution.
Creating Tables (CREATE TABLE)
Two approaches to table creation:
- Interactive tools: GUI-based database management tools
- SQL statements: Direct programmatic control
Basic CREATE TABLE Syntax
CREATE TABLE table_name
(
column1_name datatype constraints,
column2_name datatype constraints,
column3_name datatype constraints
);
Essential Information Required:
- Table name (must be unique in database)
- Column names and definitions
- Data types for each column
- Constraints (NULL/NOT NULL, defaults, etc.)
Complete Table Creation Example
CREATE TABLE Products
(
prod_id CHAR(10) NOT NULL,
vend_id CHAR(10) NOT NULL,
prod_name CHAR(254) NOT NULL,
prod_price DECIMAL(8,2) NOT NULL,
prod_desc VARCHAR(1000) NULL
);
Column Definition Components:
- prod_id: Column name
- CHAR(10): Data type with length
- NOT NULL: Constraint (required field)
Data Types Overview
Category | Examples | Description |
---|---|---|
Character | CHAR(n), VARCHAR(n) | Fixed/variable length text |
Numeric | INTEGER, DECIMAL(p,s) | Whole numbers, decimals |
Date/Time | DATE, DATETIME, TIMESTAMP | Date and time values |
Boolean | BOOLEAN, BIT | True/false values |
Working with NULL Values
NULL vs NOT NULL Constraints
CREATE TABLE Orders
(
order_num INTEGER NOT NULL, -- Required field
order_date DATETIME NOT NULL, -- Required field
cust_id CHAR(10) NOT NULL -- Required field
);
Mixed NULL/NOT NULL Example
CREATE TABLE Vendors
(
vend_id CHAR(10) NOT NULL, -- Required
vend_name CHAR(50) NOT NULL, -- Required
vend_address CHAR(50) , -- Optional (NULL allowed)
vend_city CHAR(50) , -- Optional
vend_state CHAR(5) , -- Optional
vend_zip CHAR(10) , -- Optional
vend_country CHAR(50) -- Optional
);
NULL Value Rules:
- NOT NULL: Column must have a value
- NULL (default): Column can be empty
- Primary keys: Must always be NOT NULL
Important Distinctions:
Value Type | Description | Example |
---|---|---|
NULL | No value at all | NULL |
Empty string | Valid empty text | '' |
Zero | Numeric value | 0 |
-- These are all different:
INSERT INTO Products (prod_id, prod_name, prod_price)
VALUES ('P001', '', 0); -- Empty string, zero price
-- vs
VALUES ('P002', NULL, NULL); -- NULL name and price
Specifying Default Values
DEFAULT keyword provides automatic values when none specified.
Basic Default Values
CREATE TABLE OrderItems
(
order_num INTEGER NOT NULL,
order_item INTEGER NOT NULL,
prod_id CHAR(10) NOT NULL,
quantity INTEGER NOT NULL DEFAULT 1, -- Default to 1
item_price DECIMAL(8,2) NOT NULL
);
System Date Defaults
Different DBMSs use different functions for current date/time:
DBMS | Function | Example |
---|---|---|
DB2 | CURRENT_DATE | DEFAULT CURRENT_DATE |
MySQL | CURRENT_DATE(), NOW() | DEFAULT CURRENT_DATE() |
Oracle | SYSDATE | DEFAULT SYSDATE |
PostgreSQL | CURRENT_DATE | DEFAULT CURRENT_DATE |
SQL Server | GETDATE() | DEFAULT GETDATE() |
SQLite | date('now') | DEFAULT date('now') |
Date Default Example
CREATE TABLE Orders
(
order_num INTEGER NOT NULL,
order_date DATETIME NOT NULL DEFAULT GETDATE(), -- SQL Server
cust_id CHAR(10) NOT NULL,
status CHAR(10) DEFAULT 'PENDING' -- Text default
);
Updating Table Structure (ALTER TABLE)
ALTER TABLE modifies existing table structure, but capabilities vary significantly between DBMSs.
Common ALTER TABLE Limitations:
- Adding columns: Usually supported
- Removing columns: Often restricted
- Changing data types: Limited support
- Populated tables: More restrictions
Adding Columns
-- Add new column to existing table
ALTER TABLE Vendors
ADD vend_phone CHAR(20);
Adding Multiple Columns
-- Add several columns at once (some DBMSs)
ALTER TABLE Vendors
ADD vend_email VARCHAR(100),
ADD vend_website VARCHAR(200);
Adding Columns with Constraints
-- Add column with default value
ALTER TABLE Products
ADD prod_weight DECIMAL(5,2) DEFAULT 0.00;
-- Add required column (be careful with existing data!)
ALTER TABLE Customers
ADD cust_type CHAR(10) NOT NULL DEFAULT 'REGULAR';
Dropping Columns (Limited Support)
-- Remove column (not supported by all DBMSs)
ALTER TABLE Vendors
DROP COLUMN vend_phone;
Complex Table Changes Process
For major structural changes, use this manual process:
-- 1. Create new table with desired structure
CREATE TABLE Vendors_New
(
vend_id CHAR(10) NOT NULL,
vend_name CHAR(50) NOT NULL,
vend_contact VARCHAR(100), -- New column
vend_city CHAR(50),
vend_country CHAR(50)
-- Note: removed vend_address, vend_state, vend_zip
);
-- 2. Copy data from old table
INSERT INTO Vendors_New (vend_id, vend_name, vend_city, vend_country)
SELECT vend_id, vend_name, vend_city, vend_country
FROM Vendors;
-- 3. Verify data integrity
SELECT COUNT(*) FROM Vendors; -- Original count
SELECT COUNT(*) FROM Vendors_New; -- New count should match
-- 4. Rename tables
DROP TABLE Vendors; -- Remove old table
ALTER TABLE Vendors_New RENAME TO Vendors; -- Rename new table
Deleting Tables (DROP TABLE)
DROP TABLE permanently removes entire tables and all data.
Basic DROP TABLE Syntax
-- Permanently delete table
DROP TABLE CustCopy;
Safety Considerations:
- No confirmation prompt
- No undo capability
- All data permanently lost
- All indexes, triggers, constraints removed
Safe Deletion Process
-- 1. Create backup first
CREATE TABLE CustCopy_Backup AS
SELECT * FROM CustCopy;
-- 2. Verify backup
SELECT COUNT(*) FROM CustCopy;
SELECT COUNT(*) FROM CustCopy_Backup;
-- 3. Drop original table
DROP TABLE CustCopy;
Foreign Key Protection
-- This may fail if table is referenced by foreign keys
DROP TABLE Vendors;
-- Error: Cannot drop table referenced by foreign key
-- Solution: Drop dependent tables first, or remove foreign keys
ALTER TABLE Products DROP FOREIGN KEY fk_products_vendors;
DROP TABLE Vendors;
Renaming Tables
Table renaming syntax varies significantly between DBMSs:
MySQL, MariaDB, Oracle, PostgreSQL:
RENAME TABLE old_name TO new_name;
-- or
ALTER TABLE old_name RENAME TO new_name;
SQL Server:
EXEC sp_rename 'old_name', 'new_name';
SQLite:
ALTER TABLE old_name RENAME TO new_name;
DB2:
RENAME TABLE old_name TO new_name;
Best Practices and Safety Guidelines
1. Always Backup Before Changes
-- Create complete backup
CREATE TABLE Products_Backup AS
SELECT * FROM Products;
-- Or export to file (DBMS-specific)
2. Test on Development Environment
-- Test structure changes on copy first
CREATE TABLE Products_Test AS
SELECT * FROM Products;
ALTER TABLE Products_Test
ADD new_column VARCHAR(50);
-- Verify results before applying to production
3. Plan Table Design Carefully
-- Good initial design reduces need for ALTER TABLE
CREATE TABLE Customers
(
cust_id INTEGER NOT NULL PRIMARY KEY,
cust_name VARCHAR(100) NOT NULL,
cust_email VARCHAR(255), -- Plan for long emails
cust_phone VARCHAR(20), -- Plan for international formats
cust_address TEXT, -- Flexible address storage
cust_created DATETIME DEFAULT GETDATE(),
cust_status CHAR(10) DEFAULT 'ACTIVE',
cust_notes TEXT -- Future expansion
);
4. Use Transactions for Safety
BEGIN TRANSACTION;
-- Perform table operations
ALTER TABLE Products
ADD prod_category VARCHAR(50) DEFAULT 'General';
-- Verify results
SELECT COUNT(*) FROM Products WHERE prod_category IS NULL;
-- If satisfied, commit; otherwise rollback
COMMIT;
-- OR ROLLBACK;
Common Patterns and Examples
Audit Table Creation
CREATE TABLE Customer_Audit
(
audit_id INTEGER NOT NULL PRIMARY KEY,
cust_id INTEGER NOT NULL,
operation CHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
old_values TEXT,
new_values TEXT,
changed_by VARCHAR(50) NOT NULL,
changed_date DATETIME DEFAULT GETDATE()
);
Lookup Table Creation
CREATE TABLE Product_Categories
(
category_id INTEGER NOT NULL PRIMARY KEY,
category_name VARCHAR(50) NOT NULL UNIQUE,
category_desc TEXT,
is_active BOOLEAN DEFAULT TRUE,
created_date DATETIME DEFAULT GETDATE()
);
Temporary Table Creation
-- Create temporary table for data processing
CREATE TABLE #TempSales -- SQL Server syntax
(
sale_id INTEGER,
sale_amount DECIMAL(10,2),
sale_date DATE
);
Challenge Solutions
Challenge 1: Add Website Column
-- Add website column to Vendors table
ALTER TABLE Vendors
ADD vend_web VARCHAR(255); -- Large enough for URLs
Challenge 2: Update Vendor Websites
-- Update vendors with website information
UPDATE Vendors
SET vend_web = 'https://www.bearsrus.com'
WHERE vend_id = 'BRS01';
UPDATE Vendors
SET vend_web = 'https://www.dollhouse.com'
WHERE vend_id = 'DLL01';
UPDATE Vendors
SET vend_web = 'https://www.funandgames.co.uk'
WHERE vend_id = 'FNG01';
-- Or update multiple at once with CASE
UPDATE Vendors
SET vend_web = CASE vend_id
WHEN 'BRS01' THEN 'https://www.bearsrus.com'
WHEN 'DLL01' THEN 'https://www.dollhouse.com'
WHEN 'FNG01' THEN 'https://www.funandgames.co.uk'
ELSE vend_web
END
WHERE vend_id IN ('BRS01', 'DLL01', 'FNG01');
Advanced Table Management
Partitioned Tables
-- Create partitioned table (SQL Server example)
CREATE TABLE Sales_Partitioned
(
sale_id INTEGER NOT NULL,
sale_date DATE NOT NULL,
sale_amount DECIMAL(10,2) NOT NULL
)
PARTITION BY RANGE (YEAR(sale_date))
(
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
Computed Columns
-- Table with computed column (SQL Server)
CREATE TABLE OrderItems_Enhanced
(
order_num INTEGER NOT NULL,
quantity INTEGER NOT NULL,
item_price DECIMAL(8,2) NOT NULL,
line_total AS (quantity * item_price) -- Computed column
);
Views
Understanding Views
Views are virtual tables that contain queries rather than data. Unlike physical tables, views dynamically retrieve data when accessed.
Key Characteristics:
- No data storage: Views contain only query definitions
- Dynamic results: Data retrieved from underlying tables in real-time
- Virtual abstraction: Appear and behave like regular tables
- Query encapsulation: Hide complex SQL behind simple interfaces
Real-World Example
Instead of this complex query every time:
-- Complex join query
SELECT cust_name, cust_contact
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num
AND prod_id = 'RGAN01';
Create a view once:
CREATE VIEW ProductCustomers AS
SELECT cust_name, cust_contact, prod_id
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num;
Then use simple queries:
-- Simple query using view
SELECT cust_name, cust_contact
FROM ProductCustomers
WHERE prod_id = 'RGAN01';
Why Use Views
1. Reuse SQL Statements
- Write complex queries once
- Use repeatedly without rewriting
- Maintain consistency across applications
2. Simplify Complex Operations
- Hide join complexity from users
- Abstract complicated business logic
- Provide intuitive data access
3. Expose Partial Data
- Show only relevant columns
- Filter sensitive information
- Create focused data subsets
4. Secure Data Access
- Grant access to views, not tables
- Control what users can see
- Implement row-level security
5. Format and Transform Data
- Present data in user-friendly formats
- Apply consistent calculations
- Standardize data representation
View Rules and Restrictions
General Rules:
- Unique names: Cannot duplicate table or view names
- No limit: Create as many views as needed
- Security required: Need appropriate permissions
- Nesting allowed: Views can reference other views
Common Restrictions:
Restriction | Description | Impact |
---|---|---|
No ORDER BY | Many DBMSs prohibit ORDER BY in views | Sort in SELECT instead |
Named columns | Calculated fields must have aliases | Use AS keyword |
No indexes | Views cannot be indexed | Performance considerations |
Read-only | Some DBMSs (SQLite) are read-only | No INSERT/UPDATE/DELETE |
No triggers | Views cannot have triggers | Limited automation |
Performance Considerations:
- Complex views: Multiple joins can slow performance
- Nested views: Each level adds processing overhead
- Real-time processing: No data caching in views
Creating Views
Basic CREATE VIEW Syntax
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE conditions;
View Management:
-- Create view
CREATE VIEW ViewName AS SELECT ...;
-- Drop view
DROP VIEW ViewName;
-- Update view (drop and recreate)
DROP VIEW ViewName;
CREATE VIEW ViewName AS SELECT ...;
Using Views to Simplify Complex Joins
Complex Join Example
-- Create view for product customers
CREATE VIEW ProductCustomers AS
SELECT cust_name, cust_contact, prod_id
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num;
Using the View
-- Find customers who ordered specific product
SELECT cust_name, cust_contact
FROM ProductCustomers
WHERE prod_id = 'RGAN01';
Output:
cust_name cust_contact
------------------- ------------------
Fun4All Denise L. Stephens
The Toy Store Kim Howard
Benefits:
- Simplified queries: No need to remember join syntax
- Reusable: Works for any product ID
- Maintainable: Change join logic in one place
Using Views to Reformat Data
Data Formatting View
-- SQL Server syntax
CREATE VIEW VendorLocations AS
SELECT RTRIM(vend_name) + ' (' + RTRIM(vend_country) + ')'
AS vend_title
FROM Vendors;
-- Standard SQL syntax
CREATE VIEW VendorLocations AS
SELECT RTRIM(vend_name) || ' (' || RTRIM(vend_country) || ')'
AS vend_title
FROM Vendors;
Using the Formatting View
SELECT * FROM VendorLocations;
Output:
vend_title
-----------------------------------------------------------
Bear Emporium (USA)
Bears R Us (USA)
Doll House Inc. (USA)
Fun and Games (England)
Furball Inc. (USA)
Jouets et ours (France)
Advantages:
- Consistent formatting: Same format everywhere
- Easy maintenance: Change format in one place
- User-friendly: Hide complex concatenation logic
Using Views to Filter Data
Filtering View Example
-- Create view for customers with email addresses
CREATE VIEW CustomerEMailList AS
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_email IS NOT NULL;
Using the Filter View
SELECT * FROM CustomerEMailList;
Output:
cust_id cust_name cust_email
---------- ------------ ---------------------
1000000001 Village Toys sales@villagetoys.com
1000000003 Fun4All jjones@fun4all.com
1000000004 Fun4All dstephens@fun4all.com
Combining WHERE Clauses:
-- Additional filtering on view
SELECT cust_name, cust_email
FROM CustomerEMailList
WHERE cust_name LIKE 'Fun%';
-- Combines view's WHERE with query's WHERE
Using Views with Calculated Fields
Calculated Field View
CREATE VIEW OrderItemsExpanded AS
SELECT order_num,
prod_id,
quantity,
item_price,
quantity * item_price AS expanded_price
FROM OrderItems;
Using the Calculated View
SELECT *
FROM OrderItemsExpanded
WHERE order_num = 20008;
Output:
order_num prod_id quantity item_price expanded_price
--------- ------- --------- ---------- --------------
20008 RGAN01 5 4.99 24.95
20008 BR03 5 11.99 59.95
20008 BNBG01 10 3.49 34.90
Benefits:
- Automatic calculations: No need to repeat formulas
- Consistency: Same calculation logic everywhere
- Simplicity: Hide complex mathematical operations
Advanced View Patterns
Security View
-- Employee view without salary information
CREATE VIEW EmployeePublic AS
SELECT emp_id, emp_name, emp_department, emp_title
FROM Employees;
-- Excludes emp_salary, emp_ssn, etc.
Aggregation View
-- Sales summary by customer
CREATE VIEW CustomerSales AS
SELECT c.cust_id,
c.cust_name,
COUNT(o.order_num) AS total_orders,
SUM(oi.quantity * oi.item_price) AS total_spent
FROM Customers c
LEFT JOIN Orders o ON c.cust_id = o.cust_id
LEFT JOIN OrderItems oi ON o.order_num = oi.order_num
GROUP BY c.cust_id, c.cust_name;
Union View
-- Combine current and archived orders
CREATE VIEW AllOrders AS
SELECT order_num, order_date, cust_id, 'CURRENT' AS status
FROM Orders
UNION ALL
SELECT order_num, order_date, cust_id, 'ARCHIVED' AS status
FROM OrdersArchive;
Nested Views
-- Base view
CREATE VIEW ActiveCustomers AS
SELECT * FROM Customers
WHERE status = 'ACTIVE';
-- View using another view
CREATE VIEW ActiveCustomerOrders AS
SELECT ac.cust_name, o.order_num, o.order_date
FROM ActiveCustomers ac
JOIN Orders o ON ac.cust_id = o.cust_id;
Best Practices
1. Design for Reusability
-- ✅ Good: General purpose view
CREATE VIEW ProductCustomers AS
SELECT cust_name, cust_contact, prod_id
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num;
-- ❌ Bad: Too specific
CREATE VIEW RGAN01Customers AS
SELECT cust_name, cust_contact
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num
AND prod_id = 'RGAN01';
2. Use Meaningful Names
-- ✅ Good: Descriptive names
CREATE VIEW CustomerOrderSummary AS ...
CREATE VIEW ActiveProductCatalog AS ...
-- ❌ Bad: Unclear names
CREATE VIEW View1 AS ...
CREATE VIEW TempView AS ...
3. Document Complex Views
-- Customer sales performance view
-- Includes: total orders, revenue, average order value
-- Excludes: inactive customers, cancelled orders
CREATE VIEW CustomerSalesMetrics AS
SELECT
c.cust_id,
c.cust_name,
COUNT(o.order_num) AS total_orders,
SUM(oi.quantity * oi.item_price) AS total_revenue,
AVG(oi.quantity * oi.item_price) AS avg_order_value
FROM Customers c
JOIN Orders o ON c.cust_id = o.cust_id
JOIN OrderItems oi ON o.order_num = oi.order_num
WHERE c.status = 'ACTIVE'
AND o.status != 'CANCELLED'
GROUP BY c.cust_id, c.cust_name;
Challenge Solutions
Challenge 1: CustomersWithOrders View
-- Create view for customers who have placed orders
CREATE VIEW CustomersWithOrders AS
SELECT DISTINCT c.*
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
-- Test the view
SELECT cust_name, cust_contact
FROM CustomersWithOrders
ORDER BY cust_name;
Challenge 2: What's Wrong?
-- ❌ INCORRECT: ORDER BY not allowed in many DBMS views
CREATE VIEW OrderItemsExpanded AS
SELECT order_num,
prod_id,
quantity,
item_price,
quantity*item_price AS expanded_price
FROM OrderItems
ORDER BY order_num; -- This causes the error
-- ✅ CORRECT: Remove ORDER BY from view
CREATE VIEW OrderItemsExpanded AS
SELECT order_num,
prod_id,
quantity,
item_price,
quantity*item_price AS expanded_price
FROM OrderItems;
-- Apply ORDER BY when querying the view
SELECT * FROM OrderItemsExpanded ORDER BY order_num;
Common View Patterns
Reporting Views
-- Monthly sales report view
CREATE VIEW MonthlySalesReport AS
SELECT
YEAR(o.order_date) AS sale_year,
MONTH(o.order_date) AS sale_month,
COUNT(DISTINCT o.order_num) AS total_orders,
COUNT(DISTINCT o.cust_id) AS unique_customers,
SUM(oi.quantity * oi.item_price) AS total_revenue
FROM Orders o
JOIN OrderItems oi ON o.order_num = oi.order_num
GROUP BY YEAR(o.order_date), MONTH(o.order_date);
Data Warehouse Views
-- Fact table view for analytics
CREATE VIEW SalesFactView AS
SELECT
o.order_date,
c.cust_id,
c.cust_state,
p.prod_id,
p.vend_id,
oi.quantity,
oi.item_price,
oi.quantity * oi.item_price AS line_total
FROM Orders o
JOIN Customers c ON o.cust_id = c.cust_id
JOIN OrderItems oi ON o.order_num = oi.order_num
JOIN Products p ON oi.prod_id = p.prod_id;
Stored Procedures
Understanding Stored Procedures
Stored procedures are collections of one or more SQL statements saved for future use. Think of them as database "programs" that can execute complex operations with a single call.
Real-World Scenario: Order Processing
Consider processing a customer order:
- Check inventory: Verify items are in stock
- Reserve items: Update quantities to prevent overselling
- Handle backorders: Coordinate with vendors for out-of-stock items
- Notify customer: Send status updates on availability
This requires multiple SQL statements across many tables with conditional logic - perfect for a stored procedure.
Key Characteristics:
- Batch operations: Execute multiple statements as one unit
- Conditional logic: Include IF/ELSE, loops, and variables
- Reusable code: Write once, use many times
- Server-side execution: Run directly on database server
Why Use Stored Procedures
Primary Benefits:
1. Simplicity
- Encapsulate complexity: Hide complex operations behind simple calls
- Consistent interface: Same procedure works across applications
- Reduced coding: No need to rewrite complex logic
2. Security
- Access control: Grant procedure execution without table access
- Data protection: Prevent direct table manipulation
- Reduced attack surface: Limit SQL injection opportunities
3. Performance
- Compiled code: Pre-compiled for faster execution
- Reduced network traffic: Single call instead of multiple statements
- Optimized execution: Database can optimize procedure plans
4. Data Consistency
- Standardized operations: Everyone uses same business logic
- Error prevention: Centralized validation and processing
- Transaction control: Ensure data integrity across operations
5. Change Management
- Centralized updates: Modify logic in one place
- Version control: Track procedure changes over time
- Backward compatibility: Maintain consistent interfaces
Drawbacks and Considerations
Challenges:
Challenge | Description | Mitigation |
---|---|---|
Portability | Syntax varies dramatically between DBMSs | Use standard SQL where possible |
Complexity | Harder to write than simple SQL | Invest in training and documentation |
Debugging | More difficult to troubleshoot | Use comprehensive logging |
Version control | Database objects harder to track | Use database migration tools |
Testing | Unit testing more complex | Develop testing frameworks |
Database Compatibility:
-- ❌ SQLite: Does not support stored procedures
-- ✅ Oracle: Full support with PL/SQL
-- ✅ SQL Server: Full support with T-SQL
-- ✅ MySQL: Supports stored procedures
-- ✅ PostgreSQL: Supports stored functions
Executing Stored Procedures
Basic EXECUTE Syntax
-- Basic execution
EXECUTE procedure_name;
-- With parameters
EXECUTE procedure_name(param1, param2, param3);
Example: Adding a New Product
-- Execute stored procedure with parameters
EXECUTE AddNewProduct('JTS01',
'Stuffed Eiffel Tower',
6.49,
'Plush stuffed toy with the text La Tour Eiffel in red white and blue');
What This Procedure Does:
- Validates input: Ensures all required parameters have values
- Generates ID: Creates unique primary key automatically
- Inserts data: Adds new product to Products table
- Returns result: Confirms successful creation
Parameter Types:
-- Input parameters (default)
EXECUTE GetCustomer(1000000001);
-- Output parameters
EXECUTE GetCustomerCount(@count OUTPUT);
-- Input/Output parameters
EXECUTE UpdateInventory(@product_id, @quantity INOUT);
Execution Options:
Optional Parameters:
-- With default values
EXECUTE CreateOrder(@cust_id, @priority = 'NORMAL');
Named Parameters:
-- Out-of-order specification
EXECUTE CreateOrder(priority = 'HIGH', cust_id = 1000000001);
Return Values:
-- Capture return code
DECLARE @result INT;
EXECUTE @result = ValidateCustomer(1000000001);
Creating Stored Procedures
Oracle Syntax (PL/SQL)
-- Create procedure to count customers with email
CREATE PROCEDURE MailingListCount (
ListCount OUT INTEGER
)
IS
v_rows INTEGER;
BEGIN
SELECT COUNT(*) INTO v_rows
FROM Customers
WHERE cust_email IS NOT NULL;
ListCount := v_rows;
END;
/
Executing Oracle Procedure:
-- Declare variable for output
DECLARE
ReturnValue NUMBER;
BEGIN
-- Execute procedure
MailingListCount(ReturnValue);
-- Display result
DBMS_OUTPUT.PUT_LINE('Count: ' || ReturnValue);
END;
/
SQL Server Syntax (T-SQL)
-- Create procedure to count customers with email
CREATE PROCEDURE MailingListCount
AS
BEGIN
DECLARE @cnt INTEGER;
SELECT @cnt = COUNT(*)
FROM Customers
WHERE cust_email IS NOT NULL;
RETURN @cnt;
END;
Executing SQL Server Procedure:
-- Declare variable for return value
DECLARE @ReturnValue INT;
-- Execute procedure and capture return value
EXECUTE @ReturnValue = MailingListCount;
-- Display result
SELECT @ReturnValue AS CustomerCount;
Advanced Stored Procedure Examples
Order Creation Procedure (SQL Server)
CREATE PROCEDURE NewOrder @cust_id CHAR(10)
AS
BEGIN
-- Declare variable for order number
DECLARE @order_num INTEGER;
-- Get current highest order number
SELECT @order_num = MAX(order_num)
FROM Orders;
-- Determine next order number
SELECT @order_num = @order_num + 1;
-- Insert new order
INSERT INTO Orders(order_num, order_date, cust_id)
VALUES(@order_num, GETDATE(), @cust_id);
-- Return order number
RETURN @order_num;
END;
Using Auto-Generated IDs:
CREATE PROCEDURE NewOrderAuto @cust_id CHAR(10)
AS
BEGIN
-- Insert new order (order_num auto-generated)
INSERT INTO Orders(cust_id, order_date)
VALUES(@cust_id, GETDATE());
-- Return generated order number
SELECT order_num = @@IDENTITY;
END;
Complex Business Logic Example:
CREATE PROCEDURE ProcessOrder
@cust_id CHAR(10),
@prod_id CHAR(10),
@quantity INTEGER,
@order_status VARCHAR(20) OUTPUT
AS
BEGIN
DECLARE @available_qty INTEGER;
DECLARE @order_num INTEGER;
-- Check inventory
SELECT @available_qty = quantity_in_stock
FROM Products
WHERE prod_id = @prod_id;
-- Validate inventory
IF @available_qty >= @quantity
BEGIN
-- Create order
EXECUTE @order_num = NewOrder @cust_id;
-- Add order item
INSERT INTO OrderItems(order_num, prod_id, quantity, item_price)
SELECT @order_num, @prod_id, @quantity, prod_price
FROM Products
WHERE prod_id = @prod_id;
-- Update inventory
UPDATE Products
SET quantity_in_stock = quantity_in_stock - @quantity
WHERE prod_id = @prod_id;
SET @order_status = 'CONFIRMED';
END
ELSE
BEGIN
SET @order_status = 'BACKORDERED';
END;
RETURN @order_num;
END;
Parameter Types and Usage
Input Parameters (IN)
-- Oracle
CREATE PROCEDURE GetCustomerOrders(
p_cust_id IN VARCHAR2
)
IS
BEGIN
SELECT * FROM Orders
WHERE cust_id = p_cust_id;
END;
-- SQL Server
CREATE PROCEDURE GetCustomerOrders
@cust_id VARCHAR(10)
AS
BEGIN
SELECT * FROM Orders
WHERE cust_id = @cust_id;
END;
Output Parameters (OUT)
-- Oracle
CREATE PROCEDURE GetOrderTotal(
p_order_num IN INTEGER,
p_total OUT NUMBER
)
IS
BEGIN
SELECT SUM(quantity * item_price) INTO p_total
FROM OrderItems
WHERE order_num = p_order_num;
END;
-- SQL Server
CREATE PROCEDURE GetOrderTotal
@order_num INTEGER,
@total MONEY OUTPUT
AS
BEGIN
SELECT @total = SUM(quantity * item_price)
FROM OrderItems
WHERE order_num = @order_num;
END;
Input/Output Parameters (INOUT)
-- Oracle
CREATE PROCEDURE AdjustInventory(
p_prod_id IN VARCHAR2,
p_quantity IN OUT INTEGER
)
IS
v_current_qty INTEGER;
BEGIN
-- Get current quantity
SELECT quantity_in_stock INTO v_current_qty
FROM Products
WHERE prod_id = p_prod_id;
-- Calculate new quantity
p_quantity := v_current_qty + p_quantity;
-- Update inventory
UPDATE Products
SET quantity_in_stock = p_quantity
WHERE prod_id = p_prod_id;
END;
Error Handling in Stored Procedures
SQL Server Error Handling:
CREATE PROCEDURE SafeOrderCreation
@cust_id CHAR(10),
@result VARCHAR(50) OUTPUT
AS
BEGIN
BEGIN TRY
-- Start transaction
BEGIN TRANSACTION;
-- Validate customer exists
IF NOT EXISTS (SELECT 1 FROM Customers WHERE cust_id = @cust_id)
BEGIN
SET @result = 'ERROR: Customer not found';
RETURN -1;
END;
-- Create order
INSERT INTO Orders(cust_id, order_date)
VALUES(@cust_id, GETDATE());
-- Commit transaction
COMMIT TRANSACTION;
SET @result = 'SUCCESS: Order created';
RETURN @@IDENTITY;
END TRY
BEGIN CATCH
-- Rollback on error
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
SET @result = 'ERROR: ' + ERROR_MESSAGE();
RETURN -1;
END CATCH;
END;
Oracle Error Handling:
CREATE PROCEDURE SafeOrderCreation(
p_cust_id IN VARCHAR2,
p_order_num OUT INTEGER,
p_result OUT VARCHAR2
)
IS
v_count INTEGER;
BEGIN
-- Validate customer exists
SELECT COUNT(*) INTO v_count
FROM Customers
WHERE cust_id = p_cust_id;
IF v_count = 0 THEN
p_result := 'ERROR: Customer not found';
p_order_num := -1;
RETURN;
END IF;
-- Create order
INSERT INTO Orders(order_num, cust_id, order_date)
VALUES(order_seq.NEXTVAL, p_cust_id, SYSDATE)
RETURNING order_num INTO p_order_num;
p_result := 'SUCCESS: Order created';
EXCEPTION
WHEN OTHERS THEN
p_result := 'ERROR: ' || SQLERRM;
p_order_num := -1;
ROLLBACK;
END;
Best Practices
1. Code Documentation
-- ✅ Good: Well-documented procedure
/*
Purpose: Create new customer order with validation
Parameters:
@cust_id - Customer ID (required)
@priority - Order priority (optional, default 'NORMAL')
Returns: Order number if successful, -1 if error
Author: Database Team
Created: 2024-01-15
Modified: 2024-01-20 - Added priority parameter
*/
CREATE PROCEDURE CreateCustomerOrder
@cust_id CHAR(10),
@priority VARCHAR(10) = 'NORMAL'
AS
BEGIN
-- Validate customer exists
IF NOT EXISTS (SELECT 1 FROM Customers WHERE cust_id = @cust_id)
BEGIN
RETURN -1; -- Customer not found
END;
-- Create order with specified priority
INSERT INTO Orders(cust_id, order_date, priority)
VALUES(@cust_id, GETDATE(), @priority);
-- Return new order number
RETURN @@IDENTITY;
END;
2. Parameter Validation
CREATE PROCEDURE ValidatedOrderCreation
@cust_id CHAR(10),
@prod_id CHAR(10),
@quantity INTEGER
AS
BEGIN
-- Input validation
IF @cust_id IS NULL OR LEN(@cust_id) = 0
BEGIN
RAISERROR('Customer ID is required', 16, 1);
RETURN -1;
END;
IF @quantity <= 0
BEGIN
RAISERROR('Quantity must be positive', 16, 1);
RETURN -1;
END;
-- Business logic continues...
END;
3. Transaction Management
CREATE PROCEDURE AtomicOrderProcess
@cust_id CHAR(10),
@items OrderItemsTableType READONLY
AS
BEGIN
BEGIN TRANSACTION;
BEGIN TRY
-- Create order header
INSERT INTO Orders(cust_id, order_date)
VALUES(@cust_id, GETDATE());
DECLARE @order_num INTEGER = @@IDENTITY;
-- Add order items
INSERT INTO OrderItems(order_num, prod_id, quantity, item_price)
SELECT @order_num, prod_id, quantity, price
FROM @items;
-- Update inventory
UPDATE p SET quantity_in_stock = quantity_in_stock - i.quantity
FROM Products p
INNER JOIN @items i ON p.prod_id = i.prod_id;
COMMIT TRANSACTION;
RETURN @order_num;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
THROW;
END CATCH;
END;
4. Performance Optimization
-- ✅ Good: Efficient procedure
CREATE PROCEDURE GetCustomerOrderSummary
@cust_id CHAR(10)
AS
BEGIN
-- Use SET NOCOUNT to reduce network traffic
SET NOCOUNT ON;
-- Single query instead of multiple lookups
SELECT
c.cust_name,
COUNT(o.order_num) AS total_orders,
SUM(oi.quantity * oi.item_price) AS total_spent,
MAX(o.order_date) AS last_order_date
FROM Customers c
LEFT JOIN Orders o ON c.cust_id = o.cust_id
LEFT JOIN OrderItems oi ON o.order_num = oi.order_num
WHERE c.cust_id = @cust_id
GROUP BY c.cust_id, c.cust_name;
END;
Common Stored Procedure Patterns
CRUD Operations
-- Create
CREATE PROCEDURE CreateCustomer
@cust_name VARCHAR(50),
@cust_email VARCHAR(100),
@cust_id CHAR(10) OUTPUT
AS
BEGIN
-- Generate new customer ID
SELECT @cust_id = 'C' + RIGHT('000000000' + CAST(NEXT VALUE FOR cust_seq AS VARCHAR), 9);
INSERT INTO Customers(cust_id, cust_name, cust_email)
VALUES(@cust_id, @cust_name, @cust_email);
END;
-- Read
CREATE PROCEDURE GetCustomer
@cust_id CHAR(10)
AS
BEGIN
SELECT * FROM Customers
WHERE cust_id = @cust_id;
END;
-- Update
CREATE PROCEDURE UpdateCustomer
@cust_id CHAR(10),
@cust_name VARCHAR(50),
@cust_email VARCHAR(100)
AS
BEGIN
UPDATE Customers
SET cust_name = @cust_name,
cust_email = @cust_email,
modified_date = GETDATE()
WHERE cust_id = @cust_id;
END;
-- Delete
CREATE PROCEDURE DeleteCustomer
@cust_id CHAR(10)
AS
BEGIN
DELETE FROM Customers
WHERE cust_id = @cust_id;
END;
Reporting Procedures
CREATE PROCEDURE SalesReport
@start_date DATE,
@end_date DATE
AS
BEGIN
SELECT
p.prod_name,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.item_price) AS revenue,
AVG(oi.item_price) AS avg_price
FROM Orders o
INNER JOIN OrderItems oi ON o.order_num = oi.order_num
INNER JOIN Products p ON oi.prod_id = p.prod_id
WHERE o.order_date BETWEEN @start_date AND @end_date
GROUP BY p.prod_id, p.prod_name
ORDER BY revenue DESC;
END;
Transaction Processing
Understanding Transaction Processing
Transaction processing maintains database integrity by ensuring that batches of SQL operations execute completely or not at all. This prevents partial operations from corrupting your data.
The Problem: Partial Operations
Consider adding an order to a system with multiple related tables:
- Check/add customer to Customers table
- Retrieve customer ID for reference
- Add order to Orders table
- Retrieve order ID for reference
- Add order items to OrderItems table
What happens if the system fails partway through?
Failure Point | Result | Problem |
---|---|---|
After customer, before order | Customer exists, no order | ✅ Acceptable - can retry |
After order, before items | Empty order in database | ❌ Orphaned order record |
During item insertion | Partial order | ❌ Incomplete, unknown state |
The Solution: Transaction Processing
-- All operations succeed together, or all fail together
BEGIN TRANSACTION
-- Add customer
-- Add order
-- Add order items
COMMIT TRANSACTION -- Save all changes
-- OR if any step fails:
ROLLBACK TRANSACTION -- Undo all changes
Key Transaction Concepts
Core Terminology:
- Transaction: Block of SQL statements treated as single unit
- Rollback: Process of undoing specified SQL statements
- Commit: Writing unsaved SQL statements to database tables
- Savepoint: Temporary placeholder for partial rollbacks
ACID Properties:
Property | Description | Benefit |
---|---|---|
Atomicity | All or nothing execution | Prevents partial updates |
Consistency | Database rules maintained | Data integrity preserved |
Isolation | Transactions don't interfere | Concurrent access safe |
Durability | Committed changes persist | Data survives system failures |
Statement Compatibility:
-- ✅ Can be rolled back
INSERT, UPDATE, DELETE
-- ❌ Cannot be rolled back
SELECT (no point anyway)
CREATE, DROP (structural changes)
Controlling Transactions
Database-Specific Syntax
SQL Server:
BEGIN TRANSACTION
-- SQL statements here
COMMIT TRANSACTION
MySQL/MariaDB:
START TRANSACTION
-- SQL statements here
COMMIT
Oracle:
SET TRANSACTION
-- SQL statements here
COMMIT
PostgreSQL (ANSI SQL):
BEGIN
-- SQL statements here
COMMIT
Transaction Lifecycle:
-- 1. Start transaction
BEGIN TRANSACTION
-- 2. Execute operations
INSERT INTO table1 VALUES (...);
UPDATE table2 SET ...;
DELETE FROM table3 WHERE ...;
-- 3. End transaction (choose one)
COMMIT TRANSACTION -- Save all changes
-- OR
ROLLBACK TRANSACTION -- Undo all changes
Using ROLLBACK
Basic Rollback Example
-- Undo a DELETE operation
DELETE FROM Orders;
ROLLBACK;
-- All deleted rows are restored
Conditional Rollback
BEGIN TRANSACTION
UPDATE Products
SET quantity_in_stock = quantity_in_stock - 10
WHERE prod_id = 'BR01';
-- Check if update was valid
IF (SELECT quantity_in_stock FROM Products WHERE prod_id = 'BR01') < 0
BEGIN
ROLLBACK TRANSACTION; -- Undo if negative inventory
PRINT 'Insufficient inventory';
END
ELSE
BEGIN
COMMIT TRANSACTION; -- Save if valid
PRINT 'Inventory updated';
END
Rollback Scenarios:
-- Error handling with rollback
BEGIN TRANSACTION
INSERT INTO Customers(cust_id, cust_name)
VALUES('C001', 'New Customer');
-- Simulate error condition
IF @@ERROR <> 0
BEGIN
ROLLBACK TRANSACTION;
RETURN;
END
INSERT INTO Orders(order_num, cust_id, order_date)
VALUES(1001, 'C001', GETDATE());
IF @@ERROR <> 0
BEGIN
ROLLBACK TRANSACTION;
RETURN;
END
COMMIT TRANSACTION;
Using COMMIT
Implicit vs Explicit Commits
-- Implicit commit (automatic)
INSERT INTO Products VALUES('P001', 'New Product', 19.99);
-- Automatically saved to database
-- Explicit commit (manual control)
BEGIN TRANSACTION
INSERT INTO Products VALUES('P002', 'Another Product', 29.99);
-- Not yet saved to database
COMMIT TRANSACTION;
-- Now saved to database
Complete Order Processing Example
-- SQL Server: Complete order transaction
BEGIN TRANSACTION
-- Delete order items first (foreign key constraint)
DELETE FROM OrderItems WHERE order_num = 12345;
-- Then delete the order
DELETE FROM Orders WHERE order_num = 12345;
-- Only commit if both operations succeeded
COMMIT TRANSACTION;
Oracle Equivalent:
SET TRANSACTION
DELETE FROM OrderItems WHERE order_num = 12345;
DELETE FROM Orders WHERE order_num = 12345;
COMMIT;
Benefits of Explicit Commits:
- Control timing: Decide exactly when changes are saved
- Batch operations: Group related changes together
- Error recovery: Ability to undo on problems
- Performance: Reduce disk I/O by batching writes
Using Savepoints
Creating Savepoints
-- SQL Server
SAVE TRANSACTION savepoint_name;
-- MySQL/MariaDB/Oracle
SAVEPOINT savepoint_name;
Rolling Back to Savepoints
-- SQL Server
ROLLBACK TRANSACTION savepoint_name;
-- MySQL/MariaDB/Oracle
ROLLBACK TO savepoint_name;
Complex Transaction with Savepoints
-- SQL Server: Order processing with savepoints
BEGIN TRANSACTION
-- Add customer (always keep this)
INSERT INTO Customers(cust_id, cust_name)
VALUES(1000000010, 'Toys Emporium');
-- Create savepoint after customer
SAVE TRANSACTION StartOrder;
-- Add order
INSERT INTO Orders(order_num, order_date, cust_id)
VALUES(20100, '2020/12/1', 1000000010);
IF @@ERROR <> 0
ROLLBACK TRANSACTION StartOrder; -- Keep customer, undo order
-- Add first order item
INSERT INTO OrderItems(order_num, order_item, prod_id, quantity, item_price)
VALUES(20100, 1, 'BR01', 100, 5.49);
IF @@ERROR <> 0
ROLLBACK TRANSACTION StartOrder; -- Keep customer, undo order & items
-- Add second order item
INSERT INTO OrderItems(order_num, order_item, prod_id, quantity, item_price)
VALUES(20100, 2, 'BR03', 100, 10.99);
IF @@ERROR <> 0
ROLLBACK TRANSACTION StartOrder; -- Keep customer, undo order & items
COMMIT TRANSACTION; -- Save everything if no errors
Multiple Savepoints Strategy:
BEGIN TRANSACTION
-- Customer operations
INSERT INTO Customers(...);
SAVE TRANSACTION customer_added;
-- Order operations
INSERT INTO Orders(...);
SAVE TRANSACTION order_added;
-- Order item operations
INSERT INTO OrderItems(...);
SAVE TRANSACTION item1_added;
INSERT INTO OrderItems(...);
SAVE TRANSACTION item2_added;
-- Can rollback to any savepoint as needed
-- ROLLBACK TRANSACTION item1_added; -- Keeps customer & order
-- ROLLBACK TRANSACTION order_added; -- Keeps only customer
COMMIT TRANSACTION;
Advanced Transaction Patterns
Nested Transaction Simulation
-- Simulate nested transactions with savepoints
BEGIN TRANSACTION MainTransaction
INSERT INTO Orders(...);
SAVE TRANSACTION OrderSavepoint;
-- "Nested" transaction for order items
BEGIN TRY
INSERT INTO OrderItems(...);
INSERT INTO OrderItems(...);
INSERT INTO OrderItems(...);
-- "Nested commit" - just continue
END TRY
BEGIN CATCH
-- "Nested rollback" - rollback to savepoint
ROLLBACK TRANSACTION OrderSavepoint;
THROW; -- Re-raise error
END CATCH
COMMIT TRANSACTION MainTransaction;
Batch Processing with Transactions
-- Process large datasets in batches
DECLARE @BatchSize INT = 1000;
DECLARE @RowsProcessed INT = 0;
WHILE EXISTS (SELECT 1 FROM TempTable WHERE processed = 0)
BEGIN
BEGIN TRANSACTION
-- Process batch
UPDATE TOP(@BatchSize) TempTable
SET processed = 1,
processed_date = GETDATE()
WHERE processed = 0;
SET @RowsProcessed = @@ROWCOUNT;
-- Commit batch
COMMIT TRANSACTION;
PRINT CAST(@RowsProcessed AS VARCHAR) + ' rows processed';
-- Brief pause to avoid blocking
WAITFOR DELAY '00:00:01';
END
Error Handling Patterns
Try-Catch with Transactions (SQL Server)
BEGIN TRANSACTION
BEGIN TRY
-- Risky operations
INSERT INTO Orders(order_num, cust_id, order_date)
VALUES(20100, 'INVALID_CUSTOMER', GETDATE());
INSERT INTO OrderItems(order_num, prod_id, quantity)
VALUES(20100, 'INVALID_PRODUCT', 5);
-- If we get here, commit
COMMIT TRANSACTION;
PRINT 'Transaction completed successfully';
END TRY
BEGIN CATCH
-- Error occurred, rollback
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
-- Report error details
PRINT 'Error: ' + ERROR_MESSAGE();
PRINT 'Transaction rolled back';
END CATCH
Manual Error Checking
BEGIN TRANSACTION
DECLARE @ErrorCount INT = 0;
INSERT INTO Customers(cust_id, cust_name)
VALUES('C001', 'Test Customer');
SET @ErrorCount = @ErrorCount + @@ERROR;
INSERT INTO Orders(order_num, cust_id, order_date)
VALUES(1001, 'C001', GETDATE());
SET @ErrorCount = @ErrorCount + @@ERROR;
INSERT INTO OrderItems(order_num, prod_id, quantity, item_price)
VALUES(1001, 'P001', 2, 19.99);
SET @ErrorCount = @ErrorCount + @@ERROR;
-- Check for any errors
IF @ErrorCount = 0
BEGIN
COMMIT TRANSACTION;
PRINT 'All operations successful';
END
ELSE
BEGIN
ROLLBACK TRANSACTION;
PRINT 'Errors occurred, transaction rolled back';
END
Performance Considerations
Transaction Size Management
-- ❌ Bad: Very large transaction
BEGIN TRANSACTION
-- Process 1 million rows in single transaction
UPDATE LargeTable SET status = 'PROCESSED'; -- Locks entire table
COMMIT TRANSACTION;
-- ✅ Good: Batch processing
DECLARE @BatchSize INT = 10000;
WHILE EXISTS (SELECT 1 FROM LargeTable WHERE status = 'PENDING')
BEGIN
BEGIN TRANSACTION
UPDATE TOP(@BatchSize) LargeTable
SET status = 'PROCESSED'
WHERE status = 'PENDING';
COMMIT TRANSACTION;
-- Allow other transactions to proceed
WAITFOR DELAY '00:00:00.100';
END
Lock Duration Minimization
-- ✅ Good: Minimize lock time
BEGIN TRANSACTION
-- Quick operations only
DECLARE @order_num INT;
INSERT INTO Orders(cust_id, order_date)
VALUES('C001', GETDATE());
SET @order_num = @@IDENTITY;
COMMIT TRANSACTION; -- Release locks quickly
-- Separate transaction for bulk operations
BEGIN TRANSACTION
INSERT INTO OrderItems(order_num, prod_id, quantity, item_price)
SELECT @order_num, prod_id, quantity, price
FROM TempOrderItems;
COMMIT TRANSACTION;
Best Practices
1. Keep Transactions Short
-- ✅ Good: Short, focused transaction
BEGIN TRANSACTION
UPDATE Inventory SET quantity = quantity - 1 WHERE prod_id = 'P001';
INSERT INTO Sales(prod_id, quantity, sale_date) VALUES('P001', 1, GETDATE());
COMMIT TRANSACTION;
-- ❌ Bad: Long-running transaction
BEGIN TRANSACTION
-- Complex calculations
-- File I/O operations
-- Network calls
-- Multiple large table scans
COMMIT TRANSACTION; -- Holds locks too long
2. Use Appropriate Isolation Levels
-- For read-heavy operations
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION
SELECT COUNT(*) FROM LargeTable; -- Dirty read OK for counts
COMMIT TRANSACTION;
-- For critical updates
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
-- Critical financial operations
COMMIT TRANSACTION;
3. Handle Deadlocks Gracefully
DECLARE @RetryCount INT = 0;
DECLARE @MaxRetries INT = 3;
WHILE @RetryCount < @MaxRetries
BEGIN
BEGIN TRY
BEGIN TRANSACTION
-- Operations that might deadlock
UPDATE Account SET balance = balance - 100 WHERE account_id = 1;
UPDATE Account SET balance = balance + 100 WHERE account_id = 2;
COMMIT TRANSACTION;
BREAK; -- Success, exit loop
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
IF ERROR_NUMBER() = 1205 -- Deadlock
BEGIN
SET @RetryCount = @RetryCount + 1;
WAITFOR DELAY '00:00:01'; -- Wait before retry
CONTINUE;
END
ELSE
BEGIN
THROW; -- Re-raise non-deadlock errors
END
END CATCH
END
Common Transaction Patterns
Financial Transfer
-- Money transfer between accounts
BEGIN TRANSACTION
DECLARE @source_balance MONEY;
DECLARE @transfer_amount MONEY = 500.00;
-- Check source account balance
SELECT @source_balance = balance
FROM Accounts
WHERE account_id = 'ACC001';
IF @source_balance < @transfer_amount
BEGIN
ROLLBACK TRANSACTION;
RAISERROR('Insufficient funds', 16, 1);
RETURN;
END
-- Perform transfer
UPDATE Accounts
SET balance = balance - @transfer_amount
WHERE account_id = 'ACC001';
UPDATE Accounts
SET balance = balance + @transfer_amount
WHERE account_id = 'ACC002';
-- Log transaction
INSERT INTO TransactionLog(from_account, to_account, amount, transaction_date)
VALUES('ACC001', 'ACC002', @transfer_amount, GETDATE());
COMMIT TRANSACTION;
Inventory Management
-- Reserve inventory for order
BEGIN TRANSACTION
DECLARE @available_qty INT;
DECLARE @order_qty INT = 5;
-- Check availability
SELECT @available_qty = quantity_available
FROM Inventory
WHERE prod_id = 'P001';
IF @available_qty < @order_qty
BEGIN
ROLLBACK TRANSACTION;
RAISERROR('Insufficient inventory', 16, 1);
RETURN;
END
-- Reserve inventory
UPDATE Inventory
SET quantity_available = quantity_available - @order_qty,
quantity_reserved = quantity_reserved + @order_qty
WHERE prod_id = 'P001';
-- Create order
INSERT INTO Orders(cust_id, order_date, status)
VALUES('C001', GETDATE(), 'PENDING');
INSERT INTO OrderItems(order_num, prod_id, quantity)
VALUES(@@IDENTITY, 'P001', @order_qty);
COMMIT TRANSACTION;
Key Takeaways
- Transactions ensure data integrity by treating multiple operations as atomic units
- Use transactions for related operations that must succeed or fail together
- Keep transactions short to minimize lock duration and improve concurrency
- Use savepoints for complex transactions requiring partial rollback capability
- Always include error handling with appropriate rollback logic
- Consider performance implications of transaction size and isolation levels
- Test transaction logic thoroughly including error and rollback scenarios
- Understand your DBMS-specific syntax as implementations vary significantly
- Use batch processing for large datasets instead of single massive transactions
- Plan for deadlock handling in multi-user environments
Cursors
Understanding Cursors
Cursors enable row-by-row processing of SQL result sets, allowing applications to step through data one record at a time rather than processing entire result sets at once.
The Problem with Standard SQL
Standard SQL retrieval operations work with result sets - complete collections of rows returned by queries:
-- Returns entire result set at once
SELECT * FROM Customers WHERE cust_state = 'CA';
-- No way to get "first row", "next row", or "previous 10 rows"
What Cursors Provide:
- Sequential access: Move forward/backward through rows
- Positional control: Jump to specific row positions
- Row-by-row processing: Handle one record at a time
- Interactive browsing: Support user navigation through data
Key Characteristics:
Feature | Description | Benefit |
---|---|---|
Server-side storage | Query results stored on DBMS server | Efficient memory usage |
Scrollable navigation | Move in any direction through data | Flexible data access |
Stateful processing | Maintains current position | Resumable operations |
Controlled fetching | Retrieve specific rows on demand | Reduced network traffic |
Cursor Capabilities
Common Cursor Features:
1. Access Control
-- Read-only cursor
DECLARE ReadOnlyCursor CURSOR FOR
SELECT * FROM Products;
-- Updatable cursor (where supported)
DECLARE UpdatableCursor CURSOR FOR
SELECT * FROM Products FOR UPDATE;
2. Directional Movement
- Forward: Next row(s)
- Backward: Previous row(s)
- Absolute: Specific row number
- Relative: Move by offset from current position
- First/Last: Jump to beginning/end
3. Column Editability
-- Some columns editable, others read-only
DECLARE MixedCursor CURSOR FOR
SELECT prod_id, -- Read-only (primary key)
prod_name, -- Editable
prod_price, -- Editable
created_date -- Read-only (system field)
FROM Products;
4. Scope Control
- Local: Available only to creating procedure
- Global: Available to entire session
- Connection: Available across connection
5. Data Consistency
-- Static cursor (snapshot of data)
DECLARE StaticCursor CURSOR STATIC FOR
SELECT * FROM Orders;
-- Dynamic cursor (live data)
DECLARE DynamicCursor CURSOR DYNAMIC FOR
SELECT * FROM Orders;
Working with Cursors
Cursor Lifecycle:
- DECLARE: Define cursor and SELECT statement
- OPEN: Execute query and populate cursor
- FETCH: Retrieve individual rows as needed
- CLOSE: Release cursor resources
- DEALLOCATE: Free cursor memory (some DBMSs)
-- Complete cursor lifecycle
DECLARE MyCursor CURSOR FOR SELECT * FROM Customers; -- 1. Declare
OPEN MyCursor; -- 2. Open
FETCH NEXT FROM MyCursor INTO @variables; -- 3. Fetch
CLOSE MyCursor; -- 4. Close
DEALLOCATE MyCursor; -- 5. Deallocate
Creating Cursors
Database-Specific Syntax
SQL Server/MySQL/MariaDB/DB2:
DECLARE CustCursor CURSOR
FOR
SELECT * FROM Customers
WHERE cust_email IS NULL;
Oracle/PostgreSQL:
DECLARE CURSOR CustCursor
IS
SELECT * FROM Customers
WHERE cust_email IS NULL;
Advanced Cursor Declaration:
-- SQL Server: Cursor with options
DECLARE AdvancedCursor CURSOR
STATIC -- Snapshot of data
SCROLL -- Allow backward movement
READ_ONLY -- Prevent updates
FOR
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_state = 'CA'
ORDER BY cust_name;
Cursor Options:
Option | Description | Use Case |
---|---|---|
STATIC | Snapshot of data at open time | Consistent view during processing |
DYNAMIC | Live data, sees changes | Real-time data processing |
FORWARD_ONLY | Can only move forward | Simple sequential processing |
SCROLL | Can move in any direction | Interactive browsing |
READ_ONLY | Cannot update through cursor | Data analysis and reporting |
FOR UPDATE | Can update current row | Data modification scenarios |
Opening and Using Cursors
Opening Cursors
-- Universal syntax
OPEN CURSOR CustCursor;
-- Or simply
OPEN CustCursor;
When OPEN executes:
- Query execution: SELECT statement runs
- Result population: Data stored in cursor
- Position initialization: Cursor positioned before first row
- Resource allocation: Memory allocated for result set
Fetching Data
Oracle Example:
DECLARE
TYPE CustCursor IS REF CURSOR RETURN Customers%ROWTYPE;
cursor_var CustCursor;
CustRecord Customers%ROWTYPE;
BEGIN
OPEN cursor_var FOR
SELECT * FROM Customers WHERE cust_email IS NULL;
-- Fetch single row
FETCH cursor_var INTO CustRecord;
-- Process the record
DBMS_OUTPUT.PUT_LINE('Customer: ' || CustRecord.cust_name);
CLOSE cursor_var;
END;
/
Oracle Loop Processing:
DECLARE
TYPE CustCursor IS REF CURSOR RETURN Customers%ROWTYPE;
cursor_var CustCursor;
CustRecord Customers%ROWTYPE;
BEGIN
OPEN cursor_var FOR
SELECT * FROM Customers WHERE cust_email IS NULL;
LOOP
FETCH cursor_var INTO CustRecord;
EXIT WHEN cursor_var%NOTFOUND; -- Exit when no more rows
-- Process each customer
DBMS_OUTPUT.PUT_LINE('Processing: ' || CustRecord.cust_name);
-- Add business logic here
-- Update customer record, send email, etc.
END LOOP;
CLOSE cursor_var;
END;
/
SQL Server Example:
-- Declare variables for each column
DECLARE @cust_id CHAR(10),
@cust_name CHAR(50),
@cust_address CHAR(50),
@cust_city CHAR(50),
@cust_state CHAR(5),
@cust_zip CHAR(10),
@cust_country CHAR(50),
@cust_contact CHAR(50),
@cust_email CHAR(255);
-- Declare cursor
DECLARE CustCursor CURSOR FOR
SELECT * FROM Customers WHERE cust_email IS NULL;
-- Open cursor
OPEN CustCursor;
-- Fetch first row
FETCH NEXT FROM CustCursor
INTO @cust_id, @cust_name, @cust_address,
@cust_city, @cust_state, @cust_zip,
@cust_country, @cust_contact, @cust_email;
-- Process all rows
WHILE @@FETCH_STATUS = 0
BEGIN
-- Process current customer
PRINT 'Processing customer: ' + @cust_name;
-- Add business logic here
-- Update records, generate reports, etc.
-- Fetch next row
FETCH NEXT FROM CustCursor
INTO @cust_id, @cust_name, @cust_address,
@cust_city, @cust_state, @cust_zip,
@cust_country, @cust_contact, @cust_email;
END;
-- Clean up
CLOSE CustCursor;
DEALLOCATE CustCursor;
Fetch Status Values:
| Status | SQL Server | Oracle | Meaning |-----|-----|----- | Success | @@FETCH_STATUS = 0 | cursor%FOUND | Row retrieved successfully | No data | @@FETCH_STATUS = -1 | cursor%NOTFOUND | No more rows available | Missing row | @@FETCH_STATUS = -2 | N/A | Row was deleted
Advanced Cursor Operations
Scrollable Cursors (SQL Server)
DECLARE ScrollCursor CURSOR SCROLL FOR
SELECT cust_id, cust_name FROM Customers
ORDER BY cust_name;
OPEN ScrollCursor;
-- Move to different positions
FETCH FIRST FROM ScrollCursor; -- First row
FETCH LAST FROM ScrollCursor; -- Last row
FETCH ABSOLUTE 5 FROM ScrollCursor; -- 5th row
FETCH RELATIVE -2 FROM ScrollCursor; -- 2 rows back
FETCH NEXT FROM ScrollCursor; -- Next row
FETCH PRIOR FROM ScrollCursor; -- Previous row
CLOSE ScrollCursor;
DEALLOCATE ScrollCursor;
Updatable Cursors
-- SQL Server: Update through cursor
DECLARE UpdCursor CURSOR FOR
SELECT cust_id, cust_email
FROM Customers
WHERE cust_email IS NULL
FOR UPDATE OF cust_email;
OPEN UpdCursor;
DECLARE @cust_id CHAR(10), @cust_email CHAR(255);
FETCH NEXT FROM UpdCursor INTO @cust_id, @cust_email;
WHILE @@FETCH_STATUS = 0
BEGIN
-- Generate email address
SET @cust_email = LOWER(@cust_id) + '@example.com';
-- Update current row
UPDATE Customers
SET cust_email = @cust_email
WHERE CURRENT OF UpdCursor;
FETCH NEXT FROM UpdCursor INTO @cust_id, @cust_email;
END;
CLOSE UpdCursor;
DEALLOCATE UpdCursor;
Cursor with Parameters (SQL Server)
-- Parameterized cursor
DECLARE StateCursor CURSOR FOR
SELECT cust_id, cust_name, cust_city
FROM Customers
WHERE cust_state = @state_param
ORDER BY cust_city;
-- Use with different states
DECLARE @state_param CHAR(2) = 'CA';
OPEN StateCursor;
-- Process California customers
CLOSE StateCursor;
SET @state_param = 'NY';
OPEN StateCursor;
-- Process New York customers
CLOSE StateCursor;
DEALLOCATE StateCursor;
Practical Cursor Examples
Data Migration with Validation
-- Migrate customer data with validation
DECLARE MigrationCursor CURSOR FOR
SELECT cust_id, cust_name, cust_email, cust_phone
FROM LegacyCustomers
WHERE migration_status IS NULL;
OPEN MigrationCursor;
DECLARE @cust_id CHAR(10), @cust_name VARCHAR(50),
@cust_email VARCHAR(100), @cust_phone VARCHAR(20);
DECLARE @error_count INT = 0;
FETCH NEXT FROM MigrationCursor
INTO @cust_id, @cust_name, @cust_email, @cust_phone;
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
-- Validate data
IF @cust_email IS NOT NULL AND @cust_email NOT LIKE '%@%'
BEGIN
RAISERROR('Invalid email format', 16, 1);
END;
-- Insert into new system
INSERT INTO Customers(cust_id, cust_name, cust_email, cust_phone)
VALUES(@cust_id, @cust_name, @cust_email, @cust_phone);
-- Mark as migrated
UPDATE LegacyCustomers
SET migration_status = 'SUCCESS',
migration_date = GETDATE()
WHERE CURRENT OF MigrationCursor;
END TRY
BEGIN CATCH
-- Log error
SET @error_count = @error_count + 1;
UPDATE LegacyCustomers
SET migration_status = 'ERROR',
migration_error = ERROR_MESSAGE()
WHERE CURRENT OF MigrationCursor;
END CATCH;
FETCH NEXT FROM MigrationCursor
INTO @cust_id, @cust_name, @cust_email, @cust_phone;
END;
CLOSE MigrationCursor;
DEALLOCATE MigrationCursor;
PRINT 'Migration completed with ' + CAST(@error_count AS VARCHAR) + ' errors';
Batch Processing with Progress Tracking
-- Process large dataset in batches
DECLARE BatchCursor CURSOR FOR
SELECT order_num, cust_id, order_date
FROM Orders
WHERE status = 'PENDING'
ORDER BY order_date;
OPEN BatchCursor;
DECLARE @order_num INT, @cust_id CHAR(10), @order_date DATE;
DECLARE @processed_count INT = 0;
DECLARE @batch_size INT = 100;
FETCH NEXT FROM BatchCursor INTO @order_num, @cust_id, @order_date;
WHILE @@FETCH_STATUS = 0
BEGIN
-- Process order
EXEC ProcessOrder @order_num;
SET @processed_count = @processed_count + 1;
-- Progress reporting
IF @processed_count % @batch_size = 0
BEGIN
PRINT 'Processed ' + CAST(@processed_count AS VARCHAR) + ' orders';
-- Brief pause to allow other operations
WAITFOR DELAY '00:00:01';
END;
FETCH NEXT FROM BatchCursor INTO @order_num, @cust_id, @order_date;
END;
CLOSE BatchCursor;
DEALLOCATE BatchCursor;
PRINT 'Total orders processed: ' + CAST(@processed_count AS VARCHAR);
Report Generation with Grouping
-- Generate hierarchical report
DECLARE ReportCursor CURSOR FOR
SELECT v.vend_name, p.prod_name, p.prod_price
FROM Vendors v
INNER JOIN Products p ON v.vend_id = p.vend_id
ORDER BY v.vend_name, p.prod_name;
OPEN ReportCursor;
DECLARE @vend_name VARCHAR(50), @prod_name VARCHAR(50), @prod_price MONEY;
DECLARE @current_vendor VARCHAR(50) = '';
DECLARE @vendor_total MONEY = 0;
FETCH NEXT FROM ReportCursor INTO @vend_name, @prod_name, @prod_price;
WHILE @@FETCH_STATUS = 0
BEGIN
-- Check for vendor change
IF @vend_name != @current_vendor
BEGIN
-- Print vendor total (if not first vendor)
IF @current_vendor != ''
BEGIN
PRINT ' Vendor Total: $' + CAST(@vendor_total AS VARCHAR);
PRINT '';
END;
-- Start new vendor section
PRINT 'VENDOR: ' + @vend_name;
PRINT '----------------------------------------';
SET @current_vendor = @vend_name;
SET @vendor_total = 0;
END;
-- Print product line
PRINT ' ' + @prod_name + ' - $' + CAST(@prod_price AS VARCHAR);
SET @vendor_total = @vendor_total + @prod_price;
FETCH NEXT FROM ReportCursor INTO @vend_name, @prod_name, @prod_price;
END;
-- Print final vendor total
IF @current_vendor != ''
BEGIN
PRINT ' Vendor Total: $' + CAST(@vendor_total AS VARCHAR);
END;
CLOSE ReportCursor;
DEALLOCATE ReportCursor;
Closing Cursors
Basic Closing
-- Most DBMSs
CLOSE CustCursor;
-- SQL Server also requires deallocation
CLOSE CustCursor;
DEALLOCATE CURSOR CustCursor;
Proper Cleanup Pattern
-- SQL Server: Safe cleanup
IF CURSOR_STATUS('global', 'CustCursor') >= -1
BEGIN
IF CURSOR_STATUS('global', 'CustCursor') > -1
BEGIN
CLOSE CustCursor;
END;
DEALLOCATE CustCursor;
END;
Resource Management:
Operation | Purpose | Required |
---|---|---|
CLOSE | Release result set, maintain definition | Always |
DEALLOCATE | Free cursor memory completely | SQL Server, some others |
Reopen | Can reopen closed cursor | After CLOSE, before DEALLOCATE |
Performance Considerations
When to Use Cursors
-- ✅ Good: Row-by-row processing required
DECLARE ProcessCursor CURSOR FOR
SELECT customer_id, complex_calculation_needed
FROM Customers;
-- Each row needs individual processing logic
-- ❌ Bad: Set-based operation possible
DECLARE UpdateCursor CURSOR FOR
SELECT customer_id FROM Customers;
-- Could use: UPDATE Customers SET status = 'ACTIVE'
Cursor vs Set-Based Operations
-- ❌ Slow: Cursor-based update
DECLARE SlowCursor CURSOR FOR
SELECT cust_id FROM Customers WHERE cust_state = 'CA';
OPEN SlowCursor;
DECLARE @cust_id CHAR(10);
FETCH NEXT FROM SlowCursor INTO @cust_id;
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE Customers SET discount = 0.10 WHERE cust_id = @cust_id;
FETCH NEXT FROM SlowCursor INTO @cust_id;
END;
CLOSE SlowCursor;
DEALLOCATE SlowCursor;
-- ✅ Fast: Set-based update
UPDATE Customers
SET discount = 0.10
WHERE cust_state = 'CA';
Optimization Tips:
- Use FAST_FORWARD cursors for sequential processing
- Limit result sets with WHERE clauses
- Avoid unnecessary columns in SELECT
- Process in batches for large datasets
- Consider alternatives like CTEs or window functions
Best Practices
1. Always Clean Up Resources
-- ✅ Good: Proper cleanup
DECLARE @cursor_name VARCHAR(50) = 'MyCursor';
BEGIN TRY
DECLARE MyCursor CURSOR FOR SELECT * FROM Customers;
OPEN MyCursor;
-- Cursor operations here
END TRY
BEGIN CATCH
-- Handle errors
PRINT ERROR_MESSAGE();
END CATCH;
-- Cleanup in all cases
IF CURSOR_STATUS('local', @cursor_name) >= 0
BEGIN
CLOSE MyCursor;
DEALLOCATE MyCursor;
END;
2. Use Appropriate Cursor Types
-- For sequential processing
DECLARE FastCursor CURSOR FAST_FORWARD FOR
SELECT * FROM LargeTable;
-- For random access
DECLARE ScrollCursor CURSOR SCROLL STATIC FOR
SELECT * FROM SmallTable;
-- For updates
DECLARE UpdateCursor CURSOR FOR
SELECT * FROM EditableTable
FOR UPDATE OF column_name;
3. Minimize Cursor Scope
-- ✅ Good: Local cursor
DECLARE LocalCursor CURSOR LOCAL FOR
SELECT * FROM Customers;
-- ❌ Avoid: Global cursor (unless necessary)
DECLARE GlobalCursor CURSOR GLOBAL FOR
SELECT * FROM Customers;
Common Cursor Patterns
Error Handling with Cursors
DECLARE ErrorCursor CURSOR FOR
SELECT risky_data FROM ProblematicTable;
OPEN ErrorCursor;
DECLARE @data VARCHAR(100);
DECLARE @error_count INT = 0;
FETCH NEXT FROM ErrorCursor INTO @data;
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
-- Risky operation
EXEC ProcessRiskyData @data;
END TRY
BEGIN CATCH
SET @error_count = @error_count + 1;
PRINT 'Error processing: ' + @data + ' - ' + ERROR_MESSAGE();
END CATCH;
FETCH NEXT FROM ErrorCursor INTO @data;
END;
CLOSE ErrorCursor;
DEALLOCATE ErrorCursor;
PRINT 'Processing completed with ' + CAST(@error_count AS VARCHAR) + ' errors';
Advanced Features
Understanding Constraints
Constraints are rules that govern how database data is inserted or manipulated. They enforce data integrity at the database level, ensuring consistent and valid data across all applications.
Why Use Database-Level Constraints?
Problem | Client-Side Validation | Database Constraints |
---|---|---|
Consistency | Each client must implement rules | Rules enforced universally |
Reliability | Some clients may skip validation | DBMS always enforces rules |
Performance | Network round-trips for validation | Efficient server-side checking |
Maintenance | Update multiple applications | Change rules in one place |
Benefits of Constraints:
- Data integrity: Prevent invalid data entry
- Referential integrity: Maintain relationships between tables
- Business rules: Enforce domain-specific requirements
- Error prevention: Catch problems before they corrupt data
Primary Keys
Primary keys uniquely identify each row in a table and serve as the foundation for referential integrity.
Primary Key Requirements:
- Uniqueness: No two rows can have the same primary key value
- Non-null: Every row must have a primary key value
- Immutable: Primary key values cannot be modified
- Non-reusable: Deleted primary keys cannot be reassigned
Creating Primary Keys:
-- Method 1: Inline definition
CREATE TABLE Vendors (
vend_id CHAR(10) NOT NULL PRIMARY KEY,
vend_name CHAR(50) NOT NULL,
vend_address CHAR(50) NULL,
vend_city CHAR(50) NULL,
vend_state CHAR(5) NULL,
vend_zip CHAR(10) NULL,
vend_country CHAR(50) NULL
);
-- Method 2: Constraint syntax
CREATE TABLE Vendors (
vend_id CHAR(10) NOT NULL,
vend_name CHAR(50) NOT NULL,
-- other columns...
CONSTRAINT pk_vendors PRIMARY KEY (vend_id)
);
-- Method 3: ALTER TABLE (most DBMSs)
ALTER TABLE Vendors
ADD CONSTRAINT pk_vendors PRIMARY KEY (vend_id);
Composite Primary Keys:
-- Multiple columns as primary key
CREATE TABLE OrderItems (
order_num INTEGER NOT NULL,
order_item INTEGER NOT NULL,
prod_id CHAR(10) NOT NULL,
quantity INTEGER NOT NULL,
item_price MONEY NOT NULL,
CONSTRAINT pk_orderitems PRIMARY KEY (order_num, order_item)
);
Foreign Keys
Foreign keys enforce referential integrity by ensuring values in one table correspond to valid values in another table's primary key.
Foreign Key Concepts:
-- Parent table (referenced)
CREATE TABLE Customers (
cust_id CHAR(10) PRIMARY KEY,
cust_name VARCHAR(50) NOT NULL
);
-- Child table (referencing)
CREATE TABLE Orders (
order_num INTEGER PRIMARY KEY,
order_date DATETIME NOT NULL,
cust_id CHAR(10) NOT NULL,
CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
);
Foreign Key Creation Methods:
-- Method 1: Inline REFERENCES
CREATE TABLE Orders (
order_num INTEGER NOT NULL PRIMARY KEY,
order_date DATETIME NOT NULL,
cust_id CHAR(10) NOT NULL REFERENCES Customers(cust_id)
);
-- Method 2: CONSTRAINT syntax
ALTER TABLE Orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id);
-- Method 3: Named constraint in CREATE TABLE
CREATE TABLE Orders (
order_num INTEGER NOT NULL PRIMARY KEY,
order_date DATETIME NOT NULL,
cust_id CHAR(10) NOT NULL,
CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
);
Cascading Actions:
-- Cascade delete: Delete child records when parent is deleted
ALTER TABLE Orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
ON DELETE CASCADE;
-- Cascade update: Update child records when parent key changes
ALTER TABLE Orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
ON UPDATE CASCADE;
-- Set null: Set foreign key to NULL when parent is deleted
ALTER TABLE Orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
ON DELETE SET NULL;
-- Restrict: Prevent deletion of parent if children exist (default)
ALTER TABLE Orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
ON DELETE RESTRICT;
Benefits of Foreign Keys:
- Referential integrity: Prevent orphaned records
- Data consistency: Maintain valid relationships
- Accidental deletion prevention: Protect related data
- Documentation: Self-documenting table relationships
Unique Constraints
Unique constraints ensure column values are unique but allow more flexibility than primary keys.
Primary Key vs Unique Constraint:
Feature | Primary Key | Unique Constraint |
---|---|---|
Quantity per table | One only | Multiple allowed |
NULL values | Not allowed | Allowed |
Modification | Not allowed | Allowed |
Reuse after deletion | Not allowed | Allowed |
Foreign key reference | Yes | No |
Creating Unique Constraints:
-- Employee table with multiple unique constraints
CREATE TABLE Employees (
emp_id INTEGER PRIMARY KEY,
emp_ssn CHAR(11) UNIQUE, -- Inline unique
emp_email VARCHAR(100),
emp_phone VARCHAR(20),
emp_name VARCHAR(100) NOT NULL,
-- Named unique constraints
CONSTRAINT uk_emp_email UNIQUE (emp_email),
CONSTRAINT uk_emp_phone UNIQUE (emp_phone)
);
-- ALTER TABLE method
ALTER TABLE Employees
ADD CONSTRAINT uk_emp_badge UNIQUE (emp_badge_number);
Composite Unique Constraints:
-- Ensure combination of columns is unique
CREATE TABLE CourseEnrollments (
enrollment_id INTEGER PRIMARY KEY,
student_id INTEGER NOT NULL,
course_id INTEGER NOT NULL,
semester VARCHAR(20) NOT NULL,
-- Student can only enroll in same course once per semester
CONSTRAINT uk_student_course_semester
UNIQUE (student_id, course_id, semester)
);
Check Constraints
Check constraints validate data against specific criteria before allowing insertion or updates.
Common Check Constraint Uses:
-- Range validation
CREATE TABLE Products (
prod_id CHAR(10) PRIMARY KEY,
prod_name VARCHAR(50) NOT NULL,
prod_price MONEY CHECK (prod_price > 0),
prod_weight DECIMAL(8,2) CHECK (prod_weight BETWEEN 0 AND 1000),
prod_category VARCHAR(20) CHECK (prod_category IN ('Electronics', 'Clothing', 'Books'))
);
-- Date validation
CREATE TABLE Orders (
order_num INTEGER PRIMARY KEY,
order_date DATE NOT NULL,
ship_date DATE,
-- Ship date must be on or after order date
CONSTRAINT chk_ship_after_order
CHECK (ship_date >= order_date)
);
-- Pattern validation
CREATE TABLE Customers (
cust_id CHAR(10) PRIMARY KEY,
cust_email VARCHAR(100) CHECK (cust_email LIKE '%@%'),
cust_phone VARCHAR(15) CHECK (cust_phone LIKE '[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'),
cust_gender CHAR(1) CHECK (cust_gender IN ('M', 'F', 'O'))
);
Complex Check Constraints:
-- Business rule validation
CREATE TABLE OrderItems (
order_num INTEGER NOT NULL,
order_item INTEGER NOT NULL,
prod_id CHAR(10) NOT NULL,
quantity INTEGER NOT NULL CHECK (quantity > 0),
item_price MONEY NOT NULL CHECK (item_price >= 0),
discount_pct DECIMAL(5,2) DEFAULT 0,
-- Discount cannot exceed 50%
CONSTRAINT chk_discount_limit CHECK (discount_pct BETWEEN 0 AND 50),
-- Total discount amount cannot exceed item price
CONSTRAINT chk_discount_amount
CHECK ((item_price * quantity * discount_pct / 100) <= (item_price * quantity))
);
-- Cross-column validation
CREATE TABLE Employees (
emp_id INTEGER PRIMARY KEY,
hire_date DATE NOT NULL,
termination_date DATE,
birth_date DATE NOT NULL,
-- Termination date must be after hire date
CONSTRAINT chk_term_after_hire
CHECK (termination_date IS NULL OR termination_date > hire_date),
-- Employee must be at least 16 years old when hired
CONSTRAINT chk_minimum_age
CHECK (DATEDIFF(year, birth_date, hire_date) >= 16)
);
Adding Check Constraints:
-- Add to existing table
ALTER TABLE Products
ADD CONSTRAINT chk_prod_price_positive
CHECK (prod_price > 0);
-- Multiple conditions
ALTER TABLE Orders
ADD CONSTRAINT chk_order_dates
CHECK (order_date <= GETDATE() AND
(ship_date IS NULL OR ship_date >= order_date));
Understanding Indexes
Indexes improve query performance by creating sorted references to table data, similar to book indexes.
How Indexes Work:
-- Without index: Full table scan
SELECT * FROM Customers WHERE cust_state = 'CA';
-- DBMS must examine every row
-- With index on cust_state: Index seek
CREATE INDEX idx_cust_state ON Customers(cust_state);
SELECT * FROM Customers WHERE cust_state = 'CA';
-- DBMS uses index to find matching rows quickly
Index Benefits and Costs:
| Benefits | Costs |-----|-----|----- | Faster SELECT queries | Slower INSERT/UPDATE/DELETE | Faster ORDER BY operations | Additional storage space | Faster JOIN operations | Index maintenance overhead | Faster WHERE clause filtering | Memory usage for index cache
Creating Indexes:
-- Simple index on single column
CREATE INDEX idx_prod_name ON Products(prod_name);
-- Composite index on multiple columns
CREATE INDEX idx_cust_name_state ON Customers(cust_name, cust_state);
-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_emp_ssn ON Employees(emp_ssn);
-- Descending index
CREATE INDEX idx_order_date_desc ON Orders(order_date DESC);
Index Types and Options:
-- Clustered index (SQL Server) - physically orders table data
CREATE CLUSTERED INDEX idx_cust_id ON Customers(cust_id);
-- Non-clustered index (default) - separate structure pointing to data
CREATE NONCLUSTERED INDEX idx_cust_email ON Customers(cust_email);
-- Partial index (PostgreSQL) - index only rows meeting condition
CREATE INDEX idx_active_customers ON Customers(cust_name)
WHERE cust_status = 'ACTIVE';
-- Covering index - includes additional columns for query optimization
CREATE INDEX idx_order_covering ON Orders(cust_id)
INCLUDE (order_date, order_total);
Index Strategy Guidelines:
Good Candidates for Indexing:
- Primary keys (usually indexed automatically)
- Foreign keys (for JOIN performance)
- Frequently searched columns (WHERE clauses)
- Columns used in ORDER BY
- Columns with high selectivity (many unique values)
Poor Candidates for Indexing:
- Small tables (full scan is faster)
- Columns with low selectivity (few unique values)
- Frequently updated columns (high maintenance cost)
- Very wide columns (large storage overhead)
-- ✅ Good: High selectivity, frequently searched
CREATE INDEX idx_customer_email ON Customers(cust_email);
CREATE INDEX idx_order_date ON Orders(order_date);
-- ❌ Poor: Low selectivity
CREATE INDEX idx_customer_gender ON Customers(cust_gender); -- Only M/F/O values
-- ✅ Good: Composite index for common query pattern
CREATE INDEX idx_product_category_price ON Products(prod_category, prod_price);
-- Supports: WHERE prod_category = 'Electronics' ORDER BY prod_price
Understanding Triggers
Triggers are special stored procedures that execute automatically in response to specific database events.
Trigger Types:
Timing | Event | Description |
---|---|---|
BEFORE | INSERT | Execute before row insertion |
AFTER | INSERT | Execute after row insertion |
BEFORE | UPDATE | Execute before row modification |
AFTER | UPDATE | Execute after row modification |
BEFORE | DELETE | Execute before row deletion |
AFTER | DELETE | Execute after row deletion |
INSTEAD OF | Any | Replace the triggering operation (views) |
Common Trigger Uses:
1. Data Validation and Transformation:
-- SQL Server: Convert state to uppercase
CREATE TRIGGER tr_customer_state_upper
ON Customers
FOR INSERT, UPDATE
AS
BEGIN
UPDATE Customers
SET cust_state = UPPER(cust_state)
WHERE cust_id IN (SELECT cust_id FROM inserted);
END;
-- Oracle/PostgreSQL version
CREATE TRIGGER tr_customer_state_upper
BEFORE INSERT OR UPDATE ON Customers
FOR EACH ROW
BEGIN
:NEW.cust_state := UPPER(:NEW.cust_state);
END;
2. Audit Trail Creation:
-- Create audit table
CREATE TABLE CustomerAudit (
audit_id INTEGER IDENTITY PRIMARY KEY,
cust_id CHAR(10),
action_type VARCHAR(10),
old_values VARCHAR(1000),
new_values VARCHAR(1000),
changed_by VARCHAR(50),
changed_date DATETIME
);
-- SQL Server audit trigger
CREATE TRIGGER tr_customer_audit
ON Customers
FOR INSERT, UPDATE, DELETE
AS
BEGIN
-- Log insertions
INSERT INTO CustomerAudit(cust_id, action_type, new_values, changed_by, changed_date)
SELECT cust_id, 'INSERT',
'Name: ' + cust_name + ', Email: ' + ISNULL(cust_email, 'NULL'),
SYSTEM_USER, GETDATE()
FROM inserted;
-- Log updates
INSERT INTO CustomerAudit(cust_id, action_type, old_values, new_values, changed_by, changed_date)
SELECT i.cust_id, 'UPDATE',
'Name: ' + d.cust_name + ', Email: ' + ISNULL(d.cust_email, 'NULL'),
'Name: ' + i.cust_name + ', Email: ' + ISNULL(i.cust_email, 'NULL'),
SYSTEM_USER, GETDATE()
FROM inserted i
INNER JOIN deleted d ON i.cust_id = d.cust_id;
-- Log deletions
INSERT INTO CustomerAudit(cust_id, action_type, old_values, changed_by, changed_date)
SELECT cust_id, 'DELETE',
'Name: ' + cust_name + ', Email: ' + ISNULL(cust_email, 'NULL'),
SYSTEM_USER, GETDATE()
FROM deleted;
END;
3. Business Rule Enforcement:
-- Prevent orders that exceed customer credit limit
CREATE TRIGGER tr_check_credit_limit
ON Orders
FOR INSERT, UPDATE
AS
BEGIN
DECLARE @exceeded_customers TABLE (cust_id CHAR(10));
-- Find customers who would exceed credit limit
INSERT INTO @exceeded_customers
SELECT i.cust_id
FROM inserted i
INNER JOIN Customers c ON i.cust_id = c.cust_id
WHERE i.order_total > c.credit_limit;
-- Reject transaction if any customer exceeds limit
IF EXISTS (SELECT 1 FROM @exceeded_customers)
BEGIN
ROLLBACK TRANSACTION;
RAISERROR('Order exceeds customer credit limit', 16, 1);
END;
END;
4. Automatic Timestamp Updates:
-- Add timestamp columns to table
ALTER TABLE Products
ADD created_date DATETIME DEFAULT GETDATE(),
modified_date DATETIME DEFAULT GETDATE();
-- Trigger to update modification timestamp
CREATE TRIGGER tr_product_timestamp
ON Products
FOR UPDATE
AS
BEGIN
UPDATE Products
SET modified_date = GETDATE()
WHERE prod_id IN (SELECT prod_id FROM inserted);
END;
5. Inventory Management:
-- Update inventory when order items are added/modified
CREATE TRIGGER tr_update_inventory
ON OrderItems
FOR INSERT, UPDATE, DELETE
AS
BEGIN
-- Handle insertions and updates
UPDATE Products
SET quantity_in_stock = quantity_in_stock -
(ISNULL(i.quantity, 0) - ISNULL(d.quantity, 0))
FROM Products p
INNER JOIN inserted i ON p.prod_id = i.prod_id
LEFT JOIN deleted d ON i.order_num = d.order_num AND i.order_item = d.order_item;
-- Handle deletions (restore inventory)
UPDATE Products
SET quantity_in_stock = quantity_in_stock + d.quantity
FROM Products p
INNER JOIN deleted d ON p.prod_id = d.prod_id
WHERE NOT EXISTS (SELECT 1 FROM inserted WHERE order_num = d.order_num AND order_item = d.order_item);
END;
Trigger Best Practices:
-- ✅ Good: Keep triggers simple and fast
CREATE TRIGGER tr_simple_audit
ON Customers
FOR UPDATE
AS
BEGIN
INSERT INTO AuditLog(table_name, action, user_name, timestamp)
VALUES('Customers', 'UPDATE', SYSTEM_USER, GETDATE());
END;
-- ❌ Bad: Complex logic in trigger
CREATE TRIGGER tr_complex_business_logic
ON Orders
FOR INSERT
AS
BEGIN
-- Multiple complex calculations
-- External API calls
-- Complex business rules
-- This should be in stored procedures or application code
END;
Database Security
Database security protects data through user authentication, authorization, and access control mechanisms.
Security Layers:
- Authentication: Verify user identity
- Authorization: Control what users can do
- Access Control: Limit data visibility
- Auditing: Track user activities
User Management:
-- Create database users
CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'secure_password';
CREATE USER 'report_user'@'%' IDENTIFIED BY 'report_password';
CREATE USER 'admin_user'@'localhost' IDENTIFIED BY 'admin_password';
-- Create roles for easier management
CREATE ROLE 'app_role';
CREATE ROLE 'reporting_role';
CREATE ROLE 'admin_role';
GRANT Statement - Giving Permissions:
-- Grant table-level permissions
GRANT SELECT ON Customers TO 'report_user'@'%';
GRANT INSERT, UPDATE ON Orders TO 'app_user'@'localhost';
GRANT ALL PRIVILEGES ON Products TO 'admin_user'@'localhost';
-- Grant column-level permissions
GRANT SELECT (cust_name, cust_email) ON Customers TO 'marketing_user';
GRANT UPDATE (prod_price) ON Products TO 'pricing_user';
-- Grant database-level permissions
GRANT CREATE, DROP ON sales_db.* TO 'admin_user'@'localhost';
-- Grant role permissions
GRANT 'app_role' TO 'app_user'@'localhost';
GRANT 'reporting_role' TO 'report_user'@'%';
REVOKE Statement - Removing Permissions:
-- Revoke specific permissions
REVOKE INSERT ON Orders FROM 'app_user'@'localhost';
REVOKE ALL PRIVILEGES ON Products FROM 'temp_user'@'%';
-- Revoke role membership
REVOKE 'admin_role' FROM 'former_admin'@'localhost';
Advanced Security Features:
-- Row-level security (SQL Server)
CREATE FUNCTION dbo.fn_security_predicate(@user_id INT)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS result
WHERE @user_id = USER_ID() OR IS_MEMBER('db_owner') = 1;
-- Apply security policy
CREATE SECURITY POLICY CustomerSecurityPolicy
ADD FILTER PREDICATE dbo.fn_security_predicate(cust_id) ON Customers,
ADD BLOCK PREDICATE dbo.fn_security_predicate(cust_id) ON Customers;
-- View-based security
CREATE VIEW CustomerSafeView AS
SELECT cust_id, cust_name, cust_city, cust_state
FROM Customers
WHERE cust_region = USER_REGION();
-- Grant access to view, not table
GRANT SELECT ON CustomerSafeView TO 'regional_user';
REVOKE ALL ON Customers FROM 'regional_user';
Security Best Practices:
1. Principle of Least Privilege:
-- ✅ Good: Minimal necessary permissions
GRANT SELECT ON Orders TO 'report_user';
GRANT INSERT, UPDATE ON OrderItems TO 'order_entry_user';
-- ❌ Bad: Excessive permissions
GRANT ALL PRIVILEGES ON *.* TO 'app_user'; -- Too broad
2. Use Roles for Permission Management:
-- Create functional roles
CREATE ROLE order_processor;
CREATE ROLE customer_service;
CREATE ROLE financial_analyst;
-- Assign permissions to roles
GRANT SELECT, INSERT, UPDATE ON Orders TO order_processor;
GRANT SELECT ON Customers TO customer_service;
GRANT SELECT ON financial_tables TO financial_analyst;
-- Assign roles to users
GRANT order_processor TO 'john_doe';
GRANT customer_service TO 'jane_smith';
3. Secure Sensitive Data:
-- Create separate schema for sensitive data
CREATE SCHEMA sensitive_data;
-- Move sensitive tables
ALTER SCHEMA sensitive_data TRANSFER dbo.EmployeeSalaries;
ALTER SCHEMA sensitive_data TRANSFER dbo.CustomerCreditCards;
-- Restrict access to schema
REVOKE ALL ON SCHEMA::sensitive_data FROM public;
GRANT SELECT ON SCHEMA::sensitive_data TO 'hr_manager';
Putting It All Together
Complete Table with All Constraint Types:
CREATE TABLE ComprehensiveOrders (
-- Primary key
order_id INTEGER IDENTITY(1,1) PRIMARY KEY,
-- Foreign keys
cust_id CHAR(10) NOT NULL,
emp_id INTEGER,
-- Unique constraints
order_number VARCHAR(20) NOT NULL UNIQUE,
-- Check constraints
order_date DATE NOT NULL CHECK (order_date <= GETDATE()),
ship_date DATE CHECK (ship_date >= order_date),
order_total MONEY CHECK (order_total >= 0),
order_status VARCHAR(20) DEFAULT 'PENDING'
CHECK (order_status IN ('PENDING', 'PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED')),
-- Timestamps
created_date DATETIME DEFAULT GETDATE(),
modified_date DATETIME DEFAULT GETDATE(),
-- Foreign key constraints
CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id),
CONSTRAINT fk_orders_employees
FOREIGN KEY (emp_id) REFERENCES Employees(emp_id),
-- Additional check constraints
CONSTRAINT chk_order_dates
CHECK (ship_date IS NULL OR ship_date >= order_date),
CONSTRAINT chk_order_total_reasonable
CHECK (order_total BETWEEN 0 AND 1000000)
);
-- Create indexes for performance
CREATE INDEX idx_orders_customer ON ComprehensiveOrders(cust_id);
CREATE INDEX idx_orders_date ON ComprehensiveOrders(order_date);
CREATE INDEX idx_orders_status ON ComprehensiveOrders(order_status);
-- Create trigger for automatic timestamp updates
CREATE TRIGGER tr_orders_timestamp
ON ComprehensiveOrders
FOR UPDATE
AS
BEGIN
UPDATE ComprehensiveOrders
SET modified_date = GETDATE()
WHERE order_id IN (SELECT order_id FROM inserted);
END;
Statement Syntax
Reading Syntax Conventions
Symbol Meanings:
- | indicates alternatives (choose one):
NULL|NOT NULL
- [ ] indicates optional elements:
[WHERE ...]
- ... indicates repeatable elements:
column, column, ...
- UPPERCASE indicates SQL keywords
- lowercase indicates user-defined names
ALTER TABLE
Purpose: Modify the structure of an existing table
ALTER TABLE tablename
(
ADD|DROP column datatype [NULL|NOT NULL] [CONSTRAINTS],
ADD|DROP column datatype [NULL|NOT NULL] [CONSTRAINTS],
...
);
Examples:
-- Add a new column
ALTER TABLE Customers
ADD cust_phone VARCHAR(15) NULL;
-- Add multiple columns
ALTER TABLE Products
(
ADD prod_weight DECIMAL(8,2) NULL,
ADD prod_color VARCHAR(20) NOT NULL DEFAULT 'Unknown'
);
-- Drop a column
ALTER TABLE Customers
DROP COLUMN cust_fax;
-- Add constraint
ALTER TABLE Orders
ADD CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id);
-- Modify column datatype
ALTER TABLE Products
ALTER COLUMN prod_price DECIMAL(10,2) NOT NULL;
Related: See Lesson 17 - Creating and Manipulating Tables
COMMIT
Purpose: Save transaction changes to the database
COMMIT [TRANSACTION];
Examples:
-- Basic commit
BEGIN TRANSACTION;
INSERT INTO Customers VALUES(...);
UPDATE Orders SET status = 'SHIPPED';
COMMIT;
-- Named transaction commit
BEGIN TRANSACTION OrderProcessing;
-- Multiple operations
COMMIT TRANSACTION OrderProcessing;
-- Implicit commit (auto-commit mode)
INSERT INTO Products VALUES(...); -- Automatically committed
Related: See Lesson 20 - Managing Transaction Processing
CREATE INDEX
Purpose: Create an index to improve query performance
CREATE INDEX indexname
ON tablename (column, ...);
Examples:
-- Simple index on single column
CREATE INDEX idx_cust_name
ON Customers (cust_name);
-- Composite index on multiple columns
CREATE INDEX idx_order_date_customer
ON Orders (order_date, cust_id);
-- Unique index
CREATE UNIQUE INDEX idx_prod_code
ON Products (prod_code);
-- Descending index
CREATE INDEX idx_order_date_desc
ON Orders (order_date DESC);
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_customers
ON Customers (cust_name)
WHERE cust_status = 'ACTIVE';
Related: See Lesson 22 - Understanding Advanced SQL Features
CREATE PROCEDURE
Purpose: Create a stored procedure for reusable SQL code
CREATE PROCEDURE procedurename [parameters] [options]
AS
SQL statement;
Examples:
-- SQL Server procedure
CREATE PROCEDURE GetCustomerOrders
@cust_id CHAR(10)
AS
BEGIN
SELECT * FROM Orders
WHERE cust_id = @cust_id;
END;
-- Procedure with output parameter
CREATE PROCEDURE GetOrderTotal
@order_num INTEGER,
@total MONEY OUTPUT
AS
BEGIN
SELECT @total = SUM(quantity * item_price)
FROM OrderItems
WHERE order_num = @order_num;
END;
-- Oracle procedure syntax
CREATE PROCEDURE GetCustomerOrders(
p_cust_id IN VARCHAR2
)
IS
BEGIN
SELECT * FROM Orders
WHERE cust_id = p_cust_id;
END;
Related: See Lesson 19 - Working with Stored Procedures
CREATE TABLE
Purpose: Create a new database table
CREATE TABLE tablename
(
column datatype [NULL|NOT NULL] [CONSTRAINTS],
column datatype [NULL|NOT NULL] [CONSTRAINTS],
...
);
Examples:
-- Basic table creation
CREATE TABLE Customers
(
cust_id CHAR(10) NOT NULL PRIMARY KEY,
cust_name VARCHAR(50) NOT NULL,
cust_email VARCHAR(100) NULL,
cust_phone VARCHAR(15) NULL
);
-- Table with constraints
CREATE TABLE Orders
(
order_num INTEGER NOT NULL PRIMARY KEY,
order_date DATE NOT NULL DEFAULT GETDATE(),
cust_id CHAR(10) NOT NULL,
order_total MONEY CHECK (order_total >= 0),
CONSTRAINT fk_orders_customers
FOREIGN KEY (cust_id) REFERENCES Customers(cust_id)
);
-- Table with auto-increment
CREATE TABLE Products
(
prod_id INTEGER IDENTITY(1,1) PRIMARY KEY,
prod_name VARCHAR(100) NOT NULL,
prod_price DECIMAL(8,2) NOT NULL CHECK (prod_price > 0),
created_date DATETIME DEFAULT GETDATE()
);
Related: See Lesson 17 - Creating and Manipulating Tables
CREATE VIEW
Purpose: Create a virtual table based on a SELECT statement
CREATE VIEW viewname AS
SELECT columns, ...
FROM tables, ...
[WHERE ...]
[GROUP BY ...]
[HAVING ...];
Examples:
-- Simple view
CREATE VIEW CustomerOrders AS
SELECT c.cust_name, o.order_num, o.order_date
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
-- View with calculations
CREATE VIEW OrderTotals AS
SELECT order_num,
SUM(quantity * item_price) AS order_total,
COUNT(*) AS item_count
FROM OrderItems
GROUP BY order_num;
-- Filtered view
CREATE VIEW ActiveCustomers AS
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_status = 'ACTIVE'
AND cust_email IS NOT NULL;
-- Complex view with joins and aggregation
CREATE VIEW ProductSales AS
SELECT p.prod_name,
p.prod_price,
SUM(oi.quantity) AS total_sold,
SUM(oi.quantity * oi.item_price) AS total_revenue
FROM Products p
LEFT JOIN OrderItems oi ON p.prod_id = oi.prod_id
GROUP BY p.prod_id, p.prod_name, p.prod_price;
Related: See Lesson 18 - Using Views
DELETE
Purpose: Remove one or more rows from a table
DELETE FROM tablename
[WHERE ...];
Examples:
-- Delete specific rows
DELETE FROM Customers
WHERE cust_state = 'CA';
-- Delete with subquery
DELETE FROM Orders
WHERE cust_id IN (
SELECT cust_id FROM Customers
WHERE cust_status = 'INACTIVE'
);
-- Delete all rows (use with caution!)
DELETE FROM TempTable;
-- Delete with JOIN (SQL Server)
DELETE o
FROM Orders o
INNER JOIN Customers c ON o.cust_id = c.cust_id
WHERE c.cust_status = 'CLOSED';
-- Delete with LIMIT (MySQL)
DELETE FROM Orders
WHERE order_date < '2020-01-01'
LIMIT 100;
Related: See Lesson 16 - Updating and Deleting Data
DROP
Purpose: Permanently remove database objects
DROP INDEX|PROCEDURE|TABLE|VIEW indexname|procedurename|tablename|viewname;
Examples:
-- Drop table
DROP TABLE TempCustomers;
-- Drop view
DROP VIEW CustomerOrders;
-- Drop index
DROP INDEX idx_cust_name;
-- Drop procedure
DROP PROCEDURE GetCustomerOrders;
-- Drop with IF EXISTS (safer)
DROP TABLE IF EXISTS TempTable;
DROP VIEW IF EXISTS OldView;
-- Drop multiple objects
DROP TABLE Table1, Table2, Table3;
Related: See Lessons 17 and 18
INSERT
Purpose: Add a single row to a table
INSERT INTO tablename [(columns, ...)]
VALUES (values, ...);
Examples:
-- Insert with all columns
INSERT INTO Customers
VALUES ('1000000006', 'Toy Land', '123 Any Street', 'New York', 'NY', '11111', 'USA', NULL, NULL);
-- Insert with specified columns
INSERT INTO Customers (cust_id, cust_name, cust_city, cust_state)
VALUES ('1000000007', 'The Toy Store', 'Chicago', 'IL');
-- Insert multiple rows
INSERT INTO Products (prod_name, prod_price)
VALUES
('Product A', 19.99),
('Product B', 29.99),
('Product C', 39.99);
-- Insert with DEFAULT values
INSERT INTO Orders (cust_id, order_date)
VALUES ('1000000001', DEFAULT);
-- Insert with calculated values
INSERT INTO OrderSummary (order_num, order_total)
VALUES (20001, (SELECT SUM(quantity * item_price) FROM OrderItems WHERE order_num = 20001));
Related: See Lesson 15 - Inserting Data
INSERT SELECT
Purpose: Insert the results of a SELECT statement into a table
INSERT INTO tablename [(columns, ...)]
SELECT columns, ...
FROM tablename, ...
[WHERE ...];
Examples:
-- Copy all data
INSERT INTO CustomersBackup
SELECT * FROM Customers;
-- Copy specific columns
INSERT INTO CustomerContacts (cust_id, cust_name, cust_email)
SELECT cust_id, cust_name, cust_email
FROM Customers
WHERE cust_email IS NOT NULL;
-- Insert with calculations
INSERT INTO MonthlySales (month_year, total_sales)
SELECT
FORMAT(order_date, 'yyyy-MM') AS month_year,
SUM(order_total) AS total_sales
FROM Orders
WHERE order_date >= '2023-01-01'
GROUP BY FORMAT(order_date, 'yyyy-MM');
-- Insert from multiple tables
INSERT INTO CustomerOrderSummary (cust_id, cust_name, order_count, total_spent)
SELECT
c.cust_id,
c.cust_name,
COUNT(o.order_num) AS order_count,
SUM(o.order_total) AS total_spent
FROM Customers c
LEFT JOIN Orders o ON c.cust_id = o.cust_id
GROUP BY c.cust_id, c.cust_name;
Related: See Lesson 15 - Inserting Data
ROLLBACK
Purpose: Undo transaction changes
ROLLBACK [TO savepointname];
-- or
ROLLBACK TRANSACTION;
Examples:
-- Basic rollback
BEGIN TRANSACTION;
DELETE FROM Orders WHERE order_date < '2020-01-01';
-- Oops, wrong date!
ROLLBACK;
-- Rollback to savepoint
BEGIN TRANSACTION;
INSERT INTO Customers VALUES(...);
SAVE TRANSACTION customer_added;
INSERT INTO Orders VALUES(...);
-- Error occurred
ROLLBACK TRANSACTION customer_added; -- Keep customer, undo order
COMMIT;
-- Conditional rollback
BEGIN TRANSACTION;
UPDATE Inventory SET quantity = quantity - 10 WHERE prod_id = 'P001';
IF (SELECT quantity FROM Inventory WHERE prod_id = 'P001') < 0
BEGIN
ROLLBACK;
RAISERROR('Insufficient inventory', 16, 1);
END
ELSE
BEGIN
COMMIT;
END
Related: See Lesson 20 - Managing Transaction Processing
SELECT
Purpose: Retrieve data from one or more tables or views
SELECT columnname, ...
FROM tablename, ...
[WHERE ...]
[UNION ...]
[GROUP BY ...]
[HAVING ...]
[ORDER BY ...];
Examples:
-- Basic SELECT
SELECT cust_name, cust_email
FROM Customers;
-- SELECT with WHERE clause
SELECT prod_name, prod_price
FROM Products
WHERE prod_price BETWEEN 10 AND 50;
-- SELECT with JOIN
SELECT c.cust_name, o.order_num, o.order_date
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
-- SELECT with aggregation
SELECT cust_state, COUNT(*) AS customer_count
FROM Customers
GROUP BY cust_state
HAVING COUNT(*) > 5
ORDER BY customer_count DESC;
-- SELECT with subquery
SELECT cust_name
FROM Customers
WHERE cust_id IN (
SELECT DISTINCT cust_id
FROM Orders
WHERE order_date >= '2023-01-01'
);
-- SELECT with UNION
SELECT cust_name, cust_city FROM Customers WHERE cust_state = 'CA'
UNION
SELECT cust_name, cust_city FROM Customers WHERE cust_state = 'NY';
-- Complex SELECT with window functions
SELECT
cust_name,
order_total,
ROW_NUMBER() OVER (PARTITION BY cust_state ORDER BY order_total DESC) AS rank_in_state
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
Related: See Lessons 2-14 covering all aspects of SELECT
UPDATE
Purpose: Modify one or more rows in a table
UPDATE tablename
SET columnname = value, ...
[WHERE ...];
Examples:
-- Update specific rows
UPDATE Customers
SET cust_email = 'newemail@example.com'
WHERE cust_id = '1000000001';
-- Update multiple columns
UPDATE Products
SET prod_price = prod_price * 1.1,
modified_date = GETDATE()
WHERE prod_category = 'Electronics';
-- Update with calculation
UPDATE OrderItems
SET item_price = (
SELECT prod_price
FROM Products
WHERE Products.prod_id = OrderItems.prod_id
);
-- Update with JOIN (SQL Server)
UPDATE oi
SET item_price = p.prod_price
FROM OrderItems oi
INNER JOIN Products p ON oi.prod_id = p.prod_id;
-- Conditional update
UPDATE Customers
SET cust_status = CASE
WHEN last_order_date < DATEADD(year, -1, GETDATE()) THEN 'INACTIVE'
WHEN last_order_date < DATEADD(month, -6, GETDATE()) THEN 'DORMANT'
ELSE 'ACTIVE'
END;
-- Update all rows (use with caution!)
UPDATE Products
SET modified_date = GETDATE();
Related: See Lesson 16 - Updating and Deleting Data
Advanced Syntax Patterns
Common Table Expressions (CTE):
WITH CustomerTotals AS (
SELECT cust_id, SUM(order_total) AS total_spent
FROM Orders
GROUP BY cust_id
)
SELECT c.cust_name, ct.total_spent
FROM Customers c
INNER JOIN CustomerTotals ct ON c.cust_id = ct.cust_id;
Window Functions:
SELECT
cust_name,
order_total,
ROW_NUMBER() OVER (ORDER BY order_total DESC) AS rank,
SUM(order_total) OVER (PARTITION BY cust_state) AS state_total
FROM Customers c
INNER JOIN Orders o ON c.cust_id = o.cust_id;
MERGE Statement (SQL Server):
MERGE CustomerSummary AS target
USING (
SELECT cust_id, COUNT(*) AS order_count, SUM(order_total) AS total_spent
FROM Orders
GROUP BY cust_id
) AS source ON target.cust_id = source.cust_id
WHEN MATCHED THEN
UPDATE SET order_count = source.order_count, total_spent = source.total_spent
WHEN NOT MATCHED THEN
INSERT (cust_id, order_count, total_spent)
VALUES (source.cust_id, source.order_count, source.total_spent);
Quick Reference Summary
Statement | Purpose | Key Clauses |
---|---|---|
SELECT | Retrieve data | FROM, WHERE, GROUP BY, HAVING, ORDER BY |
INSERT | Add new rows | INTO, VALUES / SELECT |
UPDATE | Modify existing rows | SET, WHERE |
DELETE | Remove rows | FROM, WHERE |
CREATE TABLE | Create new table | Column definitions, constraints |
ALTER TABLE | Modify table structure | ADD, DROP, ALTER |
CREATE VIEW | Create virtual table | AS SELECT |
CREATE INDEX | Improve performance | ON table(columns) |
CREATE PROCEDURE | Create stored procedure | Parameters, AS, SQL statements |
COMMIT | Save transaction | Optional TRANSACTION |
ROLLBACK | Undo transaction | Optional TO savepoint |
DROP | Remove objects | Object type and name |
Database-Specific Variations
Remember that while this syntax works with most DBMSs, specific implementations may vary:
- SQL Server: Uses
IDENTITY
for auto-increment,GETDATE()
for current date - MySQL: Uses
AUTO_INCREMENT
,NOW()
for current timestamp - PostgreSQL: Uses
SERIAL
for auto-increment,CURRENT_TIMESTAMP
- Oracle: Uses
SEQUENCE
objects,SYSDATE
for current date - SQLite: Limited ALTER TABLE support, uses
AUTOINCREMENT
Datatypes
Understanding Datatypes
Datatypes are rules that define what data may be stored in a column and how that data is actually stored internally by the database management system.
Why Datatypes Matter:
Benefit | Description | Example |
---|---|---|
Data Validation | Restrict invalid data entry | Numeric column rejects text input |
Storage Efficiency | Optimize space usage | INTEGER uses 4 bytes vs VARCHAR(50) |
Correct Sorting | Enable proper ordering | Numbers: 1, 2, 10 vs Strings: "1", "10", "2" |
Performance | Enable indexing and optimization | Fixed-length fields process faster |
Data Integrity | Prevent data corruption | Date fields reject invalid dates |
Critical Considerations:
-- ❌ Wrong datatype choice
CREATE TABLE Orders (
order_id VARCHAR(10), -- Should be INTEGER
zip_code INTEGER, -- Should be VARCHAR (01234 becomes 1234)
order_date VARCHAR(20), -- Should be DATE
is_shipped INTEGER -- Should be BOOLEAN/BIT
);
-- ✅ Correct datatype choice
CREATE TABLE Orders (
order_id INTEGER PRIMARY KEY,
zip_code VARCHAR(10), -- Preserves leading zeros
order_date DATE, -- Proper date validation and functions
is_shipped BIT -- Boolean true/false values
);
String Datatypes
String datatypes store text data including names, addresses, descriptions, and codes.
Fixed-Length vs Variable-Length:
| Type | Storage | Performance | Use Case |-----|-----|----- | Fixed-Length | Always uses full allocated space | Faster processing and indexing | Codes, IDs, consistent-length data | Variable-Length | Uses only space needed | Slower but space-efficient | Names, descriptions, variable content
Common String Datatypes:
-- Fixed-length strings
CHAR(n) -- Exactly n characters, padded with spaces
NCHAR(n) -- Unicode fixed-length
-- Variable-length strings
VARCHAR(n) -- Up to n characters
NVARCHAR(n) -- Unicode variable-length
TEXT -- Large text blocks (also LONG, MEMO)
CLOB -- Character Large Object
Practical Examples:
CREATE TABLE Customers (
-- Fixed-length: consistent size, faster processing
cust_id CHAR(10), -- Always 10 characters: "CUST000001"
state_code CHAR(2), -- Always 2 characters: "CA", "NY"
country_code CHAR(3), -- Always 3 characters: "USA", "CAN"
-- Variable-length: space-efficient for varying content
cust_name VARCHAR(100), -- Up to 100 characters
cust_address VARCHAR(255), -- Up to 255 characters
cust_notes TEXT, -- Large text blocks
-- Unicode support for international characters
cust_name_local NVARCHAR(100) -- Unicode names: "José", "北京"
);
String Datatype Guidelines:
-- ✅ Good practices
CREATE TABLE Products (
prod_code CHAR(8), -- Fixed format: "PROD0001"
prod_name VARCHAR(100), -- Variable names
prod_desc TEXT, -- Long descriptions
sku VARCHAR(20), -- Variable SKU lengths
barcode VARCHAR(13) -- UPC codes (preserve leading zeros)
);
-- ❌ Common mistakes
CREATE TABLE BadExample (
phone_number INTEGER, -- Loses leading zeros, formatting
zip_code INTEGER, -- "01234" becomes 1234
product_name CHAR(200), -- Wastes space for short names
description VARCHAR(50) -- Too short for descriptions
);
Database-Specific Variations:
-- SQL Server
NVARCHAR(MAX) -- Up to 2GB Unicode text
VARCHAR(MAX) -- Up to 2GB non-Unicode text
-- MySQL
TINYTEXT -- Up to 255 characters
MEDIUMTEXT -- Up to 16MB
LONGTEXT -- Up to 4GB
-- PostgreSQL
TEXT -- Unlimited length (practical limit ~1GB)
VARCHAR -- Without length limit
-- Oracle
VARCHAR2(n) -- Variable-length (Oracle's preferred VARCHAR)
CLOB -- Up to 4GB character data
NCLOB -- Unicode CLOB
Numeric Datatypes
Numeric datatypes store numbers for calculations, measurements, and counting.
Integer Types:
-- Common integer datatypes
TINYINT -- 1 byte: 0 to 255 (or -128 to 127)
SMALLINT -- 2 bytes: -32,768 to 32,767
INT/INTEGER -- 4 bytes: -2,147,483,648 to 2,147,483,647
BIGINT -- 8 bytes: very large integers
BIT -- 1 bit: 0 or 1 (boolean flags)
Decimal and Floating-Point Types:
-- Exact decimal numbers
DECIMAL(p,s) -- p=precision (total digits), s=scale (decimal places)
NUMERIC(p,s) -- Same as DECIMAL
-- Floating-point numbers
FLOAT -- Approximate floating-point
REAL -- 4-byte floating-point
DOUBLE -- 8-byte floating-point (higher precision)
Practical Examples:
CREATE TABLE Products (
prod_id INTEGER, -- Whole numbers for IDs
prod_price DECIMAL(10,2), -- Money: 99999999.99
prod_weight DECIMAL(8,3), -- Weight: 12345.678 kg
prod_rating DECIMAL(3,2), -- Rating: 4.75 out of 5
stock_count INTEGER, -- Whole number count
is_featured BIT, -- Boolean: 1=yes, 0=no
discount_pct DECIMAL(5,2) -- Percentage: 15.50%
);
CREATE TABLE OrderItems (
quantity SMALLINT, -- Small positive numbers
item_price DECIMAL(8,2), -- Currency values
line_total DECIMAL(10,2), -- Calculated totals
tax_rate DECIMAL(6,4) -- Tax: 0.0825 (8.25%)
);
Choosing the Right Numeric Type:
-- ✅ Appropriate choices
CREATE TABLE Measurements (
temperature DECIMAL(5,2), -- -999.99 to 999.99°C
pressure DECIMAL(8,3), -- Precise measurements
count INTEGER, -- Whole number counts
percentage DECIMAL(5,2), -- 0.00 to 100.00%
is_active BIT -- Boolean flag
);
-- ❌ Poor choices
CREATE TABLE BadMeasurements (
temperature INTEGER, -- Loses decimal precision
money_amount FLOAT, -- Rounding errors in currency
small_count BIGINT, -- Wastes space
percentage INTEGER -- Can't represent 15.5%
);
Currency Handling:
-- Dedicated currency types
MONEY -- SQL Server: 8-byte currency
SMALLMONEY -- SQL Server: 4-byte currency
-- Best practice: Use DECIMAL for currency
CREATE TABLE Orders (
order_total DECIMAL(12,2), -- Up to $9,999,999,999.99
tax_amount DECIMAL(10,2), -- Tax calculations
shipping_cost DECIMAL(8,2) -- Shipping fees
);
Date and Time Datatypes
Date and time datatypes store temporal data with various levels of precision.
Common Date/Time Types:
DATE -- Date only: 2024-12-25
TIME -- Time only: 14:30:00
DATETIME -- Date and time: 2024-12-25 14:30:00
TIMESTAMP -- Date/time with timezone info
SMALLDATETIME -- Date/time with minute precision
Database-Specific Implementations:
-- SQL Server
DATETIME -- 1753-9999, 3.33ms precision
DATETIME2 -- 0001-9999, 100ns precision
SMALLDATETIME -- 1900-2079, 1 minute precision
DATE -- Date only
TIME -- Time only
DATETIMEOFFSET -- With timezone offset
-- MySQL
DATE -- 1000-01-01 to 9999-12-31
DATETIME -- 1000-01-01 00:00:00 to 9999-12-31 23:59:59
TIMESTAMP -- 1970-01-01 00:00:01 to 2038-01-19 03:14:07
TIME -- -838:59:59 to 838:59:59
YEAR -- 1901 to 2155
-- PostgreSQL
DATE -- Date only
TIME -- Time only
TIMESTAMP -- Date and time
TIMESTAMPTZ -- With timezone
INTERVAL -- Time intervals
-- Oracle
DATE -- Date and time (no fractional seconds)
TIMESTAMP -- With fractional seconds
TIMESTAMP WITH TIME ZONE
TIMESTAMP WITH LOCAL TIME ZONE
Practical Examples:
CREATE TABLE Events (
event_id INTEGER PRIMARY KEY,
event_name VARCHAR(100),
event_date DATE, -- 2024-07-04
start_time TIME, -- 14:30:00
end_time TIME, -- 17:00:00
created_at DATETIME, -- 2024-01-15 09:30:00
updated_at TIMESTAMP -- Auto-updated timestamp
);
CREATE TABLE Orders (
order_id INTEGER PRIMARY KEY,
order_date DATE, -- When order was placed
ship_date DATE, -- When order shipped
created_at DATETIME DEFAULT GETDATE(),
modified_at DATETIME DEFAULT GETDATE()
);
Date Format Considerations:
-- ✅ ISO format (recommended)
INSERT INTO Orders (order_date) VALUES ('2024-12-25');
INSERT INTO Events (start_time) VALUES ('14:30:00');
-- ✅ ODBC format (portable)
INSERT INTO Orders (order_date) VALUES ({d '2024-12-25'});
INSERT INTO Events (start_time) VALUES ({t '14:30:00'});
INSERT INTO Logs (timestamp) VALUES ({ts '2024-12-25 14:30:00'});
-- ❌ Ambiguous formats (avoid)
INSERT INTO Orders (order_date) VALUES ('12/25/2024'); -- US format
INSERT INTO Orders (order_date) VALUES ('25/12/2024'); -- European format
Working with Timezones:
-- PostgreSQL timezone handling
CREATE TABLE GlobalEvents (
event_id SERIAL PRIMARY KEY,
event_name VARCHAR(100),
event_time TIMESTAMP WITH TIME ZONE,
local_time TIMESTAMP WITHOUT TIME ZONE
);
-- SQL Server timezone handling
CREATE TABLE GlobalEvents (
event_id INT IDENTITY PRIMARY KEY,
event_name VARCHAR(100),
event_time DATETIMEOFFSET, -- 2024-12-25 14:30:00 -08:00
utc_time DATETIME2 -- UTC equivalent
);
Binary Datatypes
Binary datatypes store non-text data including images, documents, and multimedia files.
Common Binary Types:
BINARY(n) -- Fixed-length binary data
VARBINARY(n) -- Variable-length binary data
BLOB -- Binary Large Object
IMAGE -- Legacy image storage (deprecated)
Database-Specific Binary Types:
-- SQL Server
BINARY(n) -- Fixed-length: 1 to 8,000 bytes
VARBINARY(n) -- Variable-length: 1 to 8,000 bytes
VARBINARY(MAX) -- Up to 2GB
IMAGE -- Deprecated, use VARBINARY(MAX)
-- MySQL
BINARY(n) -- Fixed-length binary string
VARBINARY(n) -- Variable-length binary string
TINYBLOB -- Up to 255 bytes
BLOB -- Up to 65KB
MEDIUMBLOB -- Up to 16MB
LONGBLOB -- Up to 4GB
-- PostgreSQL
BYTEA -- Variable-length binary data
-- Oracle
RAW(n) -- Fixed-length: up to 2,000 bytes
LONG RAW -- Variable-length: up to 2GB
BLOB -- Binary Large Object: up to 4GB
Practical Examples:
CREATE TABLE Documents (
doc_id INTEGER PRIMARY KEY,
doc_name VARCHAR(255),
doc_type VARCHAR(50), -- 'PDF', 'DOCX', 'JPG'
doc_size INTEGER, -- Size in bytes
doc_content VARBINARY(MAX), -- Actual file content
doc_hash BINARY(32), -- SHA-256 hash (fixed 32 bytes)
created_at DATETIME DEFAULT GETDATE()
);
CREATE TABLE UserProfiles (
user_id INTEGER PRIMARY KEY,
username VARCHAR(50),
profile_image VARBINARY(MAX), -- Profile photo
thumbnail VARBINARY(MAX), -- Smaller version
image_type VARCHAR(10) -- 'JPG', 'PNG', 'GIF'
);
Binary Data Considerations:
-- ✅ Good practices
CREATE TABLE FileStorage (
file_id INTEGER PRIMARY KEY,
filename VARCHAR(255),
file_size BIGINT, -- Size in bytes
mime_type VARCHAR(100), -- Content type
file_hash BINARY(32), -- Integrity check
file_data VARBINARY(MAX), -- Actual content
upload_date DATETIME DEFAULT GETDATE()
);
-- ❌ Considerations
-- Large binary data can impact performance
-- Consider storing files externally and keeping paths in database
CREATE TABLE FileReferences (
file_id INTEGER PRIMARY KEY,
filename VARCHAR(255),
file_path VARCHAR(500), -- Path to external file
file_size BIGINT,
mime_type VARCHAR(100)
);
Specialized Datatypes
Modern databases offer specialized datatypes for specific use cases.
JSON and XML:
-- PostgreSQL
CREATE TABLE ApiLogs (
log_id SERIAL PRIMARY KEY,
request_data JSON, -- JSON data
response_data JSONB, -- Binary JSON (faster)
created_at TIMESTAMP DEFAULT NOW()
);
-- SQL Server
CREATE TABLE ConfigData (
config_id INT PRIMARY KEY,
config_name VARCHAR(100),
config_xml XML, -- XML data
config_json NVARCHAR(MAX) -- JSON as text
);
-- MySQL
CREATE TABLE UserPreferences (
user_id INT PRIMARY KEY,
preferences JSON -- Native JSON support
);
Geographic Data:
-- PostgreSQL with PostGIS
CREATE TABLE Locations (
location_id SERIAL PRIMARY KEY,
place_name VARCHAR(100),
coordinates GEOMETRY(POINT, 4326), -- Latitude/longitude
boundary GEOMETRY(POLYGON, 4326) -- Geographic boundary
);
-- SQL Server spatial types
CREATE TABLE GeoData (
id INT PRIMARY KEY,
location GEOGRAPHY, -- Earth-based coordinates
shape GEOMETRY -- Planar coordinates
);
UUID/GUID:
-- PostgreSQL
CREATE TABLE Sessions (
session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
-- SQL Server
CREATE TABLE Transactions (
transaction_id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),
amount DECIMAL(10,2),
created_at DATETIME DEFAULT GETDATE()
);
Datatype Selection Guidelines
Best Practices for Choosing Datatypes:
1. Storage Efficiency:
-- ✅ Right-sized datatypes
CREATE TABLE Inventory (
item_id INTEGER, -- Not BIGINT for small datasets
quantity SMALLINT, -- Not INTEGER for small quantities
price DECIMAL(8,2), -- Not DECIMAL(20,10) for currency
is_active BIT -- Not INTEGER for boolean
);
-- ❌ Oversized datatypes
CREATE TABLE WastedSpace (
item_id BIGINT, -- Wastes 4 bytes per row
quantity INTEGER, -- Wastes 2 bytes per row
price DECIMAL(20,10), -- Unnecessary precision
is_active INTEGER -- Wastes 3 bytes per row
);
2. Data Validation:
-- ✅ Appropriate constraints with datatypes
CREATE TABLE Orders (
order_id INTEGER PRIMARY KEY,
order_date DATE CHECK (order_date <= GETDATE()),
ship_date DATE CHECK (ship_date >= order_date),
order_total DECIMAL(10,2) CHECK (order_total >= 0),
status VARCHAR(20) CHECK (status IN ('PENDING', 'SHIPPED', 'DELIVERED'))
);
3. Performance Considerations:
-- ✅ Indexable datatypes
CREATE TABLE Customers (
cust_id INTEGER PRIMARY KEY, -- Fast integer index
cust_code CHAR(10), -- Fixed-length, indexable
cust_name VARCHAR(100), -- Variable but reasonable length
created_date DATE -- Date indexes work well
);
-- Create appropriate indexes
CREATE INDEX idx_cust_code ON Customers(cust_code);
CREATE INDEX idx_cust_name ON Customers(cust_name);
CREATE INDEX idx_created_date ON Customers(created_date);
4. Future-Proofing:
-- ✅ Allow for growth
CREATE TABLE Products (
prod_id INTEGER, -- Room for millions of products
prod_code VARCHAR(20), -- Flexible code format
prod_name VARCHAR(200), -- Accommodate long names
prod_price DECIMAL(12,2), -- Handle expensive items
description TEXT -- Unlimited description length
);
Common Datatype Mistakes
Avoid These Common Pitfalls:
-- ❌ Wrong datatype for the data
CREATE TABLE BadChoices (
phone_number INTEGER, -- Loses formatting, leading zeros
zip_code INTEGER, -- "01234" becomes 1234
ssn INTEGER, -- Too large for INTEGER, formatting lost
price FLOAT, -- Rounding errors in currency
is_deleted VARCHAR(5), -- "true"/"false" instead of BIT
order_date VARCHAR(20) -- String instead of DATE
);
-- ✅ Correct datatype choices
CREATE TABLE GoodChoices (
phone_number VARCHAR(15), -- Preserves formatting: "(555) 123-4567"
zip_code VARCHAR(10), -- Preserves leading zeros: "01234"
ssn CHAR(11), -- Fixed format: "123-45-6789"
price DECIMAL(10,2), -- Exact currency values
is_deleted BIT, -- True boolean values
order_date DATE -- Proper date handling
);
Cross-Database Compatibility
Portable Datatype Choices:
-- Most compatible across databases
CREATE TABLE PortableTable (
id INTEGER, -- Widely supported
name VARCHAR(100), -- Standard variable text
description TEXT, -- Large text (varies in size limits)
price DECIMAL(10,2), -- Exact numeric
created_date DATE, -- Date only
is_active SMALLINT -- 0/1 for boolean (most compatible)
);
Database Migration Mapping:
| Generic Type | SQL Server | MySQL | PostgreSQL | Oracle |-----|-----|----- | Small Integer | SMALLINT | SMALLINT | SMALLINT | NUMBER(5) | Integer | INT | INT | INTEGER | NUMBER(10) | Large Integer | BIGINT | BIGINT | BIGINT | NUMBER(19) | Decimal | DECIMAL(p,s) | DECIMAL(p,s) | DECIMAL(p,s) | NUMBER(p,s) | Variable Text | VARCHAR(n) | VARCHAR(n) | VARCHAR(n) | VARCHAR2(n) | Large Text | TEXT | TEXT | TEXT | CLOB | Date | DATE | DATE | DATE | DATE | Date/Time | DATETIME | DATETIME | TIMESTAMP | DATE | Boolean | BIT | BOOLEAN | BOOLEAN | NUMBER(1)