Tuesday, January 4, 2011

SQL Server 2008 Architecture


Components of the SQL Server Engine
Figure 1-1 shows the general architecture of SQL Server, which has four major components (three of whose subcomponents are listed): protocols, the relational engine (also called the Query Processor), the storage engine, and the SQLOS. Every batch submitted to SQL Server for execution, from any client application, must interact with these four components. (For simplicity, I’ve made some minor omissions and simplifications and ignored certain “helper” modules among the subcomponents.)
The protocol layer receives the request and translates it into a form that the relational engine can work with, and it also takes the final results of any queries, status messages, or error messages and translates them into a form the client can understand before sending them back to the client. The relational engine layer accepts SQL batches and determines what to do with them. For Transact-SQL queries and programming constructs, it parses, compiles, and optimizes the request and oversees the process of executing the batch. As the batch is executed, if data is needed, a request for that data is passed to the storage engine. The storage engine manages all data access, both through transaction-based commands and bulk operations such as backup, bulk insert, and certain DBCC (Database Consistency Checker) commands. The SQLOS layer handles activities that are normally considered to be operating system responsibilities, such as thread management (scheduling), synchronization primitives, deadlock detection, and memory management, including the buffer pool.
Protocols
When an application communicates with the SQL Server Database Engine, the application programming interfaces (APIs) exposed by the protocol layer formats the communication using a Microsoft-defined format called a tabular data stream (TDS) packet. There are Net-Libraries on both the server and client computers that encapsulate the TDS packet inside a standard communication protocol, such as TCP/IP or Named Pipes. On the server side of the communication, the Net-Libraries are part of the Database Engine, and that protocol layer is illustrated in Figure 1-1. On the client side, the Net-Libraries are part of the SQL Native Client. The configuration of the client and the instance of SQL Server determine which protocol is used.
SQL Server can be configured to support multiple protocols simultaneously, coming from different clients. Each client connects to SQL Server with a single protocol. If the client program does not know which protocols SQL Server is listening on, you can configure the client to attempt multiple protocols sequentially. The following protocols are available:
• Shared Memory The simplest protocol to use, with no configurable settings. Clients using the Shared Memory protocol can connect only to a SQL Server instance running on the same computer, so this protocol is not useful for most database activity. Use this protocol for troubleshooting when you suspect that the other protocols are configured incorrectly. Clients using MDAC 2.8 or earlier cannot use the Shared Memory protocol. If such a connection is attempted, the client is switched to the Named Pipes protocol.
• Named Pipes A protocol developed for local area networks (LANs). A portion of memory is used by one process to pass information to another process, so that the output of one is the input of the other. The second process can be local (on the same computer as the first) or remote (on a networked computer).
• TCP/IP The most widely used protocol over the Internet. TCP/IP can communicate across interconnected networks of computers with diverse hardware architectures and operating systems. It includes standards for routing network traffic and offers advanced security features. Enabling SQL Server to use TCP/IP requires the most configuration effort, but most networked computers are already properly configured.
• Virtual Interface Adapter (VIA) A protocol that works with VIA hardware. This is a specialized protocol; configuration details are available from your hardware vendor.
Tabular Data Stream Endpoints
SQL Server 2005 also introduces a new concept for defining SQL Server connections: the connection is represented on the server end by a TDS endpoint. During setup, SQL Server creates an endpoint for each of the four Net-Library protocols supported by SQL Server, and if the protocol is enabled, all users have access to it. For disabled protocols, the endpoint still exists but cannot be used. An additional endpoint is created for the dedicated administrator connection (DAC), which can be used only by members of the sysadmin fixed server role. (I’ll discuss the DAC in more detail in configuration chapter.)
The Relational Engine
As mentioned earlier, the relational engine is also called the query processor. It includes the components of SQL Server that determine exactly what your query needs to do and the best way to do it. By far the most complex component of the query processor, and maybe even of the entire SQL Server product, is the query optimizer, which determines the best execution plan for the queries in the batch.
The relational engine also manages the execution of queries as it requests data from the storage engine and processes the results returned. Communication between the relational engine and the storage engine is generally in terms of OLE DB row sets. (Row set is the OLE DB term for a result set.) The storage engine comprises the components needed to actually access and modify data on disk.
The Command Parser
The command parser handles Transact-SQL language events sent to SQL Server. It checks for proper syntax and translates Transact-SQL commands into an internal format that can be operated on. This internal format is known as a query tree. If the parser doesn’t recognize the syntax, a syntax error is immediately raised that identifies where the error occurred. However, non-syntax error messages cannot be explicit about the exact source line that caused the error. Because only the command parser can access the source of the statement, the statement is no longer available in source format when the command is actually executed.
The Query Optimizer
The query optimizer takes the query tree from the command parser and prepares it for execution. Statements that can’t be optimized, such as flow-of-control and DDL commands, are compiled into an internal form. The statements that are optimizable are marked as such and then passed to the optimizer. The optimizer is mainly concerned with the DML statement SELECT, INSERT, UPDATE, and DELETE, which can be processed in more than one way, and it is the optimizer’s job to determine which of the many possible ways is the best. It compiles an entire command batch, optimizes queries that are optimizable, and checks security. The query optimization and compilation result in an execution plan.
The first step in producing such a plan is to normalize each query, which potentially breaks down a single query into multiple, fine-grained queries. After the optimizer normalizes a query, it optimizes it, which means it determines a plan for executing that query. Query optimization is cost based; the optimizer chooses the plan that it determines would cost the least based on internal metrics that include estimated memory requirements, CPU utilization, and number of required I/Os. The optimizer considers the type of statement requested, checks the amount of data in the various tables affected, looks at the indexes available for each table, and then looks at a sampling of the data values kept for each index or column referenced in the query. The sampling of the data values is called distribution statistics. Based on the available information, the optimizer considers the various access methods and processing strategies it could use to resolve a query and chooses the most cost-effective plan.
The SQL Manager
The SQL manager is responsible for everything related to managing stored procedures and their plans. It determines when a stored procedure needs recompilation, and it manages the caching of procedure plans so that other processes can reuse them.
The SQL manager also handles auto parameterization of queries. In SQL Server 2008, certain kinds of ad hoc queries are treated as if they were parameterized stored procedures, and query plans are generated and saved for them. SQL Server can save and reuse plans in several other ways, but in some situations using a saved plan might not be a good idea.
The Database Manager
The database manager handles access to the metadata needed for query compilation and optimization, making it clear that none of these separate modules can be run completely separately from the others. The metadata is stored as data and is managed by the storage engine, but metadata elements such as the data types of columns and the available indexes on a table must be available during the query compilation and optimization phase, before actual query execution starts.
The Query Executor
The query executor runs the execution plan that the optimizer produced, acting as a dispatcher for all the commands in the execution plan. This module steps through each command of the execution plan until the batch is complete. Most of the commands require interaction with the storage engine to modify or retrieve data and to manage transactions and locking.
The Storage Engine
The SQL Server storage engine has traditionally been considered to include all the components involved with the actual processing of data in your database. SQL Server 2005 separates out some of these components into a module called the SQLOS. In fact, the SQL Server storage engine team at Microsoft actually encompasses three areas: access methods, transaction management, and the SQLOS.
Transaction Services
A core feature of SQL Server is its ability to ensure that transactions are atomic–that is, all or nothing. In addition, transactions must be durable, which means that if a transaction has been committed, it must be recoverable by SQL Server no matter what–even if a total system failure occurs 1 millisecond after the commit was acknowledged. There are actually four properties that transactions must adhere to, called the ACID properties: atomicity, consistency, isolation, and durability.
Locking Operations Locking is a crucial function of a multi-user database system such as SQL Server, even if you are operating primarily in the snapshot isolation level with optimistic concurrency. SQL Server lets you manage multiple users simultaneously and ensures that the transactions observe the properties of the chosen isolation level. Even though readers will not block writers and writers will not block readers in snapshot isolation, writers do acquire locks and can still block other writers, and if two writers try to change the same data concurrently, a conflict will occur that must be resolved. The locking code acquires and releases various types of locks, such as share locks for reading, exclusive locks for writing, intent locks taken at a higher granularity to signal a potential “plan” to perform some operation, and extent locks for space allocation. It manages compatibility between the lock types, resolves deadlocks, and escalates locks if needed. The locking code controls table, page, and row locks as well as system data locks.
The SQLOS
Whether the components of the SQLOS layer are actually part of the storage engine depends on whom you ask. In addition, trying to figure out exactly which components are in the SQLOS layer can be rather like herding cats. I have seen several technical presentations on the topic at conferences and have exchanged e-mail and even spoken face to face with members of the product team, but the answers vary. The manager who said he was responsible for the SQLOS layer defined the SQLOS as everything he was responsible for, which is a rather circular definition. Earlier versions of SQL Server have a thin layer of interfaces between the storage engine and the actual operating system through which SQL Server makes calls to the OS for memory allocation, scheduler resources, thread and worker management, and synchronization objects. However, the services in SQL Server that needed to access these interfaces can be in any part of the engine. SQL Server requirements for managing memory, schedulers, synchronization objects, and so forth have become more complex. Rather than each part of the engine growing to support the increased functionality, all services in SQL Server that need this OS access have been grouped together into a single functional unit called the SQLOS. In general, the SQLOS is like an operating system inside SQL Server. It provides memory management, scheduling, IO management, a framework for locking and transaction management, deadlock detection, and general utilities for dumping, exception handling, and so on.
Another member of the product team described the SQLOS to me as a set of data structures and APIs that could potentially be needed by operations running at any layer of the engine. For example, consider various operations that require use of memory. SQL Server doesn’t just need memory when it reads in data pages through the storage engine; it also needs memory to hold query plans developed in the query processor layer. Figure 1-1 (shown earlier) depicts the SQLOS layer in several parts, but this is just a way of showing that many SQL Server components use SQLOS functionality.

What’s New in SQL Server 2008?

What’s New in SQL Server 2008?

New in SQL Server Installation
SQL Server 2008 has new Setup architecture for the following scenarios: installation, upgrade, maintenance, failover clustering, and command prompt installations.
The SQL Server Installation Wizard is Windows Installer-based. It provides a single feature tree for installation of all SQL Server components, so you do not have to install the following components individually:
• Database Engine
• Analysis Services
• Reporting Services
• Integration Services
• Replication
• Management tools
• Connectivity components
• Sample databases, samples, and SQL Server Books Online
New in SQL Server Database Engine:
This latest release of the SQL Server Database Engine introduces new features and enhancements that increase the power and productivity of architects, developers, and administrators who design, develop, and maintain data storage systems.
These are the areas in which the Database Engine has been enhanced.
Topic Description
Availability Enhancements (Database Engine) The availability of Microsoft SQL Server 2008 databases is improved by enhancements to database mirroring. Database mirroring enables the creation of hot standby servers that provide rapid failover support with no loss of data from committed transactions.
Manageability Enhancements (Database Engine) Manageability of the SQL Server 2008 Database Engine is simplified by enhancements to tools and monitoring features.
Programmability Enhancements (Database Engine) Programmability enhancements in the Database Engine include new data storage features, new data types, new full-text search architecture, and numerous improvements and additions to Transact-SQL.
Scalability and Performance Enhancements (Database Engine) Scalability and performance enhancements in the Database Engine include filtered indexes and statistics, new table and query hints, and new query performance and query processing features.
Security Enhancements (Database Engine) Security enhancements in the Database Engine include new encryption functions, the transparent data encryption and extensible key management features, and a clarification of DES algorithms.

What's New (Replication?)

Replication Monitor includes the following usability improvements:
• In most Replication Monitor grids, you can now do the following: select which columns to view; sort by multiple columns; and filter rows in the grid based on column values.

To access this functionality: right-click a grid, and then select Choose Columns to Show, Sort, Filter, or Clear Filter. Filter settings are specific to each grid. Column selection and sorting are applied to all grids of the same type, such as the publications grid for each Publisher.
• The Common Jobs tab for the Publisher node has been renamed to Agents. The Agents tab now provides a centralized location to view information about all the agents and jobs that are associated with publications at the selected Publisher. Agents and jobs that are associated with publications include the following:
• The Snapshot Agent, which is used by all publications.
• The Log Reader Agent, which is used by all transactional publications.
• The Queue Reader Agent, which is used by transactional publications that are enabled for queued updating subscriptions.
• Maintenance jobs, which are used by all publications.
The Distribution Agent and Merge Agent are associated with subscriptions to publications.

Sql Server DBA

Introduction to SQL Server 2008
What is SQL Server 2008/RDBMS?
As you most likely know, SQL Server 2008 is primarily thought of as a Relational Database Management System (RDBMS). It is certainly that, but it is also much more.
SQL Server 2008 can be more accurately described as an Enterprise Data Platform. It offers many new features and even more enhanced or improved features from previous editions of the product. In addition to traditional RDBMS duty, SQL Server 2008 also provides rich reporting capabilities, powerful data analysis, and data mining, as well as features that support asynchronous data applications, data-driven event notification, and more.
Database Engine
The Database Engine is the primary component of SQL Server 2008. It is the Online Transaction Processing (OLTP) engine for SQL Server, and has been improved and enhanced tremendously in this version. The Database Engine is a high-performance component responsible for the efficient storage, retrieval, and manipulation of relational and Extensible Markup Language (XML) formatted data.
SQL Server 2008’s Database Engine is highly optimized for transaction processing, but offers exceptional performance in complex data retrieval operations. The Database Engine is also responsible for the controlled access and modification of data through its security subsystem. SQL Server 2008’s Database Engine has many major improvements to support scalability, availability, and advanced (and secure) programming objects.
Analysis Services
Analysis Services delivers Online Analytical Processing (OLAP) and Data Mining functionality for business intelligence applications. As its name suggests, Analysis Services provides a very robust environment for the detailed analysis of data. It does this through user-created, multidimensional data structures that contain de-normalized and aggregated data from diverse data sources (such as relational databases, spreadsheets, flat files, and even other multidimensional sources).
Reporting Services
Reporting Services is a Web service–based solution for designing, deploying, and managing flexible, dynamic Web-based reports, as well as traditional paper reports. These reports can contain information from virtually any data source. Because Reporting Services is implemented as a Web service, it must be installed on a server with Internet Information Services (IIS). However, IIS does not have to be installed on a SQL Server. The Reporting Services databases are hosted on SQL Server 2008, but the Web service itself can be configured on a separate server.
Integration Services
SQL Server Integration Services (SSIS) is Microsoft’s new enterprise class data Extract, Transform, and Load (ETL) tool. SSIS is a completely new product built from the ashes of SQL Server 2000’s Data Transformation Services (DTS). SSIS offers a much richer feature set and the ability to create much more powerful and flexible data transformations than its predecessor. This huge improvement, however, is not without a cost. SSIS is a fairly complex tool and offers a completely different design paradigm than DTS. Database administrators adept at the former tool are very often intimidated and frustrated by the new SSIS. Their biggest mistake is in thinking that Integration Services would just be an upgrade of Data Transformation Services.
Replication Services
SQL Server 2008 Replication Services provides the ability to automate and schedule the copying and distribution of data and database objects from one database or server to another, while ensuring data integrity and consistency. Replication has been enhanced in SQL Server 2008 to include true Peer-to-Peer replication, replication over HTTP, the ability to replicate schema changes, and, very interestingly, the ability to configure an Oracle server as a replication publisher.
Multiple Instances
SQL Server 2008 provides the capability of installing multiple instances of the database application on a single computer. Depending on the edition of SQL Server being installed, up to 50 instances can be installed. This feature allows for one high-performance server to host multiple instances of the SQL Server services, each with its own configuration and databases. Each instance can be managed and controlled separately with no dependency on each other.
Database Mail
In the past SQL Server relied on a Messaging Application Programming Interface (MAPI) mail client configured on the server to facilitate email and pager notification for administrative and programmatic purposes. What this essentially meant was that to fully utilize administrative notifications, the administrator needed to install Outlook or some other MAPI-compliant client on the server, and then create a mail profile for the service account to use.
Many organizations wanted to take advantage of the SQL Server Agent’s ability to send job and event notification via email but were unwilling to install unnecessary and potentially risky software on production server assets. The SQL Server 2008 Database Mail feature removes this requirement by supporting Simple Mail Transfer Protocol (SMTP) for all mail traffic. In addition, multiple mail profiles can be created in the database to support different database applications.
SQL Server 2008 Services
SQL Server runs as a service. In fact, it runs as several services if all the different features of the product are installed. It is important to know what service is responsible for what part of the application so that each service can be configured correctly, and so that unneeded services can be disabled to reduce the overhead on the server and reduce the surface area of SQL Server.
MSSQLServer (SQL Server)
The MSSQLServer service is the database engine. To connect and transact against a SQL Server 2008 database, the MSSQLServer service must be running. Most of the functionality and storage features of the database engine are controlled by this service.
The MSSQLServer service can be configured to run as the local system or as a domain user. If installed on Windows Server 2003, it can also be configured to run under the Network System account.
SQLServerAgent (SQL Server Agent)
This service is responsible for the execution of scheduled jobs such as scheduled backups, import/export jobs, and Integration Services packages. If any scheduled tasks require network or file system access, the SQLServerAgent service’s credentials are typically used.
The SQLServerAgent service is dependent on the MSSQLServer service. During installation, the option is given to configure both services with the same credentials. Although this is by no means required, it is common practice. A frequent problem encountered by database administrators is that jobs that work perfectly when run manually fail when run by the agent. The reason for the failure is because the account that is used when testing the job manually is the logged-in administrator, but when the job is executed by the agent, the account the agent is running under does not have adequate permissions.
MSSQLServerOLAPService (SQL Server Analysis Services)
MSSQLServerOLAPService is the service that Analysis Services runs under. Analysis Services provides the services and functionality to support all of SQL Server 2008’s OLAP needs, as well as the new data mining engine included with SQL Server 2008.
SQLBrowser (SQL Server Browser)
The SQLBrowser service is used by SQL Server for named instance name resolution and server name enumeration over TCP/IP and VIA networks.
The default instance of SQL Server is assigned the TCP port 1433 by default to support client communication. However, because more than one application cannot share a port assignment, any named instances are given a random port number when the service is started. This random port assignment makes it difficult for clients to connect to it, because the client applications don’t know what port the server is listening on. To meet this need, the SQLBrowser service was created.
MSDTSServer (SQL Server Integration Services)
The MSDTSServer service provides management and storage support for SSIS. Although this service is not required to create, store, and execute SSIS packages, it does allow for the monitoring of SSIS package execution and displaying of a hierarchical view of SSIS packages and folders that are stored in different physical locations.
ReportServer (SQL Server Reporting Services)
The ReportServer service is the process in which Reporting Services runs. The service is accessible as a Web service and provides for report rendering, creation, management, and deploying.
MSDTC (Distributed Transaction Coordinator)
The MSDTC service is used to manage transactions that span more than one instance of SQL Server or an instance of SQL Server and another transaction-based system. It utilizes a protocol known as Two-Phased Commit (2PC) to ensure that all transactions that span systems are committed on all participating systems.
SQL Server 2008 Database Objects
SQL Server 2008 database objects are defined and exist within a defined scope and hierarchy. This hierarchy enables more control over security permissions and organization of objects by similar function. SQL Server 2008 objects are defined at the Server, Database, and Schema levels.
Server
The server scope encompasses all the objects that exist on the instance of SQL Server, regardless of their respective database or namespace. The database object resides within the server scope.
We can install multiple instances of the SQL Server 2008 Data Platform application on a single computer running a Windows operating system.
Database
The database scope defines all the objects within a defined database catalog. Schemas exist in the database scope.
Schema
Each database can contain one or more schemas. A schema is a namespace for database objects. All data objects in a SQL Server 2008 database reside in a specific schema.
Object Names
Every object in a SQL Server 2008 database is identified by a four-part, fully qualified name. This fully qualified name takes the form of server.database.schema.object. However, when referring to objects, the fully qualified name can be abbreviated. By omitting the server name SQL Server will assume the instance the connection is currently connected to. Likewise, omitting the database name will cause SQL Server to assume the existing connection’s database context.
SQL Server 2008 Databases
There are two types of databases in SQL Server: system databases and user databases. The system databases are used to store system-wide data and metadata. User databases are created by users who have the appropriate level of permissions to store application data.
System Databases
The system databases are comprised of Master, Model, MSDB, TempDB, and the hidden Resource database. If the server is configured to be a replication distributor, there will also be at least one system distribution database that is named during the replication configuration process.
The Master Database
The Master database is used to record all server-level objects in SQL Server 2008. This includes Server Logon accounts, Linked Server definitions, and EndPoints. The Master database also records information about all the other databases on the server (such as their file locations and names). Unlike its predecessors, SQL Server 2008 does not store system information in the Master database, but rather in the Resource database. However, system information is logically presented as the SYS schema in the Master database.
The Model Database
The Model database is a template database. Whenever a new database is created (including the system database TempDB), a copy of the Model database is created and renamed with the name of the database being created. The advantage of this behavior is that objects can be placed in the Model database prior to the creation of any new database and, when the database is created, the objects will appear in the new database.
The MSDB Database
I mostly think of the MSDB database as the SQL Server Agent’s database. That’s because the SQL Server Agent uses the MSDB database extensively for the storage of automated job definitions, job schedules, operator definitions, and alert definitions.
The TempDB Database
The TempDB database is used by SQL Server to store data temporarily. The TempDB database is used extensively during SQL Server operations, so careful planning and evaluation of its size and placement are critical to ensure efficient SQL Server database operations.
The TempDB database is used by the Database Engine to store temporary objects (such as temporary tables, views, cursors, and table-valued variables) that are explicitly created by database programmers. In addition, the TempDB database is used by the SQL Server database engine to store work tables containing intermediate results of a query prior to a sort operation or other data manipulation.
The Resource Database
The last system database is the Resource database. The Resource database is a read-only database that contains all the system objects used by an instance of SQL Server. The Resource database is not accessible during normal database operations. It is logically presented as the SYS schema in every database. It contains no user data or metadata. Instead, it contains the structure and description of all system objects. This design enables the fast application of service packs by just replacing the existing Resource database with a new one. As an added bonus, to roll back a service pack installation, all you have to do is replace the new Resource database with the old one. This very elegant design replaces the older method of running many scripts that progressively dropped and added new system objects.
User Databases
User databases are simply that: databases created by users. They are created to store data used by data applications and are the primary purpose of having a database server.
Distribution Databases
The distribution database stores metadata and transactional history to support all types of replication on a SQL Server. Typically, one distribution database is created when configuring a SQL Server as a replication Distributor. However, if needed, multiple distribution databases can be configured.
A model distribution database is installed by default and is used in the creation of a distribution database used in replication. It is installed in the same location as the rest of the system databases and is named distmdl.mdf.