Introduction to the SQL Server Data Profiler Task. If you need to analyze data in a SQL Server table one of the tasks you might want to consider is profiling your data. By profiling the data, I mean looking for data patterns, like the number of different distinct values for each column, or the number of rows associated with each of those distinct values, etc. What tools are you using to perform data profiling? In this article I will be exploring how to use the SSIS Data Profiling Task to perform data profiling. Data Profiling Task. Microsoft introduced a new SSIS task to profile data. That task is called "Data Profiling". It was first introduce with SQL Server 2. R2, and has been retained as an SSIS task in SQL Server 2. The Data Profiling task can be used to perform analysis of data patterns within a SQL Server table. This analysis is useful for examining data prior to loading it into a final destination, like a data warehouse.
By analyzing data and determining the patterns in the data you can determine how clean the data might be, prior to loading it into the data warehouse. By performing a profiling task on incoming data you are able to verify your new data meets the quality you expect prior to loading the data into its final location. If the data doesn’t meet the quality you normally expect data profiling allows you to reject the incoming data. The Data Profile task within SSIS can look at your data using eight different profiles. Five of these profiles analyze data at an individual column level, while the other three profiles analyze multiple columns and/or relationships between columns.
Microsoft SQL Server 2014 Express is a powerful and reliable free data management system that delivers a rich and reliable data store for lightweight Web Sites and. The Profiler can store the data it captures in a text file or in a SQL Server table. For fastest performance and to use the fewest resources, you should store your. Introduction SQL Server Profiler is a powerful tool that is available with SQL Server since a long time; however, it has mostly been underutilized by DBAs. If you need to analyze data in a SQL Server table one of the tasks you might want to consider is profiling your data. By profiling the data, I mean looking.
We experience regular slowdowns on our SQL Server databases. After analyzing the memory and CPU usage we would like to continue the root cause investigation by. Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of.
In Table 1 is a description of the eight different profiles as documented in Microsoft’s Books Online. The following five profiles analyze individual columns. Profiles that analyze individual columns Description Column Length Distribution Profile. Reports all the distinct lengths of string values in the selected column and the percentage of rows in the table that each length represents.
Discuss SQL Server with our community of DBAs, developers and SQL Server users.
This profile helps you identify problems in your data, such as values that are not valid. For example, you profile a column of United States state codes that should be two characters and discover values longer than two characters. Column Null Ratio Profile. Reports the percentage of null values in the selected column. This profile helps you identify problems in your data, such as an unexpectedly high ratio of null values in a column. For example, you profile a Zip Code/Postal Code column and discover an unacceptably high percentage of missing codes. Column Pattern Profile.
Reports a set of regular expressions that cover the specified percentage of values in a string column. This profile helps you identify problems in your data, such as strings that are not valid. This profile can also suggest regular expressions that can be used in the future to validate new values. For example, a pattern profile of a United States Zip Code column might produce the regular expressions: \d{5}- \d{4}, \d{5}, and \d{9}. If you see other regular expressions, your data likely contains values that are not valid or in an incorrect format.
Column Statistics Profile. Reports statistics, such as minimum, maximum, average, and standard deviation for numeric columns, and minimum and maximum for datetime columns. This profile helps you identify problems in your data, such as dates that are not valid. For example, you profile a column of historical dates and discover a maximum date that is in the future. Column Value Distribution Profile.
Reports all the distinct values in the selected column and the percentage of rows in the table that each value represents. Can also report values that represent more than a specified percentage of rows in the table. This profile helps you identify problems in your data, such as an incorrect number of distinct values in a column. For example, you profile a column that is supposed to contain states in the United States and discover more than 5. The following three profiles analyze multiple columns or relationships between columns and tables. Profiles that analyze multiple columns Description Candidate Key Profile. Reports whether a column or set of columns is a key, or an approximate key, for the selected table.
This profile also helps you identify problems in your data, such as duplicate values in a potential key column. Functional Dependency Profile. Reports the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column). This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the dependency between a column that contains United States Zip Codes and a column that contains states in the United States.
The same Zip Code should always have the same state, but the profile discovers violations of this dependency. Value Inclusion Profile. Computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables. This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the Product.
ID column of a Sales table and discover that the column contains values that are not found in the Product. ID column of the Products table.
Table 1: Different Profiles Available in SSIS To profile your data you need to build an SSIS package. In this package you identify your data sources, an output file and the different profiles you want to run against your data sources. You can do this all through the Data Profiling Task. The output of the Data Profiling Task is an XML file. The XML file can be viewed graphically using the Data Profile Viewer. The Data Profile Viewer can be launched independently from an exe, or you can launch it from within the Data Profiling Task Editor with a click of a button. Using the Data Profile Viewer you can examine every profile you ran and then drill down on specific items within the profile output to review a set of records associated with a specific drill down request. Rather than describe how this works, let me show you with an example.
Data to Analysis. As previously stated the Data Profiling Task only works against data loaded into a SQL Server table. The Data Profiling Task can be run against any SQL Server data table that resides in a SQL Server 2. Therefore in order to demo the Data Profiler Task I will need some data to analyze. I will be using the Adventure. Works. 20. 08. R2 database for my demonstration. If you want to follow along you can download the same database from the following location: http: //msftdbprodsamples. Once you have downloaded this database you will have to attach it to your SQL Server database engine. In my example I will be using the Visual Studio 2.
Shell that was installed with the SQL Server Data Tools, as part of my SQL Server 2. Data Profiler Task. I will be running my data profiling against the Adventure. Works. 20. 08. R2 database that resides within my SQL Server 2. Defining Properties for a Data Profiling Task. To build my data profiling SSIS package I first need to open my Visual Studio 2.
Integration Services project. Once my new project opens up I can drag the "Data Profiling Task" from the toolbox, to the Control Flow area as I show in Figure 1. Figure 1: Data Profiling Task in Control Flow. The next step is to drill into the Data Profiling Task and start defining its properties. To start identifying the Data Profile Task properties I will double click with the left mouse button on the "Data Profiling Task". When I do this the Data Profiling Task Editor is displayed, as in Figure 2. Figure 2: Data Profiler Task Editor. On the window displayed in Figure 2 you can see there are a number of different options in the left pane, and the properties for the General item is shown in the right pane of the Data Profiler Task Editor window. In the right pane I can set the General items properties to identify where to store the output of the Data Profiling Task. The output will be an XML file that contains profile information about the table I will be profiling. I want my profile information to go to a file named C: \temp\Profile.
Demo. 1. xml. To identify this location to the Data Profile Task Editor I position my mouse over the cell next to the "Destination" label in Figure 2, and then click on the left mouse button. This will bring up a down arrow that I can select. Upon doing this a drop down window will appear and then I select the "< New File connection…> " item. When I do this a File Connection Manager Editor window is displayed. The File Connection Manager Editor defaults the "Usage Type" to "Existing File", since my XML file doesn’t exist yet I need to use the drop down menu to select a usage type of "Create File", and then type the name of my file in the "File" text box. When I’m done specifying my location my File Connection Manager Editor window looks like the window show in Figure 3. Figure 3: Identifying Connection for my XML file location. To finish creating my new connection to my XML file I just need to click on the "OK" button. When I do this, I will be taken back to the Data Profiling Task Editor window. The next step in setting up my Data Profiling Task is to identity the table or tables I want to profile, and the profiles I want use to analyze those tables. There are two different ways to do this. One way to do this is to click on the "Profile Requests" item in the left pane and then identify the different profiles I want to run against a specific table or view, one by one. Or I can use the "Quick Profile" button to identify a number of profiles to run against a single table or view. In this article I will show you how to use the "Quick Profile…" button to identify the different profiles I want to run.