Quantcast
Channel: CodeSection,代码区,SQL Server(mssql)数据库 技术分享 - CodeSec
Viewing all articles
Browse latest Browse all 3160

Dynamically Generate SQL Server BCP Format Files

$
0
0

By:Jeffrey Yao || Related Tips:More >Import and Export

Problem

I frequently need to import data from some CSV files into my various databases on different systems using BCP.exe or BULK Insert . Sometimes I just need to import a few fields of a file for one table into a database and a few other fields for a few table columns in another table (in another system) out of a same CSV file with 30+ fields, I usually have to generate a Format file with bcp and then modify the format file to adjust to my needs. But this process is tiresome and error-prone especially when the number of source file fields is large, like 40+, is there any easy way to address this issue? One thing to add is that all my csv files are pure ASCII files.

Solution

In SQL Server Books Online (BOL), there is a detailed example about using a format file to map table columns to the data file fields. I personally do not like to use XML format files, because of two reasons as stated in BOL and shown below (see section "Using an XML Format File" in BOL for more info).

You cannot skip a column when you are using BCP command or a BULK INSERT statement to import data directly You can use Insert .. SELECT .. from OPENROWSET(BULK..) with XML format file, but you need to explicitly provide the column names in the SELECT list, this is an unnecessary overhead "cost" from a coding perspective.

So for these reasons, I prefer a non-XML format file.

The detailed format file structure specification can be found from Microsoft Books Online here . The following image is copied from the link.


Dynamically Generate SQL Server BCP Format Files

Figure 1 - Format Fields for Sample Non-XML Format File

I just want to make a few points here. The assumption is that the format file is an ASCII delimited source data file, not a source file of SQL Server native data.

The rows in the format file do not necessarily need to be vertically aligned. The blank spaces between fields in the format file are flexible, i.e. you can have [X] blank spaces, where X can be any number larger than 1. [Server column name] field is not important in that you can put a fake name there, but the [Server column order] field is important. Since my csv source files are not fixed-length for each field, I can set [Host file data length] and [Prefix length] fields to 0.

Now to generate a non-XML format file, here is the "algorithm" (please refer to Figure 1 for component names in the format file).

[Version] can be set to any number >= 9.0, so any bcp utility of SQL Server 2005+ can use this format file. I will set it to 10.0 as I do not have any SQL Server 2005 in my environment. [Number of columns]=# of fields in the data source file, we will calculate the # of fields by reading the first line of the source file. [Host file field order]=1 to [Number of columns]. [Host file data type]='SQLCHAR', this is a fixed value as we are dealing with ASCII data file only. [Prefix length]=0, as we are dealing with ASCII data file only. [Host file data length]=0 as said in BOL "If you are creating a non-XML format file for a delimited text file, you can specify 0 for the host file data length of every data field.". [Terminator]= value from an input parameter, such as "|" or comma ",". [Server column order]: 0 = the column is ignored, N = the nth column of the destination table. [Server column name]: the target table's column name. This seems not used by BCP utility, instead, BCP utility uses [Server column order] to determine the column position. [Column collation]: only used for columns with char or varchar datatype, default to the collation setting of the database.

We also need to design the input parameters to generate a desired non-XML format file and this is actually simple and straight-forward.

Here are the parameters:

[ServerInstance]: target SQL Server instance for data importation, default is the current machine name [Database]: target database for data importation [Schema]: schema name of the target table for data importation [Table]: target table name [FieldTerminator]: separator for fields in the source data file, defaults to '|', can be anything such as ';' or ',' or '#' [Mapping]: Field sequence number mapping to table column name in the format of (Field Position Number='column name'; ...), example (1='id'; 2='firstName') etc. [SourceFile]: UNC path to the source data file, we need to read at least one line from this file to retrieve information such as # of fields [FormatFile]: the format file to be generated, this is an ASCII text file, e.g. c:\temp\MyFormatFile.fmt Source Code

The following is the PowerShell code to create a format file:

# Function: to generate a BCP format file so we can populate
# some destination table columns with some fields in a source data file
# assume you have SQL Server PS module (sqlps) installed, this sqlps module is included in sql server 2012+ version.
push-location;
import-module sqlps -DisableNameChecking;
Pop-Location;
#requires -version 3.0
function Create-BCPFormatFile {
<#
.Synposis
Generate a bcp format file based on parameter values
.Description
Generate a BCP format file based on source data file and destination table so we can bulk insert into some destination table columns with some corresponding fields in a source data file
.Parameter
ServerInstance: target SQL Server instance, string value, default to current Machine Name
Database: target database name, string value, mandatory.
Schema: schema name of the target table, string value, default to 'dbo'
Table: target table name, string value, mandatory
Mapping: a hashtable to link the field in data source file with the table column
Source File: UNC path for the source file
Format File: a UNC path for the generated format file, which can be used by BCP directly.
.Example
Create-BCPFormatFile -database tempdb -table t -Mapping @{2='account'; 4='balance'; 5='credit'} -FormatFile 'C:\temp\MyFmt.fmt' -SourceFile 'C:\temp\source.txt'
#>
[CmdletBinding()]
param (
[Parameter ( Mandatory=$False, HelpMessage='SQL Server Instance where the Destination table resides')]
[string] $ServerInstance = $env:ComputerName,
[Parameter ( Mandatory=$true)]
[string] $Database,
[parameter (mandatory=$false)]
[string] $schema='dbo',
[parameter (mandatory=$True)]
[string] $table,
[parameter (mandatory=$false, HelpMessage="Field Terminator in the source data file, default to | ")]
[string] $FieldTerminator='|',
[parameter (Mandatory=$true, HelpMessage="Field position number mapping to Column Name, such as (1='id', 2='firstname')etc")]
[hashtable] $Mapping = @{},
[parameter (Mandatory=$true, HelpMessage="The full UNC name of the source data file, such as c:\temp\Source.csv")]
[string] $SourceFile,

Viewing all articles
Browse latest Browse all 3160

Trending Articles