Special situations comparing data
Missing columns. Normally you would expect
to compare data in tables whose structure is very
similar or identical. However it might happen that some of the columns
exist in one table but not in the other one. The column selection
dialog displays such columns as "ColumnName / [No column]". By default
such incomplete pairs are marked as "Ignored", however you can include
them into the comparison
as well. In which case the 'Compare To' expression described here will be
used instead of the missing column.
This may be convenient when a new "flag" column has been added to the
newer version of the table and it contains
NULLs for all records except for a few ones where this flag has been
specifically set. The data comparison
will clearly show these particular records.
Incompatible column types. If a given
column has different data types in the two tables, DataDiff
will make a reasonable effort to compare the data correctly. All data
is cached as VARIANT types anyway, so any
details specific to DBMS are lost (e.g. both 'datetime' and
'smalldatetime' would become the same type, all flavors of 'int'
will become just integer, all CHARs/VARCHARs/NVARCHARs become OLE
strings, etc). Any columns which are completely incompatible (e.g. BLOB
against INT) will be marked as such in the column selection dialog and
you won't be able to
include them into comparison.
Duplicate rows. Two rows in the same table
are considered duplicate if the combinations of key column values
in the rows are identical. Since the only purpose of the key columns is
to provide unique identification for each
row in the table, you should choose the key so as to avoid the
duplicates. It is recommended that you always choose
PK column(s) or UNIQUE column(s) as the key. If for some reason you run
data comparison with a non-unique key, the DataDiff will still be able
to handle them correctly.
BLOB values. Although the total size of the
loaded table data is limited only by the maximum size of system memory,
DataDiff can not keep in the cache any single
column value whose storage size exceeds 64Kb.
Furthermore, by default it only keeps values of up to 4Kb in size
(which is a user-configurable setting described here).
Whenever the Diff encounteres a BLOB value exceeding the 64Kb limit or
(more likely) the user-defined threshold,
it processes the whole BLOB into a 16-byte MD5 digest code and keeps
that instead. Such digested value takes part
in the comparison like any normal field, it is displayed as "[BLOB
Digest]" in the grid and it generates a warning
message when you try to produce SQL from rows containing digested
fields. The only way to reliably script long variable size columns is
to use server-side
scripting for them.
The hidden keys.
There are two situations when the DataDiff will load column(s) into the
buffer regardless of the "Ignore" status you've given them. The first such case is
when the ignored
columns belong to the actual primary key, whereas the "key" columns
you've chosen are not known to contain unique values. In this case the
program would not display or compare the PK columns, but will quietly
keep them in the buffer and use their values in the WHERE clause of
UPDATE statement, to make sure that any
row can be correctly addressed and synchronized.
Ignored NOT NULL-able columns. The second such situation is when values the ignored column must be included into the generated INSERTs anyway. If a column
is NOT NULLable and has no DEFAULT or IDENTITY attributes, it mustbe
included in any valid INSERT
statement. If you specify such column as "Ignored", the DataDiff will
either load it into the buffer anyway, or insert it directly from the
server (which is probably preferrable).
Scripting IDENTITY columns. Before
scripting, DataDiff checks if any of the modifications in the data
involve IDENTITY (autoincremental) columns. If they do,
the script is modified as follows:
- For the records where the IDENTITY column is updated,
DataDiff generated a pair od DELETE/INSERT statements, deleting and
reinserting the record. Note that it should be a very rare situation,
since
the IDENTITY is normally either the key field or the "hidden key" (see
above) which are never synchronized.
- Whenever the program needs to insert records with IDENTITY
column (including the DELETE/INSERT situation
described above), it will put "SET IDENTITY_INSERT
<table_name> ON" before the INSERTs and the corresponding
"OFF" setting afterwards.
- Deleting of records is handled in the usual way.
Computed columns. By default, any computed
columns are marked as "Ignored", don't take part in the comparison
and aren't included into the generated SQL. You may choose to include
them into comparison (so they will appear in the
data grid), but they won't be scripted anyway. If the same column
appears as computed in one table and as a regular
column in the other, it will be compared in the regular way and
scripted, when appropriate.
|