By Steve Bolton
………… Fuzzy set relations carry an added layer of complexity not seen in ordinary “crisp” sets, due to the need to derive new grades for membership in the resultset from the scores in the original sets. As I explained two weeks ago in this series of amateur self-tutorials, binary set relations like fuzzy intersections are a bit more complicated than reflexive set relations like fuzzy complements, since we have to perform fuzzy aggregation operations on multiple sets. The good news is that we can reuse most of the concepts introduced in the last article for fuzzy unions, which are handled in an almost identical way. With intersections, we’re basically requesting “all of the records that are members of a both sets,” but with unions, we’re asking SQL Server, “Give me any records that are members of either set.” When we use the fuzzy versions of either, that amounts to retrieving maximum values in the case of intersections and minimums in the case of unions. With the crisp sets SQL Server users work with every day, set relations are straightforward because no calculations of membership values are involved, but with fuzzy intersections there are myriad ways of defining these grades; fortunately, the mathematical functions for fuzzy unions are quite similar to those introduced in the last article. Last time around I provided sample code for four basic types of fuzzy intersections known as the standard intersection, drastic intersection, algebraic product and bounded difference; this week, I’ll demonstrate how to code the corresponding inverses, which are known as the Standard Union, Drastic Union, Algebraic Sum and Bounded Sum. Just as the concept of fuzzy intersections are best exemplified by a class of mathematical functions known as T-norms, so too are fuzzy unions typified by another class known as T-conorms. As we shall see, the authors and parameters of the most popular T-conorms are almost identical to those I coded for the last installment, based on my favorite mathematical reference on the topic, George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications . At the tail end of the last article I offered suggestions for coping with the odd data modeling issues that crop up in cases when it is desirable to persist parameter values for such advanced functions; I won’t rehash that here because the solutions are exactly the same for fuzzy unions. I’ll also omit discussion of the theorems and formulas that justify these functions, since it’s not really necessary to overload end users with that kind of information, for the same reason that it’s not necessary to give a dissertation on automotive engineering to get a driver’s license. Suffice it to say that T-conorms share many of the same mathematical properties as T-norms, except that they’re characterized by superidempotency instead of subidempotency, which you don’t really need to know unless you’re devising your own fuzzy aggregates.[i] Instead, I’ll spend more time discussing the use cases for the different varieties of fuzzy set relations; suffice it to say for now that it boils down to looking for different shades of meaning in conjunctions like “or” and “and” in natural language, which often correspond to fuzzy unions and joins.
………… First, let me get the code for fuzzy unions out of the way. To readers of the last article, the structure of Figures 1 through 4 ought to look familiar. Once again, I’m using the output of two stored procedures I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores and Outlier Detection with SQL Server, part 2.2: Modified Z-Scores as my membership functions (you can of course test it on other membership functions that have nothing to do with Z-Scores), then storing the results in two table variables. As usual, the GroupRank and OutlierCandidate columns can be safely ignored (they’re only included in the INSERT EXEC because they were part of the original procedures), while the ReversedZScore column is used in conjunction with the @RescalingMax, @RescalingMin and @RescalingRange variables to normalize the results on the industry-standard fuzzy set range of 0 to 1. In fact, there aren’t any differences whatsoever in the code till we insert the results of the union in a third table variable, so that we can demonstrate several T-conormswith it later on; in practice, it may be possible to perform fuzzy unions without these table variables, depending on how the membership functions are being calculated. We don’t have to perform any mathematical operations on the records returned by ordinary crisp relations, but this is not true with their fuzzy counterparts. Using a regular INTERSECT operator in the sample code in the last tutorial would have been awkward to say the least, since it would have excluded unmatched values for the MembershipScores of both sets, which must be preserved to calculate the grade for membership in the fuzzy relation. The same problem arises when we try to implement a fuzzy union with the T-SQL UNION operator, so I had to resort to a workaround similar to the one used for fuzzy intersections, with a few cosmetic changes. I use a FULL JOIN here instead of an INNER JOIN and had to add IsNull statements and a CASE to substitute grades of 0 whenever the Value of one of the columns is NULL. The same two WHERE clauses I used after the join condition in the last tutorial are present here, but are buried in two subqueries; thanks to the null checks, they’re assigned membership grades of 0 on one of the two sets, whenever the WHERE clause removes them from one side but not the other.
Figure 1: Sample Code for a Fuzzy Union on Two Different Types of Z-Scores
DECLARE
@ RescalingMax decimal ( 38 , 6 ), @ RescalingMin decimal ( 38 , 6 ), @ RescalingRange decimal ( 38 , 6
)
DECLARE
@ ZScoreTable
table
(
PrimaryKey
sql_variant
,
Value
decimal ( 38 , 6
),
ZScore
decimal ( 38 , 6
),
ReversedZScore
as CAST ( 1 as decimal ( 38 , 6 )) ABS ( ZScore
),
MembershipScore
decimal ( 38 , 6
),
GroupRank
bigint
)
DECLARE
@ ModifiedZScoreTable
table
(
PrimaryKey
sql_variant
,
Value
decimal ( 38 ,