Читать онлайн "Marketing In Cultures" - Wolliams D. - RuLit

In order to explore our data set we therefore need to apply a different body of mathematics which is appropriate for our cause. Recent developments in relational database technology, database mining methods, and knowledge elicitation (Expert Systems) came to our rescue. The following treatment is based on the ID3 induction algorithm. Because these new techniques may not be familiar to the reader and because of their importance in our debate, we will give a short explanation rather than simply quote the results.

For the purpose of discussion, consider a very small but typical portion of our database based on ten cases. (Note: these are for the illustration of these new methods of analysis and these cases are not intended to imply or categorize any stereotypes through these examples.)

Case

Purchasing decision

Country

Function

Gender

universalist

senior manager

male

universalist

junior manager

male

particularist

senior manager

female

universalist

senior manager

female

particularist

VEN

senior manager

female

particularist

VEN

senior manager

male

particularist

senior manager

male

particularist

VEN

junior manager

male

universalist

junior manager

female

universalist

junior manager

male

In the domain of data mining, the various items are called "attributes" rather than factors. This helps to differentiate between parametric factor analysis methods or variables. For simplification at this stage, the first attribute, "dimension score," has been given only two values; namely whether a respondent is likely to adopt a "universalist" or "particularist" purchasing decision. This is called the goal attribute.

We shall see later how we can use data mining where the goal attribute is not restricted in this way to two extreme values. Indeed, any of the attributes can be multistate.

The basic principle is to find the relative importance of the various attributes in determining the goal attribute. If we normalize (arrange) the data to the so-called third normal form in separate tables (as we would for representation in a relational database), we obtain:

1: Cases Sorted by Country

particularist

VEN

particularist

VEN

particularist

VEN

universalist

particularist

universalist

2: Cases Sorted by Manager Function

particularist

senior

universalist

senior

particularist

senior

particularist

senior

particularist

senior

universalist

senior

universalist

junior

particularist

junior

universalist

junior

universalist

junior

3: Cases Sorted by Gender

universalist

male

universalist

male

particularist

male

particularist

male

particularist

male

universalist

male

particularist

female

universalist

female

particularist

female

universalist

female

When we look at the attribute gender in table 3, we see that we can't determine the goal attribute - i.e., whether males or females are universalistic or particularistic in their purchasing decisions - from a given gender.

Similarly, for either a junior or senior manager function, the goal attribute can't be uniquely determined from table 2. When we look at the attribute country in table 1 we find that in all cases where, for example country = US, we can correctly determine that the goal is universalistic. If we know "country," we can correctly classify six of the ten examples in our data set. In data mining terminology, the attribute "country" is therefore said to have the highest information content.

For the full database, we can compute the amount of entropy for each attribute. This gives us a measure of the uncertainty of classification of our goal by each attribute. As the entropy increases, the amount of uncertainty we gain by adding each attribute increases. However, what we really want to know is how much information there is when we know the value(s) of any particular attribute.

If HC(attribute value) is the entropy of attribute of class "c" then this is given by:

Thus, the entropy of classification for Management Function is 'senior manager' is:

HC(function is senior)

-f(particularist(function is senior) × logf(particularist(function is senior) -f(universalist(function is senior) × logf(universalist(function is senior)

-4/6log(4/6)-2/6log(2/6)

0.918

Similarly,

HC(function is junior)

= -f(particularist(function is junior) × log f(particularist(function is junior) -f(universalist(function is senior)junior) × log f(universalist(function is junior)

= -l/4log(l/4)-3/4log(3/4)

= 0.811

Hence, for the overall value of H(function), we simply weight these by the ten cases:

HC(manager function) = 6/10 x 0.918 + 4/10 x 0.811 = 0.8752

Repeating this procedure for the other attributes we obtain:

HC(gender) = 1.0

HC(country) = 0.4

Since HC(gender) = 1.0, i.e. maximum uncertainty, this tells us that there is no information about the goal contained in the attribute "gender." This is consistent with Table 3 which shows that half the males and half the females are of each goal.

Because HC(country) has the lowest entropy of classification, then this corresponds to the least uncertainty. In other words, "country" has the highest information content and thus "country" is the major contributor in explaining the cultural orientation on this dimension of this consumer. Manager function has a smaller contribution.

Implementing the Induction Algorithm