Assume we have a collection of 100,000 people who each have a list of attributes such as:
height: "between 130 and 140 cm"
eyecolor: "blue"
age_rangee: "16-18"
favorite_music_type: "jazz"
home_city: "NYC"
owns_a_boat: "no"
preferred_flower: "hyacinth"
bathing_frequency_per_month: 60
car_type: "minivan"
house_type: "apartment"
wears_jeans: "often"
wears_sandals: "never"
wears_boots: "sometimes"
The set of attributes may vary somewhat from person to person. The number of attributes can change, and the types of attributes may change. And of course, the values of the attributes may change.However, given one individual, we assume there is some overlap of his attributes with those a number of people in our collection.
The question I have is: "What is the best way of expressing these various attributes in a graph database such that I can most quickly select a group of say 50 people whose attributes most similarly match that of a particular individual, and order them from best-match to worst-match"
Thanks Kenny,
In your example Cypher query, do I understand that each feature node contains a key:value pair that identifies the attribute and it's corresponding value?
Here is a somewhat more complex congruence matching problem.
Assume we have a feature set, (A, B, C, D, E, F), and 100,000 people who have preferences that match to some degree this preference set. But each feature, not only may they have a preference, but they may have NO preference.
For example Lena's preferences are, (A, B, C, X, Y, Z), and Robert's preferences are, (A, B, C, _, _, ), (where underbar, (), signifies that any choice is OK)
We would like to rate Robert higher than Lena in terms of preference matching because while he and Lena have the same number of matching preferences, Robert has fewer mis-matching preferences
Here is a more concrete example:
Lets say we have 100,000 people who are interested in cars, and we know what features of cars are important to them. We have, say, 10 cars, with various features and we want to select a group of say, 50 people, whose desired-car-features best match each of the 10 cars.
Some people will have no preference about a subset all car features. For example Lester has no preference with respect to transmission, either 'automatic' or 'manual', would be just fine, and Rebecca has no preference with respect to 'color', 'power_windows', and 'power_door_locks'. Any color would be fine, and she does not care if the car has power windows and door locks.
So for example, here is a car with a defined set of features
engine: '4cylinder'
transmission: 'automatic'
color: 'dark blue'
size: 'subcompact'
age: 'less than 4 years'
power_windows: 'yes'
power_door_locks: 'yes'
average_gas_milage: 'greater than 30mpg'
And here we have two individuals, Lester, and Rebecca who have indicated features that are important to them:
Lester:
engine: '4cylinder'
color: 'dark blue'
size: 'subcompact'
age: 'less than 4 years'
power_windows: 'yes'
power_door_locks: 'yes'
average_gas_milage: 'greater than 30mpg'
Rebecca:
engine: '4cylinder'
transmission: 'automatic'
size: 'subcompact'
age: 'less than 4 years'
average_gas_milage: 'greater than 30mpg'
So how can we best select and order a group of 50 people whose feature-preferences best match each car? In this case we want the people with the maximal matching-feature preferences ranked first, but we also want to include those people who would be happy with any value of particular attributes.