//****************************************************************************// //**************** Knowledge Engineering - October 7th, 2019 ****************// //**************************************************************************// - What? Professor Goel isn't here? - Instead, a man unknown to me gazes at the class - Spencer Rugaber is going to be talking to us - "about pizza-based AI!" - In this lecture, we'll talk about: - Knowledge representations - Terminology - -------------------------------------------------------------------------------- - Imagine you've decided to open a pizza restaurant - what do you need to do? - You need to print menus, decide what types of pizza you're going to serve, what ingredients to order, etc. - Let's start with a more fundamental question, though: what IS a pizza? - "Things on bread" is too broad; "edible things" narrows it down, but is still WAY too broad - To refine it, let's say pizza is a layered food with dough as a base, then food toppings - Are there any toppings that are required? Cheese and tomato sauce seem to be defaults, but what about white-sauce pizzas? Desert pizzas? - What REALLY makes something pizza? It's hard! - As another example, Dell originally started as a "just-in-time" manufacturer, where they wouldn't build a computer until you ordered one - That reduced inventories, but meant that Dell has to rapidly communicate with their suppliers to get things out - That requires a precise language to communicate this information quickly and accurately - This means building an "ontology" - which is what KNOWLEDGE ENGINEERING is all about! - First off, let's define some terms we're going to use today: - DATA is the result of some sort of sensory input (whether it's a human or non-human sensor) - We should expect some of this data to be noisy and inconsistent, right? That's why "data wrangling" is a part of data science! - INFORMATION is the data sufficient to make decisions - A BIT is the smallest unit of information that tells us about the outcome of a 50/50, "true or false" choice - KNOWLEDGE is a set of information that's been grouped or structured together in some way - WISDOM is where we know how to apply this knowledge effectively, often gathered through experience - Now, we want to represent this knowledge so it can support different applications - There are 3 main "orders" of knowledge representations: - VOCABULARY is the agreed set of "words" representing concepts that we'll use - Note that words are NOT the same as a concept; there can be synonyms, words for different levels of abstraction, etc - A TAXONOMY is where we have a hierarchy of concepts - For instance, we can have "hypernymy" or "is-a" relationships (similar to a parent class in OOP, where the subclass is called a "hyponym") - For instance, "meat" is a hyponym of "food" - We can represent this taxonomy as a DAG (not quite a tree, since we can have multiple superclasses for a single subclass) - Finally, an ONTOLOGY is a set of concepts and their relationships - Hierarchies are just one type of relationship, but there are other kinds! - A "part-of" or "has-a" relationship is different from an "is-a" relationship, for instance - For instance, we might say a pizza is composed of multiple slices, each of which has some number of calories - We might also say water has a "does-not-mix-with" relationship with oil - There are a couple standards for expressing ontologies, like OWL or RDF - Why might we want to break things into ontologies like this? There're a few different reasons: - We might want to better understand a domain - We might want to reuse our domain knowledge, which we can do more easily by structuring our knowledge - for instance, if we ask Wikipedia "what was the phase of the moon when Michael Jackson was born?", it can't do that; DBPedia, though, has all that information saved as a database, so it can! - We might want to provide standards for supporting interoperability, like HTML - If W3 hadn't standardized HTML, web browsers would be a mess - they'd have to support ridiculous numbers of custom language! - So, ontologies are great! How can we make them? - First, we have to decide the SCOPE of our ontology - what's the domain we care about modeling? - For our pizzeria example, we might decide we're not modeling the customers, or the pricing, our our logistics with other businesses - We then want to decide what CONCEPTS and TAXONOMIES we need - CONCEPTS are base units of knowledge in the domain, and represent a collection of examples with similar properties - For instance, "whale," "pizza," "fluke," "baleen," "pod," etc. - Concepts are then organized into taxonomies - We'll then characterize these concepts with ATTRIBUTES and VALUES, similar to variables on objects in OOP - For instance, a whale might have a "lifespan" attribute with a value of "90 years" - Attributes are part of the concept itself, while values are specific to individual instances of that concept - INSTANCES are specific, individual examples of a concept (e.g. instances of a whale might be Moby Dick, Shamu, your favorite Killer Whale at Seaworld, etc.) - Note that an instance of a concept is ALSO an instance of any of that concept's superclasses/hierarchies - We might also have INTRINSIC and EXTRINSIC concepts - INTRINSIC concepts have their definition inherent in the concept itself (e.g. "blowhole") - EXTRINSIC concepts are dependent on outside factors; for instance, what "endangered" means might vary from country to country - We'll then define the RELATIONSHIPS between our concepts and any CONSTRAINTS on their attribute values - A RELATIONSHIP is just any connection between different concepts (e.g. a humpback is-a whale, dolphins sometimes travel-with whales, etc.) - CONSTRAINTS just limit what possible values an attribute can have (e.g. a porpoise's fin shape MUST be triangular, a human must have an age between 0 and 150 years, etc.) - One common tool that's used to make ontologies is PROTEGE, which is a freely available tool from Stanford - That's generally how we come up with ontologies of knowledge, but there are a few issues that can come up with this - (TODO:) - In general, the lifecycle of an ontology looks something like this: - ACQUISITION is where you try to get the knowledge you need to start modeling things - Manually doing this is time-consuming and expensive, especially if you converse with subject matter experts (SMEs) - Automated approaches are good, but often run into limitations - ANALYSIS is where you try to make sure your ontology is correctly defined and appropriate - You can try doing this by checking for semantic consistency (any broken constraints, any cycles in the hierarchy, undefined terms, etc.), and looking for some red flags tha - You'll then EVALUATE your ontology by building application with it, checking with SMEs to see if it fits their needs, and iterating - Finally, you'll have to deal with MAINTENANCE and keeping your ontology relevant as things change and requirements evolve - You may have to merge two ontologies together (for instance, if there's a company merger), or create a brand new ontology if - Why do we care about these ontologies, again? The killer app for it is something called the SEMANTIC WEB - Currently, HTML gives pretty shallow knowledge about what a webpage is actually ABOUT, and just focuses on how the document should look and be organized; the hope is that we can eventually add more structured knowledge to websites and take advantage of that knowledge - XML was an early attempt at this by adding some structure and constraints; RDF tried to add some relations to this, while OWL added heavier constraints and is attempting to formalize this knowledge more strictly - RDF and OWL are both heavily in use for ontologies - This is all fancy stuff, but really, knowledge engineering its simplest is just labeling stuff in a fancy way so we know what we're looking at, and what we can do with it - Alright, thank you very much for listening! There're some resources on my website if you're interested in this stuff, but that's all!