//****************************************************************************//
//**************** Knowledge Engineering - October 7th, 2019 ****************//
//**************************************************************************//
- What? Professor Goel isn't here?
- Instead, a man unknown to me gazes at the class
- Spencer Rugaber is going to be talking to us - "about pizza-based AI!"
- In this lecture, we'll talk about:
- Knowledge representations
- Terminology
-
--------------------------------------------------------------------------------
- Imagine you've decided to open a pizza restaurant - what do you need to do?
- You need to print menus, decide what types of pizza you're going to serve, what ingredients to order, etc.
- Let's start with a more fundamental question, though: what IS a pizza?
- "Things on bread" is too broad; "edible things" narrows it down, but is still WAY too broad
- To refine it, let's say pizza is a layered food with dough as a base, then food toppings
- Are there any toppings that are required? Cheese and tomato sauce seem to be defaults, but what about white-sauce pizzas? Desert pizzas?
- What REALLY makes something pizza? It's hard!
- As another example, Dell originally started as a "just-in-time" manufacturer, where they wouldn't build a computer until you ordered one
- That reduced inventories, but meant that Dell has to rapidly communicate with their suppliers to get things out
- That requires a precise language to communicate this information quickly and accurately
- This means building an "ontology" - which is what KNOWLEDGE ENGINEERING is all about!
- First off, let's define some terms we're going to use today:
- DATA is the result of some sort of sensory input (whether it's a human or non-human sensor)
- We should expect some of this data to be noisy and inconsistent, right? That's why "data wrangling" is a part of data science!
- INFORMATION is the data sufficient to make decisions
- A BIT is the smallest unit of information that tells us about the outcome of a 50/50, "true or false" choice
- KNOWLEDGE is a set of information that's been grouped or structured together in some way
- WISDOM is where we know how to apply this knowledge effectively, often gathered through experience
- Now, we want to represent this knowledge so it can support different applications
- There are 3 main "orders" of knowledge representations:
- VOCABULARY is the agreed set of "words" representing concepts that we'll use
- Note that words are NOT the same as a concept; there can be synonyms, words for different levels of abstraction, etc
- A TAXONOMY is where we have a hierarchy of concepts
- For instance, we can have "hypernymy" or "is-a" relationships (similar to a parent class in OOP, where the subclass is called a "hyponym")
- For instance, "meat" is a hyponym of "food"
- We can represent this taxonomy as a DAG (not quite a tree, since we can have multiple superclasses for a single subclass)
- Finally, an ONTOLOGY is a set of concepts and their relationships
- Hierarchies are just one type of relationship, but there are other kinds!
- A "part-of" or "has-a" relationship is different from an "is-a" relationship, for instance
- For instance, we might say a pizza is composed of multiple slices, each of which has some number of calories
- We might also say water has a "does-not-mix-with" relationship with oil
- There are a couple standards for expressing ontologies, like OWL or RDF
- Why might we want to break things into ontologies like this? There're a few different reasons:
- We might want to better understand a domain
- We might want to reuse our domain knowledge, which we can do more easily by structuring our knowledge
- for instance, if we ask Wikipedia "what was the phase of the moon when Michael Jackson was born?", it can't do that; DBPedia, though, has all that information saved as a database, so it can!
- We might want to provide standards for supporting interoperability, like HTML
- If W3 hadn't standardized HTML, web browsers would be a mess - they'd have to support ridiculous numbers of custom language!
- So, ontologies are great! How can we make them?
- First, we have to decide the SCOPE of our ontology - what's the domain we care about modeling?
- For our pizzeria example, we might decide we're not modeling the customers, or the pricing, our our logistics with other businesses
- We then want to decide what CONCEPTS and TAXONOMIES we need
- CONCEPTS are base units of knowledge in the domain, and represent a collection of examples with similar properties
- For instance, "whale," "pizza," "fluke," "baleen," "pod," etc.
- Concepts are then organized into taxonomies
- We'll then characterize these concepts with ATTRIBUTES and VALUES, similar to variables on objects in OOP
- For instance, a whale might have a "lifespan" attribute with a value of "90 years"
- Attributes are part of the concept itself, while values are specific to individual instances of that concept
- INSTANCES are specific, individual examples of a concept (e.g. instances of a whale might be Moby Dick, Shamu, your favorite Killer Whale at Seaworld, etc.)
- Note that an instance of a concept is ALSO an instance of any of that concept's superclasses/hierarchies
- We might also have INTRINSIC and EXTRINSIC concepts
- INTRINSIC concepts have their definition inherent in the concept itself (e.g. "blowhole")
- EXTRINSIC concepts are dependent on outside factors; for instance, what "endangered" means might vary from country to country
- We'll then define the RELATIONSHIPS between our concepts and any CONSTRAINTS on their attribute values
- A RELATIONSHIP is just any connection between different concepts (e.g. a humpback is-a whale, dolphins sometimes travel-with whales, etc.)
- CONSTRAINTS just limit what possible values an attribute can have (e.g. a porpoise's fin shape MUST be triangular, a human must have an age between 0 and 150 years, etc.)
- One common tool that's used to make ontologies is PROTEGE, which is a freely available tool from Stanford
- That's generally how we come up with ontologies of knowledge, but there are a few issues that can come up with this
- (TODO:)
- In general, the lifecycle of an ontology looks something like this:
- ACQUISITION is where you try to get the knowledge you need to start modeling things
- Manually doing this is time-consuming and expensive, especially if you converse with subject matter experts (SMEs)
- Automated approaches are good, but often run into limitations
- ANALYSIS is where you try to make sure your ontology is correctly defined and appropriate
- You can try doing this by checking for semantic consistency (any broken constraints, any cycles in the hierarchy, undefined terms, etc.), and looking for some red flags tha
- You'll then EVALUATE your ontology by building application with it, checking with SMEs to see if it fits their needs, and iterating
- Finally, you'll have to deal with MAINTENANCE and keeping your ontology relevant as things change and requirements evolve
- You may have to merge two ontologies together (for instance, if there's a company merger), or create a brand new ontology if
- Why do we care about these ontologies, again? The killer app for it is something called the SEMANTIC WEB
- Currently, HTML gives pretty shallow knowledge about what a webpage is actually ABOUT, and just focuses on how the document should look and be organized; the hope is that we can eventually add more structured knowledge to websites and take advantage of that knowledge
- XML was an early attempt at this by adding some structure and constraints; RDF tried to add some relations to this, while OWL added heavier constraints and is attempting to formalize this knowledge more strictly
- RDF and OWL are both heavily in use for ontologies
- This is all fancy stuff, but really, knowledge engineering its simplest is just labeling stuff in a fancy way so we know what we're looking at, and what we can do with it
- Alright, thank you very much for listening! There're some resources on my website if you're interested in this stuff, but that's all!