A beginner-friendly approach to Natural Language Processing and its Challenges

Adam Henderix
6 min readJan 31, 2021

NLP for short, is a domain in which machines transform rigorous logic based on ones and zeros into our messy and procedural-based language. NLP is a bridge to how humans and machines might be able to communicate some day in the future. Today’s post will conduct a series of problems faced when trying to break down language into binary code.

How language was invented

It’s probably a good thing starting point to understand how we got to where we are from an evolutionary standpoint. Our survival is accredited because we have thumbs, but more importantly in this context because we have language. We always made up words on the spot in a procedural manner; our framework consisted of what sounded better. The advantage in this allowed us to quickly adapt our language to new environments; spending 3 hours trying to carve a new word based on mathematic principles would have taken us too long if we tried to create an artificial one like Esperanto.

The various challenges

NLP comes with a basket of challenges. The blame is to be put on ourselves because of how we constructed our language.

Implied intention versus literal intention

The first problem arrives because robots lacks the same understanding of a sentence as we do. When humans talk about something we often refer to the context of something else in our meanings. Take the example: The fire truck was rapidly passing through the other cars. This simple sentence assumes a lot of information that we take for granted:

  • Fire trucks are given priority in driving.
  • Fire trucks are designed to take down fires.
  • A fire truck has educated firemen in it operating the vehicle.
  • A fire truck carries large volumes of water.
  • A fire truck is distinguishable by its red color

The list goes on..

What’s more important is that knowing these facts gives us an advantage when comprehending what ought to be a rather odd sentence without this pre-existing knowledge we hold. What’s important to note is that we have real life experience and machines lacks that, which makes it really bad at understanding a sentence like:

The fire truck managed to save the kitten caught in the tree.

The AI will fail to understand how the kitten was saved because it doesn’t know what a tree is, and that a fire truck carries a long ladder. For us however it’s statistically easy to guess that it was the case. We cannot know for certain what happened, but we can use probability to our advantage. Sometimes we do get contextual meanings wrong, which sometimes turn into comedy, but generally we’re pretty good at making the connection.

This is an issue that we run into over and over again when it comes to Machine learning. We face at a problem with a certain knowledge bias, assuming that it’ll be an easy task because we are tremendously efficient at converging that perplexed information ourselves - but then we’re forced to comply with a harsh reality check as we realize how complex it really is.

Coreference resolution

Coreference resolution is another similar challenge in regards to how we translate logic over sentences. If “John partakes in a circus and dances with a clown” — followed with: “He liked dancing with John”, the machine might be confused who “he” is, even though it is inherently obvious that the Clown is = “He”.

Here’s another example:

Jenny gave Ortuga some chips.

She liked it.

Who is “she”? Answer is of course Ortuga, but how do we know? Some of us probably replayed a scenario with two characters named Jenny and Ortuga, and as Jenny gave Ortuga the chips, she ate them and liked them. How do we know that she ate them? We assume she ate them because the next sentence states that she liked them — Although she could have liked them because they looked nice. Again, how would an AI know?

This is a complicated problem and the only solution appears to be living. How else would a machine be able to figure out these intricate resolution puzzles? It’s not like you can teach a toddler to know all of these things using pen and paper. The best way is to let the kid socialize with his peers. The same solution in a different format might be how we teach Machines NLP.

Real world experience

This is a big issue with AI. Humans have evolved into becoming very adaptable beings. This experience is key when a machine is ought to learn our language. Humans live day to day in society while machines sit in a lab all day for experimentation. It is not surprising that humans are still superior at living while machines are sitting at a major handicap- even though the concept of AI has existed for 80~ years take or give.

One clue is that savants, which are humans whose part of their brain that organizes thought based on importance, have ceased to function properly. This leads to them remembering rather meaningless stuff such as memorizing different kinds of rocks at an extraordinary level. At the same time they’ll end up not being able to survive as normal; taking the bus, buying groceries and having a deep conversation about life becomes much harder for them. This serves as a hint to what humans do well since machines very much correlate with savants in terms of memory and intelligence.

Savants are still better than AI at living, there is no arguing about that. It is clearly not the final answer- but rather a nice clue towards how we might solve some of these problems in the future.

In solving the challenges of NLP

One interesting thing to comment on is how if we manage to solve these issues amongst many more, we might find clues to how we think ourselves. Right now we know very little about or brains and how they correspond to reality. If we were able to mimic our behavior into an artificial one, we would eventually have a working scheme of our own minds work. We could easily analyze that and be given rare clues to how things like consciousness works.

It is not odd to think that NLP has taken decades to evolve. After all, humans took millions of years (or billions depending on where you put the threshold). Being able to put more accurate definitions behind terminology like ‘consciousnes’ will also be a relief. It appears that we to a large extent in western societies consider something consciousness if it can fullfil social contracts (Mostly humans). Some cultures (Such as the yogis) consider all matter to have some form of consciousness, regardless if it lives or not. The actual honest answer is that we don’t fully know yet, and most people tend to draw their finalized conclusions around speculation.

Summary

To wrap things up and not go too far into the problems we’re currently faced with (In which there are many more), I thought it would be interesting to talk about how easy Natural Language Processing appears to be versus how hard it actually gets when you dig down into the details.

Usually we’re pretty good at approximating how hard something might be - although the timeline can be difficult if it’s something new, generally we can approximate that it’ll take a week or two, whilst in NLP we think that language is extremely simple - but that’s because our intuition doesn’t take into account that we’re actually really good at language ourselves.

No other animal does language as well as we do, so we need to take a step back and analyze language itself and find a middle ground between AI and humans. It won’t be easy since AIs are extremely rigorous relative to us humans, and doing it right will be important. We’re already seeing this today, when two countries that do not speak the same language, we become more isolated towards each other, which might lead to a war. This could arguably also happen with AIs and Humans going into war as some speculation suggests.

Understanding is key.

--

--

Adam Henderix

I write about AI-related topics and publish on the weekends. I’m interested in empowering people towards having an impact as the optimal way of getting there.