Vittal Sirigiri, one of the top data scientists in Locus, has had quite the professional journey—investment banker, cofounder of a food business, self-taught coder, he’s worn many hats. Vittal’s will to keep expanding his horizons and love for all things numbers made him take up data science and he’s not looked back since. He is a part of the core team that works on Locus’ proprietary geocoding engine.
Team Locus caught up with him to understand what really happens behind the scenes and why geocoding is going to be more important than ever in the coming days.
What is geocoding?
Geocoding is converting an address written in text form to a position on the map, a latitude and a longitude. Imagine these scenarios: Figuring where a particular place is, how to reach there, what else is nearby. All of these involve a map and a map needs coordinates. Without coordinates, none of these services would work. Humans do not think in coordinates yet. They describe the coordinate in words, i.e. an address. Our job is to translate that address into a coordinate on which the rest of the magic can start working.
Geocoding is really important in today’s age. If you were to go back, say, 20 years, a post carrier delivered your mail. The carrier knew the area in and out. S/he is a human geocoder and knows exactly where an address is located.
If you now see how cities are growing, businesses are growing… people in this (delivery business) space are unaware of the city, its roots, alleys, etc. If a new person comes and starts to work without having any idea of what that place is, it is like searching for a needle in a haystack. Geocoding makes a huge difference. You can go about doing your job much more efficiently.
A good geocoding solution brings people and services together seamlessly and enhances the entire transaction experience. Geocoding is extremely crucial for any of the services to take off. Pick any industry, and maps will be playing an important role somewhere in the supply chain. It makes your transaction simpler, efficient, and smooth. Once you geocode well, the downstream effects like optimization, routing become efficient.
Another aspect of the geocoding service is address verification. We can verify the quality of an address and tell our clients if a certain address needs manual attention. In certain cases, customers might place fake orders and submit fake addresses. The address verification service can highlight and draw attention to such addresses. Imagine if we can tell that out of the 100 addresses given, these five addresses need manual intervention. This is useful in both developed and developing countries.
What are the real problems in addresses? Why is it tough for a machine to decipher addresses?
Geocoding essentially has two components to it— the database (where locations are mapped to coordinates, to put it simplistically) and the second part is querying this database when you get an address that needs to be geocoded. If you ask wrong questions, you shall get wrong answers! Understanding the addresses and figuring out what to query can make a lot of difference.
When people start writing random addresses, it gets tricky. Addresses are descriptive, people in many places. People don’t just write the address, they even write, ‘Take left from here, behind the ATM,’ etc. It makes sense if a human is reading it, but imagine if a machine is looking at it. It gets conflicting information. The location intelligence comes in figuring out what the person wants to say and extract what is probably the most relevant information in the address and then do the appropriate database search.
This part is hard in the developing countries, it is much easier in the developed countries.
What happens in the background?
Typically what are addresses? It is a specific point— a house, street, locality and then a city and pincode. When we get the address, we try to break it up into its respective components.
Imagine an inverted pyramid: The largest area is the city, you then try to keep narrowing it down, it becomes smaller and smaller, until we come to a fine point which is very precise. The main work that happens in the background is breaking the address into various components and once we get the components out, it is a simple enough thing.
We just need to query the database. If you do a bad job in figuring out the exact thing to search for, you’ll get different answers. The intelligence is figuring out what is what in the given address and making the search.
What is special about our Locus Geocoder?
Our uniqueness comes from our understanding of addresses. We have built intelligence in figuring out what are the different components of an address. It is not an easy thing to solve.
We also went for a slightly more practical approach, than just a technical one. To elaborate on that, one thing is- I can give you the most accurate answer, but if I don’t find it, I can be like, ‘See, I don’t know the answer’ and I give up.
But our idea is- given that humans are delivering the packages, even if we nudge them in the right direction, they will figure it out from there. If I don’t confidently know what the answer might be, let me at least tell you the nearest thing I figured out. If you can get there, you can figure out where the actual place is.
Our solution gets work done on the ground. We’ve built a lot of checks and balances to make sure it works on the ground.
Can you give us an idea of how you take into account the various quirks of different geographies while working on geocoding? Let’s face it, not all countries write addresses the same way.
In geographies like the US, Europe, Australia, the databases are very good. The government entities handling these have good information available. There is an established format in which addresses are written and people stick to it. Thus, the accuracy of geocoding improves a lot. There is no need to guess anything. For instance, in Singapore each building is a pincode. You just need to write the pincode and you’ll know exactly where that is. It is that accurate and hence it is fairly easy to geocode there.
When you come to countries like India and Indonesia, things get a bit tricky. In Thailand, for example, they’ll write Number 33, Village 39, and that’s it.
Markets wise, it is easier in some places because of the data that they have and tricky in countries where it is not that good.
Just to delve a bit deeper.
Here are a few more examples. For instance, take this random address in India’s Noida—House no.34, Pocket A, Sector-52, Noida. Sometimes, people don’t write Pocket A, Sector-52, they don’t write the important words. They write 32, 52A, Noida. A human who sees this will understand what it means. Suddenly when you throw that to a machine, the machine will get confused. There is sector 32, there is sector 52, there is sector A, but what is the relevant thing? So, we need to figure out what it means when people write it in their own unique way.
If you look at Vietnam. Vietnam has a main street, let’s call it CMT 8, now there is a lane that is off CMT 8, that lane is called 110 CMT 8. This 110 is connected to another road, that is 110/36 CMT8. If 36 is connected to another road called 69, it’ll become 110/36/69. That is the way they write their streets, which is completely different from how it is written in India.
What’s special about Locus is that we actually spend enough time in each geography to understand the nuances in the way people write addresses. Each country is different, right? One rule does not work for everyone. We come up with code and logic to suit various situations.
Our geocoder is also unique as we have a proper feedback mechanism in our system. We can learn from our mistakes. If you are in ‘x,y’ but the actual answer is ‘x1,y1’, and if I’m told that the answer is ‘x1,y1’ consistently, the geocoder learns from that.
Since our clients end up executing on the ground, we have information on where they went and from the on-ground data we can look back on what we did and what actually happened on ground, that’s a nice feedback loop to improve our geocoding system. It is a big machine learning algorithm, the more data I get, the more accurate I become over time.
Why is geocoding important in the Covid-19 times?
Geocoding becomes slightly more important than what it already is due to Covid-19. Imagine a scenario where there is no geocoding. The way a person would reach a place is calling and figuring out, right? Now what if while figuring out you reach a containment zone or a red zone?. Having a precise geocode helps you avoid such things. I can now tell upfront that this address is in a containment zone or it is in a red zone and hence please be more careful. Without geocoding this isn’t possible.
How it becomes important in the current times is-the more accurate and more precise your answers are, the chances of things going wrong is lesser. I don’t end up putting you in a containment zone when you are not supposed to be there. A bad geocoder will give you random answers and in today’s world random places have a cost to it— physically and business wise. Business fleets are a lot more safer. Earlier this was a good to have, now it is a need to have for businesses.