My Three Levels of Abstraction

I tell people that I am a software developer, but that’s not what I really do. I’m a problem solver that sometimes has to use code to solve problems I am faced with. When I was only a few years into my career, my boss once referred to me as a “hacker”, and I found a great truth in the term. My job wasn’t to package and sell applications to users, it was to hack away and eventually solve complex problems involving spatial data and network routing. As I matured as a problem solver, I came to identify three different levels of abstraction.

For context, consider a request for some processing in a spatial database on data representing a collection of points and roads. To join along or understand further, reference the MTFCC Classifications and an SQL script tailored for DuckDB.

  1. How many points are within 100m of a primary road?
  2. How many points are within 100m of a secondary road?


Level 1: Solves one known problem

In the first level, abstraction is nearly non-existent. We write two queries, one to answer each request. Our solution doesn’t provide the ability to answer any questions beyond the identified requests. These types of solutions lack reusability and are more likely to be redundant with other solutions created in the past and future. They only require knowledge of the task at hand.

select count(distinct p.id) Count
from points p
join roads r on st_dwithin(p.geom,r.geom,100)=true
where r.mtfcc='S1100';

Count
26

select count(distinct p.id) Count
from points p
join roads r on st_dwithin(p.geom,r.geom,100)=true
where r.mtfcc='S1200';

Count
107


Level 2: Solves multiple known problems

Instead of answering each request independently, we create a solution that can answer both requests. As a bonus, we reveal additional information as we can now see the distribution of nearest road types for any point within 100m. This depended on the fact that we had knowledge of more than one request. When working in teams or organizations, effective communication becomes increasingly important at this level. It’s easier to create solutions for known problems than unknown ones.

select mtfcc Classification
  ,count(distinct p.id) Count
from points p
join roads r on st_dwithin(p.geom,r.geom,100)=true
group by 1
order by 1;

ClassificationCount
S110026
S1200107
S14001182
S15002
S16303
S16402
S17101
S17508


Level 3: Solves multiple known and unknown problems

At this level, we are predicting and anticipating the future. We ask ourselves what requests will come next and what other questions we can answer. We don’t just answer the things we know have been asked of us, we explore further. The solution we create is capable of answering multiple known and unknown questions. We elect to create a new data set that provides the closest road_id for every point_id regardless of distance.

drop table if exists points_all;
create table points_all as
select distinct on (p.id)
   p.id point_id
  ,r.tlid road_id
  ,r.mtfcc classification
  ,st_distance(p.geom,r.geom) distancem
from points p,roads r
order by 1,3

point_idroad_idclassificationdistancem
1647157989S1400826.0732468995094
22828553S1400420.80429160397244
32822803S1400138.23585861539044
42825668S1400207.97482230400536
52840003S140070.72361199647597

Consider how much more we can now answer with our derivative table points_all showing every relationship, regardless of road classification or distance.

  • How many points are within 200m of any road? Further than 1km?
  • Statistics like average, percentiles, and more can be determined.
  • We can bin the results into distance bands (0-10m, 10-25m, 25-50m, etc.) for all roads or by classification.


Simple example, but with big implications

The example provided is simple. The implications are large. What if our data set was hundreds of millions of points? If we aproach with the mindset of level 1, each query may run for hours before returning a result. If subsequent requests are made to determine the count of points for different distances or classifications, the solution will, again, be hours away. With the knowledge of more information we get in level 2 or level 3, we may choose to ask different questions.

The provided example uses a spatial data analysis task, but the same principles can be applied to software development. The lessons are clear: Work with a more expansive understanding of the problems you are solving while focusing on building tools and capabilities rather than just generating one off answers.