My Three Levels of Abstraction

I tell people that I am a software developer, but that’s not what I really do. I’m a problem solver that sometimes has to use code to solve problems I am faced with. When I was only a few years into my career, my boss once referred to me as a “hacker”, and I found a great truth in the term. My job wasn’t to package and sell applications to users, it was to hack away and eventually solve complex problems involving spatial data and network routing. As I matured as a problem solver, I came to identify three different levels of abstraction.

For context, consider a request for some processing in a spatial database on data representing a collection of points and roads. To join along or understand further, reference the MTFCC Classifications and an SQL script tailored for DuckDB.

How many points are within 100m of a primary road?
How many points are within 100m of a secondary road?

Level 1: Solves one known problem

In the first level, abstraction is nearly non-existent. We write two queries, one to answer each request. Our solution doesn’t provide the ability to answer any questions beyond the identified requests. These types of solutions lack reusability and are more likely to be redundant with other solutions created in the past and future. They only require knowledge of the task at hand.

select count(distinct p.id) Count
from points p
join roads r on st_dwithin(p.geom,r.geom,100)=true
where r.mtfcc='S1100';

Count
26

select count(distinct p.id) Count
from points p
join roads r on st_dwithin(p.geom,r.geom,100)=true
where r.mtfcc='S1200';

Count
107

Level 2: Solves multiple known problems

Instead of answering each request independently, we create a solution that can answer both requests. As a bonus, we reveal additional information as we can now see the distribution of nearest road types for any point within 100m. This depended on the fact that we had knowledge of more than one request. When working in teams or organizations, effective communication becomes increasingly important at this level. It’s easier to create solutions for known problems than unknown ones.

select mtfcc Classification
  ,count(distinct p.id) Count
from points p
join roads r on st_dwithin(p.geom,r.geom,100)=true
group by 1
order by 1;

Classification	Count
S1100	26
S1200	107
S1400	1182
S1500	2
S1630	3
S1640	2
S1710	1
S1750	8

Level 3: Solves multiple known and unknown problems

At this level, we are predicting and anticipating the future. We ask ourselves what requests will come next and what other questions we can answer. We don’t just answer the things we know have been asked of us, we explore further. The solution we create is capable of answering multiple known and unknown questions. We elect to create a new data set that provides the closest road_id for every point_id regardless of distance.

drop table if exists points_all;
create table points_all as
select distinct on (p.id)
   p.id point_id
  ,r.tlid road_id
  ,r.mtfcc classification
  ,st_distance(p.geom,r.geom) distancem
from points p,roads r
order by 1,3

point_id	road_id	classification	distancem
1	647157989	S1400	826.0732468995094
2	2828553	S1400	420.80429160397244
3	2822803	S1400	138.23585861539044
4	2825668	S1400	207.97482230400536
5	2840003	S1400	70.72361199647597

Consider how much more we can now answer with our derivative table points_all showing every relationship, regardless of road classification or distance.

How many points are within 200m of any road? Further than 1km?
Statistics like average, percentiles, and more can be determined.
We can bin the results into distance bands (0-10m, 10-25m, 25-50m, etc.) for all roads or by classification.

Simple example, but with big implications

The example provided is simple. The implications are large. What if our data set was hundreds of millions of points? If we aproach with the mindset of level 1, each query may run for hours before returning a result. If subsequent requests are made to determine the count of points for different distances or classifications, the solution will, again, be hours away. With the knowledge of more information we get in level 2 or level 3, we may choose to ask different questions.

The provided example uses a spatial data analysis task, but the same principles can be applied to software development. The lessons are clear: Work with a more expansive understanding of the problems you are solving while focusing on building tools and capabilities rather than just generating one off answers.

Tags: software thinking abstraction duckdb spatial