Building an OSINT Investigation Workflow With Maltego and SpiderFoot

Two tools dominate practical open-source intelligence work, and they solve opposite halves of the same problem. Maltego turns relationships into a graph an analyst can reason about, surfacing how a domain connects to a person, how a person connects to a wallet, how a wallet connects to a forum handle. SpiderFoot brute-forces breadth: feed it a target and it queries hundreds of sources unattended while you work on something else. Run them in isolation and you get either a beautiful graph with thin data or a mountain of unsorted findings. Run them together — SpiderFoot collecting, Maltego correlating — and you get an investigation workflow that scales from a single email address to a multi-week intrusion attribution.

This guide walks the full pipeline: scoping the target, automating collection with SpiderFoot, importing into Maltego for graph analysis, pivoting to fill gaps, and producing something defensible at the end. It assumes you’ve heard of both tools but haven’t yet wired them into a repeatable process.

What Each Tool Actually Does

Maltego is a graph-based link analysis platform built around entities (a domain, an email, a person, a phone number, an AS number) and transforms (small functions that take an entity and return related entities). Right-click a domain entity, run To DNS Name [Robtex], and the graph populates with subdomains. Right-click those, run To IP Address, and so on. The Maltego Standard Transforms hub includes over 150 transforms covering DNS queries, search engines, social networks, and various APIs, with paid hubs from Shodan, SecurityTrails, WhoisXML, Have I Been Pwned, and others extending coverage.

There are three editions worth knowing about. Community Edition is free, requires registration, and caps results per transform — fine for learning, frustrating for real cases. Maltego Pro unlocks unlimited results and the full Standard Transforms set. Enterprise adds collaboration and case management. The Community cap on transform results is the single most common reason analysts move to Pro.

SpiderFoot is the automation counterpart. SpiderFoot has over 200 modules, most of which don’t require API keys, and many of those that do require API keys have a free tier. Point it at an IP, domain, email, phone number, name, or username and it crawls outward through whichever modules you’ve enabled, feeding findings between modules so that — for example — a domain discovered by one module triggers WHOIS, breach-database, and certificate-transparency lookups in others. The current open-source release is 4.0.0, distributed via GitHub at smicallef/spiderfoot and bundled in Kali Linux. Intel 471 acquired SpiderFoot in November 2022, with founder Steve Micallef joining as VP of Attack Surface Technology. The open-source version remains on GitHub; commercial development continues as SpiderFoot HX, a hosted SaaS with additional modules, correlation rules, and change monitoring.

The shorthand: SpiderFoot is the collection engine, Maltego is the analysis surface. Neither replaces the other.

Setting Up Both Tools

Maltego runs as a desktop client on Windows, macOS, and Linux. Download from maltego.com, register an account, install the desired transform hubs from the Transform Hub panel, and configure API keys for any commercial transforms you intend to use. Free tiers exist for SecurityTrails, Shodan InternetDB, the Wayback Machine, and others — enough to run real investigations without spending money on day one.

SpiderFoot installs from source in three commands:

git clone https://github.com/smicallef/spiderfoot.git
cd spiderfoot
pip3 install -r requirements.txt
python3 ./sf.py -l 127.0.0.1:5001

That last command starts the web UI on localhost:5001. It needs Python 3.7 or newer. Settings → Modules is where you enter API keys for the modules that need them — Shodan, Have I Been Pwned, GreyNoise, AlienVault OTX, SecurityTrails, VirusTotal, and roughly forty others. The same modules without keys will still run; they just have less to query.

For headless or scripted use, SpiderFoot has a CLI. The flag surface is small and worth memorizing — it’s faster than the web UI for repeat scans.

Reference

SpiderFoot CLI Flags & Maltego Transform Categories

Verified against SpiderFoot 4.0.0 and Maltego Standard Transforms documentation.

SpiderFoot — sf.py

-s TARGET

Target for the scan (IP, domain, email, etc.)

-u {footprint,investigate,passive,all}

Auto-select modules by use case

-m mod1,mod2,…

Manually enable specific modules

-t type1,type2,…

Event types to collect; modules auto-selected

-x

Strict mode — only modules consuming target directly

-o {tab,csv,json}

Output format for export

-C scanID

Run correlation rules against an existing scan

-l IP:port

Start web UI listener (default 127.0.0.1:5001)

Maltego — Standard Transforms

To DNS Name [Robtex]

Subdomain enumeration from a domain entity

To IP Address [DNS]

Resolve hostnames to IPs

To Email Address [Search Engine]

Pull contacts associated with a domain

To Snapshots [Wayback Machine]

Recover deleted pages and historical content

To Open Ports [Shodan]

Exposed services on an IP or netblock

To EXIF Info

Extract metadata from posted images

To Breach [HIBP]

Check email against Have I Been Pwned

To AS Number / Netblock

Map IP to its parent autonomous system

A Four-Stage Workflow

The integration that follows uses SpiderFoot for breadth-first collection and Maltego for depth-first analysis. The handoff is a CSV export from SpiderFoot imported as entities in Maltego.

Stage 1: Scope and choose a use case

SpiderFoot ships with three preset use cases — and which one you pick shapes the entire scan. The use case options are footprint, investigate, and passive. Footprint maps your target’s external surface (subdomains, ports, technologies, exposed assets); use it for attack-surface work. Investigate assesses risk and reputation (breach data, blocklists, malware associations); use it for threat actor or fraud cases. Passive is the careful one — no module touches the target directly; everything goes through third-party data sources. Use passive when you don’t want the target seeing scan traffic.

Set scope before scanning. A scan with -u all against a corporate parent domain can run for many hours and produce tens of thousands of events, most of which are noise. Constrain by event type with -t (IP_ADDRESS,DOMAIN_NAME,EMAILADDR,VULNERABILITY) or by module list with -m to keep results tractable.

Stage 2: Run SpiderFoot, export, then walk away

A typical investigative scan from the CLI:

python3 ./sf.py -s example.com -u investigate -o csv > scan.csv

Or via the web UI: New Scan → enter target → choose By Use Case → Investigate → run. Realistic scan times for a single domain run from twenty minutes to several hours depending on enabled modules and target depth. SpiderFoot 4.0 introduced correlation rules — 37 rules ship in the existing rule set, with a template.yaml walk through and reference for the correlation rule-set. Run them post-scan with -C scanID to surface findings the rules flag as interesting (open S3 buckets, cleartext credentials in pastes, expired certificates on production hosts). Don’t skip this step. Raw scan output is overwhelming; correlations cut it to what an analyst actually needs to look at.

Export results as CSV from the web UI’s Export panel, or use the -o csv flag. The CSV columns SpiderFoot produces — Updated, Type, Module, Source, F/P (false positive flag), Data — are what you’ll feed into Maltego.

Stage 3: Import to Maltego, build the graph

Maltego accepts CSVs through Import → Import Graph from Table. The wizard maps CSV columns to entity types: SpiderFoot’s Type field maps to Maltego’s entity classes (IP_ADDRESS → maltego.IPv4Address, EMAILADDR → maltego.EmailAddress, INTERNET_NAME → maltego.DNSName). Spend time on this mapping — bad mapping produces a graph where everything is a generic “Phrase” entity and the transforms won’t fire correctly on it.

Once imported, layout matters. The default Block layout buries clusters; switch to Organic for graphs over 50 nodes. Use Centrality view (Investigate → Centrality) to identify which entities have the most connections — these are usually the pivot points worth investigating further.

Stage 4: Pivot with transforms, fill the gaps

This is where Maltego earns its place. SpiderFoot found that [email protected] exists. Now: right-click → To Breach [HIBP] confirms it appeared in a breach, To Phone Number [search] returns a candidate number, To Person infers identity, To Aliases finds reused usernames, To Social Media Profiles maps active accounts. Each step adds nodes and edges. Bookmark high-value entities. Add notes via the entity properties pane explaining why each finding matters — these notes survive into PDF and report exports.

For threat-actor work specifically, the MISP-Maltego integration is essential — a community transform set linking Maltego to a MISP threat-sharing instance and exposing the entire MITRE ATT&CK dataset as queryable entities. Pair this with SpiderFoot’s malware-association modules (AbuseIPDB, AlienVault OTX, Abuse.ch) and you can chase indicators from a single suspicious IP through to known TTPs and campaigns.

Workflow Stages

From Target to Defensible Report

Scope & Configure

Pick footprint / investigate / passive. Constrain event types. Add API keys.

Tool: SpiderFoot UI / sf.py -h

Automated Collection

Run scan unattended. Apply 37 correlation rules. Export CSV.

sf.py -s target -u investigate -C

Graph Construction

Import CSV. Map columns to entity types. Switch to Organic layout.

Tool: Maltego Import Wizard

Pivot & Enrich

Run transforms on high-centrality nodes. Annotate findings. Export report.

Tool: Standard / MISP / Shodan transforms

Where the Workflow Breaks

Three failure modes recur enough to plan around.

The Community Edition wall. Maltego CE caps results per transform run, usually at 12. For a small case this is fine. For a domain with hundreds of subdomains it’s a problem — you’ll see twelve, the analysis will look thin, and you won’t know what’s missing. Either commit to Pro or do bulk enumeration in SpiderFoot and import results, since CSV import bypasses the transform cap.

Rate limits and API quotas. Free tiers on Shodan, HIBP, VirusTotal, and SecurityTrails will burn through fast on real cases. SpiderFoot fails quietly when an API key hits its quota — modules just stop returning data. Check the scan log (Settings → Scan Log) for 429 errors or auth failures before concluding a target is “clean.” Maltego transforms surface API errors more clearly but also stop returning results once quota is hit.

False confidence in correlations. SpiderFoot correlation rules and Maltego transform results are starting points, not conclusions. A “compromised credentials” hit from a breach module could be a real exposure or a generic password-reuse list with no specific link to your target. Cross-check breach claims against the original source (HIBP’s individual paste lookup, DeHashed, or the breach itself) before putting “compromised” in a report. Public data may be outdated or inaccurate, necessitating verification — this is the most reliable thing said about OSINT.

A fourth, looser issue: target awareness. Maltego transforms and many SpiderFoot modules query third-party services that log requests. If the target operates one of those services (a corporate Shodan account watching their own assets, a domain owner watching WHOIS lookups, a Cloudflare customer watching certificate-transparency requests on their cert), they may see the investigation. Use SpiderFoot’s passive use case and Maltego transforms tagged “passive” when this matters.

Frequently Asked Questions

Do I need both tools, or can SpiderFoot’s web UI alone cover it? For attack-surface monitoring of your own assets, SpiderFoot alone works. The moment investigation requires understanding relationships — same registrant on multiple domains, shared infrastructure between actors, a person tied to multiple aliases — Maltego’s graph buys you something the table view doesn’t.

Can I drive Maltego entirely from SpiderFoot output without manual transform runs? Mostly. The CSV-import path captures everything SpiderFoot found. But the Maltego transform layer (HIBP, MISP, Shodan, Wayback) reaches sources SpiderFoot doesn’t, and several Maltego transforms produce richer entity properties than SpiderFoot’s flat data field. Treat them as complementary, not redundant.

What’s the difference between SpiderFoot open-source and SpiderFoot HX? HX is the hosted commercial product. It adds change monitoring, multi-target scanning, additional HX-only modules, Slack/email notifications, account management, and a revamped UI. The open-source 4.0.0 release on GitHub remains free and feature-complete for single-analyst investigative work; HX targets teams running continuous monitoring.

Is any of this legal? OSINT collection from genuinely public sources is legal in most jurisdictions, but the boundary moves depending on local privacy law, terms of service of the platforms being queried, and the purpose of the collection. GDPR, CCPA, and similar regimes apply when you’re processing personal data about identifiable individuals. Get authorization in writing for any target you don’t own, and keep evidence of where each finding came from — Maltego’s per-entity notes and SpiderFoot’s source field both help here.

What to Build First

Pick a target you own — your personal domain, your homelab IP block, or a sandbox account — and run the full workflow once before doing real work. Scan with SpiderFoot’s footprint use case. Export. Import to Maltego. Run three or four transforms. Sit with the resulting graph for ten minutes and notice what’s wrong: stale DNS records, exposed services you forgot about, an old social profile linking back to your real name. The exercise teaches you what a target sees when someone investigates them, which is the only way to build a defensible workflow for investigating others.

Both tools are free to start. The discipline that turns them into an investigation capability — scoping, correlation, verification, attribution — is what takes time.