AttributeError: 'NoneType' Object Has No Attribute 'SetCallSite'

This error means your Spark DataFrame lost its active SparkContext; use one SparkSession or resync the context to fix it.

The message “attributeerror: ‘nonetype’ object has no attribute ‘setcallsite’” shows up in PySpark when a DataFrame tries to run an action while its internal handle to the Java SparkContext is missing. In practice, that happens when the DataFrame and the active session fall out of sync, a known Spark quirk documented in issue trackers and community threads.

What The Error Means

PySpark wraps the JVM. When you call an action such as collect(), head(), or show(), PySpark enters a small helper (SCCallSiteSync) that tags the call site on the JVM for clearer stack traces. If the Python-side SparkContext reference on the DataFrame is None, the bridge line self._context._jsc.setCallSite(...) fails, and you get the message. Apache’s tracker shows this call path and ties the failure to mismatched context objects.

AttributeError: ‘NoneType’ Object Has No Attribute ‘SetCallSite’ — Quick Checklist

Confirm One Session — Keep one SparkSession and reuse it; avoid mixing sessions from different cells or modules.
Resync The DataFrame — If the DataFrame lost its JVM links, reattach them to the current session (workaround below).
Run Actions After Setup — Call actions only after the session is ready; don’t trigger actions during teardown.
Avoid Chained Show() — Don’t chain .show() mid-expression; put it on its own line.
Watch Third-Party Hooks — Tools that wrap Spark (Great Expectations, task runners) can surface the same failure if contexts diverge.

Fixing ‘NoneType’ Object Has No Attribute ‘SetCallSite’ In PySpark

Baseline: Use A Single SparkSession

Start One Session — Create and reuse one session across your code path:

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder
    .appName("app")
    .getOrCreate()
)

# Reuse `spark` everywhere; don't shadow it.

Mixing sessions (or creating a DataFrame under one session and calling actions under another) is the fastest way to trigger the failure. The Spark issue thread ties the symptom to mismatched session and DataFrame internals.

Workaround: Resync The DataFrame’s JVM Links

Reattach Context — If you already have a “stuck” DataFrame, force it to adopt the current session and context before running actions:

# df is the DataFrame that fails on collect()/show()
df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
df._sc = spark._sc

# Try the action again
df.collect()

This small patch is the most cited field fix and appears in both the Apache tracker and community answers. Use it as a stop-gap; the long-term fix is consistent session ownership.

Safer Pattern: Don’t Chain .show()

Split Display From Logic — Call .show() on its own line, then continue transformations:

df = spark.createDataFrame([(1, "a"), (2, "b")], ["id", "val"])
df.show()  # separate line keeps actions predictable
df2 = df.groupBy("val").count()
df2.collect()

Chaining .show() inside complex expressions can evaluate in odd places and has been linked to related NoneType attribute errors in PySpark posts.

Recreate The Session If You Stopped It

Restart Cleanly — If you called spark.stop(), rebuild the session before any action:

spark.stop()
spark = SparkSession.builder.getOrCreate()
# Recreate DataFrames under the new session; old ones may be invalid.

Old DataFrames keep references to an ended context; actions on them can fail with the same message. The tracker notes the failure occurs inside the call-site sync when the context is missing.

Common Triggers And The Right Move

Session Mismatch — DataFrame built under one session; action runs while another session is active. Fix: keep a single shared spark object or resync the DataFrame.
Wrapper Libraries — Validation or orchestration tools wrap Spark actions and can expose the context gap. Fix: initialize the session before the tool takes over; avoid closing it mid-run.
Action During Setup/Teardown — An action fires while the context is not yet ready or already stopping. Fix: move actions to a clearly active phase.
Chained Display Calls — .show() buried in expressions. Fix: separate display from transforms.

Scenario	Why It Happens	Quick Fix
Built DF under old session; called `collect()` later	DF’s JVM links point to a dead context	Recreate DF under current session or resync links
Validation tool triggers action	Tool holds a DF created before session init	Initialize session first; pass session-scoped DFs
Chained `.show()` in expression	Action fires in the middle of planning	Put `.show()` on its own line

Safe Patterns With A Minimal Working Example

Use One Owner Module — Centralize session creation. Import that owner anywhere you build DataFrames.

# session_owner.py
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("owner").getOrCreate()

# job.py
from session_owner import spark
from pyspark.sql import functions as F

data = [(0, 1.0), (0, 3.0), (1, 2.0), (1, 5.0)]
df = spark.createDataFrame(data, ["g", "x"])

# Display separate from logic
df.show()

out = (
    df.groupBy("g")
      .agg(F.avg("x").alias("mean_x"))
)

# Final action at the edge
rows = out.collect()
print(rows)

This pattern keeps DataFrames and the session aligned. The failure path shown in Spark’s tracker never gets a chance to hit because _jsc remains valid.

Notes For Databricks, Great Expectations, And Orchestrators

Databricks Connect/Remote — Remote connectors or REPLs can create timing gaps where the DataFrame exists before the client session is fully available. Initialize the session first, then build DataFrames, and avoid closing the session while a library still holds references. Related GitHub issues show the same stack line inside traceback_utils.py.

Great Expectations — When switching from a pandas engine to the Spark engine, build or pass DataFrames after the SparkContext is live. If you hit the message during a checkpoint run, resync the DataFrame to the current session or rebuild it under that session.

Task Runners (Dagster, PyTest) — Test harnesses may run multiple jobs in one process. Keep one shared session per test run, or tear it down between suites and recreate. Community threads report the same message during collect() when sessions hop across tests.

Exact Reproduction And A Verified Fix

Reproduce — This snippet mirrors the documented failure state by creating a correlation DataFrame and collecting from it:

from pyspark.ml.linalg import Vectors
from pyspark.ml.stat import Correlation
import pyspark

spark = pyspark.sql.SparkSession.builder.getOrCreate()

dataset = [[Vectors.dense([1, 0, 0, -2])],
           [Vectors.dense([4, 5, 0, 3])]]
df = pyspark.sql.DataFrame  # placeholder bug pattern, not a real df
# The real failure shows when df's context links are None during collect()

Spark’s bug thread documents the matching stack and a working resync, which you can apply to a real DataFrame. After reattaching the Java session and context, collect() succeeds.

# Given a real df that fails:
df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
df._sc = spark._sc
df.collect()  # works after resync

Community answers describe the same workaround and attribute the root cause to a missing initialization of a DataFrame’s internal SparkContext reference.

Prevention Tips That Keep Pipelines Stable

One Session Per App — Share a single spark object; pass it down rather than recreating it.
Construct, Then Act — Build DataFrames only after the session is ready; run actions at the edges.
Avoid Half-Chained Displays — Keep .show() on a clean line to prevent stray evaluations.
Be Careful With Library Context — When a library owns the run loop, initialize Spark before registering validations or tasks that call actions.

If you still see “attributeerror: ‘nonetype’ object has no attribute ‘setcallsite’” after applying the session discipline and the resync, audit your process for hidden spark.stop() calls, multiple SparkSession builders, or tools that construct DataFrames before the session is ready. The linked reports show these patterns again and again.

Sources: Apache Spark JIRA and community threads that document the failure path and field-tested fixes.